Kubernetes GPU

Scheduling GPUs On-Premises or on EKS, GKE, and AKS

How are companies using Kubernetes on GPUs?

Kubernetes is a highly popular container orchestrator, which can be deployed on-premises, in the cloud, and in hybrid environments.

To support compute-intensive workloads like machine learning (ML), Kubernetes can be used with graphical processing units (GPUs). GPUs provide hardware acceleration that is especially beneficial for deep learning and other machine learning algorithms. Kubernetes can be used to scale up multi GPU setups for large-scale ML projects.

GPU scheduling on Kubernetes is currently supported for NVIDIA and AMD GPUs, and requires the use of vendor-provided drivers and device plugins.

You can run Kubernetes on GPU machines in your local data center, or leverage GPU-powered compute instances on managed Kubernetes services, including Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), and Azure Kubernetes Service (AKS).

In this article, you will learn:

GPU Scheduling with Kubernetes

Kubernetes lets you manage graphical processing units (GPUs) across multiple nodes. GPU scheduling on Kubernetes is available primarily for AMD and NVIDIA accelerators.

To enable GPU scheduling, Kubernetes uses Device Plugins, which enable pods to access specialized hardware functionality, including GPUs. This is not set up by default—you need to configure GPU scheduling to use it.

First, you need to choose a GPU vendor—AMD or NVIDIA—and install your chosen GPU drivers on the nodes. You can then run the device plugin provided by the GPU vendor.

After you set up and run a GPU driver, Kubernetes exposes either nvidia.com/gpu or amd.com/gpu as a schedulable resource.

To consume GPUs from containers, you need to request .com/gpu, in the same manner you request memory or cpu resources. Note that there are certain limitations in how you can specify resource requirements for GPUs:

  • You cannot overcommit GPUs—containers and pods do not share GPUs.
  • A container cannot request part of a GPU—each container can receive access to a full GPU or multiple GPUs.
  • Limits must be equal to requests—requests are what a container is guaranteed to get, while limits ensure the resources it receives do not exceed a certain value. GPUs are less flexible than other resources; when it comes to GPU resources, you may only specify limits. If you specify a request, it must be equal to the limit.

Deploying AMD Device Plugin on Kubernetes Nodes

To run AMD GPUs on a node, you need to first install an AMD GPU Linux driver. Once your nodes have the driver, you can deploy the relevant AMD device plugin by using the below command:

kubectl create -f
https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/v1.10/k8s-ds-amdgpu-dp.yaml

Deploying NVIDIA Device Plugin on Kubernetes Nodes

You can configure NVIDIA GPUs on Kubernetes using the official NVIDIA GPU device plugin. Here are several prerequisites of the plugin:

  • The relevant NVIDIA drivers are installed on the nodes.
  • nvidia-docker 2.0 is installed on the nodes.
  • Docker is used as the container runtime by kubelet.
  • nvidia-container-runtime is the default runtime for Docker (do not use runc).
  • Your NVIDIA drivers version must match the constraint ~= 384.81.

Once all the prerequisites are met, you can deploy the NVIDIA device plugin using this command:

kubectl create -f
https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

Learn more about Kubernetes for machine learning in our detailed guides about:

Using GPUs on Google Kubernetes Engine (GKE)

Google Kubernetes Engine lets you run Kubernetes nodes with several types of GPUs, including NVIDIA Tesla K80, P4, V100, P100, A100, and T4.

To reduce costs, you can use preemptible virtual machines (also known as spot instances)—as long as your workloads can tolerate frequent node disruptions.

There are several prerequisites to using GPUs on GKE:

  • Kubernetes version—GKE version 1.9 or higher supports GPUs for node pools with the Container-Optimized OS node images. GKE version 1.11.3 or higher supports GPUs using the standard Ubuntu node image.
  • GPU quota—to create nodes running GPUs, you must set up a GPU quota in the relevant Compute Engine zone.
  • NVIDIA GPU drivers—install the relevant NVIDIA drivers on all nodes that need to run GPUs.
  • A100 GPUs—only a2 machine types with GKE version 1.18.6-gke.3504 or higher can support A100 GPUs.

Here are several limitations of GPUs on GKE:

  • It is not possible to add GPUs to an existing node pool
  • You cannot live-migrate GPU nodes during any maintenance event
  • GPUs are only supported on the general-purpose N1 machine type or the accelerator-optimized A2 machine type.
  • Windows Server node pools do not support GPUs

How to install NVIDIA drivers

After you add GPU nodes to a cluster, install the relevant NVIDIA drivers. You can do this using the DaemonSet provided by Google.

Here is a command you can run to deploy the installation DaemonSet:

kubectl apply -f
https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml

You can now run GPU workloads on your GKE cluster.

Using GPUs on Azure Kubernetes Service (AKS)

AKS also supports creating Kubernetes nodes that are GPU-enabled. Currently you can only use GPUs for Linux node pools.

Prerequisites:

  • Kubernetes 1.10 or higher running on the cluster
  • You must install and configure Azure CLI 2.0.64 or later

How to install NVIDIA device plugin

Before using GPUs on nodes, you need to deploy the DaemonSet for the NVIDIA device plugin. This DaemonSet runs pods on each node and provides the necessary drivers for the GPU.

To install the device plugin on Azure nodes:

1. Create a namespace using this command:

kubectl create namespace gpu-resources

2. Create a text file, rename it to nvidia-device-plugin-ds.yaml and paste the YAML manifest provided by Azure—get it here.

3. Run kubectl apply -f nvidia-device-plugin-ds.yaml to create the DaemonSet.

You can now run GPU-enabled workloads on your AKS cluster. See an example showing how to run Tensorflow on AKS nodes.

Running GPU Accelerated Linux AMIs on Amazon EKS

AWS offers an EKS-optimized AMI that comes with built-in GPU support. EKS-optimized AMIs are configured to be used as base images for Amazon P2 and P3 instances.

The GPU-accelerated AMIs are an optional image you can use to run GPU workloads on EKS nodes. In addition to the standard EKS-optimized AMI configuration, GPU AMIs include NVIDIA drives, a default runtime set to nvidia-container-runtime, and the nvidia-docker2 package.

To enable GPU workloads with the EKS-optimized AMI and test that GPU nodes are configured correctly:

1. After the GPU node has joined the cluster, apply the NVIDIA device plugin for Kubernetes as a DaemonSet in the cluster using the following command:

kubectl apply -f
https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.8.0/nvidia-device-plugin.yml

2. Verify that a node has an allocation of GPUs by using the following command:

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

3. Create a file called nvidia-smi.yaml and use the YAML configuration provided by Amazon here. The manifest starts a Cuda container running nvidia-smi on the node.

4. Apply the manifest as follows:

kubectl apply -f nvidia-smi.yaml

5. Once the pod is running, check the logs using the following command:

kubectl logs nvidia-smi

Kubernetes GPU Scheduling with Run:AI

Run:AI’s Scheduler is a simple plug-in to Kubernetes clusters and enables optimized orchestration of high-performance containerized workloads. It adds high-performance orchestration to your containerized AI workloads. The Run:AI platform includes:

  • High-performance for scale-up infrastructures—pool resources and enable large workloads that require considerable resources to coexist efficiently with small workloads requiring fewer resources.
  • Batch scheduling—workloads can start, pause, restart, end, and then shut down, all without any manual intervention. Plus, when the container terminates, the resources are released and can be allocated to other workloads for greater system efficiency.
  • Topology awareness—inter-resource and inter-node communication enable consistent high performance of containerized workloads.
  • Gang scheduling—containers can be launched together, start together, and end together for distributed workloads that need considerable resources.

Run:AI simplifies Kubernetes scheduling for AI and HPC workloads, helping researchers accelerate their productivity and the quality of their work.

Learn more about the Run:AI Kubernetes Scheduler