MIG Support in Kubernetes
The Multi-Instance GPU (MIG) feature enables securely partitioning GPUs such as the NVIDIA A100 into several separate GPU instances for CUDA applications. For example, the NVIDIA A100 supports up to seven separate GPU instances.
MIG provides multiple users with separate GPU resources for optimal GPU utilization. This feature is particularly beneficial for workloads that do not fully saturate the GPU’s compute capacity and therefore users may want to run different workloads in parallel to maximize utilization.
This document provides an overview of the necessary software to enable MIG support for Kubernetes. Refer to the MIG User Guide for more details on the technical concepts, setting up MIG and the NVIDIA Container Toolkit for running containers with MIG.
The deployment workflow requires these pre-requisites:
You have installed the NVIDIA R450+ datacenter (450.80.02+) drivers required for NVIDIA A100.
You have installed the NVIDIA Container Toolkit v2.5.0+
You already have a Kubernetes deployment up and running with access to at least one NVIDIA A100 GPU.
Once these prerequisites have been met, you can proceed to deploy a MIG capable version of the NVIDIA k8s-device-plugin
and (optionally)
the gpu-feature-discovery
component in your cluster, so that Kubernetes can schedule pods on the available MIG devices.
The minimum versions of the software components required are enumerated below:
NVIDIA R450+ datacenter driver: 450.80.02+
NVIDIA Container Toolkit (
nvidia-docker2
): v2.5.0+NVIDIA k8s-device-plugin: v0.7.0+
NVIDIA gpu-feature-discovery: v0.2.0+
MIG Strategies
NVIDIA provides two strategies for exposing MIG devices on a Kubernetes node. For more details on the strategies, refer to the design document.
Using MIG Strategies in Kubernetes
This section walks through the steps necessary to deploy and run the k8s-device-plugin
and gpu-feature-discovery
components for the various MIG strategies. The preferred
approach for deployment is through Helm.
For alternate deployment methods, refer to the installation instructions in the following GitHub repositories:
First, add the nvidia-device-plugin
and gpu-feature-discovery
helm repositories:
$ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
$ helm repo add nvgfd https://nvidia.github.io/gpu-feature-discovery
$ helm repo update
Then, verify that the v0.7.0 version of the nvidia-device-plugin and the v0.2.0 version of gpu-feature-discovery is available:
$ helm search repo nvdp --devel
NAME CHART VERSION APP VERSION DESCRIPTION
nvdp/nvidia-device-plugin 0.7.0 0.7.0 A Helm chart for ...
$ helm search repo nvgfd --devel
NAME CHART VERSION APP VERSION DESCRIPTION
nvgfd/gpu-feature-discovery 0.2.0 0.2.0 A Helm chart for ...
Finally, select a MIG strategy and deploy the nvidia-device-plugin and gpu-feature-discovery components:
$ export MIG_STRATEGY=<none | single | mixed>
$ helm install \
--version=0.7.0 \
--generate-name \
--set migStrategy=${MIG_STRATEGY} \
nvdp/nvidia-device-plugin
$ helm install \
--version=0.2.0 \
--generate-name \
--set migStrategy=${MIG_STRATEGY} \
nvgfd/gpu-feature-discovery
Testing with Different Strategies
This section walks through the steps necessary to test each of the MIG strategies.
Note
With a default setup, only one device type can be requested by a container at a time for
the mixed strategy. If more than one device type is requested by the container, then the
device received is undefined. For example, a container cannot request both nvidia.com/gpu
and nvidia.com/mig-3g.20gb
at the same time. However, it can request multiple instances
of the same resource type (e.g. nvidia.com/gpu: 2
or nvidia.com/mig-3g.20gb: 2
) without restriction.
To mitigate this behavior, we recommend following the guidance outlined in the document.
The none strategy
The none
strategy is designed to keep the nvidia-device-plugin running the same as it always
has. The plugin will make no distinction between GPUs that have either MIG enabled or not, and
will enumerate all GPUs on the node, making them available using the nvidia.com/gpu
resource type.
Testing
To test this strategy we check the enumeration of a GPU with and without MIG enabled and make sure we can see it in both cases. The test assumes a single GPU on a single node in the cluster.
Verify that MIG is disabled on the GPU:
$ nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 A100-SXM4-40GB Off | 00000000:36:00.0 Off | 0 | | N/A 29C P0 62W / 400W | 0MiB / 40537MiB | 6% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
Start the nvidia-device-plugin with the
none
strategy as described in the previous section. Restart the plugin if its already running.Observe that 1 GPU is available on the node with resource type
nvidia.com/gpu
:$ kubectl describe node ... Capacity: nvidia.com/gpu: 1 ... Allocatable: nvidia.com/gpu: 1 ...
Start gpu-feature-discovery with the
none
strategy as described in the previous section Restart the plugin if its already running.Observe that the proper set of labels have been applied for this MIG strategy:
$ kubectl get node -o json | \ jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{ "nvidia.com/cuda.driver.major": "450", "nvidia.com/cuda.driver.minor": "80", "nvidia.com/cuda.driver.rev": "02", "nvidia.com/cuda.runtime.major": "11", "nvidia.com/cuda.runtime.minor": "0", "nvidia.com/gfd.timestamp": "1605312111", "nvidia.com/gpu.compute.major": "8", "nvidia.com/gpu.compute.minor": "0", "nvidia.com/gpu.count": "1", "nvidia.com/gpu.family": "ampere", "nvidia.com/gpu.machine": "NVIDIA DGX", "nvidia.com/gpu.memory": "40537", "nvidia.com/gpu.product": "A100-SXM4-40GB" }
Deploy a pod to consume the GPU and run
nvidia-smi
$ kubectl run -it --rm \ --image=nvidia/cuda:11.0-base \ --restart=Never \ --limits=nvidia.com/gpu=1 \ mig-none-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-15f0798d-c807-231d-6525-a7827081f0f1)
Enable MIG on the GPU (requires stopping all GPU clients first)
$ sudo systemctl stop kubelet
$ sudo nvidia-smi -mig 1
Enabled MIG Mode for GPU 00000000:36:00.0 All done.
$ nvidia-smi --query-gpu=mig.mode.current --format=csv,noheader
Enabled
Restart the
kubelet
and the plugins$ sudo systemctl start kubelet
Observe that 1 GPU is available on the node with resource type
nvidia.com/gpu
.$ kubectl describe node ... Capacity: nvidia.com/gpu: 1 ... Allocatable: nvidia.com/gpu: 1 ...
Observe that the labels haven’t changed
$ kubectl get node -o json | \ jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{ "nvidia.com/cuda.driver.major": "450", "nvidia.com/cuda.driver.minor": "80", "nvidia.com/cuda.driver.rev": "02", "nvidia.com/cuda.runtime.major": "11", "nvidia.com/cuda.runtime.minor": "0", "nvidia.com/gfd.timestamp": "1605312111", "nvidia.com/gpu.compute.major": "8", "nvidia.com/gpu.compute.minor": "0", "nvidia.com/gpu.count": "1", "nvidia.com/gpu.family": "ampere", "nvidia.com/gpu.machine": "NVIDIA DGX", "nvidia.com/gpu.memory": "40537", "nvidia.com/gpu.product": "A100-SXM4-40GB" }
Deploy a pod to consume the GPU and run nvidia-smi
$ kubectl run -it --rm \ --image=nvidia/cuda:9.0-base \ --restart=Never \ --limits=nvidia.com/gpu=1 \ mig-none-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-15f0798d-c807-231d-6525-a7827081f0f1)
The single strategy
The single
strategy is designed to keep the user-experience of working with GPUs in Kubernetes the
same as it has always been. MIG devices are enumerated with the nvidia.com/gpu resource type just as before.
However, the properties associated with that resource type now map to the MIG devices available on that node,
instead of the full GPUs.
Testing
To test this strategy, we check that MIG devices of a single type are enumerated using the traditional nvidia.com/gpu
resource type. The test assumes a single GPU on a single node in the cluster with MIG enabled on it already.
Verify that MIG is enabled on the GPU and no MIG devices present:
$ nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 A100-SXM4-40GB On | 00000000:00:04.0 Off | On | | N/A 32C P0 43W / 400W | 0MiB / 40537MiB | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |==================+======================+===========+=======================| | No MIG devices found | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Create 7 single-slice MIG devices on the GPU:
$ sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 -C
$ nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a) MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/7/0) MIG 1g.5gb Device 1: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/8/0) MIG 1g.5gb Device 2: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0) MIG 1g.5gb Device 3: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/10/0) MIG 1g.5gb Device 4: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/11/0) MIG 1g.5gb Device 5: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/12/0) MIG 1g.5gb Device 6: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/13/0)
Start the nvidia-device-plugin plugin with the
single
strategy as described in the previous section. If its already running, then restart the plugin.Observe that 7 MIG devices are available on the node with resource type
nvidia.com/gpu
:$ kubectl describe node ... Capacity: nvidia.com/gpu: 7 ... Allocatable: nvidia.com/gpu: 7 ...
Start gpu-feature-discovery with the
single
strategy as described in the previous section. If its already running, then restart the plugin.Observe that the proper set of labels have been applied for this MIG strategy:
$ kubectl get node -o json | \ jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{ "nvidia.com/cuda.driver.major": "450", "nvidia.com/cuda.driver.minor": "80", "nvidia.com/cuda.driver.rev": "02", "nvidia.com/cuda.runtime.major": "11", "nvidia.com/cuda.runtime.minor": "0", "nvidia.com/gfd.timestamp": "1605657366", "nvidia.com/gpu.compute.major": "8", "nvidia.com/gpu.compute.minor": "0", "nvidia.com/gpu.count": "7", "nvidia.com/gpu.engines.copy": "1", "nvidia.com/gpu.engines.decoder": "0", "nvidia.com/gpu.engines.encoder": "0", "nvidia.com/gpu.engines.jpeg": "0", "nvidia.com/gpu.engines.ofa": "0", "nvidia.com/gpu.family": "ampere", "nvidia.com/gpu.machine": "NVIDIA DGX", "nvidia.com/gpu.memory": "4864", "nvidia.com/gpu.multiprocessors": "14", "nvidia.com/gpu.product": "A100-SXM4-40GB-MIG-1g.5gb", "nvidia.com/gpu.slices.ci": "1", "nvidia.com/gpu.slices.gi": "1", "nvidia.com/mig.strategy": "single" }
Deploy 7 pods, each consuming one MIG device (then read their logs and delete them)
$ for i in $(seq 7); do kubectl run \ --image=nvidia/cuda:11.0-base \ --restart=Never \ --limits=nvidia.com/gpu=1 \ mig-single-example-${i} -- bash -c "nvidia-smi -L; sleep infinity" done
pod/mig-single-example-1 created pod/mig-single-example-2 created pod/mig-single-example-3 created pod/mig-single-example-4 created pod/mig-single-example-5 created pod/mig-single-example-6 created pod/mig-single-example-7 created
$ for i in $(seq 7); do echo "mig-single-example-${i}"; kubectl logs mig-single-example-${i} echo ""; done
mig-single-example-1 GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a) MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/7/0) mig-single-example-2 GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a) MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0) ...
$ for i in $(seq 7); do kubectl delete pod mig-single-example-${i}; done
pod "mig-single-example-1" deleted pod "mig-single-example-2" deleted ...
The mixed strategy
The mixed
strategy is designed to enumerate a different resource type for every MIG device
configuration available in the cluster.
Testing
To test this strategy, we check that all MIG devices are enumerated using their fully qualified name
of the form nvidia.com/mig-<slice_count>g.<memory_size>gb
. The test assumes a single GPU on a single
node in the cluster with MIG enabled on it already.
Verify that MIG is enabled on the GPU and no MIG devices present:
$ nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 A100-SXM4-40GB On | 00000000:00:04.0 Off | On | | N/A 32C P0 43W / 400W | 0MiB / 40537MiB | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |==================+======================+===========+=======================| | No MIG devices found | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Create 3 different MIG devices of different sizes on the GPU:
$ sudo nvidia-smi mig -cgi 9,14,19 -C
$ nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a) MIG 3g.20gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/2/0) MIG 2g.10gb Device 1: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/3/0) MIG 1g.5gb Device 2: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)
Start the nvidia-device-plugin plugin with the
mixed
strategy as described in the previous section. If its already running, then restart the plugin.Observe that 3 MIG devices are available on the node with resource type
nvidia.com/gpu
:$ kubectl describe node ... Capacity: nvidia.com/mig-1g.5gb: 1 nvidia.com/mig-2g.10gb: 1 nvidia.com/mig-3g.20gb: 1 ... Allocatable: nvidia.com/mig-1g.5gb: 1 nvidia.com/mig-2g.10gb: 1 nvidia.com/mig-3g.20gb: 1 ...
Start gpu-feature-discovery with the
mixed
strategy as described in the previous section. If its already running, then restart the plugin.Observe that the proper set of labels have been applied for this MIG strategy:
$ kubectl get node -o json | \ jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{ "nvidia.com/cuda.driver.major": "450", "nvidia.com/cuda.driver.minor": "80", "nvidia.com/cuda.driver.rev": "02", "nvidia.com/cuda.runtime.major": "11", "nvidia.com/cuda.runtime.minor": "0", "nvidia.com/gfd.timestamp": "1605658841", "nvidia.com/gpu.compute.major": "8", "nvidia.com/gpu.compute.minor": "0", "nvidia.com/gpu.count": "1", "nvidia.com/gpu.family": "ampere", "nvidia.com/gpu.machine": "NVIDIA DGX", "nvidia.com/gpu.memory": "40537", "nvidia.com/gpu.product": "A100-SXM4-40GB", "nvidia.com/mig-1g.5gb.count": "1", "nvidia.com/mig-1g.5gb.engines.copy": "1", "nvidia.com/mig-1g.5gb.engines.decoder": "0", "nvidia.com/mig-1g.5gb.engines.encoder": "0", "nvidia.com/mig-1g.5gb.engines.jpeg": "0", "nvidia.com/mig-1g.5gb.engines.ofa": "0", "nvidia.com/mig-1g.5gb.memory": "4864", "nvidia.com/mig-1g.5gb.multiprocessors": "14", "nvidia.com/mig-1g.5gb.slices.ci": "1", "nvidia.com/mig-1g.5gb.slices.gi": "1", "nvidia.com/mig-2g.10gb.count": "1", "nvidia.com/mig-2g.10gb.engines.copy": "2", "nvidia.com/mig-2g.10gb.engines.decoder": "1", "nvidia.com/mig-2g.10gb.engines.encoder": "0", "nvidia.com/mig-2g.10gb.engines.jpeg": "0", "nvidia.com/mig-2g.10gb.engines.ofa": "0", "nvidia.com/mig-2g.10gb.memory": "9984", "nvidia.com/mig-2g.10gb.multiprocessors": "28", "nvidia.com/mig-2g.10gb.slices.ci": "2", "nvidia.com/mig-2g.10gb.slices.gi": "2", "nvidia.com/mig-3g.21gb.count": "1", "nvidia.com/mig-3g.21gb.engines.copy": "3", "nvidia.com/mig-3g.21gb.engines.decoder": "2", "nvidia.com/mig-3g.21gb.engines.encoder": "0", "nvidia.com/mig-3g.21gb.engines.jpeg": "0", "nvidia.com/mig-3g.21gb.engines.ofa": "0", "nvidia.com/mig-3g.21gb.memory": "20096", "nvidia.com/mig-3g.21gb.multiprocessors": "42", "nvidia.com/mig-3g.21gb.slices.ci": "3", "nvidia.com/mig-3g.21gb.slices.gi": "3", "nvidia.com/mig.strategy": "mixed" }
Deploy 3 pods, each consuming one of the available MIG devices
$ kubectl run -it --rm \ --image=nvidia/cuda:11.0-base \ --restart=Never \ --limits=nvidia.com/mig-1g.5gb=1 \ mig-mixed-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a) MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0) pod "mig-mixed-example" deleted