MIG Support in OpenShift Container Platform

Introduction
MIG geometry
- Prerequisites
Configuring MIG Devices in OpenShift
- MIG advertisement strategies
- Set the MIG advertisement strategy and apply the MIG partitioning
Creating and applying a custom MIG configuration
Example: Mixed MIG strategy
- Introduction and default MIG configuration
- Procedure
Example: Single MIG strategy
Running a sample GPU application
Disable the MIG mode
Troubleshooting

Introduction

NVIDIA Mult-Instance GPU (MIG) is useful anytime you have an application that does not require the full power of an entire GPU. The new NVIDIA Ampere architecture’s MIG feature allows you to split your hardware resources into multiple GPU instances, each exposed to the operating system as an independent CUDA-enabled GPU. The NVIDIA GPU Operator version 1.7.0 and above provides MIG feature support for the A100 and A30 Ampere cards. These GPU instances are designed to support multiple independent CUDA applications (up to 7), so they operate completely isolated from each other using dedicated hardware resources.

The compute units of the GPU, in addition to its memory, can be partitioned into multiple MIG instances. Each of these instances presents as a stand-alone GPU device from the system perspective and can be bound to any application, container, or virtual machine running on the node.

From the perspective of the software consuming the GPU each of these MIG instances looks like its own individual GPU.

MIG geometry

The NVIDIA GPU Operator version 1.7.0 and above enables OpenShift Container Platform administrators to dynamically reconfigure the geometry of the MIG partitioning. The geometry of the MIG partitioning is how hardware resources are bound to MIG instances, so it directly influences their performance and the number of instances that can be allocated. The A100-40GB, for example, has eight compute units and 40 GB of RAM. When the MIG mode is enabled, the eighth instance is reserved for resource management.

The table below provides a summary of the MIG instance properties of the NVIDIA A100-40GB product:

Profile	Memory	Compute Units	Maximum number of homogeneous instances
1g.5gb	5 GB	1	7
2g.10gb	10 GB	2	3
3g.20gb	20 GB	3	2
4g.20gb	20 GB	4	1
7g.40gb	40 GB	7	1

In addition to homogeneous instances, some heterogeneous combinations can be chosen. See the Multi-Instance GPU User Guide documentation for an exhaustive listing.

Here is an example, again for the A100-40GB, with heterogeneous (or “mixed”) geometries:

2x 1g.5gb
1x 2g.10gb
1x 3g.10gb

Prerequisites

The deployment workflow requires these prerequisites.

You already have a OpenShift Container Platform cluster up and running with access to at least one MIG-capable GPU.
You have followed the guidance in Installation and Upgrade Overview proceeding as far as creating the cluster policy <create-cluster-policy>.

Note

The node must be free (drained) of GPU workloads before any reconfiguration is triggered. For guidance on draining a node see, the OpenShift Container Platform documentation Understanding how to evacuate pods on nodes.

Configuring MIG Devices in OpenShift

MIG advertisement strategies

The NVIDIA GPU Operator exposes GPUs to Kubernetes as extended resources that can be requested and exposed into Pods and containers. The first step of the MIG configuration is to decide what Strategy you want. The advertisement strategies are described here:

Single defines a homogeneous advertisement strategy, with MIG instances exposed as usual GPUs. This strategy exposes the MIG instances as nvidia.com/gpu resources, identically, as usual non-MIG capable (or with MIG disabled) devices. In this strategy, all the GPUs in a single node must be configured in a homogenous manner (same number of compute units, same memory size). This strategy is best for a large cluster where the infrastructure teams can configure “node pools” of different MIG geometries and make them available to users. Another advantage of this strategy is backward compatibility where the existing application does not have to be modified to be scheduled this way.
Examples for the A100-40GB:
- 1g.5gb: 7 nvidia.com/gpu instances, or
- 2g.10gb: 3 nvidia.com/gpu instances, or
- 3g.20gb: 2 nvidia.com/gpuinstances, or
- 7g.40gb: 1 nvidia.com/gpu instances
Mixed defines a heterogeneous advertisement strategy. There is no constraint on the geometry; all the combinations allowed by the GPU are permitted. This strategy is appropriate for a smaller cluster, where on a single node with multiple GPUs, each GPU can be configured in a different MIG geometry.
Examples for the A100-40GB:
- All the single configurations are possible
- A “balanced” configuration:
  - 1g.5gb: 2 nvidia.com/mig-1g.5gb instances, and
  - 2g.10gb: 1 nvidia.com/mig-2g.10gb instance, and
  - 3g.20gb: 1 nvidia.com/mig-3g.20gb instance

Version 1.8 and greater of the NVIDIA GPU Operator supports updating the Strategy in the ClusterPolicy after deployment.

The default configmap defines the combination of single (homogeneous) and mixed (heterogeneous) profiles that are supported for A100-40GB, A100-80GB and A30-24GB. The configmap allows administrators to declaratively define a set of possible MIG configurations they would like applied to all GPUs on a node. The tables below describe these configurations:

Single configuration
GPU Type	Custom label	Profile	MIG instances
A100-40GB
	all-1g.5gb	1g.5gb	7
	all-2g.10gb	2g.10gb	3
	all-3g.20gb	3g.20gb	2
	all-7g.40gb	7g.40gb	1
A100-80GB
	all-1g.10gb	1g.10gb	7
	all-2g.20gb	2g.20gb	3
	all-3g.40gb	3g.40gb	2
	all-7g.80gb	7g.80gb	1
A30-24GB
	all-1g.6gb	1g.6gb	4
	all-2g.12gb	2g.12gb	2
	all-4g.24gb	4g.24gb	1

All-balanced is composed of 3 distinct configurations, with a device-filter filtering, based on the device UID. The possible supported combinations are described below:

Balanced configuration
GPU Type	Custom label	Profile and MIG instances
A100-40GB
	all-balanced	1g.5gb: 2 2g.10gb:1 3g.20gb:1
A100-80GB
	all-balanced	1g.10gb:2 2g.20gb:1 3g.40gb:1
A30-24GB
	all-balanced	1g.6gb: 2 2g.12gb:1

Set the MIG advertisement strategy and apply the MIG partitioning

Having decided on your advertisement strategy you need to set this by editing the default cluster policy and then apply the MIG partitioning profile.

For example to set the advertisement strategy to mixed and the MIG partitioning profile to 3x 2g.10gb MIG devices follow the step below:

In the OpenShift Container Platform CLI run the following:
```
$ STRATEGY=mixed && \
  oc patch clusterpolicy/gpu-cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/mig/strategy", "value": '$STRATEGY'}]'
```
Note

This may take a while so be patient and wait at least 10-20 minutes before digging deeper into any form of troubleshooting.
In the OpenShift Container Platform web console, from the side menu, select Operators > Installed Operators, then click the NVIDIA GPU Operator.
Select the ClusterPolicy tab. The status of the newly deployed ClusterPolicy gpu-cluster-policy for the NVIDIA GPU Operator displays State:ready once the installation succeeded.

Apply the desired MIG partitioning profile. To configure 3x 2g.10gb MIG devices run the following:

$ MIG_CONFIGURATION=all-2g.10gb && \
  oc label node/$NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite

Wait for the mig-manager to perform the reconfiguration:
```
$ oc -n nvidia-gpu-operator logs ds/nvidia-mig-manager --all-containers -f --prefix
```
The status of the reconfiguration should change from success → pending → success.

Verify the new configuration is applied:

$ oc get pods -n nvidia-gpu-operator -lapp=nvidia-driver-daemonset -owide

Select the name of the Pod on the MIG GPU enabled node and run the following:

$ oc rsh -n nvidia-gpu-operator $POD_NAME nvidia-smi mig -lgi

+----------------------------------------------------+
| GPU instances:                                     |
| GPU   Name          Profile  Instance   Placement  |
|                       ID       ID       Start:Size |
|====================================================|
|   0  MIG 2g.10gb       19        3          4:2    |
+----------------------------------------------------+
|   0  MIG 2g.10gb       19        5          0:2    |
+----------------------------------------------------+
|   0  MIG 2g.10gb       19        6          2:2    |
+----------------------------------------------------+

With the profile in step 4 applied the A100 is configured into 3 MIG devices.

Check the node has been labeled:

$ oc get nodes/$NODE_NAME --show-labels | tr ',' '\n' | grep nvidia.com

with labels:

nvidia.com/gpu.present=true
nvidia.com/cuda.driver.major=470
nvidia.com/cuda.driver.minor=57
nvidia.com/cuda.driver.rev=02
nvidia.com/cuda.runtime.major=11
nvidia.com/cuda.runtime.minor=4
nvidia.com/gpu.compute.major=8
nvidia.com/gpu.compute.minor=0
nvidia.com/gpu.count=1
nvidia.com/gpu.family=ampere
nvidia.com/gpu.machine=...
nvidia.com/gpu.memory=40536
nvidia.com/gpu.product=NVIDIA-A100-SXM4-40GB
nvidia.com/mig-2g.10gb.count=3
nvidia.com/mig-2g.10gb.engines.copy=2
nvidia.com/mig-2g.10gb.engines.decoder=1
nvidia.com/mig-2g.10gb.engines.encoder=0
nvidia.com/mig-2g.10gb.engines.jpeg=0
nvidia.com/mig-2g.10gb.engines.ofa=0
nvidia.com/mig-2g.10gb.memory=9984
nvidia.com/mig-2g.10gb.multiprocessors=28
nvidia.com/mig-2g.10gb.slices.ci=2
nvidia.com/mig-2g.10gb.slices.gi=2
nvidia.com/mig.config.state=success
nvidia.com/mig.config=all-2g.10gb
nvidia.com/mig.strategy=mixed
[...]

Note

The extract above shows the strategy is set to mixed with the MIG configuration set to all-2g.10gb.

Verify that the MIG instances are exposed:

$ oc get node/$NODE_NAME -ojsonpath={.status.allocatable} | jq . | grep nvidia

"nvidia.com/mig-2g.10gb": "3",

Note

You can ignore values set to 0.

Creating and applying a custom MIG configuration

Follow the guidance below to create a new slicing profile.

Prepare a custom configmap resource file for example custom_configmap.yaml. Use the configmap as guidance to help you build that custom configuration. For more documentation about the file format see mig-parted.

Note

For a list of all supported combinations and placements of profiles on A100 and A30, refer to the section on supported profiles.

Create the custom configuration within the nvidia-gpu-operator namespace:

$ CONFIG_FILE=/path/to/custom_configmap.yaml && \
  oc create configmap custom-mig-parted-config \
     --from-file=config.yaml=$CONFIG_FILE \
     -n nvidia-gpu-operator

Edit the cluster policy and enter the name of the config map in the field spec.migManager.config.name:

$ oc edit clusterpolicy
  spec:
    migManager:
      config:
        name: custom-mig-parted-config

Label the node with this newly created profile following the guidance in Set the MIG advertisement strategy and apply the MIG partitioning.

Example: Mixed MIG strategy

Introduction and default MIG configuration

For each MIG configuration, you specify a strategy and a MIG configuration label.

This example shows how to configure a mixed strategy with the all-balanced configuration on one NVIDIA DGX H100 host with 8 x H100 80GB GPUs. The DGX H100 host runs a single node installation of OpenShift.

By default, MIG is disabled and is configured with the single strategy:

$ oc describe node | grep nvidia.com/mig

Example Output

nvidia.com/mig.capable=true
nvidia.com/mig.config=all-disabled
nvidia.com/mig.config.state=success
nvidia.com/mig.strategy=single

With the default configuration, the host supports up to 8 pods with GPUs:

$ oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu|Allocatable:|Requests +Limits"

Example Output

Name:               myworker.redhat.com
Roles:              control-plane,master,worker
Capacity:
nvidia.com/gpu:     8
Allocatable:
nvidia.com/gpu:     8
Resource           Requests      Limits
nvidia.com/gpu     0             0

Procedure

The following steps show how to apply the mixed strategy with the MIG configuration label all-balanced.

With this strategy and label, each H100 GPU enables these MIG profiles:

2 x 1g.10gb
1 x 2g.20gb
1 x 3g.40g

For the NVIDIA DGX H100 that has 8 H100 GPUs, performing the steps results in the following GPU capacity on the cluster:

16 x 1g.10gb (8 x 2)
8 x 2g.20gb (8 x 1)
8 x 3g.40gb (8 x 1)

Specify the host name, strategy, and configuration label in environment variables:

$ NODE_NAME=myworker.redhat.com
$ STRATEGY=mixed
$ MIG_CONFIGURATION=all-balanced

Apply the strategy:

$ oc patch clusterpolicy/gpu-cluster-policy --type='json' \
    -p='[{"op": "replace", "path": "/spec/mig/strategy", "value": '$STRATEGY'}]'

Label the node with the configuration label:
```
$ oc label node $NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite
```
MIG manager applies a mig.config.state label to the GPU and then terminates all the GPU pods in preparation to enable MIG mode and configure the GPU into the specified configuration.

Optional: Verify that MIG manager configured the GPUs:

$ oc describe node | grep nvidia.com/mig.config

Example Output

nvidia.com/mig.config=all-balanced
nvidia.com/mig.config.state=success

Confirm that the GPU resources are available:

$ oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu:|nvidia.com/mig-.* |Allocatable:|Requests +Limits"

The following sample output shows the expected 32 GPU resources:

16 x 1g.10gb
8 x 1g.10gb
8 x 3g.40gb

Name:               myworker.redhat.com
Roles:              control-plane,master,worker
Capacity:
nvidia.com/gpu:          0
nvidia.com/mig-1g.10gb:  16
nvidia.com/mig-2g.20gb:  8
nvidia.com/mig-3g.40gb:  8
Allocatable:
nvidia.com/gpu:          0
nvidia.com/mig-1g.10gb:  16
nvidia.com/mig-2g.20gb:  8
nvidia.com/mig-3g.40gb:  8
Resource                Requests      Limits
nvidia.com/mig-1g.10gb  0             0
nvidia.com/mig-2g.20gb  0             0
nvidia.com/mig-3g.40gb  0             0

Optional: Start a pod to run the nvidia-smi command and display the GPU resources.

Start the pod:

$ cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: command-nvidia-smi
spec:
  restartPolicy: Never
  containers:
  - name: cuda-container
    image: nvcr.io/nvidia/cuda:12.1.0-base-ubi8
    command: ["/bin/sh","-c"]
    args: ["nvidia-smi"]
EOF

Confirm the pod ran successfully:

$ oc get pods

Example Output

NAME                 READY   STATUS      RESTARTS   AGE
command-nvidia-smi   0/1     Completed   0          3m34s

Confirm that the nvidia-smi output includes 32 MIG devices:

$ oc logs command-nvidia-smi

Example Output

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          On  | 00000000:1B:00.0 Off |                   On |
| N/A   25C    P0              71W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          On  | 00000000:43:00.0 Off |                   On |
| N/A   26C    P0              70W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          On  | 00000000:52:00.0 Off |                   On |
| N/A   31C    P0              72W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          On  | 00000000:61:00.0 Off |                   On |
| N/A   29C    P0              71W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          On  | 00000000:9D:00.0 Off |                   On |
| N/A   26C    P0              71W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          On  | 00000000:C3:00.0 Off |                   On |
| N/A   25C    P0              70W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          On  | 00000000:D1:00.0 Off |                   On |
| N/A   29C    P0              73W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          On  | 00000000:DF:00.0 Off |                   On |
| N/A   31C    P0              72W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  0    2   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    3   0   1  |              11MiB / 20096MiB  | 32      0 |  2   0    2    0    2 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    9   0   2  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0   10   0   3  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  1    2   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  1    3   0   1  |              11MiB / 20096MiB  | 32      0 |  2   0    2    0    2 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  1    9   0   2  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  1   10   0   3  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  2    2   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  2    3   0   1  |              11MiB / 20096MiB  | 32      0 |  2   0    2    0    2 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  2    9   0   2  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  2   10   0   3  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  3    2   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  3    3   0   1  |              11MiB / 20096MiB  | 32      0 |  2   0    2    0    2 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  3    9   0   2  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  3   10   0   3  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  4    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  4    5   0   1  |              11MiB / 20096MiB  | 32      0 |  2   0    2    0    2 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  4   13   0   2  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  4   14   0   3  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  5    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  5    5   0   1  |              11MiB / 20096MiB  | 32      0 |  2   0    2    0    2 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  5   13   0   2  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  5   14   0   3  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  6    2   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  6    3   0   1  |              11MiB / 20096MiB  | 32      0 |  2   0    2    0    2 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  6    9   0   2  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  6   10   0   3  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  7    2   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  7    3   0   1  |              11MiB / 20096MiB  | 32      0 |  2   0    2    0    2 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  7    9   0   2  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  7   10   0   3  |               5MiB /  9984MiB  | 16      0 |  1   0    1    0    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Delete the sample pod:

$ oc delete pod command-nvidia-smi

Example Output

pod "command-nvidia-smi" deleted

Example: Single MIG strategy

This example shows how to configure a single strategy with the all-3g.40gb configuration on one NVIDIA DGX H100 host with 8 x H100 80GB GPUs. The DGX H100 host runs a single node installation of OpenShift.

For information about the initial default MIG configuration and viewing it, refer to the beginning of Example: Mixed MIG strategy.

Specify the host name, strategy, and configuration label in environment variables:

$ NODE_NAME=myworker.redhat.com
$ STRATEGY=single
$ MIG_CONFIGURATION=all-3g.40gb

Apply the strategy:

$ oc patch clusterpolicy/gpu-cluster-policy --type='json' \
    -p='[{"op": "replace", "path": "/spec/mig/strategy", "value": '$STRATEGY'}]'

Label the node with the configuration label:
```
$ oc label node $NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite
```
MIG manager applies a mig.config.state label to the GPU and then terminates all the GPU pods in preparation to enable MIG mode and configure the GPU into the specified configuration.

Confirm that the GPU resources are available:

$ oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu:|nvidia.com/mig-.* |Allocatable:|Requests +Limits"

The following sample output shows the expected 16 GPUs:

Name:               myworker.redhat.com
Roles:              control-plane,master,worker
Capacity:
nvidia.com/gpu:          16
nvidia.com/mig-1g.10gb:  0
nvidia.com/mig-2g.20gb:  0
nvidia.com/mig-3g.40gb:  0
Allocatable:
nvidia.com/gpu:          16
nvidia.com/mig-1g.10gb:  0
nvidia.com/mig-2g.20gb:  0
nvidia.com/mig-3g.40gb:  0
Resource                Requests      Limits
nvidia.com/mig-1g.10gb  0             0
nvidia.com/mig-2g.20gb  0             0
nvidia.com/mig-3g.40gb  0             0

Optional: Start a pod to run the nvidia-smi command and display the GPU resources.

Start the pod:

$ cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: command-nvidia-smi
spec:
  restartPolicy: Never
  containers:
  - name: cuda-container
    image: nvcr.io/nvidia/cuda:12.1.0-base-ubi8
    command: ["/bin/sh","-c"]
    args: ["nvidia-smi"]
EOF

Confirm the pod ran successfully:

$ oc get pods

Example Output

NAME                 READY   STATUS      RESTARTS   AGE
command-nvidia-smi   0/1     Completed   0          3m34s

Confirm that the nvidia-smi output includes 16 MIG devices:

$ oc logs command-nvidia-smi

Example Output

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          On  | 00000000:1B:00.0 Off |                   On |
| N/A   25C    P0              75W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          On  | 00000000:43:00.0 Off |                   On |
| N/A   27C    P0              74W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          On  | 00000000:52:00.0 Off |                   On |
| N/A   32C    P0              75W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          On  | 00000000:61:00.0 Off |                   On |
| N/A   30C    P0              74W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          On  | 00000000:9D:00.0 Off |                   On |
| N/A   27C    P0              75W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          On  | 00000000:C3:00.0 Off |                   On |
| N/A   25C    P0              73W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          On  | 00000000:D1:00.0 Off |                   On |
| N/A   30C    P0              77W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          On  | 00000000:DF:00.0 Off |                   On |
| N/A   31C    P0              76W / 700W |                  N/A |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  0    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    2   0   1  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  1    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  1    2   0   1  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  2    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  2    2   0   1  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  3    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  3    2   0   1  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  4    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  4    2   0   1  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  5    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  5    2   0   1  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  6    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  6    2   0   1  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  7    1   0   0  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  7    2   0   1  |              16MiB / 40448MiB  | 60      0 |  3   0    3    0    3 |
|                  |               0MiB / 65535MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Delete the sample pod:

$ oc delete pod command-nvidia-smi

Example Output

pod "command-nvidia-smi" deleted

Running a sample GPU application

Let’s run a simple CUDA sample, in this case vectorAdd by requesting a GPU resource as you would normally do in Kubernetes.

If the cluster is configured with the mixed advertisement strategy.

Request the MIG instance with nvidia.com/mig-2g.10gb: 1 as follows:

Note

There is no need for a nodeSelector, as the Pod is necessarily scheduled on a 2g.10gb MIG instance.

$ cat << EOF | oc create -f -

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vectoradd
    image: "nvidia/samples:vectoradd-cuda11.2.1"
    resources:
      limits:
        nvidia.com/mig-2g.10gb: 1

pod/cuda-vectoradd created

Check the logs of the container:

$ oc logs cuda-vectoradd

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

If the cluster is configured with the single advertisement strategy.

Request the MIG instance with nvidia.com/gpu: 1 and enforce the Pod scheduling on a node with a 2g.10gb MIG instance with the nodeSelector stanza as follows:

$ cat << EOF | oc create -f -

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vectoradd
    image: "nvidia/samples:vectoradd-cuda11.2.1"
    resources:
      limits:
        nvidia.com/gpu: 1
  nodeSelector:
    nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
EOF

Disable the MIG mode

To turn MIG mode off so that you can utilize the full capacity of the GPU run the following:

$ MIG_CONFIGURATION=all-disabled && \
  oc label node/$NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite

Troubleshooting

The MIG reconfiguration is handled exclusively by the controller deployed within the nvidia-mig-manager DaemonSet. Inspecting the logs of these Pods should give a clue about what went wrong.

Check the logs of the container:

$ oc logs nvidia-mig-manager

The cluster administrator is expected to drain the node from any GPU workload, before requesting the MIG reconfiguration. If the node is not properly drained, the nvidia-mig-manager will fail with this error in the logs:

 Updating MIG config: map[2g.10gb:3]
Error clearing MigConfig: error destroying Compute instance for profile '(0, 0)': In use by another client
Error clearing MIG config on GPU 0, erroneous devices may persist
Error setting MIGConfig: error attempting multiple config orderings: all orderings failed
Restarting all GPU clients previously shutdown by reenabling their component-specific nodeSelector labels
Changing the 'nvidia.com/mig.config.state' node label to 'failed'

Resolve this issue by:

Correctly draining the node. For guidance on draining a node see, the OpenShift Container Platform documentation Understanding how to evacuate pods on nodes.

Retrigger the reconfiguration by forcing the label update:

$ oc label node/$NODE_NAME nvidia.com/mig.config- --overwrite

$ oc label node/$NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite