Step #9: Optimize GPU Resources with NVIDIA Multi-Instance GPU (MIG)

It’s not often possible to perfectly match a GPU’s capabilities to the needs of an application which can lead to potential waste of resources. To prevent this, we will used an advanced feature of NVIDIA GPU’s called Multi-Instance GPU (MIG). MIG allows supported GPUs to be partitioned in the firmware into multiple smaller instances for use across multiple applications. Once enabled, each partitioned instance presents itself as unique GPU device. Fleet Command enables an administrator to configure MIG on supported systems directly from the console with having to access the systems remote console.

Not all GPU’s support MIG. The below table lists supported MIG options and related GPU types

MIG Available Options

Supported GPUs

2 MIGs of 3c.20gb

A100

3 MIGs of 2c.10gb

A100

7 MIGs of 1c.5gb

A100

2 MIGs of 2c.12gb

A30

4 MIGs of 1c.6gb

A30

Explore this feature and enable MIG on your Fleet Command system. Once MIG is configured, we will deploy a second application on the same system.

  1. First, view the current status of MIG for your system, by navigating to the Details page for your Location. Click the Multi Instance GPUs (MIG) tab for your system.

    fc-024.png

  2. Review the MIG configuration of the available GPU’s. Fleet Command will list all available physical GPU’s that can be configured. The below screenshot is from a system with an A30-24GB that is not yet configured.

    fc-032.png
    Note

    Application compatibility feature enables backwards compatibility for the nvidia.com/gpu field in pod specs(defined in the Helm chart) so long as homogenous MIG profiles are used. Alternatively, if a mixture of MIG profiles types are used then backward compatibility is not possible and the Helm chart will need to be updated for the appropriate MIG profiles.


  3. Select a MIG profile for each GPU, check the acknowledgement box, then click Save. This will create MIG instances on the remote system. This is a disruptive action as it requires a hard reset of the GPU device. Any interuption should be brief.

    fc-033.png

  4. Open a Remote Console and run nvidia-smi as the rcuser to further validate MIG is enabled. Below is an NVIDIA A30 with 2 MIG’s of 2c.12g. Each MIG is presented to the operating system as a unique GPU device.

    Copy
    Copied!
                

    $ nvidia-smi

    fc-034.png

  5. Validate as the admin user that kubernetes is able to use the devices with the kubectl command. The embedded GPU operator will automatically install the drivers needed.

    Copy
    Copied!
                

    $ su admin Password: $ sudo kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu" NAME GPU system-1.egx.nvidia.com 2

    Applications running on this system now have access to two GPU devices that can be used for entirely different applications. The firmware in the GPU will ensure each application experiences the expected performance as if each MIG device was a dedicated GPU.

    Let’s deploy a second copy of our video application following the same procedure in the previous step titled “Deploy a Sample Application”.

  6. Go to the Deployment page and click Create Deployment.

    fc-016.png

  7. Give your deployment using a unique name. You will get a form validation error if you attempt to reuse the name from earlier. We will deploy the same car counter Application but to avoid conflicts we need to change our deployment to run on a different port. Copy the 2 lines of YAML shown below into the Application Configuration field.

    fc-038.png

    Copy
    Copied!
                

    service: webuinodePort: 31116

  8. After a few minutes, both applications will be running on a shared GPU system with each application consuming a MIG device. Let’s refer back to the table at the top of this page and consider the possibilities that MIG enables.

MIG Available Options

Supported GPUs

2 MIGs of 3c.20gb

A100

3 MIGs of 2c.10gb

A100

7 MIGs of 1c.5gb

A100

2 MIGs of 2c.12gb

A30

4 MIGs of 1c.6gb

A30

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.