Is this page helpful?

NVIDIA vGPU for Compute#

NVIDIA AI Enterprise is a cloud-native suite of software tools, libraries and frameworks designed to deliver optimized performance, robust security, and stability for production AI deployments. Easy-to-use microservices optimize model performance with enterprise-grade security, support, and stability, ensuring a streamlined transition from prototype to production for enterprises that run their businesses on AI. It consists of two primary layers: the application layer and the infrastructure layer.

NVIDIA vGPU for Compute is licensed exclusively through NVIDIA AI Enterprise. NVIDIA vGPU for Compute enables multiple virtual machines (VM) to have simultaneous, direct access to a single physical GPU while offering compute capabilities required for AI model training, fine tuning, and inference workloads. By distributing GPU resources efficiently across multiple VMs, NVIDIA vGPU for Compute optimizes utilization and lowers overall hardware costs. In addition, it offers advanced monitoring and management capabilities, including Suspend/Resume, Live Migration and Warm Updates, making it ideal for Cloud Service Providers (CSPs) and organizations that require scalable, cost effective GPU acceleration.

Key Concepts#

Glossary#

Table 18 Commonly Used Terms#
Term	Definition
NVIDIA Virtual GPU (vGPU) Manager	The Virtual GPU (vGPU) Manager enables GPU virtualization by allowing multiple VMs to share a physical GPU, optimizing GPU allocation for different workloads. The NVIDIA Virtual GPU Manager is installed on the hypervisor.
NVIDIA vGPU for Compute Guest Driver	The NVIDIA vGPU for Compute Guest Driver is installed on each VM’s operating system, allowing it to fully leverage the virtualized GPU resources. The Guest Driver provides the necessary interface and support to ensure that applications running within the VMs can fully leverage the GPU’s capabilities, similar to how they would on a physical machine with a dedicated GPU.
NVIDIA Licensing System	The NVIDIA Licensing System for NVIDIA AI Enterprise manages the software licenses required to use NVIDIA’s AI tools and infrastructure. This system ensures that organizations are compliant with licensing terms while providing flexibility in managing and deploying NVIDIA AI Enterprise across their infrastructure.
NVIDIA AI Enterprise Infra Collection	The NVIDIA AI Enterprise Infrastructure (Infra) Collection hosted on NVIDIA AI Enterprise Infra Collection is a suite of software and tools designed to support the deployment and management of AI workloads in enterprise environments. The NVIDIA AI Enterprise Infra Collection provides a robust and scalable foundation for running AI workloads, ensuring that enterprises can leverage the full power of NVIDIA GPUs and software to accelerate their AI initiatives.

The NVIDIA vGPU for Compute Drivers can be downloaded from the NVIDIA AI Enterprise Infra Collection.

NVIDIA vGPU Architecture Overview#

The high-level architecture of the NVIDIA vGPU is illustrated in the following diagram. Under the control of the NVIDIA Virtual GPU Manager (running on the hypervisor), a single NVIDIA physical GPU is capable of supporting multiple virtual GPU devices (vGPUs) that can be assigned directly to guest VMs, each functioning like a dedicated GPU.

Guest VMs use NVIDIA vGPUs in the same manner as a physical GPU that has been passed through by the hypervisor: the NVIDIA vGPU for Compute driver loaded in the guest VM provides direct access to the GPU for performance-critical fast paths.

Each NVIDIA vGPU is analogous to a conventional GPU with a fixed amount of GPU framebuffer/memory. The vGPU’s framebuffer is allocated out of the physical GPU’s framebuffer at the time the vGPU is created, and the vGPU retains exclusive use of that framebuffer until it is destroyed.

NVIDIA vGPU for Compute Configurations#

Depending on the physical GPU, NVIDIA vGPU for Compute supports different types of vGPU modes on a physical GPU:

Time-sliced vGPUs can be created on all NVIDIA AI Enterprise supported GPUs.
Additionally, on GPUs that support the Multi-Instance GPU (MIG) feature, the following types of MIG-backed vGPU are supported:
- MIG-backed vGPUs that vGPUs that occupy an entire GPU instance
- Time-sliced, MIG-backed vGPUs

Table 19 Supported vGPU Modes#
vGPU Mode	Description	GPU Partitioning	Isolation	Use Cases
Time Sliced vGPU	A time-sliced vGPU for Compute VM shares access to all of the GPU’s compute resources, including streaming multiprocessors (SMs), and GPU engines with other vGPUs on the same GPU. Processes are scheduled sequentially, with each vGPU for Compute VM gaining exclusive use of GPU engines during its time slice.	Temporal	Strong hardware-based memory and fault isolation. Good performance and QoS with round-robin scheduling.	Deployments with non-strict isolation requirements, or in environments where MIG-backed vGPU is not available. Suitable for light to moderate AI workloads such as small-scale inferencing, preprocessing pipelines, and development/testing of models in a pre-training phase.
MIG-backed vGPU	A MIG-backed vGPU for Compute VM is created from one or more MIG slices and assigned to a VM on a MIG-capable physical GPU. Each MIG-backed vGPU for Compute VM has exclusive access to the compute resources of its GPU instance, including SMs and GPU engines. On a MIG-backed vGPU for Compute VM, processes running on one VM execute in parallel with processes running on other vGPUs on the same physical GPU. Each process runs only on its assigned vGPU, alongside processes on other vGPUs. For more information on configuring MIG-backed vGPU VMs, refer to the Virtual GPU Types for Supported GPUs.	Spatial	Strong hardware-based memory and fault isolation. Better performance and QoS with dedicated cache/memory bandwidth and lower scheduling latency.	Most virtualization deployments require strong isolation, multi-tenancy, and consistent performance. Well-suited for consistent high-performance AI inferencing, multi-tenant fine-tuning jobs, or parallel execution of small to medium training tasks with predictable throughput requirements.
Time Sliced, MIG-Backed vGPU	A time-sliced, MIG-backed vGPU for Compute VM occupies only a fraction of a MIG instance on a MIG-capable physical GPU. On a time-sliced, MIG-backed vGPU for Compute VM, processes are scheduled sequentially as each VM shares access to the GPU instance’s compute resources, including the SMs and compute engines with all other vGPUs on the same MIG instance. This mode was introduced with the RTX Pro 6000 Blackwell Server Edition. For more information on configuring MIG-backed vGPU VMs, refer to the Virtual GPU Types for Supported GPUs.	Spatial partitioning between MIG instances. Temporal partitioning within each MIG instance.	Strong hardware-based memory and fault isolation. Better performance and QoS with dedicated cache/memory bandwidth and lower scheduling latency.	Most virtualization deployments require strong isolation, multi-tenancy, and consistent performance, while maximizing GPU utilization. Ideal for high-density AI workloads such as serving multiple concurrent inferencing endpoints, hosting AI models across multiple tenants, or running light training jobs on shared GPU resources.

Installing NVIDIA vGPU for Compute#

Prerequisites#

System Requirements#

Before proceeding, ensure the following system prerequisites are met:

At least one NVIDIA data center GPU in a single NVIDIA AI Enterprise compatible NVIDIA-Certified System. NVIDIA recommends using the following GPUs based on your infrastructure.

Table 20 System Requirements Use Cases#
Use Case	GPU
Adding AI to mainstream servers (single to 4-GPU NVLink)	NVIDIA A30 1- 8x NVIDIA L4 NVIDIA L40S NVIDIA H100 NVL NVIDIA H200 NVL NVIDIA RTX Pro 6000 Blackwell Server Edition
AI Model Inference	NVIDIA A100 NVIDIA H200 NVL NVIDIA RTX Pro 6000 Blackwell Server Edition
AI Model Training (Large) and Inference (HGX Scale Up and Out Server)	NVIDIA H100 HGX NVIDIA H200 HGX NVIDIA B200 HGX

If using GPUs based on the NVIDIA Ampere architecture or later, the following BIOS settings are enabled on your server platform:
- Single Root I/O Virtualization (SR-IOV) - Enabled
- VT-d/IOMMU - Enabled
NVIDIA AI Enterprise License
NVIDIA AI Enterprise Software:
- NVIDIA Virtual GPU Manager
- NVIDIA vGPU for Compute Guest Driver

You can leverage the NVIDIA System Management interface (NV-SMI) management and monitoring tool for testing and benchmarking.

The following server configuration details are considered best practices:

Hyperthreading - Enabled
Power Setting or System Profile - High Performance
CPU Performance - Enterprise or High Throughput (if available in the BIOS)
Memory Mapped I/O above 4-GB - Enabled (if available in the BIOS)

Installing NGC CLI#

To access the NVIDIA Virtual GPU Manager and NVIDIA vGPU for Compute Guest Driver, you must first download and install the NGC Catalog CLI. After the NGC Catalog CLI is installed, you will need to launch a command window and run the following commands to download the drivers.

To install the NGC Catalog CLI:

Login to the NVIDIA NGC Catalog.
In the top right corner, click Welcome and then select Setup from the menu.
Click Downloads under Install NGC CLI from the Setup page.
From the CLI Install page, click the Windows, Linux, or MacOS tab, according to the platform from which you will be running NGC Catalog CLI.
Follow the instructions to install the CLI.
Verify the installation by entering ngc --version in a terminal or command prompt. The output should be NGC Catalog CLI x.y.z where x.y.z indicates the version.

You must configure NGC CLI for your use so that you can run the commands. You will be prompted to enter your NGC API Key. Enter the following command:

$ ngc config set

Enter API key [no-apikey]. Choices: [<VALID_APIKEY>, 'no-apikey']: (COPY/PASTE API KEY)

Enter CLI output format type [ascii]. Choices: [ascii, csv, json]: ascii

Enter org [no-org]. Choices: ['no-org']:

Enter team [no-team]. Choices: ['no-team']:

Enter ace [no-ace]. Choices: ['no-ace']:

Successfully saved NGC configuration to /home/$username/.ngc/config

After the NGC Catalog CLI is installed, you will need to launch a command window and run the following commands to download the software.
- NVIDIA NVIDIA Virtual GPU Manager
```
ngc registry resource download-version "nvidia/vgpu/vgpu-host-driver-X:X.X"
```
- NVIDIA vGPU for Compute Guest Driver
```
ngc registry resource download-version "nvidia/vgpu/vgpu-guest-driver-X:X.X"
```

For more information on configuring the NGC CLI, refer to the Getting Started with the NGC CLI documentation.

Installing NVIDIA Virtual GPU Manager#

The process of installing the NVIDIA Virtual GPU Manager depends on the hypervisor that you are using. This section assumes the following:

You have downloaded the Virtual GPU Manager software from NVIDIA NGC Catalog
You want to deploy the NVIDIA vGPU for Compute on a single server node

Table 21 Hypervisor Platform Installation Instructions for the NVIDIA Virtual GPU Manager#
Hypervisor Platform	Installation Instructions
Red Hat Enterprise Linux KVM	Installing and Configuring the NVIDIA Virtual GPU Manager for Red Hat Enterprise Linux KVM
Ubuntu KVM	Installing and Configuring the NVIDIA Virtual GPU Manager for Ubuntu
VMware vSphere	Installing and Configuring the NVIDIA Virtual GPU Manager for VMware vSphere

After you complete this process, you can install the vGPU Guest Driver on your Guest VM.

Installing NVIDIA Fabric Manager on HGX Servers#

NVIDIA Fabric Manager must be installed in addition to the Virtual GPU Manager on NVIDIA HGX platforms in VMs to enable multi-GPU configurations required for AI training, complex simulations, and processing massive datasets. Fabric Manager is responsible for enabling and managing high-bandwidth interconnect topologies between multiple GPUs on the same node.

On Ampere, Hopper, and Blackwell HGX systems equipped with NVSwitch, Fabric Manager configures the NVSwitch memory fabric to create a unified memory fabric among all participating GPUs and monitors the supporting NVLinks, enabling the deployment of multi-GPU VMs with 1, 2, 4, or 8 GPUs.

Note

For information about NVIDIA Fabric Manager integration or support for deploying 1‑, 2‑, 4- or 8‑GPU VMs on your hypervisor, consult the documentation from your hypervisor vendor.
The Fabric Manager service must be running before creating VMs with multi-GPU configurations. Failure to enable Fabric Manager on HGX platforms may result in incomplete or non-functional GPU topologies inside the VM. For details on capabilities, configuration, and usage, refer to the NVIDIA Fabric Manager User Guide.

Installing NVIDIA vGPU Guest Driver#

The process for installing the driver is the same in a VM configured with vGPU, in a VM that is running pass-through GPU, or on a physical host in a bare-metal deployment. This section assumes the following:

You have downloaded the vGPU for Compute Guest Driver from NVIDIA NGC Catalog
The Guest VM has been created and booted on the hypervisor

Table 22 Guest Operating System Installation Instructions for the NVIDIA vGPU for Compute Guest Driver#
Guest Operating System	Installation Instructions
Ubuntu	Installing the NVIDIA vGPU for Compute Guest Driver on Ubuntu from a Debian Package
Red Hat	Installing the NVIDIA vGPU for Compute Guest Driver on Red Hat Distributions from an RPM Package
Windows	Installing the NVIDIA vGPU for Compute Guest Driver and NVIDIA Control Panel
Other Linux distributions	Installing the NVIDIA vGPU for Compute Guest Driver on a Linux VM from a .run Package

After you install the NVIDIA vGPU for Compute Guest driver, you are required to license the Guest VM. After a license from the NVIDIA License System is obtained, the Guest VM operates at full capability and can be used to run AI/ML workloads.

Licensing a NVIDIA vGPU for Compute Guest VM#

Note

The NVIDIA AI Enterprise license is enforced through software when you deploy NVIDIA vGPU for Compute VMs.

When booted on a supported GPU, a vGPU for Compute VM initially operates at full capability but its performance degrades over time if the VM fails to obtain a license. In such a scenario, the full capability of the VM is restored when the license is acquired.

Once licensing is configured, a vGPU VM automatically obtains a license from the license server when booted on a supported GPU. The VM retains the license until it is shut down. It then releases the license back to the license server. Licensing settings persist across reboots and need only be modified if the license server address changes, or the VM is switched to running GPU pass through.

For more information on how to license a vGPU for Compute VM from the NVIDIA License System, including step-by-step instructions, refer to the Virtual GPU Client Licensing User Guide.

Note

For vGPU for Compute deployments, one license per vGPU assigned to a VM is enforced through software. This license is valid for up to sixteen vGPU instances on a single GPU or for the assignment to a VM of one vGPU that is assigned all the physical GPU’s framebuffer. If multiple NVIDIA C‑series vGPUs are assigned to a single VM, a separate license must be obtained for each vGPU from the NVIDIA Licensing System, regardless of whether it is a Networked or Node‑Locked license.

Verifying the License Status of a Licensed NVIDIA vGPU for Compute Guest VM#

After configuring an NVIDIA vGPU for Compute client VM with a license, verify the license status by displaying the licensed product name and status.

To verify the license status of a licensed client, run nvidia-smi with the –q or --query option from within the client VM, not the hypervisor host. If the product is licensed, the expiration date is shown in the license status.

==============NVSMI LOG==============

Timestamp                                 : Tue Jun 17 16:49:09 2025
Driver Version                            : 580.46
CUDA Version                              : 13.0

Attached GPUs                             : 2
GPU 00000000:02:01.0
    Product Name                          : NVIDIA H100-80C
    Product Brand                         : NVIDIA Virtual Compute Server
    Product Architecture                  : Hopper
    Display Mode                          : Requested functionality has been deprecated
    Display Attached                      : Yes
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    Addressing Mode                       : HMM
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                                  GPU-a1833a31-1dd2-11b2-8e58-a589b8170988
    GPU PDI                               : N/A
    Minor Number                          : 0
    VBIOS Version                         : 00.00.00.00.00
    MultiGPU Board                        : No
    Board ID                              : 0x201
    Board Part Number                     : N/A
    GPU Part Number                       : 2331-882-A1
    FRU Part Number                       : N/A
    Platform Info
        Chassis Serial Number             : N/A
        Slot Number                       : N/A
        Tray Index                        : N/A
        Host ID                           : N/A
        Peer Type                         : N/A
        Module Id                         : N/A
        GPU Fabric GUID                   : N/A
    Inforom Version
        Image Version                     : N/A
        OEM Object                        : N/A
        ECC Object                        : N/A
        Power Management Object           : N/A
    Inforom BBX Object Flush
        Latest Timestamp                  : N/A
        Latest Duration                   : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU C2C Mode                          : Disabled
    GPU Virtualization Mode
        Virtualization Mode               : VGPU
        Host VGPU Mode                    : N/A
        vGPU Heterogeneous Mode           : N/A
    vGPU Software Licensed Product
        Product Name                      : NVIDIA Virtual Compute Server
        License Status                    : Licensed (Expiry: 2025-6-18 8:59:55 GMT)
….

Installing the NVIDIA GPU Operator Using a Bash Shell Script#

A bash shell script for installing the NVIDIA GPU Operator with the NVIDIA vGPU for Compute Driver is available for download from the NVIDIA AI Enterprise Infra Collection.

Note

This approach assumes there is no vGPU for Compute Driver installed on the Guest VM.The vGPU for Compute Guest driver is installed by GPU Operator.

Refer to the GPU Operator documentation for detailed instructions on deploying the NVIDIA vGPU for Compute Driver using the bash shell script.

Installing NVIDIA AI Enterprise Applications Software#

Installing NVIDIA AI Enterprise Applications Software using Docker and NVIDIA Container Toolkit#

Prerequisites#

Before you install any NVIDIA AI Enterprise container:

Ensure your vGPU for Compute Guest VM is running a supported OS distribution.
Ensure the VM has obtained a valid vGPU for Compute license from the NVIDIA License System.
Confirm that one or more NVIDIA GPU is available and recognized by your system.
Make sure the vGPU for Compute Guest Driver is installed correctly. You can verify this by running nvidia-smi. If you see your GPU listed, you’re ready to proceed.

Installing Docker Engine#

Refer to the official Docker Installation Guide for your vGPU for Compute Guest VM OS Linux distribution.

Installing the NVIDIA Container Toolkit#

The NVIDIA Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includes a container runtime library and utilities to configure containers to leverage NVIDIA GPUs automatically. Complete documentation and frequently asked questions are available on the repository wiki. Refer to the Installing the NVIDIA Container Toolkit documentation to enable the Docker repository and install the NVIDIA Container Toolkit on the Guest VM.

Once the NVIDIA Container Toolkit is installed, to configure the Docker container runtime, refer to the Configuration documentation.

Verifying the Installation: Run a Sample CUDA Container#

Refer to the Running a Sample Workload documentation to run a sample CUDA container test on your GPU.

Accessing NVIDIA AI Enterprise Containers on NGC#

NVIDIA AI Enterprise Application Software is available through the NVIDIA NGC Catalog and identifiable by the NVIDIA AI Enterprise Supported label.

The container image for each application or framework contains the entire user-space software stack required to run it, namely, the CUDA libraries, cuDNN, any required Magnum IO components, TensorRT, and the framework itself.

Generate an NGC API key to access the NVIDIA AI Enterprise Software in the NGC Catalog using the URL provided to you by NVIDIA.

Authenticate with Docker to NGC Registry. In your shell, run:

docker login nvcr.io
Username: $oauthtoken
Password: <paste-your-NGC_API_key-here>

A successful login (``Login Succeeded``) lets you pull containers from NGC.

From the NVIDIA vGPU for Compute VM, browse the NGC Catalog for containers labeled NVIDIA AI Enterprise Supported.
Copy the relevant docker pull command.
```
sudo docker pull nvcr.io/nvaie/rapids-pb25h1:x.y.z-runtime
```
Where x.y.z is the version of your container.
Run the container with GPU access.
```
sudo docker run --gpus all -it --rm nvcr.io/nvaie/rapids-pb25h1:x.y.z-runtime
```
Where x.y.z is the version of your container.

This command launches an interactive container using the vGPUs available on the Guest VM.

Installing the NVIDIA AI Enterprise Software Components Using Podman#

You can use Podman (an alternative container runtime to Docker) for running NVIDIA AI Enterprise containers. The installation flow is similar to Docker. For more information, refer to the NVIDIA AI Enterprise: RHEL with KVM Deployment Guide.

Installing NVIDIA AI Enterprise Software Components Using Kubernetes and NVIDIA Cloud Native Stack#

NVIDIA provides the Cloud Native Stack (CNS), which is a collection of software to run cloud native workloads on NVIDIA GPUs. NVIDIA Cloud Native Stack is based on Ubuntu/RHEL, Kubernetes, Helm, and the NVIDIA GPU and Network Operator.

Refer to this repository for a series of installation guides with step-by-step instructions based on your OS distribution. The installation guides also offer instructions to deploy an application from the NGC Catalog to validate that GPU resources are accessible and functional.

NVIDIA vGPU for Compute Key Features#

MIG Backed vGPU#

A Multi Instance GPU (MIG)-backed vGPU is a vGPU that resides on a GPU instance in a MIG-capable physical GPU. MIG-backed vGPUs are created from individual MIG slices and assigned to virtual machines. Each MIG-backed vGPU resident on a GPU has exclusive access to the GPU instance’s engines, including the compute and video decode engines. This model combines MIG’s hardware-level spatial partitioning with the temporal partitioning capabilities of vGPU, offering flexibility in how GPU resources are shared across workloads.

In a MIG-backed vGPU, processes running on one vGPU execute in parallel with processes running on other vGPUs on the same physical GPU. Each process runs only on its assigned vGPU, alongside processes on other vGPUs.

Note

NVIDIA vGPU for Compute supports MIG-Backed vGPUs on all the GPU boards that support Multi Instance GPU (MIG).
Universal MIG technology on Blackwell enables both compute and graphics workloads to be consolidated and securely isolated on the same physical GPU.

A MIG-backed vGPU is ideal when running multiple high-priority workloads that require guaranteed, consistent performance and strong isolation, such as in multi-tenant environments, MLOps platforms, or shared research clusters. By partitioning a GPU into dedicated hardware instances, teams can run training, inference, video analytics, and data processing jobs simultaneously with consistent performance, maximizing utilization while ensuring each workload meets its SLA.

Supported MIG-Backed vGPU Configurations on a Single GPU#

NVIDIA vGPU supports both homogeneous and mixed MIG-backed virtual GPU configurations, and on GPUs with MIG time-slicing support, each MIG instance supports multiple time-sliced vGPU VMs.

On the NVIDIA RTX PRO 6000 Blackwell Server Edition, up to 4 MIG slices can be created on a single GPU. Within each MIG slice, 1 to 3 time-sliced vGPUs for Compute, with 8 GB frame buffer each can be created, depending on workload requirements and user density goals. Each of these vGPU instances can be assigned to a separate VM, enabling up to 12 virtual machines to share a single physical GPU, while still benefiting from the isolation boundaries provided by MIG.

The figure above shows how each MIG slice on the NVIDIA RTX PRO 6000 Blackwell can be time-sliced across multiple VMs - supporting up to 3 NVIDIA vGPU for Compute VMs per slice - to maximize user density while maintaining performance isolation through hardware-level partitioning.

Note

You can determine whether time-sliced, MIG-backed vGPUs are supported with your GPU on your chosen hypervisor by running the nvidia-smi -q command.

$ nvidia-smi -q
vGPU Device Capability
    MIG Time-Slicing                  : Supported
    MIG Time-Slicing Mode             : Enabled

If MIG Time-Slicing is shown as Supported, the GPU supports time-sliced, MIG-backed vGPUs.
If MIG Time-Slicing Mode is shown as Enabled, your chosen hypervisor supports time-sliced, MIG-backed vGPUs on GPUs that also support this feature.

The Ampere NVIDIA A100 PCIe 40GB card has one physical GPU and can support several types of MIG-backed vGPU configurations. The following figure shows examples of valid homogeneous and mixed MIG-backed virtual GPU configurations on NVIDIA A100 PCIe 40GB.

A valid homogeneous configuration with 3 A100-2-10C vGPUs on 3 MIG.2g.10b GPU instances
A valid homogeneous configuration with 2 A100-3-20C vGPUs on 3 MIG.3g.20b GPU instances
A valid mixed configuration with 1 A100-4-20C vGPU on a MIG.4g.20b GPU instance, 1 A100-2-10C vGPU on a MIG.2.10b GPU instance, and 1 A100-1-5C vGPU on a MIG.1g.5b instance

Valid MIG-Backed Virtual GPU Configurations on a Single GPU

Configuring MIG-Backed vGPU#

Configuring a GPU for MIG-Backed vGPUs#

To support GPU Instances with NVIDIA vGPU, a GPU must be configured with MIG mode enabled, and GPU Instances and Compute Instances must be created and configured on the physical GPU.

Prerequisites

The NVIDIA Virtual GPU Manager is installed on the hypervisor host.
You have root user privileges on your hypervisor host machine.
You have determined which GPU instances correspond to the vGPU types of the MIG-backed vGPUs you will create.
Other processes, such as CUDA applications, monitoring applications, or the nvidia-smi command, do not use the GPU.

Steps

Enable MIG mode for a GPU.

Note

For VMware vSphere, only enabling MIG mode is required because VMware vSphere creates the GPU Instances and Compute Instances.
Create GPU instances on a MIG-enabled GPU.
Create Compute instances in a GPU instance.

After configuring a GPU for MIG-backed vGPUs, create the vGPUs you need and add them to their VMs.

Enabling MIG Mode for a GPU#

Perform this task in your hypervisor command shell.

Open a command shell as the root user on your hypervisor host machine. You can use a secure shell (SSH) on all supported hypervisors. Individual hypervisors may provide additional means for logging in. For details, refer to the documentation for your hypervisor.

Determine whether MIG mode is enabled. Use the nvidia-smi command for this purpose. By default, MIG mode is disabled. This example shows that MIG mode is disabled on GPU 0.

Note

In the output from nvidia-smi, the NVIDIA A100 HGX 40GB GPU is referred to as A100-SXM4-40GB.

$ nvidia-smi -i 0
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 550.54.16   Driver Version: 550.54.16    CUDA Version:  12.3     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  A100-SXM4-40GB      On   | 00000000:36:00.0 Off |                    0 |
    | N/A   29C    P0    62W / 400W |      0MiB / 40537MiB |      6%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+

If MIG mode is disabled, enable it.
```
$ nvidia-smi -i [gpu-ids] -mig 1
```
gpu-ids - A comma-separated list of GPU indexes, PCI bus IDs, or UUIDs specifying the GPUs you want to enable MIG mode. If gpu-ids are omitted, MIG mode is enabled on all GPUs on the system.

This example enables MIG mode on GPU 0.
```
$ nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000000:36:00.0
All done.
```
Note

If another process is using the GPU, this command fails and displays a warning message that MIG mode for the GPU is in the pending enable state. In this situation, stop all GPU processes and retry the command.
VMware vSphere ESXi with GPUs based only on the NVIDIA Ampere architecture: Reboot the hypervisor host. If you are using a different hypervisor or GPUs based on the NVIDIA Hopper GPU architecture or a later architecture, omit this step.
Query the GPUs on which you enabled MIG mode to confirm that MIG mode is enabled. This example queries GPU 0 for the PCI bus ID and MIG mode in comma-separated values (CSV) format.
```
$ nvidia-smi -i 0 --query-gpu=pci.bus_id,mig.mode.current --format=csv
pci.bus_id, mig.mode.current
00000000:36:00.0, Enabled
```

Creating GPU Instances on a MIG-Enabled GPU#

Note

If you are using VMware vSphere, omit this task. VMware vSphere creates the GPU instances automatically.

Perform this task in your hypervisor command shell.

Open a command shell as the root user on your hypervisor host machine if necessary.

List the GPU instance profiles that are available on your GPU. When you create a profile, you must specify the profiles by their IDs, not their names.

$ nvidia-smi mig -lgip
    +--------------------------------------------------------------------------+
    | GPU instance profiles:                                                   |
    | GPU   Name          ID    Instances   Memory     P2P    SM    DEC   ENC  |
    |                           Free/Total   GiB              CE    JPEG  OFA  |
    |==========================================================================|
    |   0  MIG 1g.5gb     19     7/7        4.95       No     14     0     0   |
    |                                                          1     0     0   |
    +--------------------------------------------------------------------------+
    |   0  MIG 2g.10gb    14     3/3        9.90       No     28     1     0   |
    |                                                          2     0     0   |
    +--------------------------------------------------------------------------+
    |   0  MIG 3g.20gb     9     2/2        19.79      No     42     2     0   |
    |                                                          3     0     0   |
    +--------------------------------------------------------------------------+
    |   0  MIG 4g.20gb     5     1/1        19.79      No     56     2     0   |
    |                                                          4     0     0   |
    +--------------------------------------------------------------------------+
    |   0  MIG 7g.40gb     0     1/1        39.59      No     98     5     0   |
    |                                                          7     1     1   |
    +--------------------------------------------------------------------------+

Discover the GPU instance profiles that are mapped to the different vGPU types.

nvidia-smi vgpu -s -v

For example, for H100, some of the listing attributes of certain vGPU profiles look something like:

# nvidia-smi vgpu -s -v -i 1
GPU 00000000:1A:00.0
vGPU Type ID : 0x335
Name : NVIDIA H100-1-10C
Class : Compute
GPU Instance Profile ID : 19
...
vGPU Type ID : 0x336
Name : NVIDIA H100-2-20C
Class : Compute
GPU Instance Profile ID : 14
…

Create the GPU instances with a default compute instance corresponding to the vGPU types of the MIG-backed vGPUs you will create.
```
$ nvidia-smi mig -cgi gpu-instance-profile-ids -C
```
gpu-instance-profile-ids - A comma-separated list of GPU instance profile IDs specifying the GPU instances you want to create.

This example creates two GPU instances of type 2g.10gb with profile ID 14.
```
$ nvidia-smi mig -cgi 14,14 -C
Successfully created GPU instance ID  5 on GPU  2 using profile MIG 2g.10gb (ID 14)
Successfully created GPU instance ID  3 on GPU  2 using profile MIG 2g.10gb (ID 14)
```

Note

If you are creating a GPU Instance to support a 1:1 MIG-backed vGPU on a platform other than VMware vSphere, you can optionally create non-default Compute Instances for that vGPU, by following the steps outlined in the Creating Non-Default Compute Instances in a GPU Instance for 1:1 vGPUs section.

Creating Non-Default Compute Instances in a GPU Instance for 1:1 vGPUs#

This task is required only if you plan to use a 1:1, MIG-backed vGPU on a GPU Instance and wish to create non-default Compute Instances for that vGPU. This option is only available on platforms other than VMware vSphere.

Perform this task in your hypervisor command shell.

Open a command shell as the root user on your hypervisor host machine if necessary.

List the available GPU instances.

$ nvidia-smi mig -lgi
    +----------------------------------------------------+
    | GPU instances:                                     |
    | GPU   Name          Profile  Instance   Placement  |
    |                       ID       ID       Start:Size |
    |====================================================|
    |   2  MIG 2g.10gb      14        3          0:2     |
    +----------------------------------------------------+
    |   2  MIG 2g.10gb      14        5          4:2     |
    +----------------------------------------------------+

Create the compute instances that you need within each GPU instance.
```
$ nvidia-smi mig -cci -gi gpu-instance-ids
```
gpu-instance-ids - A comma-separated list of GPU instance IDs that specifies the GPU instances within which you want to create the compute instances.

Caution

To avoid an inconsistent state between a guest VM and the hypervisor host, do not create compute instances from the hypervisor on a GPU instance on which an active guest VM is running. Runtime changes to the vGPU’s Compute Instance configuration may be done by the guest VM itself, as explained in Modifying a MIG-Backed vGPU’s Configuration.

This example creates a compute instance on each GPU instance 3 and 5.
```
$ nvidia-smi mig -cci -gi 3,5
Successfully created compute instance on GPU  0 GPU instance ID  1 using profile ID  2
Successfully created compute instance on GPU  0 GPU instance ID  2 using profile ID  2
```

Verify that the compute instances were created within each GPU instance.

$ nvidia-smi
    +-----------------------------------------------------------------------------+
    | MIG devices:                                                                |
    +------------------+----------------------+-----------+-----------------------+
    | GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
    |      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
    |                  |                      |        ECC|                       |
    |==================+======================+===========+=======================|
    |  2    3   0   0  |      0MiB /  9984MiB | 28      0 |  2   0    1    0    0 |
    |                  |      0MiB / 16383MiB |           |                       |
    +------------------+----------------------+-----------+-----------------------+
    |  2    5   0   1  |      0MiB /  9984MiB | 28      0 |  2   0    1    0    0 |
    |                  |      0MiB / 16383MiB |           |                       |
    +------------------+----------------------+-----------+-----------------------+

    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|

Note

Additional Compute Instances created in a VM at runtime are destroyed when the VM is shut down or rebooted. After the shutdown or reboot, only one Compute Instance remains in the VM.
On NVIDIA B200 HGX, the following Compute Instance combinations are blocked in vGPU for Compute Guest VMs running on full-sized (7-slice) GPU Instances:
- 4-slice
- 2-slice + 2-slice + 2-slice
- 3-slice + 2-slice + 2-slice
- 2-slice + 2-slice + 3-slice

Disabling MIG Mode for One or More GPUs#

If a GPU you want to use for time-sliced vGPUs or GPU passthrough has previously been configured for MIG-backed vGPUs, disable MIG mode on the GPU.

Prerequisites

The NVIDIA Virtual GPU Manager is installed on the hypervisor host.
You have root user privileges on your hypervisor host machine.
Other processes, such as CUDA applications, monitoring applications, or the nvidia-smi command, do not use the GPU.

Steps

Perform this task in your hypervisor command shell.

Open a command shell as the root user on your hypervisor host machine. You can use a secure shell (SSH) on all supported hypervisors. Individual hypervisors may provide additional means for logging in. For details, refer to the documentation for your hypervisor.

Determine whether MIG mode is disabled. Use the nvidia-smi command for this purpose. By default, MIG mode is disabled but might have previously been enabled. This example shows that MIG mode is enabled on GPU 0.

Note

In the output from nvidia-smi, the NVIDIA A100 HGX 40GB GPU is referred to as A100-SXM4-40GB.

$ nvidia-smi -i 0
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 550.54.16    Driver Version: 550.54.16   CUDA Version:  12.3     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  A100-SXM4-40GB      Off  | 00000000:36:00.0 Off |                    0 |
    | N/A   29C    P0    62W / 400W |      0MiB / 40537MiB |      6%      Default |
    |                               |                      |              Enabled |
    +-------------------------------+----------------------+----------------------+

If MIG mode is enabled, disable it.
```
$ nvidia-smi -i [gpu-ids] -mig 0
```
gpu-ids - A comma-separated list of GPU indexes, PCI bus IDs, or UUIDs specifying the GPUs you want to disable MIG mode. If gpu-ids are omitted, MIG mode is disabled for all GPUs in the system.

This example disables MIG Mode on GPU 0.
```
$ sudo nvidia-smi -i 0 -mig 0
Disabled MIG Mode for GPU 00000000:36:00.0
All done.
```

Confirm that MIG mode was disabled. Use the nvidia-smi command for this purpose. This example shows that MIG mode is disabled on GPU 0.

$ nvidia-smi -i 0
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 550.54.16    Driver Version: 550.54.16   CUDA Version:  12.3     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  A100-SXM4-40GB      Off  | 00000000:36:00.0 Off |                    0 |
    | N/A   29C    P0    62W / 400W |      0MiB / 40537MiB |      6%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+

Modifying a MIG-Backed vGPU’s Configuration From a Guest VM#

If you want to replace the compute instances created when the GPU was configured for MIG-backed vGPUs, you can delete them before adding the compute instances from within the guest VM.

Note

From within a guest VM, you can modify the configuration only of MIG-backed vGPUs that occupy an entire GPU instance. For time-sliced, MIG-backed vGPUs, you must create compute instances as explained in Create Compute Instances in a GPU instance. Creating Non-Default Compute Instances in a GPU Instance for 1:1 vGPUs.
On NVIDIA B200 HGX, the following Compute Instance combinations are blocked in vGPU for Compute Guest VMs running on full-sized (7-slice) GPU Instances:
- 4-slice
- 2-slice + 2-slice + 2-slice
- 3-slice + 2-slice + 2-slice
- 2-slice + 2-slice + 3-slice

A MIG-backed vGPU that occupies an entire GPU instance is assigned all of the instance’s framebuffer. For such vGPUs, the maximum vGPUs per GPU instance in the tables in Virtual GPU Types for Supported GPUs is always 1.

Prerequisites

You have root user privileges in the guest VM.
Other processes, such as CUDA applications, monitoring applications, or the nvidia-smi command, do not use the GPU instance.

Steps

Perform this task in a guest VM command shell.

Open a command shell as the root user in the guest VM. You can use a secure shell (SSH) on all supported hypervisors. Individual hypervisors may provide additional means for logging in. For details, refer to the documentation for your hypervisor.

List the available GPU instances

$ nvidia-smi mig -lgi
    +----------------------------------------------------+
    | GPU instances:                                     |
    | GPU   Name          Profile  Instance   Placement  |
    |                       ID       ID       Start:Size |
    |====================================================|
    |   0  MIG 2g.10gb       0        0          0:8     |
    +----------------------------------------------------+

Optional: If compute instances were created when the GPU was configured for MIG-backed vGPUs that you no longer require, delete them.
```
$ nvidia-smi mig -dci -ci compute-instance-id -gi gpu-instance-id
```
compute-instance-id - The ID of the compute instance that you want to delete.

gpu-instance-id - The ID of the GPU instance from which you want to delete the compute instance.

Note

This command fails if another process is using the GPU instance. In this situation, stop all processes using the GPU instance and retry the command.

This example deletes compute instance 0 from GPU instance 0 on GPU 0.
```
$ nvidia-smi mig -dci -ci 0 -gi 0
Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID  0
```

List the compute instance profiles that are available for your GPU instance.

$ nvidia-smi mig -lcip

This example shows that one MIG 2g.10gb compute instance or two MIG 1c.2g.10gb compute instances can be created within the GPU instance.

$ nvidia-smi mig -lcip
    +-------------------------------------------------------------------------------+
    | Compute instance profiles:                                                    |
    | GPU    GPU      Name          Profile  Instances   Exclusive      Shared      |
    |      Instance                   ID     Free/Total     SM      DEC   ENC   OFA |
    |        ID                                                     CE    JPEG      |
    |===============================================================================|
    |   0     0       MIG 1c.2g.10gb   0      2/2           14       1     0     0  |
    |                                                                2     0        |
    +-------------------------------------------------------------------------------+
    |   0     0       MIG 2g.10gb      1*     1/1           28       1     0     0  |
    |                                                                2     0        |
    +-------------------------------------------------------------------------------+

Create the compute instances that you need within the available GPU instance. Run the following command to create each compute instance individually.
```
$ nvidia-smi mig -cci compute-instance-profile-id -gi gpu-instance-id
```
compute-instance-profile-id - The compute instance profile ID that specifies the compute instance.

gpu-instance-id - The GPU instance ID specifies the GPU instance within which you want to create the compute instance.

Note

This command fails if another process is using the GPU instance. In this situation, stop all GPU processes and retry the command.

This example creates a MIG 2g.10gb compute instance on GPU instance 0.
```
$ nvidia-smi mig -cci 1 -gi 0
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  0 using profile MIG 2g.10gb (ID  1)
```
This example creates two MIG 1c.2g.10gb compute instances on GPU instance 0 by running the same command twice.
```
$ nvidia-smi mig -cci 0 -gi 0
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  0 using profile MIG 1c.2g.10gb (ID  0)
$ nvidia-smi mig -cci 0 -gi 0
Successfully created compute instance ID  1 on GPU  0 GPU instance ID  0 using profile MIG 1c.2g.10gb (ID  0)
```

Verify that the compute instances were created within the GPU instance. Use the nvidia-smi command for this purpose. This example confirms that a MIG 2g.10gb compute instance was created on GPU instance 0.

nvidia-smi
  Mon Mar 25 19:01:24 2024
  +-----------------------------------------------------------------------------+
  | NVIDIA-SMI 550.54.16    Driver Version: 550.54.16   CUDA Version:  12.3     |
  |-------------------------------+----------------------+----------------------+
  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
  |                               |                      |               MIG M. |
  |===============================+======================+======================|
  |   0  GRID A100X-2-10C     On  | 00000000:00:08.0 Off |                   On |
  | N/A   N/A    P0    N/A /  N/A |   1058MiB / 10235MiB |     N/A      Default |
                                   |                      |              Enabled |
  +-------------------------------+----------------------+----------------------+

  +-----------------------------------------------------------------------------+
  | MIG devices:                                                                |
  +------------------+----------------------+-----------+-----------------------+
  | GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
  |      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
  |                  |                      |        ECC|                       |
  |==================+======================+===========+=======================|
  |  0    0   0   0  |   1058MiB / 10235MiB | 28      0 |  2   0    1    0    0 |
  |                  |      0MiB /  4096MiB |           |                       |
  +------------------+----------------------+-----------+-----------------------+

  +-----------------------------------------------------------------------------+
  | Processes:                                                                  |
  |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
  |        ID   ID                                                   Usage      |
  |=============================================================================|
  |  No running processes found                                                 |
  +-----------------------------------------------------------------------------+

This example confirms that two MIG 1c.2g.10gb compute instances were created on GPU instance 0.

$ nvidia-smi
    Mon Mar 25 19:01:24 2024
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 550.54.16    Driver Version: 550.54.16   CUDA Version:  12.3     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GRID A100X-2-10C     On  | 00000000:00:08.0 Off |                   On |
    | N/A   N/A    P0    N/A /  N/A |   1058MiB / 10235MiB |     N/A      Default |
    |                               |                      |              Enabled |
    +-------------------------------+----------------------+----------------------+

    +-----------------------------------------------------------------------------+
    | MIG devices:                                                                |
    +------------------+----------------------+-----------+-----------------------+
    | GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
    |      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
    |                  |                      |        ECC|                       |
    |==================+======================+===========+=======================|
    |  0    0   0   0  |   1058MiB / 10235MiB | 14      0 |  2   0    1    0    0 |
    |                  |      0MiB /  4096MiB |           |                       |
    +------------------+                      +-----------+-----------------------+
    |  0    0   1   1  |                      | 14      0 |  2   0    1    0    0 |
    |                  |                      |           |                       |
    +------------------+----------------------+-----------+-----------------------+

    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+

Monitoring MIG-backed vGPU Activity#

Note

MIG-backed vGPU activity cannot be monitored on GPUs based on the NVIDIA Ampere GPU architecture because the required hardware feature is absent.
On the NVIDIA RTX Pro 6000 Blackwell Server Edition, GPM metrics are supported only for 1:1 MIG backed vGPUs and are not available for MIG-backed and timesliced vGPUs.
The --gpm-metrics option is supported only on MIG-backed vGPUs that are allocated all of the GPU instance’s frame buffer.

For more information, refer to the Monitoring MIG-backed vGPU Activity documentation.

Device Groups#

Device Groups provide an abstraction layer for multi-device virtual hardware provisioning. They enable platforms to automatically detect sets of physically connected devices (such as GPUs linked via NVLink or GPU-NIC pairs) at the hardware level and present them as a single logical unit to VMs. This abstraction is particularly vital to ensure that AI workloads that depend on low-latency, high-bandwidth communication, such as distributed model training, inference, and large-scale data processing, ensure maximum utilization of the underlying hardware topology.

Device groups can consist of two or more hardware devices that share a common PCIe switch or a direct interconnect. This simplifies virtual hardware assignment and enables:

Optimized Multi-GPU and GPU-NIC communication: NVLink-connected GPUs can be provisioned together to maximize peer-to-peer bandwidth and minimize latency, which is ideal for large-batch training and NCCL all-reduce-heavy workloads. Similarly, GPU-NIC pairs located under the same PCIe switch or capable of delivering optimal GPUDirect RDMA performance are grouped together, enabling high-throughput data ingestion directly into GPU memory for training or inference workloads. Adjacent NICs that do not meet the required performance thresholds are automatically excluded to avoid bottlenecks.
Topology consistency: Unlike manual device assignment, Device Groups guarantee correct placement across PCIe switches and interconnects, even after reboots or events like live migration.
Simplified and reliable provisioning: By abstracting the PCIe/NVLink topology into logical units, device groups eliminate the need for scripting or topology mapping, reducing the risk of misconfiguration and enabling faster deployment of AI clusters.

This figure illustrates how devices (GPUs and NICs) that share a common PCIe switch or a direct GPU interconnect can be presented as a device group. On the right side, we can see that although two NICs are connected to the same PCIe switch as the GPU, only one NIC is included in the device group. This is because the NVIDIA driver identifies and exposes only the GPU-NIC pairings that meet the necessary criteria like GPUDirect RDMA. Adjacent NICs that do not satisfy these requirements are excluded.

For more information regarding Hypervisor Platform support for Device Groups, refer to the vGPU Device Groups documentation.

GPUDirect RDMA and GPUDirect Storage#

NVIDIA GPUDirect Remote Direct Memory Access (RDMA) is a technology in NVIDIA GPUs that enables direct data exchange between GPUs and a third-party peer device using PCIe. GPUDirect RDMA enables network devices to access the vGPU frame buffer directly, bypassing CPU host memory altogether. The third-party devices could be network interfaces such as NVIDIA ConnectX SmartNICs or BlueField DPUs, or video acquisition adapters.

GPUDirect Storage (GDS) enables a direct data path between local or remote storage, such as NFS servers or NVMe/NVMe over Fabric (NVMe-oF), and GPU memory. GDS performs direct memory access (DMA) transfers between GPU memory and storage. DMA avoids a bounce buffer through the CPU. This direct path increases system bandwidth and decreases the latency and utilization load on the CPU.

GPUDirect technology is supported only on a subset of vGPUs and guest OS releases.

GPUDirect RDMA and GPUDirect Storage Known Issues and Limitations#

Starting with GPUDirect Storage technology release 1.7.2, the following limitations apply:

GPUDirect Storage technology is not supported on GPUs based on the NVIDIA Ampere GPU architecture.
On GPUs based on the NVIDIA Ada Lovelace, Hopper, and Blackwell GPU architectures, GPUDirect Storage technology is supported only with the guest driver for Linux based on NVIDIA Linux open GPU kernel modules.

GPUDirect Storage technology releases before 1.7.2 are supported only with guest drivers with Linux kernel versions earlier than 6.6.

GPUDirect Storage technology is supported only on the following guest OS releases:

Red Hat Enterprise Linux 8.8+
Ubuntu 22.04 LTS
Ubuntu 24.04 LTS

Hypervisor Platform Support for GPUDirect RDMA and GPUDirect Storage#

Table 23 Hypervisor Platform Support for GPUDirect RDMA and GPUDirect Storage#
Hypervisor Platform	Version
Red Hat Enterprise Linux with KVM	8.8+
Ubuntu	22.04 24.04
VMware vSphere	8 9

vGPU Support for GPUDirect RDMA and GPUDirect Storage#

GPUDirect RDMA and GPUDirect Storage technology are supported on all time-sliced and MIG-backed NVIDIA vGPU for Compute on physical GPUs that support single root I/O virtualization (SR-IOV).

For a list of supported GPUs, refer to the Supported NVIDIA GPUs and Networking section in the NVIDIA AI Enterprise Infra Support Matrix.

Guest OS Releases Support for GPUDirect RDMA and GPUDirect Storage#

Linux only. GPUDirect technology is not supported on Windows.

Network Interface Cards Support for GPUDirect RDMA and GPUDirect Storage#

GPUDirect technology is supported on the following network interface cards:

NVIDIA ConnectX- 8 SmartNIC
NVIDIA ConnectX- 7 SmartNIC
Mellanox Connect-X 6 SmartNIC
Mellanox Connect-X 5 Ethernet adapter card

Heterogeneous vGPU#

Heterogeneous vGPU allows a single physical GPU to simultaneously support multiple vGPU profiles with different memory allocations (framebuffer sizes). This configuration is particularly beneficial for environments where VMs have diverse GPU resource requirements. By enabling the same physical GPU to host vGPUs of varying sizes, heterogeneous vGPU optimizes overall resource usage, ensuring VMs access only the necessary GPU resources and preventing underutilization.

When a GPU is configured for heterogeneous vGPU, its behavior during events like a host reboot, NVIDIA Virtual GPU Manager reload, or GPU reset varies by hypervisor. This configuration only supports the Best Effort and Equal Share schedulers.

Heterogeneous vGPU is supported on Volta and later GPUs. For additional information and operational instructions across different hypervisors, refer to the Heterogeneous vGPU documentation.

Platform Support for Heterogeneous vGPUs#

Table 24 Platform Support for Heterogeneous vGPUs#
Hypervisor Platform	NVIDIA AI Enterprise Infra Release	Documentation
Red Hat Enterprise Linux with KVM	NVIDIA AI Enterprise Infra 6.x NVIDIA AI Enterprise Infra 7.0	Configuring a GPU for Heterogeneous vGPU on RHEL KVM
Canonical Ubuntu with KVM	NVIDIA AI Enterprise Infra 6.x NVIDIA AI Enterprise Infra 7.0	Configuring a GPU for Heterogeneous vGPU on Linux KVM
VMware vSphere	NVIDIA AI Enterprise Infra 6.x NVIDIA AI Enterprise Infra 7.0	Configuring a GPU for Heterogeneous vGPU on VMware vSphere

Live Migration#

Live migration enables the seamless transfer of VMs configured with NVIDIA vGPUs from one physical host to another without downtime. This capability enables enterprises to maintain continuous operations during infrastructure changes, balancing workloads, or reallocating resources with minimal disruption. Live migration offers significant operational benefits, including enhanced business continuity, scalability, and agility.

For additional information about this feature and instructions on how to perform the operation across different hypervisors, refer to the vGPU Live Migration documentation.

Live Migration Known Issues and Limitations#

Table 25 Live Migration Known Issues and Limitations#
Hypervisor Platform	Documentation
Red Hat Enterprise Linux with KVM	Known Issues and Limitations with NVIDIA vGPU for Compute Migration on RHEL KVM
Ubuntu with KVM	Known Issues and Limitations with NVIDIA vGPU for Compute Migration on Ubuntu KVM
VMware vSphere	Known Issues and Limitations with NVIDIA vGPU for Compute Migration on VMware vSphere

Platform Support for Live Migration#

Table 26 Live Migration Known Issues and Limitations#
Hypervisor Platform	Version	NVIDIA AI Enterprise Infra Release	Documentation
Red Hat Enterprise Linux with KVM	9.4 9.6 10.0	NVIDIA AI Enterprise Infra 6.x NVIDIA AI Enterprise Infra 7.0	Migrating a VM Configured with NVIDIA vGPU for Compute on RHEL KVM
Ubuntu with KVM	24.04 LTS	NVIDIA AI Enterprise Infra 6.x NVIDIA AI Enterprise Infra 7.0	Migrating a VM Configured with NVIDIA vGPU for Compute on Linux KVM
VMware vSphere	8 9	All active NVIDIA AI Enterprise Infra Releases	Migrating a VM Configured with NVIDIA vGPU for Compute on VMware vSphere

Note

Live Migration is not supported between RHEL 10 and RHEL 9.4.

vGPU Support for Live Migration#

For a list of supported GPUs, refer to the Supported NVIDIA GPUs and Networking section in the NVIDIA AI Enterprise Infra Support Matrix.

Note

Live Migration is not supported between 80GB PCIe and 94GB NVL variants of GPU Boards
Live Migration is not supported between H200 / H800 / H100 GPU Boards

Multi-vGPU and P2P#

Multi vGPU#

Multi-vGPU technology allows a single VM to simultaneously leverage multiple vGPUs, significantly enhancing its computational capabilities. Unlike standard vGPU configurations that virtualize a single physical GPU for sharing across multiple VMs, Multi-vGPU presents resources from several vGPU devices into a single VM. These vGPUs can be time-sliced or MIG-backed. These vGPU devices are not required to reside on the same physical GPU; they can be distributed across separate physical GPUs, pooling their collective power to meet the demands of high-performance workloads.

This technology is particularly advantageous for AI training and inference workloads that require extensive computational power. It optimizes resource allocation by enabling applications within a VM to access dedicated GPU resources. For instance, a VM configured with two NVIDIA A100 GPUs using Multi-vGPU can run large-scale AI models more efficiently than with a single GPU. This dedicated assignment eliminates resource contention between different AI processes within the same VM, ensuring optimal and predictable performance for critical tasks. The ability to aggregate computational power from multiple vGPUs makes Multi-vGPU an ideal solution for scaling complex AI model development and deployment.

Peer-To-Peer (P2P) CUDA Transfers#

Peer-to-Peer (P2P) CUDA transfers enable device memory between vGPUs on different GPUs that are assigned to the same VM to be accessed from within the CUDA kernels. NVLink is a high-bandwidth interconnect that enables fast communication between such vGPUs.

P2P CUDA transfers over NVLink are supported only on a subset of vGPUs, hypervisor releases, and guest OS releases.

Peer-to-Peer CUDA Transfers Known Issues and Limitations#

Only time-sliced vGPUs are supported. MIG-backed vGPUs are not supported.
P2P transfers over PCIe are not supported.

Hypervisor Platform Support for Multi-vGPU and P2P#

Table 27 Hypervisor Platform Support for Multi-vGPU and P2P#
Hypervisor Platform	NVIDIA AI Enterprise Infra Release	Supported vGPU Types	Documentation
Red Hat Enterprise Linux with KVM	All active NVIDIA AI Enterprise Infra Releases	All NVIDIA vGPU for Compute with PCIe GPUs; on supported GPUs, both time-sliced and MIG-backed vGPUs are supported.	Setting up Multi-vGPU VMs on RHEL KVM
Ubuntu with KVM	All active NVIDIA AI Enterprise Infra Releases	All NVIDIA vGPU for Compute with PCIe GPUs; on supported GPUs, both time-sliced and MIG-backed vGPUs are supported.	Setting up Multi-vGPU VMs on Ubuntu KVM
VMware vSphere	All active NVIDIA AI Enterprise Infra Releases	All NVIDIA vGPU for Compute, on supported GPUs, both time-sliced and MIG-backed vGPUs are supported.	Setting up Multi-vGPU on VMware vSphere 8

Note

P2P CUDA transfers are not supported on Windows. Only Linux OS distros as outlined in NVIDIA AI Enterprise Infrastructure Support Matrix are supported.

vGPU Support for Multi-vGPU#

You can assign multiple vGPUs with differing amounts of frame buffer to a single VM, provided the board type and the series of all the vGPUs are the same. For example, you can assign an A40-48C vGPU and an A40-16C timesliced vGPUs to the same VM. You can also assign an A100-4-20C vGPU and one A100-2-10C vGPU to a VM, both on MIG instances from an A100 board. However, you cannot assign an A30-8C vGPU and an A16-8C vGPU to the same VM.

Table 28 vGPU Support for Multi-vGPU on the NVIDIA Blackwell Architecture#
Board	vGPU [1]
NVIDIA HGX B200 180GB	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: - All NVIDIA vGPU for Compute
NVIDIA RTX PRO 6000 Blackwell SE 96GB	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute

Table 29 vGPU Support for Multi-vGPU on the NVIDIA Hopper GPU Architecture#
Board	vGPU [1]
NVIDIA H800 PCIe 94GB (H800 NVL)	All NVIDIA vGPU for Compute
NVIDIA H800 PCIe 80GB	All NVIDIA vGPU for Compute
NVIDIA H800 SXM5 80GB	NVIDIA vGPU for Compute [5]
NVIDIA H200 PCIe 141GB (H200 NVL)	All NVIDIA vGPU for Compute
NVIDIA H200 SXM5 141GB	NVIDIA vGPU for Compute [5]
NVIDIA H100 PCIe 94GB (H100 NVL)	All NVIDIA vGPU for Compute
NVIDIA H100 SXM5 94GB	NVIDIA vGPU for Compute [5]
NVIDIA H100 PCIe 80GB	All NVIDIA vGPU for Compute
NVIDIA H100 SXM5 80GB	NVIDIA vGPU for Compute [5]
NVIDIA H100 SXM5 64GB	NVIDIA vGPU for Compute [5]
NVIDIA H20 SXM5 141GB	NVIDIA vGPU for Compute [5]
NVIDIA H20 SXM5 96GB	NVIDIA vGPU for Compute [5]

Table 30 vGPU Support for Multi-vGPU on the NVIDIA Ada Lovelace Architecture#
Board	vGPU
NVIDIA L40	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA L40S	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA L20	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA L4	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA L2	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA RTX 6000 Ada	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA RTX 5880 Ada	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA RTX 5000 Ada	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute

Table 31 vGPU Support for Multi-vGPU on the NVIDIA Ampere GPU Architecture#
Board	vGPU [1]
NVIDIA A800 PCIe 80GB NVIDIA A800 PCIe 80GB liquid-cooled NVIDIA AX800	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A800 PCIe 40GB active-cooled	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A800 HGX 80GB	Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A100 PCIe 80GB NVIDIA A100 PCIe 80GB liquid-cooled NVIDIA A100X	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A100 HGX 80GB	Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A100 PCIe 40GB	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A100 HGX 40GB	Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A40	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A30 NVIDIA A30X NVIDIA A30 liquid-cooled	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A16	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA A10	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA RTX A6000	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA RTX A5500	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
NVIDIA RTX A5000	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute

Table 32 vGPU Support for Multi-vGPU on the NVIDIA Turing GPU Architecture#
Board	vGPU
Tesla T4	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
Quadro RTX 6000 passive	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
Quadro RTX 8000 passive	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute

Table 33 vGPU Support for Multi-vGPU on the NVIDIA Volta GPU Architecture#
Board	vGPU
Tesla V100 SXM2	Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
Tesla V100 SXM2 32GB	Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
Tesla V100 PCIe	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
Tesla V100 PCIe 32GB	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
Tesla V100S PCIe 32GB	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute
Tesla V100 FHHL	Generic Linux with KVM hypervisors [6], Red Hat Enterprise Linux KVM, and Ubuntu: All NVIDIA vGPU for Compute Since VMware vSphere 8.0: All NVIDIA vGPU for Compute

vGPU Support for P2P#

Only NVIDIA vGPU for Compute time-sliced vGPUs allocated all of the physical GPU framebuffer on physical GPUs that support NVLink are supported.

Table 34 vGPU Support for P2P on the NVIDIA Blackwell GPU Architecture#
Board	vGPU
NVIDIA HGX B200 180GB	NVIDIA B200X-180C

Table 35 vGPU Support for P2P on the NVIDIA Hopper GPU Architecture#
Board	vGPU
NVIDIA H800 PCIe 94GB (H800 NVL)	H800L-94C
NVIDIA H800 PCIe 80GB	H800-80C
NVIDIA H200 PCIe 141GB (H200 NVL)	H200-141C
NVIDIA H200 SXM5 141GB	H200X-141C
NVIDIA H100 PCIe 94GB (H100 NVL)	H100L-94C
NVIDIA H100 SXM5 94GB	H100XL-94C
NVIDIA H100 PCIe 80GB	H100-80C
NVIDIA H100 SXM5 80GB	H100XM-80C
NVIDIA H100 SXM5 64GB	H100XS-64C
NVIDIA H20 SXM5 141GB	H20X-141C
NVIDIA H20 SXM5 96GB	H20-96C

Table 36 vGPU Support for P2P on the NVIDIA Ampere GPU Architecture#
Board	vGPU
NVIDIA A800 PCIe 80GB NVIDIA A800 PCIe 80GB liquid-cooled NVIDIA AX800	A800D-80C
NVIDIA A800 PCIe 40GB active-cooled	A800-40C
NVIDIA A800 HGX 80GB	A800DX-80C [2]
NVIDIA A100 PCIe 80GB NVIDIA A100 PCIe 80GB liquid-cooled NVIDIA A100X	A100D-80C
NVIDIA A100 HGX 80GB	A100DX-80C [2]
NVIDIA A100 PCIe 40GB	A100-40C
NVIDIA A100 HGX 40GB	A100X-40C [2]
NVIDIA A40	A40-48C
NVIDIA A30 NVIDIA A30X NVIDIA A30 liquid-cooled	A30-24C
NVIDIA A16	A16-16C
NVIDIA A10	A10-24C
NVIDIA RTX A6000	A6000-48C
NVIDIA RTX A5500	A5500-24C
NVIDIA RTX A5000	A5000-24C

Table 37 vGPU Support for P2P on the NVIDIA Turing GPU Architecture#
Board	vGPU
Quadro RTX 8000 passive	RTX8000P-48C
Quadro RTX 6000 passive	RTX6000P-24C

Table 38 vGPU Support for P2P on the NVIDIA Volta GPU Architecture#
Board	vGPU
Tesla V100 SXM2	V100X-16C
Tesla V100 SXM2 32GB	V100DX-32C

NVIDIA NVSwitch#

NVIDIA NVSwitch provides a high-bandwidth, low-latency interconnect fabric that enables seamless, direct communication between multiple GPUs within a system. NVIDIA NVSwitch enables peer-to-peer vGPU communication within a single node over the NVLink fabric. The NVSwitch acts as a high-speed crossbar, allowing any GPU to communicate with any other GPU at full NVLink speed, significantly improving communication efficiency and bandwidth compared to traditional PCIe-based interconnections. It facilitates the creation of large GPU clusters, enabling AI and deep learning applications to efficiently utilize pooled GPU memory and compute resources for complex, computationally intensive tasks. It is supported only on a subset of hardware platforms, vGPUs, hypervisor software releases, and guest OS releases.

For information about using the NVSwitch, refer to the NVIDIA Fabric Manager documentation.

Platform Support for NVIDIA NVSwitch#

NVIDIA HGX B200 8-GPU baseboard
NVIDIA HGX H200 8-GPU baseboard
NVIDIA HGX H100 8-GPU baseboard
NVIDIA HGX H800 8-GPU baseboard
NVIDIA HGX A100 8-GPU baseboard

NVIDIA NVSwitch Limitations#

Only time-sliced vGPUs are supported. MIG-backed vGPUs are not supported.
GPU passthrough is not supported on NVIDIA Systems that include NVSwitch when using VMware vSphere.
All vGPUs communicating peer-to-peer must be assigned to the same VM.
On GPUs based on the NVIDIA Hopper and Blackwell GPU architectures, multicast is supported when unified memory (UVM) is enabled.
VMware vSphere is not supported on NVIDIA HGX B200.

Hypervisor Platform Support for NVSwitch#

Consult the documentation from your hypervisor vendor for information about which generic Linux with KVM hypervisor software releases supports NVIDIA NVSwitch.

All supported Red Hat Enterprise Linux KVM and Ubuntu KVM releases support NVIDIA NVSwitch.
The earliest VMware vSphere Hypervisor (ESXi) release that supports NVIDIA NVSwitch depends on the GPU architecture.

Table 39 VMware vSphere Releases that Support NVIDIA NVSwitch#
GPU Architecture	Earliest Supported VMware vSphere Hypervisor (ESXi) Release
NVIDIA Blackwell	Not supported on VMware
NVIDIA Hopper	VMware vSphere Hypervisor (ESXi) 8 update 2
NVIDIA Ampere	VMware vSphere Hypervisor (ESXi) 8 update 1

vGPU Support for NVSwitch#

Only the following vGPU for Compute time-sliced vGPUs that are allocated all of the physical GPU’s framebuffer are supported:

NVIDIA A800
NVIDIA A100 HGX
NVIDIA B200 HGX
NVIDIA H800
NVIDIA H200 HGX
NVIDIA H100 SXM5
NVIDIA H20

Table 40 NVIDIA NVSwitch Support on the NVIDIA Ampere GPU Architecture#
Board	vGPU
NVIDIA A800 HGX 80GB	A800DX-80C
NVIDIA A100 HGX 80GB	A100DX-80C
NVIDIA A100 HGX 40GB	A100X-40C

Table 41 NVIDIA NVSwitch Support on the NVIDIA Blackwell GPU Architecture#
Board	vGPU
NVIDIA B200 HGX 180GB	B200X-180C

Table 42 NVIDIA NVSwitch Support on the NVIDIA Hopper GPU Architecture#
Board	vGPU
NVIDIA H800 SXM5 80GB	H800XM-80C
NVIDIA H200 SXM5 141GB	H200X-141C
NVIDIA H100 SXM5 80GB	H100XM-80C
NVIDIA H20 SXM5 141GB	H20X-141C
NVIDIA H20 SXM5 96GB	H20-96C

Guest OS Releases Support for NVSwitch#

Linux only. NVIDIA NVSwitch is not supported on Windows.

NVLink Multicast#

NVLink multicast support requires that unified memory is enabled. For more information about enabling unified memory, refer to the Enabling Unified Memory for a vGPU documentation.

vGPU Support for NVLink Multicast#

Only full-sized, time-sliced NVIDIA vGPU for Compute support NVLink multicast.

Table 43 NVIDIA NVLink Multicast Support#
Board	vGPU
NVIDIA HGX B200 [7]	NVIDIA B200X-180C
NVIDIA HGX H800	NVIDIA H800XM-80C
NVIDIA HGX H200	NVIDIA H200X-141C
NVIDIA HGX H100	NVIDIA H100XM-80C
NVIDIA H20 HGX 141GB	H20X-141C
HGX H20 96GB	NVIDIA H20-96C

Scheduling Policies#

NVIDIA vGPU for Compute offers a range of scheduling policies that allow administrators to customize resource allocation based on workload intensity and organizational priorities, ensuring optimal resource utilization and alignment with business needs. These policies determine how GPU resources are shared across multiple VMs and directly impacts factors like latency, throughput, and performance stability in multi-tenant environments.

For workloads with varying demands, time slicing plays a critical role in determining scheduling efficiency. The vGPU scheduler time slice represents the duration a VM’s work is allowed to run on the GPU before it is preempted. A longer time slice maximizes throughput for compute-heavy workloads, such as CUDA applications, by minimizing context switching. In contrast, a shorter time slice reduces latency, making it ideal for latency-sensitive tasks like graphics applications.

NVIDIA provides three scheduling modes: Best Effort, Equal Share, and Fixed Share, each designed to different workload requirements and environments. For more information, refer to the vGPU Schedulers documentation.

Refer to the Changing Scheduling Behavior for Time-Sliced vGPUs documentation for how to configure and adjust scheduling policies to meet specific resource distribution needs.

Suspend-Resume#

The suspend-resume feature allows NVIDIA vGPU-configured VMs to be temporarily paused and later resumed without losing their operational state. During suspension, the entire VM state, including GPU and compute resources, is saved to disk, thereby freeing these resources on the host. Upon resumption, the state is fully restored, enabling seamless workload continuation.

This capability provides operational flexibility and optimizes resource utilization. It is valuable for planned host maintenance, freeing up resources by pausing non-critical workloads, and ensuring consistent environments for development and testing.

Unlike live migration, suspend-resume involves downtime during both suspension and resumption. Cross-host operations require strict compatibility across hosts, encompassing GPU type, Virtual GPU manager version, memory configuration, and NVLink topology.

Suspend-resume is supported on all GPUs that enable vGPU functionality; however, compatibility varies by hypervisor, NVIDIA vGPU software release, and guest operating system.

For additional information and operational instructions across different hypervisors, refer to the vGPU Suspend-Resume documentation.

Suspend-Resume Known Issues and Limitations#

Table 44 Suspend-Resume Known Issues and Limitations#
Hypervisor Platform	Documentation
VMware vSphere	Known Issues and Limitations with Suspend Resume on VMware vSphere

Note

While live migration generally allows resuming a suspended VM on any compatible vGPU host manager, a current bug in Red Hat Enterprise Linux 9.4 and Ubuntu 24.04 LTS limits suspend, resume, and migration to hosts with an identical vGPU manager version. The issue has been resolved in Red Hat Enterprise Linux 9.6 and later.

Platform Support for Suspend-Resume#

Suspend-resume is supported on all GPUs that support NVIDIA vGPU for Compute, but compatibility varies by hypervisor, release version, and guest operating system.

Table 45 Platform Support for Suspend-Resume#
Hypervisor Platform	Version	NVIDIA AI Enterprise Infra Release	Documentation
Red Hat Enterprise Linux with KVM	9.4 9.6 10.0	NVIDIA AI Enterprise Infra 6.x NVIDIA AI Enterprise Infra 7.0	Suspending and Resuming a VM Configured with NVIDIA vGPU for Compute on RHEL KVM
Ubuntu with KVM	24.04 LTS	NVIDIA AI Enterprise Infra 6.x NVIDIA AI Enterprise Infra 7.0	Suspending and Resuming a VM Configured with NVIDIA vGPU for Compute on Ubuntu KVM
VMware vSphere	8 9	All active NVIDIA AI Enterprise Infra Releases	Suspending and Resuming a VM Configured with NVIDIA vGPU for Compute on VMware vSphere

vGPU Support for Suspend-Resume#

For a list of supported GPUs, refer to the Supported NVIDIA GPUs and Networking section in the NVIDIA AI Enterprise Infra Support Matrix.

Unified Virtual Memory (UVM)#

Unified Virtual Memory (UVM) provides a single, cohesive memory address space accessible by both the CPUs and GPUs within a system. This feature creates a managed memory pool, allowing data to be allocated and accessed by code executing on either processor. The primary benefit is the simplification of programming and enhanced performance for GPU-accelerated workloads, as it eliminates the need for applications to explicitly manage data transfers between CPU and GPU memory. For additional information about this feature, refer to the Unified Virtual Memory documentation.

UVM Known Issues and Limitations#

Unified Virtual Memory (UVM) is restricted to 1:1 time-sliced and MIG vGPU for Compute profiles that allocate the entire framebuffer of a compatible physical GPU or GPU Instance. Fractional time-sliced vGPUs do not support UVM.
UVM is only supported on Linux Guest OS distros. Windows Guest OS is not supported.
Enabling UVM disables vGPU migration for the VM, which may reduce operational flexibility in environments reliant on live migration.
UVM is disabled by default and must be explicitly enabled for each vGPU that requires it by setting a specific vGPU plugin parameter for the VM.
When deploying NVIDIA NIM, if UVM is enabled and an optimized engine is available, the model will run on the TensorRT-LLM (TRT-LLM) backend. Otherwise, it will typically run on the vLLM backend.

Hypervisor Platform Support for UVM#

Unified Virtual Memory (UVM) is disabled by default. If used, you must enable unified memory individually for each vGPU for Compute VM that requires it by setting a vGPU plugin parameter. How to enable UVM for a vGPU VM depends on the hypervisor that you are using.

Table 46 Hypervisor Platform Support for UVM#
Hypervisor Platform	Documentation
Red Hat Enterprise Linux with KVM	Enabling Unified Memory for NVIDIA vGPU for Compute VM on Red Hat Enterprise Linux KVM
Ubuntu with KVM	Enabling Unified Memory for NVIDIA vGPU for Compute VM on Ubuntu KVM
VMware vSphere	Enabling Unified Memory for NVIDIA vGPU for Compute VM on VMware vSphere

vGPU Support for UVM#

UVM is supported on 1:1 MIG-backed and time sliced vGPUs. These vGPUs have the entire framebuffer of a MIG GPU Instance or physical GPU assigned to a single vGPU.

Table 47 vGPU Support for UVM on the NVIDIA Blackwell GPU Architecture#
Board	vGPU
NVIDIA HGX B200 SXM	B200X-7-180C MIG-backed 1:1 vGPUs
NVIDIA RTX PRO 6000 Blackwell SE	DC-4-96C MIG-backed 1:1 vGPUs

Table 48 vGPU Support for UVM on the NVIDIA Hopper GPU Architecture#
Board	vGPU
NVIDIA H800 PCIe 94GB (H800 NVL)	H800L-94C All MIG-backed vGPUs
NVIDIA H800 PCIe 80GB	H800-80C All MIG-backed vGPUs
NVIDIA H800 SXM5 80GB	H800XM-80C All MIG-backed vGPUs
NVIDIA H200 SXM5	H200X-141C All MIG-backed vGPUs
NVIDIA H200 NVL	H200-141C All MIG-backed vGPUs
NVIDIA H100 PCIe 94GB (H100 NVL)	H100L-94C All MIG-backed vGPUs
NVIDIA H100 SXM5 94GB	H100XL-94C All MIG-backed vGPUs
NVIDIA H100 PCIe 80GB	H100-80C All MIG-backed vGPUs
NVIDIA H100 SXM5 80GB	H100XM-80C All MIG-backed vGPUs
NVIDIA H100 SXM5 64GB	H100XS-64C All MIG-backed vGPUs
NVIDIA H20 SXM5 141GB	H20X-141C All MIG-backed vGPUs
NVIDIA H20 SXM5 96GB	H20-96C All MIG-backed vGPUs

Table 49 vGPU Support for UVM on the NVIDIA Ada Lovelace GPU Architecture#
Board	vGPU
NVIDIA L40	L40-48C
NVIDIA L40S	L40S-48C
NVIDIA L20 NVIDIA L20 liquid-cooled	L20-48C
NVIDIA L4	L4-24C
NVIDIA L2	L2-24C
NVIDIA RTX 6000 Ada	RTX 6000 Ada-48C
NVIDIA RTX 5880 Ada	RTX 5880 Ada-48C
NVIDIA RTX 5000 Ada	RTX 6000 Ada-32C

Table 50 vGPU Support for UVM on the NVIDIA Ampere GPU Architecture#
Board	vGPU
NVIDIA A800 PCIe 80GB NVIDIA A800 PCIe 80GB liquid-cooled NVIDIA AX800	A800D-80C All MIG-backed vGPUs
NVIDIA A800 PCIe 40GB active-cooled	A800-40C All MIG-backed vGPUs
NVIDIA A800 HGX 80GB	A800DX-80C All MIG-backed vGPUs
NVIDIA A100 PCIe 80GB NVIDIA A100 PCIe 80GB liquid-cooled NVIDIA A100X	A100D-80C All MIG-backed vGPUs
NVIDIA A100 HGX 80GB	A100DX-80C All MIG-backed vGPUs
NVIDIA A100 PCIe 40GB	A100-40C All MIG-backed vGPUs
NVIDIA A100 HGX 40GB	A100X-40C All MIG-backed vGPUs
NVIDIA A40	A40-48C
NVIDIA A30 NVIDIA A30X NVIDIA A30 liquid-cooled	A30-24C All MIG-backed vGPUs
NVIDIA A16	A16-16C
NVIDIA A10	A10-24C
NVIDIA RTX A6000	A6000-48C
NVIDIA RTX A5500	A5500-24C
NVIDIA RTX A5000	A5000-24C

Product Limitations and Known Issues#

Red Hat Enterprise Linux with KVM Limitations and Known Issues#

Refer to the following lists of known Red Hat Enterprise Linux with KVM product limitations and product issues.

Ubuntu KVM Limitations and Known Issues#

Refer to the following lists of known Red Hat Enterprise Linux with KVM product limitations and product issues.

VMware vSphere Limitations and Known Issues#

Refer to the following lists of known Red Hat Enterprise Linux with KVM product limitations and product issues.

Requirements for Using vGPU for Compute on VMware vSphere for GPUs Requiring 64 GB+ of MMIO Space with Large-Memory VMs#

Some GPUs require 64 GB or more of MMIO space. When a vGPU on a GPU that requires 64 GB or more of MMIO space is assigned to a VM with 32 GB or more of memory on ESXi , the VM’s MMIO space must be increased to the amount of MMIO space that the GPU requires.

For detailed information about this limitation, refer to the Requirements for Using vGPU on GPUs Requiring 64 GB or More of MMIO Space with Large-Memory VMs documentation.

Table 51 GPUs Requiring 64GB or More of MMIO Space with Large-Memory VMs#
GPU	MMIO Space Required
NVIDIA B200	768GB
NVIDIA H200 (all variants)	512GB
NVIDIA H100 (all variants)	256GB
NVIDIA H800 (all variants)	256GB
NVIDIA H20 141GB	512GB
NVIDIA H20 96GB	256GB
NVIDIA L40	128GB
NVIDIA L20	128GB
NVIDIA L4	64GB
NVIDIA L2	64GB
NVIDIA RTX 6000 Ada	128GB
NVIDIA RTX 5000 Ada	64GB
NVIDIA A40	128GB
NVIDIA A30	64GB
NVIDIA A10	64GB
NVIDIA A100 80GB (all variants)	256GB
NVIDIA A100 40GB (all variants)	128GB
NVIDIA RTX A6000	128GB
NVIDIA RTX A5500	64GB
NVIDIA RTX A5000	64GB
Quadro RTX 8000 Passive	64GB
Quadro RTX 6000 Passive	64GB
Tesla V100 (all variants)	64GB

Microsoft Windows Server Limitations and Known Issues#

Refer to the following lists of known Red Hat Enterprise Linux with KVM product limitations and product issues.

NVIDIA AI Enterprise supports only the Tesla Compute Cluster (TCC) driver model for Windows guest drivers.

Windows guest OS support is limited to running applications natively in Windows VMs without containers. NVIDIA AI Enterprise features that depend on the containerization of applications are not supported on Windows guest operating systems.

If you are using a generic Linux supported by the KVM hypervisor, consult the documentation from your hypervisor vendor for information about Windows releases supported as a guest OS.

For more information, refer to the Non-containerized Applications on Hypervisors and Guest Operating Systems Supported with vGPU table.

Virtual GPU Types for Supported GPUs#

NVIDIA Blackwell GPU Architecture#

MIG-Backed and Time-Sliced NVIDIA vGPU for Compute for the NVIDIA Blackwell Architecture#

Physical GPUs per board: 1

The maximum number of vGPUs per board is the product of the maximum number of vGPUs per GPU and the number of physical GPUs per board.

Required license edition: NVIDIA AI Enterprise

MIG-Backed NVIDIA vGPU for Compute

For details on GPU instance profiles, refer to the NVIDIA Multi-Instance GPU User Guide.

Time-Sliced NVIDIA vGPU for Compute

Intended use cases:

vGPUs with more than 40 GB of framebuffer: Training Workloads
vGPUs with 40 GB of framebuffer: Inference Workloads

These vGPU types support a single display with a fixed maximum resolution.

NVIDIA HGX B200 180GB

Table 52 MIG-Backed NVIDIA vGPU for Compute for NVIDIA HGX B200 180GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
B200X-7-180C	180	1	7	7	MIG 7g.180gb
B200X-4-90C	90	1	4	4	MIG 4g.90gb
B200X-3-90C	90	2	3	3	MIG 3g.90gb
B200X-2-45C	45	3	2	2	MIG 2g.45gb
B200X-1-45C	45	4	1	1	MIG 1g.45gb
B200X-1-23C	22.5	7	1	1	MIG 1g.23gb
B200X-1-23CME	22.5	1	1	1	MIG 1g.23gb+me

Table 53 Time-Sliced NVIDIA vGPU for Compute for NVIDIA HGX B200 180GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
B200X-180C	180	1	1	3840x2400	1

NVIDIA RTX PRO 6000 Server Edition

Table 54 MIG-Backed NVIDIA vGPU for Compute for NVIDIA RTX PRO 6000 SE 96GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
DC-4-96C	96	1	4	4	MIG 4g.96gb
DC-4-48C	48	2	4	1	MIG 4g.48gb
DC-2-48C	48	1	2	2	MIG 2g.48gb
DC-4-32C	32	3	4	1	MIG 4g.32gb
DC-4-24C	24	4	4	1	MIG 4g.24gb
DC-2-24C	24	2	2	1	MIG 2g.24gb
DC-1-24C	24	1	1	1	MIG 1g.24gb
DC-2-16C	16	3	2	1	MIG 2g.16gb
DC-2-12C	12	4	2	1	MIG 2g.12gb
DC-1-12C	12	2	1	1	MIG 1g.12gb
DC-1-8C	8	3	1	1	MIG 1g.8gb

Table 55 Time-Sliced NVIDIA vGPU for Compute for NVIDIA RTX PRO 6000 SE 96GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
DC-96C	96	1	1	3840x2400	1
DC-48C	48	2	2	3840x2400	1
DC-32C	32	3	3	3840x2400	1
DC-24C	24	4	4	3840x2400	1
DC-16C	16	6	6	3840x2400	1
DC-12C	12	8	8	3840x2400	1
DC-8C	8	12	12	3840x2400	1

NVIDIA Hopper GPU Architecture#

MIG-Backed and Time-Sliced NVIDIA vGPU for Compute for the NVIDIA Ampere GPU Architecture#

Physical GPUs per board: 1

The maximum number of vGPUs per board is the product of the maximum number of vGPUs per GPU and the number of physical GPUs per board.

Required license edition: NVIDIA AI Enterprise

MIG-Backed NVIDIA vGPU for Compute

For details on GPU instance profiles, refer to the NVIDIA Multi-Instance GPU User Guide.

Time-Sliced NVIDIA vGPU for Compute

Intended use cases:

vGPUs with more than 40 GB of framebuffer: Training Workloads
vGPUs with 40 GB of framebuffer: Inference Workloads

These vGPU types support a single display with a fixed maximum resolution.

NVIDIA H800 PCIe 94GB (H800 NVL)

Table 56 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H800 PCIe 94GB (H800 NVL)#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H800L-7-94C	96	1	7	7	MIG 7g.94gb
H800L-4-47C	48	1	4	4	MIG 4g.47gb
H800L-3-47C	48	2	3	3	MIG 3g.47gb
H800L-2-24C	24	3	2	2	MIG 2g.24gb
H800L-1-24C	24	4	1	1	MIG 1g.24gb
H800L-1-12C	12	7	1	1	MIG 1g.12gb
H800L-1-12CME [3]	12	1	1	1	MIG 1g.12gb+me

Table 57 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H800 PCIe 94GB (H800 NVL)#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H800L-94C	96	1	1	3840x2400	1
H800L-47C	48	2	2	3840x2400	1
H800L-23C	23	4	4	3840x2400	1
H800L-15C	15	6	4	3840x2400	1
H800L-11C	11	8	8	3840x2400	1
H800L-6C	6	15	8	3840x2400	1
H800L-4C	4	23	16	3840x2400	1

NVIDIA H800 PCIe 80GB

Table 58 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H800 PCIe 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H800-7-80C	81	1	7	7	MIG 7g.80gb
H800-4-40C	40	1	4	4	MIG 4g.40gb
H800-3-40C	40	2	3	3	MIG 3g.40gb
H800-2-20C	20	3	2	2	MIG 2g.20gb
H800-1-20C [3]	20	4	1	1	MIG 1g.20gb
H800-1-10C	10	7	1	1	MIG 1g.10gb
H800-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 59 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H800 PCIe 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H800-80C	81	1	1	3840x2400	1
H800-40C	40	2	2	3840x2400	1
H800-20C	20	4	4	3840x2400	1
H800-16C	16	5	4	3840x2400	1
H800-10C	10	8	8	3840x2400	1
H800-8C	8	10	8	3840x2400	1
H800-5C	5	16	16	3840x2400	1
H800-4C	4	20	16	3840x2400	1

NVIDIA H800 SXM5 80GB

Table 60 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H800 SXM5 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H800XM-7-80C	81	1	7	7	MIG 7g.80gb
H800XM-4-40C	40	1	4	4	MIG 4g.40gb
H800XM-3-40C	40	2	3	3	MIG 3g.40gb
H800XM-2-20C	20	3	2	2	MIG 2g.20gb
H800XM-1-20C [3]	20	4	1	1	MIG 1g.20gb
H800XM-1-10C	10	7	1	1	MIG 1g.10gb
H800XM-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 61 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H800 SXM5 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H800XM-80C	81	1	1	3840x2400	1
H800XM-40C	40	2	2	3840x2400	1
H800XM-20C	20	4	4	3840x2400	1
H800XM-16C	16	5	4	3840x2400	1
H800XM-10C	10	8	8	3840x2400	1
H800XM-8C	8	10	8	3840x2400	1
H800XM-5C	5	16	16	3840x2400	1
H800XM-4C	4	20	16	3840x2400	1

NVIDIA H200 PCIe 141GB (H200 NVL)

Table 62 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H200 PCIe 141GB (H200 NVL)#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H200-7-141C	144	1	7	7	MIG 7g.141gb
H200-4-71C	72	1	4	4	MIG 4g.71gb
H200-3-71C	72	2	3	3	MIG 3g.71gb
H200-2-35C	35	3	2	2	MIG 2g.35gb
H200-1-35C [3]	35	4	1	1	MIG 1g.35gb
H200-1-18C	18	7	1	1	MIG 1g.18gb
H200-1-18CME [3]	18	1	1	1	MIG 1g.18gb+me

Table 63 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H200 PCIe 141GB (H200 NVL)#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H200-141C	144	1	1	3840x2400	1
H200-70C	71	2	2	3840x2400	1
H200-35C	35	4	4	3840x2400	1
H200-28C	28	5	5	3840x2400	1
H200-17C	17	8	8	3840x2400	1
H200-14C	14	10	10	3840x2400	1
H200-8C	8	16	16	3840x2400	1
H200-7C	7	20	20	3840x2400	1
H200-4C	4	32	32	3840x2400	1

NVIDIA H200 SXM5 141GB

Table 64 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H200 SXM5 141GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H200X-7-141C	144	1	7	7	MIG 7g.141gb
H200X-4-71C	72	1	4	4	MIG 4g.71gb
H200X-3-71C	72	2	3	3	MIG 3g.71gb
H200X-2-35C	35	3	2	2	MIG 2g.35gb
H200X-1-35C [3]	35	4	1	1	MIG 1g.35gb
H200X-1-18C	18	7	1	1	MIG 1g.18gb
H200X-1-18CME [3]	18	1	1	1	MIG 1g.18gb+me

Table 65 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H200 SXM5 141GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H200X-141C	144	1	1	3840x2400	1
H200X-70C	71	2	2	3840x2400	1
H200X-35C	35	4	4	3840x2400	1
H200X-28C	28	5	5	3840x2400	1
H200X-17C	17	8	8	3840x2400	1
H200X-14C	14	10	10	3840x2400	1
H200X-8C	8	16	16	3840x2400	1
H200X-7C	7	20	20	3840x2400	1
H200X-4C	4	32	32	3840x2400	1

NVIDIA H100 PCIe 94GB (H100 NVL)

Table 66 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H100 PCIe 94GB (H100 NVL)#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H100L-7-94C	96	1	7	7	MIG 7g.94gb
H100L-4-47C	48	1	4	4	MIG 4g.47gb
H100L-3-47C	48	2	3	3	MIG 3g.47gb
H100L-2-24C	24	3	2	2	MIG 2g.24gb
H100L-1-24C [3]	24	4	1	1	MIG 1g.24gb
H100L-1-12C	12	7	1	1	MIG 1g.12gb
H100L-1-12CME [3]	12	1	1	1	MIG 1g.12gb+me

Table 67 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H100 PCIe 94GB (H100 NVL)#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H100L-94C	96	1	1	3840x2400	1
H100L-47C	48	2	2	3840x2400	1
H100L-23C	23	4	4	3840x2400	1
H100L-15C	15	6	4	3840x2400	1
H100L-11C	11	8	8	3840x2400	1
H100L-6C	6	15	8	3840x2400	1
H100L-4C	4	23	16	3840x2400	1

NVIDIA H100 SXM5 94GB

Table 68 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H100 SXM5 94GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H100XL-7-94C	96	1	7	7	MIG 7g.94gb
H100XL-4-47C	48	1	4	4	MIG 4g.47gb
H100XL-3-47C	48	2	3	3	MIG 3g.47gb
H100XL-2-24C	24	3	2	2	MIG 2g.24gb
H100XL-1-24C [3]	24	4	1	1	MIG 1g.24gb
H100XL-1-12C	12	7	1	1	MIG 1g.12gb
H100XL-1-12CME [3]	12	1	1	1	MIG 1g.12gb+me

Table 69 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H100 SXM5 94GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H100XL-94C	96	1	1	3840x2400	1
H100XL-47C	48	2	2	3840x2400	1
H100XL-23C	23	4	4	3840x2400	1
H100XL-15C	15	6	4	3840x2400	1
H100XL-11C	11	8	8	3840x2400	1
H100XL-6C	6	15	8	3840x2400	1
H100XL-4C	4	23	16	3840x2400	1

NVIDIA H100 PCIe 80GB

Table 70 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H100 PCIe 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H100-7-80C	81	1	7	7	MIG 7g.80gb
H100-4-40C	40	1	4	4	MIG 4g.40gb
H100-3-40C	40	2	3	3	MIG 3g.40gb
H100-2-20C	20	3	2	2	MIG 2g.20gb
H100-1-20C [3]	20	4	1	1	MIG 1g.20gb
H100-1-10C	10	7	1	1	MIG 1g.10gb
H100-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 71 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H100 PCIe 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H100-80	81	1	1	3840x2400	1
H100-40C	40	2	2	3840x2400	1
H100-20C	20	4	4	3840x2400	1
H100-16C	16	6	4	3840x2400	1
H100-10C	10	8	8	3840x2400	1
H100-8C	8	10	8	3840x2400	1
H100-5C	5	16	16	3840x2400	1
H100-4C	4	20	16	3840x2400	1

NVIDIA H100 SXM5 80GB

Table 72 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H100 SXM5 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H100XM-7-80C	81	1	7	7	MIG 7g.80gb
H100XM-4-40C	40	1	4	4	MIG 4g.40gb
H100XM-3-40C	40	2	3	3	MIG 3g.40gb
H100XM-2-20C	20	3	2	2	MIG 2g.20gb
H100XM-1-20C [3]	20	4	1	1	MIG 1g.20gb
H100XM-1-10C	10	7	1	1	MIG 1g.10gb
H100XM-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 73 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H100 SXM5 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H100XM-80C	81	1	1	3840x2400	1
H100XM-40C	40	2	2	3840x2400	1
H100XM-20C	20	4	4	3840x2400	1
H100XM-16C	16	6	4	3840x2400	1
H100XM-10C	10	8	8	3840x2400	1
H100XM-8C	8	10	8	3840x2400	1
H100XM-5C	5	16	16	3840x2400	1
H100XM-4C	4	20	16	3840x2400	1

NVIDIA H100 SXM5 64GB

Table 74 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H100 SXM5 64GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H100XS-7-64C	65	1	7	7	MIG 7g.64gb
H100XS-4-32C	32	1	4	4	MIG 4g.32gb
H100XS-3-32C	32	2	3	3	MIG 3g.32gb
H100XS-2-16C	16	3	2	2	MIG 2g.16gb
H100XS-1-16C [3]	16	4	1	1	MIG 1g.16gb
H100XS-1-8C	8	7	1	1	MIG 1g.8gb
H100XS-1-8CME [3]	8	1	1	1	MIG 1g.8gb+me

Table 75 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H100 SXM5 64GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H100XS-64C	65	1	1	3840x2400	1
H100XS-32C	32	2	2	3840x2400	1
H100XS-16C	16	4	4	3840x2400	1
H100XS-8C	8	8	8	3840x2400	1
H100XS-4C	4	16	16	3840x2400	1

NVIDIA H20 SXM5 141GB

Table 76 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H20 SXM5 141GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H20X-7-141C	144	1	7	7	MIG 7g.141gb
H20X-4-71C	72	1	4	4	MIG 4g.71gb
H20X-3-71C	72	2	3	3	MIG 3g.71gb
H20X-2-35C	35	3	2	2	MIG 2g.35gb
H20X-1-35C [3]	35	4	1	1	MIG 1g.35gb
H20X-1-18C	18	7	1	1	MIG 1g.18gb
H20X-1-18CME [3]	18	1	1	1	MIG 1g.18gb+me

Table 77 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H20 SXM5 141GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H20X-141C	144	1	1	3840x2400	1
H20X-70C	71	2	2	3840x2400	1
H20X-35C	35	4	4	3840x2400	1
H20X-28C	28	5	5	3840x2400	1
H20X-17C	17	8	8	3840x2400	1
H20X-14C	14	10	10	3840x2400	1
H20X-8C	8	16	16	3840x2400	1
H20X-7C	7	20	20	3840x2400	1
H20X-4C	4	32	32	3840x2400	1

NVIDIA H20 SXM5 96GB

Table 78 MIG-Backed NVIDIA vGPU for Compute for NVIDIA H20 SXM5 96GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
H20-7-96C	98	1	7	7	MIG 7g.96gb
H20-4-48C	49	1	4	4	MIG 4g.48gb
H20-3-48C	49	2	3	3	MIG 3g.48gb
H20-2-24C	24	3	2	2	MIG 2g.24gb
H20-1-24C [3]	24	4	1	1	MIG 1g.24gb
H20-1-12C	12	7	1	1	MIG 1g.12gb
H20-1-12CME [3]	12	1	1	1	MIG 1g.12gb+me

Table 79 Time-Sliced NVIDIA vGPU for Compute for NVIDIA H20 SXM5 96GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
H20-96C	98	1	1	3840x2400	1
H20-48C	49	2	2	3840x2400	1
H20-24C	24	4	2	3840x2400	1
H20-16C	16	6	4	3840x2400	1
H20-12C	12	8	4	3840x2400	1
H20-6C	6	16	8	3840x2400	1
H20-4C	4	24	8	3840x2400	1

NVIDIA Ada Lovelace GPU Architecture#

Physical GPUs per board: 1

The maximum number of vGPUs per board is the product of the maximum number of vGPUs per GPU and the number of physical GPUs per board.

Required license edition: NVIDIA AI Enterprise

Intended use cases:

vGPUs with more than 40 GB of framebuffer: Training Workloads
vGPUs with 40 GB of framebuffer: Inference Workloads

These vGPU types support a single display with a fixed maximum resolution.

NVIDIA L40

Table 80 NVIDIA vGPU for Compute for NVIDIA L40#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
L40-48C	49	1	1	3840x2400	1
L40-24C	24	2	2	3840x2400	1
L40-16C	16	3	2	3840x2400	1
L40-12C	12	4	4	3840x2400	1
L40-8C	8	6	4	3840x2400	1
L40-6C	6	8	8	3840x2400	1
L40-4C	4	12	8	3840x2400	1

NVIDIA L40S

Table 81 NVIDIA vGPU for Compute for NVIDIA L40S#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
L40S-48C	49	1	1	3840x2400	1
L40S-24C	24	2	2	3840x2400	1
L40S-16C	16	3	2	3840x2400	1
L40S-12C	12	4	4	3840x2400	1
L40S-8C	8	6	4	3840x2400	1
L40S-6C	6	8	8	3840x2400	1
L40S-4C	4	12	8	3840x2400	1

NVIDIA L20

Table 82 NVIDIA vGPU for Compute for NVIDIA L20#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
L20-48C	49	1	1	3840x2400	1
L20-24C	24	2	2	3840x2400	1
L20-16C	16	3	2	3840x2400	1
L20-12C	12	4	4	3840x2400	1
L20-8C	8	6	4	3840x2400	1
L20-6C	6	8	8	3840x2400	1
L20-4C	4	12	8	3840x2400	1

NVIDIA L20 Liquid-Cooled

Table 83 NVIDIA vGPU for Compute for NVIDIA L20 Liquid-Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
L20-48C	49	1	1	3840x2400	1
L20-24C	24	2	2	3840x2400	1
L20-16C	16	3	2	3840x2400	1
L20-12C	12	4	4	3840x2400	1
L20-8C	8	6	4	3840x2400	1
L20-6C	6	8	8	3840x2400	1
L20-4C	4	12	8	3840x2400	1

NVIDIA L4

Table 84 NVIDIA vGPU for Compute for NVIDIA L4#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
L4-24C	24	1	1	3840x2400	1
L4-12C	12	2	2	3840x2400	1
L4-8C	8	3	2	3840x2400	1
L4-6C	6	4	4	3840x2400	1
L4-4C	4	6	4	3840x2400	1

NVIDIA L2

Table 85 NVIDIA vGPU for Compute for NVIDIA L2#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
L2-24C	24	1	1	3840x2400	1
L2-12C	12	2	2	3840x2400	1
L2-8C	8	3	2	3840x2400	1
L2-6C	6	4	4	3840x2400	1
L2-4C	4	6	4	3840x2400	1

NVIDIA RTX 6000 Ada

Table 86 NVIDIA vGPU for Compute for NVIDIA RTX 6000 Ada#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
RTX 6000 Ada-48C	49	1	1	3840x2400	1
RTX 6000 Ada-24C	24	2	2	3840x2400	1
RTX 6000 Ada-16C	16	3	2	3840x2400	1
RTX 6000 Ada-12C	12	4	4	3840x2400	1
RTX 6000 Ada-8C	8	6	4	3840x2400	1
RTX 6000 Ada-6C	6	8	8	3840x2400	1
RTX 6000 Ada-4C	4	12	8	3840x2400	1

NVIDIA RTX 5880 Ada

Table 87 NVIDIA vGPU for Compute for NVIDIA RTX 5880 Ada#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
RTX 5880 Ada-48C	49	1	1	3840x2400	1
RTX 5880 Ada-24C	24	2	2	3840x2400	1
RTX 5880 Ada-16C	16	3	2	3840x2400	1
RTX 5880 Ada-12C	12	4	4	3840x2400	1
RTX 5880 Ada-8C	8	6	4	3840x2400	1
RTX 5880 Ada-6C	6	8	8	3840x2400	1
RTX 5880 Ada-4C	4	12	8	3840x2400	1

NVIDIA RTX 5000 Ada

Table 88 NVIDIA vGPU for Compute for NVIDIA RTX 5000 Ada#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
RTX 5000 Ada-32C	32	1	1	3840x2400	1
RTX 5000 Ada-16C	16	2	2	3840x2400	1
RTX 5000 Ada-8C	8	4	4	3840x2400	1
RTX 5000 Ada-4C	4	8	8	3840x2400	1

NVIDIA Ampere GPU Architecture#

Physical GPUs per board: 1 (with the exception of NVIDIA A16)

The maximum number of vGPUs per board is the product of the maximum number of vGPUs per GPU and the number of physical GPUs per board.

Required license edition: NVIDIA AI Enterprise

Intended use cases:

vGPUs with more than 40 GB of framebuffer: Training Workloads
vGPUs with 40 GB of framebuffer: Inference Workloads

These vGPU types support a single display with a fixed maximum resolution.

NVIDIA A40

Table 89 NVIDIA vGPU for Compute for NVIDIA A40#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A40-48C	49	1	1	3840x2400	1
A40-24C	24	2	2	3840x2400	1
A40-16C	16	3	2	3840x2400	1
A40-12C	12	4	4	3840x2400	1
A40-8C	8	6	4	3840x2400	1
A40-6C	6	8	8	3840x2400	1
A40-4C	4	12	8	3840x2400	1

NVIDIA A16

Physical GPUs per board: 4

Table 90 NVIDIA vGPU for Compute for NVIDIA A16#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A16-16C	16	1	1	3840x2400	1
A16-8C	8	2	2	3840x2400	1
A16-4C	4	4	4	3840x2400	1

NVIDIA A10

Table 91 NVIDIA vGPU for Compute for NVIDIA A10#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A10-24C	24	1	1	3840x2400	1
A10-12C	12	2	2	3840x2400	1
A10-8C	8	3	2	3840x2400	1
A10-6C	6	4	4	3840x2400	1
A10-4C	4	6	4	3840x2400	1

NVIDIA RTX A6000

Table 92 NVIDIA vGPU for Compute for NVIDIA RTX A6000#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
RTXA6000-48C	49	1	1	3840x2400	1
RTXA6000-24C	24	2	2	3840x2400	1
RTXA6000-16C	16	3	2	3840x2400	1
RTXA6000-12C	12	4	4	3840x2400	1
RTXA6000-8C	8	6	4	3840x2400	1
RTXA6000-6C	6	8	8	3840x2400	1
RTXA6000-4C	4	12	8	3840x2400	1

NVIDIA RTX A5500

Table 93 NVIDIA vGPU for Compute for NVIDIA RTX A5500#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
RTXA5500-24C	24	1	1	3840x2400	1
RTXA5500-12C	12	2	2	3840x2400	1
RTXA5500-8C	8	3	2	3840x2400	1
RTXA5500-6C	6	4	4	3840x2400	1
RTXA5500-4C	4	6	4	3840x2400	1

NVIDIA RTX A5000

Table 94 NVIDIA vGPU for Compute for NVIDIA RTX A5000#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
RTXA5000-24	24	1	1	3840x2400	1
RTXA5000-12C	12	2	2	3840x2400	1
RTXA5000-8C	8	3	2	3840x2400	1
RTXA5000-6C	6	4	4	3840x2400	1
RTXA5000-4C	4	6	4	3840x2400	1

MIG-Backed and Time-Sliced NVIDIA vGPU for Compute for the NVIDIA Ampere GPU Architecture#

Physical GPUs per board: 1

The maximum number of vGPUs per board is the product of the maximum number of vGPUs per GPU and the number of physical GPUs per board.

Required license edition: NVIDIA AI Enterprise

MIG-Backed NVIDIA vGPU for Compute

For details on GPU instance profiles, refer to the NVIDIA Multi-Instance GPU User Guide.

Time-Sliced NVIDIA vGPU for Compute

Intended use cases:

vGPUs with more than 40 GB of framebuffer: Training Workloads
vGPUs with 40 GB of framebuffer: Inference Workloads

These vGPU types support a single display with a fixed maximum resolution.

NVIDIA A800 PCIe 80GB

Table 95 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A800 PCIe 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A800D-7-80C	81	1	7	7	MIG 7g.80gb
A800D-4-40C	40	1	4	4	MIG 4g.40gb
A800D-3-40C	40	2	3	3	MIG 3g.40gb
A800D-2-20C	20	3	2	2	MIG 2g.20gb
A800D-1-20C [3]	20	4	1	1	MIG 1g.20gb
A800D-1-10C	10	7	1	1	MIG 1g.10gb
A800D-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 96 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A800 PCIe 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A800D-80C	81	1	1	3840x2400	1
A800D-40C	40	2	2	3840x2400	1
A800D-20C	20	4	4	3840x2400	1
A800D-16C	16	5	4	3840x2400	1
A800D-10C	10	8	8	3840x2400	1
A800D-8C	8	10	8	3840x2400	1
A800D-4C	4	20	16	3840x2400	1

NVIDIA A800 PCIe 80GB Liquid Cooled

Table 97 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A800 PCIe 80GB Liquid-Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A800D-7-80C	81	1	7	7	MIG 7g.80gb
A800D-4-40C	40	1	4	4	MIG 4g.40gb
A800D-3-40C	40	2	3	3	MIG 3g.40gb
A800D-2-20C	20	3	2	2	MIG 2g.20gb
A800D-1-20C [3]	20	4	1	1	MIG 1g.20gb
A800D-1-10C	10	7	1	1	MIG 1g.10gb
A800D-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 98 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A800 PCIe 80GB Liquid-Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A800D-80C	81	1	1	3840x2400	1
A800D-40C	40	2	2	3840x2400	1
A800D-20C	20	4	4	3840x2400	1
A800D-16C	16	5	4	3840x2400	1
A800D-10C	10	8	8	3840x2400	1
A800D-8C	8	10	8	3840x2400	1
A800D-4C	4	20	16	3840x2400	1

NVIDIA AX800

Table 99 MIG-Backed NVIDIA vGPU for Compute for NVIDIA AX800#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A800D-7-80C	81	1	7	7	MIG 7g.80gb
A800D-4-40C	40	1	4	4	MIG 4g.40gb
A800D-3-40C	40	2	3	3	MIG 3g.40gb
A800D-2-20C	20	3	2	2	MIG 2g.20gb
A800D-1-20C [3]	20	4	1	1	MIG 1g.20gb
A800D-1-10C	10	7	1	1	MIG 1g.10gb
A800D-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 100 Time-Sliced NVIDIA vGPU for Compute for NVIDIA AX800#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A800D-80C	81	1	1	3840x2400	1
A800D-40C	40	2	2	3840x2400	1
A800D-20C	20	4	4	3840x2400	1
A800D-16C	16	5	4	3840x2400	1
A800D-10C	10	8	8	3840x2400	1
A800D-8C	8	10	8	3840x2400	1
A800D-4C	4	20	16	3840x2400	1

NVIDIA A800 PCIe 40GB Active Cooled

Table 101 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A800 PCIe 40GB Active Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A800-7-40C	40	1	7	7	MIG 7g.40gb
A800-4-20C	20	1	4	4	MIG 4g.20gb
A800-3-20C	20	2	3	3	MIG 3g.20gb
A800-2-10C	10	3	2	2	MIG 2g.10gb
A800-1-10C [3]	10	4	1	1	MIG 1g.10gb
A800-1-5C	5	7	1	1	MIG 1g.5gb
A800-1-5CME [3]	5	1	1	1	MIG 1g.5gb+me

Table 102 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A800 PCIe 40GB Active Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A800-40C	40	1	1	3840x2400	1
A800-20C	20	2	2	3840x2400	1
A800-10C	10	4	4	3840x2400	1
A800-8C	8	5	4	3840x2400	1
A800-5C	5	8	8	3840x2400	1
A800-4C	4	10	8	3840x2400	1

NVIDIA A800 HGX 80GB

Table 103 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A800 HGX 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A800DX-7-80C	81	1	7	7	MIG 7g.80gb
A800DX-4-40C	40	1	4	4	MIG 4g.40gb
A800DX-3-40C	40	2	3	3	MIG 3g.40gb
A800DX-2-20C	20	3	2	2	MIG 2g.20gb
A800DX-1-20C [3]	20	4	1	1	MIG 1g.20gb
A800DX-1-10C	10	7	1	1	MIG 1g.10gb
A800DX-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 104 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A800 HGX 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A800DX-80C	81	1	1	3840x2400	1
A800DX-40C	40	2	2	3840x2400	1
A800DX-20C	20	4	4	3840x2400	1
A800DX-16C	16	5	4	3840x2400	1
A800DX-10C	10	8	8	3840x2400	1
A800DX-8C	8	10	8	3840x2400	1
A800DX-4C	4	20	16	3840x2400	1

NVIDIA A100 PCIe 80GB

Table 105 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A100 PCIe 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A100D-7-80C	81	1	7	7	MIG 7g.80gb
A100D-4-40C	40	1	4	4	MIG 4g.40gb
A100D-3-40C	40	2	3	3	MIG 3g.40gb
A100D-2-20C	20	3	2	2	MIG 2g.20gb
A100D-1-20C [3]	20	4	1	1	MIG 1g.20gb
A100D-1-10C	10	7	1	1	MIG 1g.10gb
A100D-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 106 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A100 PCIe 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A100D-80C	81	1	1	3840x2400	1
A100D-40C	40	2	2	3840x2400	1
A100D-20C	20	4	4	3840x2400	1
A100D-16C	16	5	4	3840x2400	1
A100D-10C	10	8	8	3840x2400	1
A100D-8C	8	10	8	3840x2400	1
A100D-4C	4	20	16	3840x2400	1

NVIDIA A100 PCIe 80GB Liquid-Cooled

Table 107 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A100 PCIe 80GB Liquid-Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A100D-7-80C	81	1	7	7	MIG 7g.80gb
A100D-4-40C	40	1	4	4	MIG 4g.40gb
A100D-3-40C	40	2	3	3	MIG 3g.40gb
A100D-2-20C	20	3	2	2	MIG 2g.20gb
A100D-1-20C [3]	20	4	1	1	MIG 1g.20gb
A100D-1-10C	10	7	1	1	MIG 1g.10gb
A100D-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 108 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A100 PCIe 80GB Liquid-Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A100D-80C	81	1	1	3840x2400	1
A100D-40C	40	2	2	3840x2400	1
A100D-20C	20	4	4	3840x2400	1
A100D-16C	16	5	4	3840x2400	1
A100D-10C	10	8	8	3840x2400	1
A100D-8C	8	10	8	3840x2400	1
A100D-4C	4	20	16	3840x2400	1

NVIDIA A100X

Table 109 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A100X#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A100D-7-80C	81	1	7	7	MIG 7g.80gb
A100D-4-40C	40	1	4	4	MIG 4g.40gb
A100D-3-40C	40	2	3	3	MIG 3g.40gb
A100D-2-20C	20	3	2	2	MIG 2g.20gb
A100D-1-20C [3]	20	4	1	1	MIG 1g.20gb
A100D-1-10C	10	7	1	1	MIG 1g.10gb
A100D-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 110 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A100X#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A100D-80C	81	1	1	3840x2400	1
A100D-40C	40	2	2	3840x2400	1
A100D-20C	20	4	4	3840x2400	1
A100D-16C	16	5	4	3840x2400	1
A100D-10C	10	8	8	3840x2400	1
A100D-8C	8	10	8	3840x2400	1
A100D-4C	4	20	16	3840x2400	1

NVIDIA A100 HGX 80GB

Table 111 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A100 HGX 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A100DX-7-80C	81	1	7	7	MIG 7g.80gb
A100DX-4-40C	40	1	4	4	MIG 4g.40gb
A100DX-3-40C	40	2	3	3	MIG 3g.40gb
A100DX-2-20C	20	3	2	2	MIG 2g.20gb
A100DX-1-20C [3]	20	4	1	1	MIG 1g.20gb
A100DX-1-10C	10	7	1	1	MIG 1g.10gb
A100DX-1-10CME [3]	10	1	1	1	MIG 1g.10gb+me

Table 112 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A100 HGX 80GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A100DX-80C	81	1	1	3840x2400	1
A100DX-40C	40	2	2	3840x2400	1
A100DX-20C	20	4	4	3840x2400	1
A100DX-16C	16	5	4	3840x2400	1
A100DX-10C	10	8	8	3840x2400	1
A100DX-8C	8	10	8	3840x2400	1
A100DX-4C	4	20	16	3840x2400	1

NVIDIA A100 PCIe 40GB

Table 113 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A100 PCIe 40GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A100-7-40C	40	1	7	7	MIG 7g.40gb
A100-4-20C	20	1	4	4	MIG 4g.20gb
A100-3-20C	20	2	3	3	MIG 3g.20gb
A100-2-10C	10	3	2	2	MIG 2g.10gb
A100-1-10C [3]	10	4	1	1	MIG 1g.10gb
A100-1-5C	5	7	1	1	MIG 1g.5gb
A100-1-5CME [3]	5	1	1	1	MIG 1g.5gb+me

Table 114 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A100 PCIe 40GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A100-40C	40	1	1	3840x2400	1
A100-20C	20	2	2	3840x2400	1
A100-10C	10	4	4	3840x2400	1
A100-8C	8	5	4	3840x2400	1
A100-5C	5	8	8	3840x2400	1
A100-4C	4	10	8	3840x2400	1

NVIDIA A100 HGX 40GB

Table 115 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A100 HGX 40GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A100X-7-40C	40	1	7	7	MIG 7g.40gb
A100X-4-20C	20	1	4	4	MIG 4g.20gb
A100X-3-20C	20	2	3	3	MIG 3g.20gb
A100X-2-10C	10	3	2	2	MIG 2g.10gb
A100X-1-10C [3]	10	4	1	1	MIG 1g.10gb
A100X-1-5C	5	7	1	1	MIG 1g.5gb
A100X-1-5CME [3]	5	1	1	1	MIG 1g.5gb+me

Table 116 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A100 HGX 40GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A100X-40C	40	1	1	3840x2400	1
A100X-20C	20	2	2	3840x2400	1
A100X-10C	10	4	4	3840x2400	1
A100X-8C	8	5	4	3840x2400	1
A100X-5C	5	8	8	3840x2400	1
A100X-4C	4	10	8	3840x2400	1

NVIDIA A30

Table 117 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A30#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A30-4-24C	24	1	4	4	MIG 4g.24gb
A30-2-12C	12	2	2	2	MIG 2g.12gb
A30-2-12CME [3]	12	1	2	2	MIG 2g.12gb+me
A30-1-6C	6	4	1	1	MIG 1g.6gb
A30-1-6CME [3]	6	1	1	1	MIG 1g.6gb+me

Table 118 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A30#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A30-24C	24	1	1	3840x2400	1
A30-12C	12	2	2	3840x2400	1
A30-8C	8	3	2	3840x2400	1
A30-6C	6	4	4	3840x2400	1
A30-4C	4	6	4	3840x2400	1

NVIDIA A30X

Table 119 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A30X#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A30-4-24C	24	1	4	4	MIG 4g.24gb
A30-2-12C	12	2	2	2	MIG 2g.12gb
A30-2-12CME [3]	12	1	2	2	MIG 2g.12gb+me
A30-1-6C	6	4	1	1	MIG 1g.6gb
A30-1-6CME [3]	6	1	1	1	MIG 1g.6gb+me

Table 120 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A30X#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A30-24C	24	1	1	3840x2400	1
A30-12C	12	2	2	3840x2400	1
A30-8C	8	3	2	3840x2400	1
A30-6C	6	4	4	3840x2400	1
A30-4C	4	6	4	3840x2400	1

NVIDIA A30 Liquid-Cooled

Table 121 MIG-Backed NVIDIA vGPU for Compute for NVIDIA A30 Liquid-Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Slices per vGPU	Compute Instances per vGPU	Corresponding GPU Instance Profile
A30-4-24C	24	1	4	4	MIG 4g.24gb
A30-2-12C	12	2	2	2	MIG 2g.12gb
A30-2-12CME [3]	12	1	2	2	MIG 2g.12gb+me
A30-1-6C	6	4	1	1	MIG 1g.6gb
A30-1-6CME [3]	6	1	1	1	MIG 1g.6gb+me

Table 122 Time-Sliced NVIDIA vGPU for Compute for NVIDIA A30 Liquid-Cooled#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU in Equal-Size Mode	Maximum vGPUs per GPU in Mixed-Size Mode	Maximum Display Resolution [4]	Virtual Displays per vGPU
A30-24C	24	1	1	3840x2400	1
A30-12C	12	2	2	3840x2400	1
A30-8C	8	3	2	3840x2400	1
A30-6C	6	4	4	3840x2400	1
A30-4C	4	6	4	3840x2400	1

NVIDIA Turing GPU Architecture#

Physical GPUs per board: 1

The maximum number of vGPUs per board is the product of the maximum number of vGPUs per GPU and the number of physical GPUs per board.

This GPU does not support mixed-size mode.

Intended use cases:

vGPUs with more than 40 GB of framebuffer: Training Workloads
vGPUs with 40 GB of framebuffer: Inference Workloads

Required license edition: NVIDIA AI Enterprise

These vGPU types support a single display with a fixed maximum resolution.

NVIDIA Tesla T4

Table 123 NVIDIA vGPU for Compute for NVIDIA Tesla T4#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum Display Resolution [4]	Virtual Displays per vGPU
T4-16C	16	1	3840x2400	1
T4-8C	8	2	3840x2400	1
T4-4C	4	4	3840x2400	1

NVIDIA Quadro RTX 6000 Passive

Table 124 NVIDIA vGPU for Compute for NVIDIA Quadro RTX 6000 Passive#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum Display Resolution [4]	Virtual Displays per vGPU
RTX6000P-24C	24	1	3840x2400	1
RTX6000P-12C	12	2	3840x2400	1
RTX6000P-8C	8	3	3840x2400	1
RTX6000P-6C	6	4	3840x2400	1
RTX6000P-4C	4	6	3840x2400	1

NVIDIA Quadro RTX 8000 Passive

Table 125 NVIDIA vGPU for Compute for NVIDIA Quadro RTX 8000 Passive#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum Display Resolution [4]	Virtual Displays per vGPU
RTX8000P-48C	49	1	3840x2400	1
RTX8000P-24C	24	2	3840x2400	1
RTX8000P-16C	16	3	3840x2400	1
RTX8000P-12C	12	4	3840x2400	1
RTX8000P-8C	8	6	3840x2400	1
RTX8000P-6C	6	8	3840x2400	1
RTX8000P-4C	4	8	3840x2400	1

NVIDIA Volta GPU Architecture#

Physical GPUs per board: 1

The maximum number of vGPUs per board is the product of the maximum number of vGPUs per GPU and the number of physical GPUs per board.

This GPU does not support mixed-size mode.

Intended use cases:

vGPUs with more than 40 GB of framebuffer: Training Workloads
vGPUs with 40 GB of framebuffer: Inference Workloads

Required license edition: NVIDIA AI Enterprise

These vGPU types support a single display with a fixed maximum resolution.

NVIDIA Tesla V100 SXM2

Table 126 NVIDIA vGPU for Compute for NVIDIA Tesla V100 SXM2#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum Display Resolution [4]	Virtual Displays per vGPU
V100X-16C	16	1	3840x2400	1
V100X-8C	8	2	3840x2400	1
V100X-4C	4	4	3840x2400	1

NVIDIA Tesla V100 SXM2 32GB

Table 127 NVIDIA vGPU for Compute for NVIDIA Tesla V100 SXM2 32GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum Display Resolution [4]	Virtual Displays per vGPU
V100DX-32C	32	1	3840x2400	1
V100DX-16C	16	2	3840x2400	1
V100DX-8C	8	4	3840x2400	1
V100DX-4C	6	8	3840x2400	1

NVIDIA Tesla V100 PCIe

Table 128 NVIDIA vGPU for Compute for NVIDIA Tesla V100 PCIe#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum Display Resolution [4]	Virtual Displays per vGPU
V100-16C	16	1	3840x2400	1
V100-8C	8	2	3840x2400	1
V100-4C	4	4	3840x2400	1

NVIDIA Tesla V100 PCIe 32GB

Table 129 NVIDIA vGPU for Compute for NVIDIA Tesla V100 PCIe 32GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum Display Resolution [4]	Virtual Displays per vGPU
V100D-32C	32	1	3840x2400	1
V100D-16C	16	2	3840x2400	1
V100D-8C	8	4	3840x2400	1
V100D-4C	4	8	3840x2400	1

NVIDIA Tesla V100S PCIe 32GB

Table 130 NVIDIA vGPU for Compute for NVIDIA Tesla V100S PCIe 32GB#
Virtual GPU Type	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum Display Resolution [4]	Virtual Displays per vGPU
V100S-32C	32	1	3840x2400	1
V100S-16C	16	2	3840x2400	1
V100S-8C	8	4	3840x2400	1
V100S-4C	4	8	3840x2400	1

NVIDIA Tesla V100 FHHL

Table 131 NVIDIA vGPU for Compute for NVIDIA Tesla V100 FHHL#
Virtual GPU Type	Intended Use Case	Framebuffer (GB)	Maximum vGPUs per GPU	Maximum vGPUs per Board	Maximum Display Resolution [4]	Virtual Displays per vGPU
V100L-16C	Training Workloads	16	1	1	3840x2400	1
V100L-8C	Training Workloads	8	2	2	3840x2400	1
V100L-4C	Inference Workloads	4	4	4	3840x2400	1

vGPU for Compute FAQs#

Q. What are the differences between NVIDIA vGPU for Compute and GPU passthrough?

NVIDIA vGPU for Compute and GPU passthrough are two different approaches to deploying NVIDIA GPUs in a virtualized environment supported by NVIDIA AI Enterprise. NVIDIA vGPU for Compute enables multiple VMs to share a single physical GPU concurrently. This approach is highly cost-effective and scalable because GPU resources are efficiently distributed among various workloads. It also delivers excellent compute performance while utilizing NVIDIA drivers. vGPU deployments offer live migration and suspend/resume capabilities, providing greater flexibility in VM management. In contrast, GPU passthrough dedicates an entire physical GPU to a single VM. While this provides maximum performance as the VM has exclusive access to the GPU, it does not support live migration or suspend/resume features. Since the GPU cannot be shared with other VMs, passthrough is less scalable and is typically more suitable for workloads that demand dedicated GPU power.

Q. Where do I download the NVIDIA vGPU for Compute from?

NVIDIA vGPU for Compute is available to download from the NVIDIA AI Enterprise Infra Collection, which you can access by logging in to the NVIDIA NGC Catalog. If you have not already purchased NVIDIA AI Enterprise and want to try it, you can obtain a NVIDIA AI Enterprise 90 Day Trial License.

Q. What is the difference between vGPU and MIG?

The fundamental distinction between vGPU and MIG lies in their approach to GPU resource partitioning.

MIG (Multi-Instance GPU) employs spatial partitioning, dividing a single GPU into several independent, isolated instances. Each MIG instance possesses its own dedicated compute cores, memory, and resources, operating simultaneously and independently. This architecture guarantees predictable performance by eliminating resource contention. While an entire MIG-enabled GPU can be passed through to a single VM, individual MIG instances cannot be directly assigned to multiple VMs without the integration of vGPU. For multi-tenancy across VMs utilizing MIG, vGPU is essential. It empowers the hypervisor to manage and allocate distinct MIG-backed vGPUs to different virtual machines. Once assigned, each MIG instance functions as a separate, isolated GPU, delivering strict resource isolation and consistent performance for workloads. For more information on using vGPU with MIG, refer to the technical brief.

vGPU (Virtual GPU) utilizes temporal partitioning. This method allows multiple virtual machines to share GPU resources by alternating access through a time-slicing mechanism. The GPU scheduler dynamically assigns time slices to each VM, effectively balancing workload demands. While this approach offers greater flexibility and higher GPU utilization, performance can vary based on the specific demands of the concurrent workloads. To enable multi-tenancy, where multiple VMs share a single physical GPU, vGPU is a prerequisite. Without vGPU, a GPU can only be assigned to one VM at a time, thereby limiting scalability and overall resource efficiency.

Q. What is the difference between time-sliced vGPUs and MIG-backed vGPUs?

Time-sliced vGPUs and MIG-backed vGPUs are two different approaches to sharing GPU resources in virtualized environments. Here are the key differences:

Table 132 Differences Between Time-Sliced and MIG-Backed vGPUs#

Time-sliced vGPUs

MIG-backed vGPUs

Share the entire GPU among multiple VMs.

Partition the GPU into smaller, dedicated instances.

Each vGPU gets full access to all streaming multiprocessors (SMs) and engines, but only for a specific time slice.

Each vGPU gets exclusive access to a portion of the GPU’s memory and compute resources.

Processes run in series, with each vGPU waiting while others use the GPU.

Processes run in parallel on dedicated hardware slices.

The number of VMs per GPU is limited only by framebuffer size.

Depending on the number of MIG instances supported on a GPU, this can range from 4 to 7 VMs per GPU.

Better for workloads that require occasional bursts of full GPU power.

Provides better performance isolation and more consistent latency.

Q. Where can I find more information on the NVIDIA License System (NLS), which is the licensing solution for vGPU for Compute?

You can refer to the NVIDIA License System documentation and the NLS FAQ.

Footnotes

NVIDIA HGX A100 4-GPU baseboard with four fully connected GPUs
NVIDIA HGX A100 8-GPU baseboards with eight fully connected GPUs

Fully connected means that each GPU is connected to every other GPU on the baseboard.

Time-sliced vGPUs	MIG-backed vGPUs
Share the entire GPU among multiple VMs.	Partition the GPU into smaller, dedicated instances.
Each vGPU gets full access to all streaming multiprocessors (SMs) and engines, but only for a specific time slice.	Each vGPU gets exclusive access to a portion of the GPU’s memory and compute resources.
Processes run in series, with each vGPU waiting while others use the GPU.	Processes run in parallel on dedicated hardware slices.
The number of VMs per GPU is limited only by framebuffer size.	Depending on the number of MIG instances supported on a GPU, this can range from 4 to 7 VMs per GPU.
Better for workloads that require occasional bursts of full GPU power.	Provides better performance isolation and more consistent latency.