NVIDIA vGPU Types Reference#

This reference provides complete vGPU type specifications for all supported NVIDIA GPU architectures.

Quick Navigation by GPU Architecture

Hopper Architecture

High-performance AI training and inference

  • NVIDIA H100, H200, H800, H20

Hopper Architecture vGPU Types
Ada Lovelace Architecture

Advanced ray tracing and AI

  • NVIDIA L4, L20, L40, L40S

  • NVIDIA RTX 6000 Ada

Ada Lovelace Architecture vGPU Types
Ampere Architecture

Proven AI and HPC performance

  • NVIDIA A100, A30, A40, A10, A16

  • NVIDIA RTX A-series

Ampere Architecture vGPU Types
Turing Architecture

First-generation ray tracing

  • NVIDIA T4

  • NVIDIA Quadro RTX

Turing Architecture vGPU Types
Volta Architecture

First Tensor Core generation

  • NVIDIA V100

Volta Architecture vGPU Types

Understanding vGPU Types

vGPU types define the GPU resources allocated to virtual machines. Each type specifies:

  • Framebuffer Size - Amount of GPU memory

  • Maximum vGPUs - Number of vGPUs supported per physical GPU

  • Compute Resources - SMs, encoders, decoders

  • License Edition - Required NVIDIA AI Enterprise license

vGPU Configuration Modes

Table 138 vGPU Configuration Comparison#

Mode

Isolation

Use Case

Supported Architectures

Time-Sliced

Temporal

General-purpose, cost-effective

All architectures

MIG-Backed

Spatial (hardware)

Multi-tenant, guaranteed performance

Ampere, Hopper

For detailed configuration guidance, refer to vGPU Configuration.


Frequently Asked Questions#

Q. What are the differences between NVIDIA vGPU for Compute and GPU passthrough?

  1. NVIDIA vGPU for Compute and GPU passthrough are two methods for deploying NVIDIA GPUs in a virtualized environment supported by NVIDIA AI Enterprise. NVIDIA vGPU for Compute enables multiple VMs to share a single physical GPU concurrently. This method is cost-effective and scalable because GPU resources are efficiently distributed among workloads. It also delivers excellent compute performance while utilizing NVIDIA drivers. vGPU deployments offer live migration and suspend/resume capabilities, providing greater flexibility in VM management. In contrast, GPU passthrough dedicates an entire physical GPU to a single VM. While this provides maximum performance as the VM has exclusive access to the GPU, it does not support live migration or suspend/resume features. Since the GPU cannot be shared with other VMs, passthrough is less scalable and is typically more suitable for workloads that demand dedicated GPU power.

Q. Where do I download the NVIDIA vGPU for Compute from?

  1. NVIDIA vGPU for Compute is available to download from the NVIDIA AI Enterprise Infra Collection, which you can access by logging in to the NVIDIA NGC Catalog. If you have not already purchased NVIDIA AI Enterprise and want to try it, you can obtain a NVIDIA AI Enterprise 90 Day Trial License.

Q. What is the difference between vGPU and MIG?

  1. The fundamental distinction between vGPU and MIG lies in their approach to GPU resource partitioning.

MIG (Multi-Instance GPU) employs spatial partitioning, dividing a single GPU into several independent, isolated instances. Each MIG instance possesses its own dedicated compute cores, memory, and resources, operating simultaneously and independently. This architecture guarantees predictable performance by eliminating resource contention. While an entire MIG-enabled GPU can be passed through to a single VM, individual MIG instances can’t be directly assigned to multiple VMs without the integration of vGPU. For multi-tenancy across VMs utilizing MIG, vGPU is essential. It empowers the hypervisor to manage and allocate distinct MIG-backed vGPUs to different virtual machines. Once assigned, each MIG instance functions as a separate, isolated GPU, delivering strict resource isolation and consistent performance for workloads. For more information on using vGPU with MIG, refer to the technical brief.

vGPU (Virtual GPU) utilizes temporal partitioning. This method allows multiple virtual machines to share GPU resources by alternating access through a time-slicing mechanism. The GPU scheduler dynamically assigns time slices to each VM, effectively balancing workload demands. While this approach offers greater flexibility and higher GPU utilization, performance can vary based on the specific demands of the concurrent workloads. To enable multi-tenancy, where multiple VMs share a single physical GPU, vGPU is a prerequisite. Without vGPU, a GPU can only be assigned to one VM at a time, thereby limiting scalability and overall resource efficiency.

Q. What is the difference between time-sliced vGPUs and MIG-backed vGPUs?

  1. Time-sliced vGPUs and MIG-backed vGPUs are two methods for sharing GPU resources in virtualized environments. The key differences are:

Table 139 Differences Between Time-Sliced and MIG-Backed vGPUs#

Time-sliced vGPUs

MIG-backed vGPUs

Share the entire GPU among multiple VMs.

Partition the GPU into smaller, dedicated instances.

Each vGPU gets full access to all streaming multiprocessors (SMs) and engines, but only for a specific time slice.

Each vGPU gets exclusive access to a portion of the GPU’s memory and compute resources.

Processes run in series, with each vGPU waiting while others use the GPU.

Processes run in parallel on dedicated hardware slices.

The number of VMs per GPU is limited only by framebuffer size.

Depending on the number of MIG instances supported on a GPU, this can range from 4 to 7 VMs per GPU.

Better for workloads that require occasional bursts of full GPU power.

Provides better performance isolation and more consistent latency.

Q. Where can I find more information on the NVIDIA License System (NLS), the licensing solution for vGPU for Compute?

  1. You can refer to the NVIDIA License System documentation and the NLS FAQ.


Reference Pages by Architecture