Is this page helpful?

NVIDIA vGPU Types Reference#

This reference provides complete vGPU type specifications for all supported NVIDIA GPU architectures.

Quick Navigation by GPU Architecture

🆕 Blackwell Architecture

Latest generation GPUs

NVIDIA B200 HGX
NVIDIA RTX PRO 6000 Blackwell SE

Blackwell Architecture vGPU Types

Hopper Architecture

High-performance AI training and inference

NVIDIA H100, H200, H800, H20

Hopper Architecture vGPU Types

Ada Lovelace Architecture

Advanced ray tracing and AI

NVIDIA L4, L20, L40, L40S
NVIDIA RTX 6000 Ada

Ada Lovelace Architecture vGPU Types

Ampere Architecture

Proven AI and HPC performance

NVIDIA A100, A30, A40, A10, A16
NVIDIA RTX A-series

Ampere Architecture vGPU Types

Turing Architecture

First-generation ray tracing

NVIDIA T4
NVIDIA Quadro RTX

Turing Architecture vGPU Types

Volta Architecture

First Tensor Core generation

NVIDIA V100

Volta Architecture vGPU Types

Understanding vGPU Types

vGPU types define the GPU resources allocated to virtual machines. Each type specifies:

Framebuffer Size - Amount of GPU memory
Maximum vGPUs - Number of vGPUs supported per physical GPU
Compute Resources - SMs, encoders, decoders
License Edition - Required NVIDIA AI Enterprise license

vGPU Configuration Modes

For the canonical mode comparison (description, partitioning, isolation, use cases, and supported GPUs per mode), refer to Supported vGPU Modes on the overview page. For detailed configuration guidance, refer to vGPU Configuration.

Frequently Asked Questions#

Q. What are the differences between NVIDIA vGPU for Compute and GPU passthrough?

NVIDIA vGPU for Compute and GPU passthrough are two methods for deploying NVIDIA GPUs in a virtualized environment supported by NVIDIA AI Enterprise. NVIDIA vGPU for Compute enables multiple VMs to share a single physical GPU concurrently. This method is cost-effective and scalable because GPU resources are efficiently distributed among workloads. It also delivers excellent compute performance while utilizing NVIDIA drivers. vGPU deployments offer live migration and suspend/resume capabilities, providing greater flexibility in VM management. In contrast, GPU passthrough dedicates an entire physical GPU to a single VM. While this provides maximum performance as the VM has exclusive access to the GPU, it doesn’t support live migration or suspend/resume features. Since the GPU cannot be shared with other VMs, passthrough is less scalable and is typically more suitable for workloads that demand dedicated GPU power.

Q. Where do I download the NVIDIA vGPU for Compute from?

NVIDIA vGPU for Compute is available to download from the NVIDIA AI Enterprise Infra Collection, which you can access by logging in to the NVIDIA NGC Catalog. If you have not already purchased NVIDIA AI Enterprise and want to try it, you can obtain a NVIDIA AI Enterprise 90 Day Trial License.

Q. What is the difference between vGPU and MIG?

Time-sliced vGPUs and MIG-backed vGPUs are two methods for sharing GPU resources in virtualized environments. Time-sliced vGPUs share an entire GPU among multiple VMs through temporal scheduling — each vGPU gets full access to streaming multiprocessors and engines for a specific time slice — while MIG-backed vGPUs partition the GPU into smaller dedicated instances with spatial isolation, giving each VM exclusive access to its own portion of memory and compute resources. Time-sliced suits workloads with occasional bursts of full GPU power; MIG-backed delivers stronger performance isolation and more consistent latency. For the full mode comparison including supported GPUs, refer to Supported vGPU Modes.

Q. What is the difference between time-sliced vGPUs and MIG-backed vGPUs?

Time-sliced vGPUs and MIG-backed vGPUs are two methods for sharing GPU resources in virtualized environments. The key differences are:

Time-sliced vGPUs	MIG-backed vGPUs
Share the entire GPU among multiple VMs.	Partition the GPU into smaller, dedicated instances.
Each vGPU gets full access to all streaming multiprocessors (SMs) and engines, but only for a specific time slice.	Each vGPU gets exclusive access to a portion of the GPU’s memory and compute resources.
Processes run in series, with each vGPU waiting while others use the GPU.	Processes run in parallel on dedicated hardware slices.
The number of VMs per GPU is limited only by framebuffer size.	Depending on the number of MIG instances supported on a GPU, this can range from 4 to 7 VMs per GPU.
Better for workloads that require occasional bursts of full GPU power.	Provides better performance isolation and more consistent latency.

Q. Where can I find more information on the NVIDIA License System (NLS), the licensing solution for vGPU for Compute?

You can refer to the NVIDIA License System documentation and the NLS FAQ.

Reference Pages by Architecture