NVIDIA vGPU Types Reference#

This reference provides complete vGPU type specifications for all supported NVIDIA GPU architectures.

Quick Navigation by GPU Architecture

🆕 Blackwell Architecture

Latest generation GPUs

  • NVIDIA B300 HGX

  • NVIDIA B200 HGX

  • NVIDIA RTX Pro 4500 Blackwell SE

  • NVIDIA RTX Pro 6000 Blackwell SE

Blackwell Architecture vGPU Types
Hopper Architecture

High-performance AI training and inference

  • NVIDIA H100, H200, H800, H20

Hopper Architecture vGPU Types
Ada Lovelace Architecture

Advanced ray tracing and AI

  • NVIDIA L4, L20, L40, L40S

  • NVIDIA RTX 6000 Ada

Ada Lovelace Architecture vGPU Types
Ampere Architecture

Proven AI and HPC performance

  • NVIDIA A100, A30, A40, A10, A16

  • NVIDIA RTX A-series

Ampere Architecture vGPU Types
Turing Architecture

First-generation ray tracing

  • NVIDIA T4

  • NVIDIA Quadro RTX

Turing Architecture vGPU Types
Volta Architecture

First Tensor Core generation

  • NVIDIA V100

Volta Architecture vGPU Types

Understanding vGPU Types

vGPU types define the GPU resources allocated to virtual machines. Each type specifies:

  • Framebuffer Size - Amount of GPU memory

  • Maximum vGPUs - Number of vGPUs supported per physical GPU

  • Compute Resources - SMs, encoders, decoders

  • License Edition - Required NVIDIA AI Enterprise license

vGPU Configuration Modes

Table 50 vGPU Configuration Comparison#

Mode

Isolation

Use Case

Supported Architectures

Time-Sliced

Temporal

General-purpose, cost-effective

All architectures

MIG-Backed

Spatial (hardware)

Multi-tenant, guaranteed performance

Ampere, Hopper, Blackwell

Time-Sliced MIG-Backed

Spatial + Temporal

Maximum density with isolation

Blackwell (RTX Pro 4500, RTX Pro 6000)

For detailed configuration guidance, refer to vGPU Configuration.


Frequently Asked Questions#

Q. What are the differences between NVIDIA vGPU for Compute and GPU passthrough?

  1. Both are supported ways to use NVIDIA GPUs with NVIDIA AI Enterprise in a virtualized environment.

Table 51 vGPU for Compute vs GPU Passthrough#

vGPU for Compute

GPU Passthrough

GPU sharing

One physical GPU shared by multiple VMs

One physical GPU dedicated to one VM

Memory Isolation

  • Time sliced vGPU: Strong hardware based memory isolation enforced by IOMMU, configured through hypervisor

  • MIG backed vGPU: Dedicated L2 cache, memory controllers, and DRAM address buses per MIG instance provide strong hardware level spatial isolation

Complete. A single VM has exclusive GPU access

Fault isolation

Faults in one VM do not propagate to others

Complete. GPU fault or VM crash affects only that VM and its dedicated GPU

Live migration

Supported on compatible hypervisors and vGPU types

Not supported for the GPU device

Suspend/resume

Supported on compatible hypervisors and vGPU types

Not supported for the GPU device

Framebuffer per VM

Fraction of physical GPU memory (configured per vGPU profile)

Full physical GPU memory

Scheduling

Hypervisor-managed (Best Effort, Equal Share, or Fixed Share)

N/A — VM has exclusive GPU access

Heterogeneous profiles

Supported — mixed framebuffer sizes on one GPU

N/A — single VM per GPU

Best for

Multi-tenant, shared infrastructure, density optimization

Single-VM workloads requiring full GPU performance

Q. Where do I download the NVIDIA vGPU for Compute from?

  1. Download from the NVIDIA AI Enterprise Infra 8 collection after signing in to the NVIDIA NGC Catalog. For evaluation without a purchase, use the NVIDIA AI Enterprise 90 Day Trial License.

Q. What is the difference between vGPU and MIG?

  1. They partition the GPU differently: MIG uses fixed hardware slices (spatial partitioning); vGPU time-multiplexes access (temporal partitioning).

MIG (Multi-Instance GPU) splits one GPU into isolated instances, each with dedicated memory and compute. Instances run in parallel with minimal cross-tenant interference. You can pass through a whole MIG-capable GPU to one VM; assigning distinct MIG instances to different VMs in a multi-tenant setup uses MIG-backed vGPU so the hypervisor can hand each VM its slice. Refer to the technical brief.

vGPU (Virtual GPU) schedules time slices so multiple VMs share one physical GPU. Utilization is often high, but latency and throughput depend on what else is running. Sharing one GPU across several VMs in this way is what vGPU software is for; without it, a single GPU is typically bound to one VM at a time.

Q. What is the difference between time-sliced vGPUs and MIG-backed vGPUs?

  1. Both share GPU capacity across VMs. Time-sliced vGPUs multiplex the full GPU in time (processes run in series), while MIG-backed vGPUs assign dedicated hardware partitions for true parallel execution with stronger isolation. For a detailed comparison, refer to NVIDIA vGPU for Compute Features.

Q. Where can I find more information on the NVIDIA License System (NLS), the licensing solution for vGPU for Compute?

  1. Refer to the NVIDIA License System and the NLS FAQ.


Reference Pages by Architecture