Is this page helpful?

NVIDIA vGPU for Compute Overview#

NVIDIA vGPU (Virtual GPU) for Compute lets multiple VMs share one physical GPU while each VM sees a dedicated-style device. This page explains core terms, how components fit together, the three vGPU modes—Time-Sliced vGPU, MIG-Backed vGPU, and Time-Sliced MIG-Backed vGPU—and headline features, before you install or size a deployment. For platform-specific known issues and host/guest constraints, see NVIDIA vGPU for Compute Limitations.

Key Terms#

For definitions of vGPU Manager, vGPU Guest Driver, NVIDIA Licensing System, and NVIDIA AI Enterprise Infra Collection, see the Glossary.

NVIDIA vGPU Architecture Overview#

Under the NVIDIA Virtual GPU Manager on the hypervisor, one physical GPU can expose multiple vGPUs, each attachable to a guest VM as its own GPU device.

Download the latest vGPU for Compute drivers from the NVIDIA AI Enterprise Infra 8 collection.

NVIDIA vGPU for Compute architecture diagram showing hypervisor, VMs, and GPU partitioning

Each vGPU behaves like a GPU with a fixed framebuffer carved from the physical GPU at creation time; that memory stays reserved for that vGPU until the vGPU is destroyed.

NVIDIA vGPU for Compute Configurations#

Supported modes depend on the physical GPU:

Time-sliced vGPUs are available on all NVIDIA AI Enterprise supported GPUs.
On GPUs with Multi-Instance GPU (MIG), these MIG-backed variants are supported:
- MIG-backed vGPUs that use a whole GPU instance
- Time-sliced vGPUs within a MIG instance (time-sliced, MIG-backed)

Table 18 Supported vGPU Modes#
vGPU Mode	Description	GPU Partitioning	Isolation	Use Cases	Supported GPUs
Time-Sliced vGPU	Multiple Compute VMs time-share the same physical GPU: SMs and engines are used one vGPU at a time in scheduled slices.	Temporal	Strong hardware memory and fault isolation; round-robin scheduling yields solid throughput when strict per-VM compute isolation is not required.	Workloads tolerating shared compute time, or platforms without MIG-backed vGPU. Typical fits: lighter inference, preprocessing, and model dev/test before large training.	Infra Support Matrix
MIG-backed vGPU	Built from one or more MIG slices on a MIG-capable GPU; each VM owns the SMs and engines of its GPU instance and runs in parallel with VMs on other instances on the same card. See Virtual GPU Types for Supported GPUs for profiles and sizing.	Spatial	Strong hardware memory and fault isolation; dedicated cache and memory bandwidth per instance with lower scheduling jitter than pure time-slicing on the full GPU.	Multi-tenant or SLA-sensitive setups: consistent inference, fine-tuning, or smaller training jobs that need predictable isolation on one physical GPU.	MIG-Backed vGPU
Time-Sliced, MIG-Backed vGPU	Uses part of a MIG instance; VMs on that instance time-share its SMs and engines. Introduced with the RTX PRO 6000 and RTX PRO 4500 Blackwell Server Edition GPUs. Profile details: NVIDIA vGPU Types Reference.	Spatial between MIG instances; temporal within each instance.	Hardware isolation between instances; within an instance, time-sharing similar tradeoffs to full-GPU time-slicing but scoped to the MIG partition.	Dense multi-tenant layouts that need hard boundaries between tenant groups but more than one light workload per MIG slice—for example: Several low-to-moderate QPS inference services per slice Separate inference VMs per tenant on one physical GPU Batch jobs that can safely share a MIG partition	RTX PRO 6000 Blackwell Server Edition, RTX PRO 4500 Blackwell Server Edition

NVIDIA vGPU for Compute Key Features#

Beyond basic partitioning, vGPU for Compute adds networking, migration, scheduling, and memory features aimed at AI, ML, and HPC in virtualized clusters.

Table 19 Key Capabilities#
Capability	Description
MIG-Backed vGPU	Hardware-level GPU partitioning with spatial isolation for multi-tenant workloads.
Device Groups	Automated detection and provisioning of physically connected devices for optimal topology.
GPUDirect RDMA and Storage	Direct memory access and storage I/O bypass that reduces CPU overhead and latency for GPU-to-GPU and GPU-to-storage transfers.
Heterogeneous vGPU	Mixed vGPU profiles on a single GPU for diverse workload requirements.
Live Migration	VM migration with minimal downtime (seconds of stun time) for maintenance and load balancing on supported hypervisors.
Multi-vGPU and P2P	Multiple vGPUs per VM with peer-to-peer communication.
NVIDIA NVSwitch	High-bandwidth GPU-to-GPU interconnect fabric through NVLink.
NVLink Multicast	Efficient one-to-many data distribution for distributed training.
Scheduling Policies	Workload-specific GPU scheduling algorithms.
Suspend-Resume	VM state preservation for flexible resource management.
Unified Virtual Memory	Single memory address space across CPU and GPU.

For platform support, configuration requirements, and compatibility tables per feature, see NVIDIA vGPU for Compute Features.

Platform Limitations and Known Issues#

For host monitoring caveats, hypervisor-specific known issues, MMIO requirements for large-memory VMs, and Microsoft Windows Server constraints, see NVIDIA vGPU for Compute Limitations.