Multi-vGPU and P2P#
Multi vGPU#
Multi-vGPU attaches several vGPU devices to one VM. Devices may be time-sliced or MIG-backed and can sit on different physical GPUs—you are not limited to slicing one physical GPU across many VMs.
That layout suits training and inference that need multiple GPUs inside one guest: each vGPU is dedicated to that VM, so workloads in the VM do not compete with other VMs on the same physical GPU for those devices (for example, a VM with two A100-class vGPUs versus one).
vGPU Support for Multi-vGPU#
You can assign multiple vGPUs with differing amounts of frame buffer to a single VM, provided the board type and the series of all the vGPUs are the same. For example, you can assign an A40-48C vGPU and an A40-16C time-sliced vGPUs to the same VM. You can also assign an A100-4-20C vGPU and one A100-2-10C vGPU to a VM, both on MIG instances from an A100 board. However, you cannot assign an A30-8C vGPU and an A16-8C vGPU to the same VM.
Board |
vGPU [1] |
|---|---|
NVIDIA HGX B300 279 GB |
Generic Linux with KVM hypervisors [3], Red Hat Enterprise Linux KVM, and Ubuntu: - All NVIDIA vGPU for Compute |
NVIDIA HGX B200 180 GB |
Generic Linux with KVM hypervisors [3], Red Hat Enterprise Linux KVM, and Ubuntu: - All NVIDIA vGPU for Compute |
NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB |
|
NVIDIA RTX Pro 4500 Blackwell Server Edition 32 GB |
|
Board |
vGPU [1] |
|---|---|
NVIDIA H800 PCIe 94 GB (H800 NVL) |
All NVIDIA vGPU for Compute |
NVIDIA H800 PCIe 80 GB |
All NVIDIA vGPU for Compute |
NVIDIA H800 SXM5 80 GB |
NVIDIA vGPU for Compute |
NVIDIA H200 PCIe 141 GB (H200 NVL) |
All NVIDIA vGPU for Compute |
NVIDIA H200 SXM5 141 GB |
NVIDIA vGPU for Compute |
NVIDIA H100 PCIe 94 GB (H100 NVL) |
All NVIDIA vGPU for Compute |
NVIDIA H100 SXM5 94 GB |
NVIDIA vGPU for Compute |
NVIDIA H100 PCIe 80 GB |
All NVIDIA vGPU for Compute |
NVIDIA H100 SXM5 80 GB |
NVIDIA vGPU for Compute |
NVIDIA H100 SXM5 64 GB |
NVIDIA vGPU for Compute |
NVIDIA H20 SXM5 141 GB |
NVIDIA vGPU for Compute |
NVIDIA H20 SXM5 96 GB |
NVIDIA vGPU for Compute |
Board |
vGPU |
|---|---|
NVIDIA L40 |
|
NVIDIA L40S |
|
NVIDIA L20 |
|
NVIDIA L4 |
|
NVIDIA L2 |
|
NVIDIA RTX 6000 Ada |
|
NVIDIA RTX 5880 Ada |
|
NVIDIA RTX 5000 Ada |
|
Board |
vGPU [1] |
|---|---|
|
|
NVIDIA A800 PCIe 40 GB active-cooled |
|
NVIDIA A800 HGX 80 GB |
|
|
|
NVIDIA A100 HGX 80 GB |
|
NVIDIA A100 PCIe 40 GB |
|
NVIDIA A100 HGX 40 GB |
|
NVIDIA A40 |
|
|
|
NVIDIA A16 |
|
NVIDIA A10 |
|
NVIDIA RTX A6000 |
|
NVIDIA RTX A5500 |
|
NVIDIA RTX A5000 |
|
Board |
vGPU |
|---|---|
Tesla T4 |
|
Quadro RTX 6000 passive |
|
Quadro RTX 8000 passive |
|
Board |
vGPU |
|---|---|
Tesla V100 SXM2 |
|
Tesla V100 SXM2 32 GB |
|
Tesla V100 PCIe |
|
Tesla V100 PCIe 32 GB |
|
Tesla V100S PCIe 32 GB |
|
Tesla V100 FHHL |
|
Peer-To-Peer (P2P) CUDA Transfers#
Peer-to-Peer (P2P) CUDA transfers enable device memory between vGPUs on different GPUs that are assigned to the same VM to be accessed from within CUDA kernels. NVLink is a high-bandwidth interconnect that enables fast communication between such vGPUs.
P2P CUDA transfers over NVLink are supported only on a subset of vGPUs, hypervisor releases, and guest OS releases.
Peer-to-Peer CUDA Transfers Known Issues and Limitations#
Only time-sliced vGPUs are supported. MIG-backed vGPUs are not supported.
P2P transfers over PCIe are not supported.
vGPU Support for P2P#
Only NVIDIA vGPU for Compute time-sliced vGPUs allocated all of the physical GPU framebuffer on physical GPUs supporting NVLink are supported.
Board |
vGPU |
|---|---|
NVIDIA HGX B300 279 GB |
NVIDIA B300X-279C |
NVIDIA HGX B200 180 GB |
NVIDIA B200X-180C |
Board |
vGPU |
|---|---|
NVIDIA H800 PCIe 94 GB (H800 NVL) |
H800L-94C |
NVIDIA H800 PCIe 80 GB |
H800-80C |
NVIDIA H200 PCIe 141 GB (H200 NVL) |
H200-141C |
NVIDIA H200 SXM5 141 GB |
H200X-141C |
NVIDIA H100 PCIe 94 GB (H100 NVL) |
H100L-94C |
NVIDIA H100 SXM5 94 GB |
H100XL-94C |
NVIDIA H100 PCIe 80 GB |
H100-80C |
NVIDIA H100 SXM5 80 GB |
H100XM-80C |
NVIDIA H100 SXM5 64 GB |
H100XS-64C |
NVIDIA H20 SXM5 141 GB |
H20X-141C |
NVIDIA H20 SXM5 96 GB |
H20-96C |
Board |
vGPU |
|---|---|
|
A800D-80C |
NVIDIA A800 PCIe 40 GB active-cooled |
A800-40C |
NVIDIA A800 HGX 80 GB |
A800DX-80C [2] |
|
A100D-80C |
NVIDIA A100 HGX 80 GB |
A100DX-80C [2] |
NVIDIA A100 PCIe 40 GB |
A100-40C |
NVIDIA A100 HGX 40 GB |
A100X-40C [2] |
NVIDIA A40 |
A40-48C |
|
A30-24C |
NVIDIA A16 |
A16-16C |
NVIDIA A10 |
A10-24C |
NVIDIA RTX A6000 |
A6000-48C |
NVIDIA RTX A5500 |
A5500-24C |
NVIDIA RTX A5000 |
A5000-24C |
Board |
vGPU |
|---|---|
Quadro RTX 8000 passive |
RTX8000P-48C |
Quadro RTX 6000 passive |
RTX6000P-24C |
Board |
vGPU |
|---|---|
Tesla V100 SXM2 |
V100X-16C |
Tesla V100 SXM2 32 GB |
V100DX-32C |
Hypervisor Platform Support for Multi-vGPU and P2P#
Hypervisor Platform |
NVIDIA AI Enterprise Infra Release |
Supported vGPU Types |
Documentation |
|---|---|---|---|
Red Hat Enterprise Linux with KVM |
All active NVIDIA AI Enterprise Infra Releases |
All NVIDIA vGPU for Compute with PCIe GPUs; on supported GPUs, both time-sliced and MIG-backed vGPUs are supported. |
|
Ubuntu with KVM |
All active NVIDIA AI Enterprise Infra Releases |
All NVIDIA vGPU for Compute with PCIe GPUs; on supported GPUs, both time-sliced and MIG-backed vGPUs are supported. |
|
VMware vSphere |
All active NVIDIA AI Enterprise Infra Releases |
All NVIDIA vGPU for Compute, on supported GPUs, both time-sliced and MIG-backed vGPUs are supported. |
Note
P2P CUDA transfers are not supported on Windows. Only Linux OS distros as outlined in NVIDIA AI Enterprise Infrastructure Support Matrix are supported.
Footnotes