Multi-vGPU and P2P#

Multi-vGPU and Peer-to-Peer (P2P) CUDA transfers are two related capabilities for multi-GPU VM workloads. Each capability is documented on its own page, and the hypervisor support table on this page applies to both.

Table 22 Multi-vGPU and P2P Subpages#

Subpage

Use when you need

Multi-vGPU

To attach several vGPU devices to one VM (time-sliced or MIG-backed, possibly across different physical GPUs) for training and inference that need multiple GPUs inside a single guest.

Peer-to-Peer (P2P) CUDA Transfers

To allow CUDA kernels to access device memory between vGPUs in the same VM over NVLink, including the supported board/vGPU profile combinations and the known A100 and UVM caveat.

Hypervisor Platform Support for Multi-vGPU and P2P#

The hypervisor support below applies to both Multi-vGPU and P2P features.

Table 23 Hypervisor Platform Support for Multi-vGPU and P2P#

Hypervisor Platform

NVIDIA AI Enterprise Infra Release

Supported vGPU Types

Documentation

Red Hat Enterprise Linux with KVM

All active NVIDIA AI Enterprise Infra Releases

All NVIDIA vGPU for Compute with PCIe GPUs; on supported GPUs, both time-sliced and MIG-backed vGPUs are supported.

Setting up Multi-vGPU VMs on RHEL KVM

Ubuntu with KVM

All active NVIDIA AI Enterprise Infra Releases

All NVIDIA vGPU for Compute with PCIe GPUs; on supported GPUs, both time-sliced and MIG-backed vGPUs are supported.

Setting up Multi-vGPU VMs on Ubuntu KVM

VMware vSphere

All active NVIDIA AI Enterprise Infra Releases

All NVIDIA vGPU for Compute, on supported GPUs, both time-sliced and MIG-backed vGPUs are supported.

Setting up Multi-vGPU on VMware vSphere 8

Note

P2P CUDA transfers are not supported on Windows. Only Linux OS distros as outlined in NVIDIA AI Enterprise Infrastructure Support Matrix are supported.