Requirements#

Summary of Minimum Requirements#

At least one NVIDIA A100 or T4 GPU in a single NVIDIA-Certified Systems with NVIDIA ConnectX-6 Dx NIC
Servers hosting the VMs connect to an NVIDIA Mellanox Spectrum switch
GPU and NIC pairs need to be on the same root complex
VMware vSphere 7.0 Update 2
NVIDIA AI Enterprise Software Suite
- NVIDIA AI Enterprise Host Software
- NVIDIA Guest Driver
NVIDIA Virtual GPU license server

Note

The installation of VMware ESXi and the NVIDIA vGPU Host and Guest Driver Software is out of the scope of this document. Please refer to the NVIDIA AI Enterprise Deployment Guide for detailed instructions. To set up AI-ready VMs on VMware, a vGPU profile needs to add to the VM. This requires installing the vGPU Host Manager on ESXi, attaching a vGPU profile, installing a vGPU guest driver on the VM, and licensing the VM. The following sections of the guide are helpful for reference:

Server Configuration#

The following server configuration details are considered best practices:

Hyper-threading – Enabled
Power Setting or System Profile – High Performance
CPU Performance (if applicable) – Enterprise or High Throughput
Memory Mapped I/O above 4-GB – Enabled (if applicable)
Single Root I/O Virtualization (SR-IOV) – Enabled

VM Requirements#

Before proceeding with the guide, at least 2 VMs should be pre-created. The following are the hardware requirements for each VM:

One ConnectX-6 Dx NIC per VM
One GPU per VM

Because C-Series vGPUs have large BAR memory settings, the following configuration requirements are required.

The guest OS must be a 64-bit OS.
64-bit MMIO and EFI boot must be enabled for the VM.
The guest OS must be able to be installed in EFI boot mode.
VM version 19. The VM’s MMIO space must be increased to 128 GB as explained in VMware Knowledge Base Article: VMware vSphere VMDirectPath I/O: Requirements for Platforms and Devices (2142307).

VM Configuration#

16 vCPU (All the cores assigned to a single socket)
64 GB RAM
500 GB disk
VMXNet3 NIC connected to a network
NVIDIA full vGPU non MiG profile attached (A100-40C)
NVIDIA ConnectX-6 Dx NIC connected in passthrough
Ubuntu Server 20.04 Server HWE 64-bit

Additional VM Configuration#

Install the following within the VMs using this sequence:

vGPU 12.0 or later VM drivers
Docker, please refer to the Docker installation guide.
NVIDIA Container Toolkit; this includes the required version of Docker.

Note

You do not need to install the CUDA Toolkit on the host, but the driver needs to be installed.

Optional: Setting up the A100 vGPU VM with MIG#

Follow the Enabling the NVIDIA vGPU section of the NVIDIA AI Enterprise Deployment Guide.