Requirements

  • At least one NVIDIA A100 or T4 GPU in a single NVIDIA-Certified Systems with NVIDIA ConnectX-6 Dx NIC

  • Servers hosting the VMs connect to an NVIDIA Mellanox Spectrum switch

  • GPU and NIC pairs need to be on the same root complex

  • VMware vSphere 7.0 Update 2

  • NVIDIA AI Enterprise Software Suite

    • NVIDIA AI Enterprise Host Software

    • NVIDIA Guest Driver

  • NVIDIA Virtual GPU license server

Note

The installation of VMware ESXi and the NVIDIA vGPU Host and Guest Driver Software is out of the scope of this document. Please refer to the NVIDIA AI Enterprise Deployment Guide for detailed instructions. To set up AI-ready VMs on VMware, a vGPU profile needs to add to the VM. This requires installing the vGPU Host Manager on ESXi, attaching a vGPU profile, installing a vGPU guest driver on the VM, and licensing the VM. The following sections of the guide are helpful for reference:

The following server configuration details are considered best practices:

  • Hyper-threading – Enabled

  • Power Setting or System Profile – High Performance

  • CPU Performance (if applicable) – Enterprise or High Throughput

  • Memory Mapped I/O above 4-GB – Enabled (if applicable)

  • Single Root I/O Virtualization (SR-IOV) – Enabled

Before proceeding with the guide, at least 2 VMs should be pre-created. The following are the hardware requirements for each VM:

  • One ConnectX-6 Dx NIC per VM

  • One GPU per VM

Because C-Series vGPUs have large BAR memory settings, the following configuration requirements are required.

  • The guest OS must be a 64-bit OS.

  • 64-bit MMIO and EFI boot must be enabled for the VM.

  • The guest OS must be able to be installed in EFI boot mode.

  • VM version 19. The VM’s MMIO space must be increased to 128 GB as explained in VMware Knowledge Base Article: VMware vSphere VMDirectPath I/O: Requirements for Platforms and Devices (2142307).

  • 16 vCPU (All the cores assigned to a single socket)

  • 64 GB RAM

  • 500 GB disk

  • VMXNet3 NIC connected to a network

  • NVIDIA full vGPU non MiG profile attached (A100-40C)

  • NVIDIA ConnectX-6 Dx NIC connected in passthrough

  • Ubuntu Server 20.04 Server HWE 64-bit

Install the following within the VMs using this sequence:

  • vGPU 12.0 or later VM drivers

  • Docker, please refer to the Docker installation guide.

  • NVIDIA Container Toolkit; this includes the required version of Docker.

Note

You do not need to install the CUDA Toolkit on the host, but the driver needs to be installed.

Previous Compute Workflows
Next Getting Started
© Copyright 2024, NVIDIA. Last updated on Apr 2, 2024.