Creating Your First NVIDIA AI Enterprise VM#

Because C-Series vGPUs have large BAR memory settings, using these vGPUs has some restrictions on VMware ESXi.

The guest OS must be a 64-bit OS.
64-bit MMIO and EFI boot must be enabled for the VM.
The guest OS must be able to be installed in EFI boot mode.
The VM’s MMIO space must be increased to in correlation to GPU model.
For GPUDirectRDMA P2P must be enabled.

Creating a Virtual Machine#

These instructions are to assist in making a VM from scratch that will support NVIDIA vGPU. Later, the VM will be used as a gold master image. Use the following procedure to configure a vGPU for a single guest desktop:

Browse to the host or cluster using the vSphere Web Client.
Right-click the desired host or cluster and select New Virtual Machine.
Select Create a new virtual machine and click Next.
Enter a name for the virtual machine. Next, choose the location to host the virtual machine using the Select a location for the virtual machine section. Click Next to continue.
Select a compute resource to run the VM. Click Next to continue.

Note

This compute resource should include an NVIDIA vGPU enabled card installed and be correctly configured.
Select the datastore to host the virtual machine. Click Next to continue.
Next, select compatibility for the virtual machine. This should reflect the ESXi version for your NVIDIA-Certified Systems. Click Next to continue.
Select the appropriate Ubuntu Linux OS from the Guest OS Family and Guest OS Version pull-down menus. Click Next to continue.
Next, we will set up the hardware for the virtual machine. The following table summarizes the settings which we will set up within the upcoming steps.

Virtual Machine Configuration

CPU

16 vCPU on a single socket

RAM

64 GB

Storage

150 GB thin provisioned disk
Expand the CPU options by clicking the greater than sign. Set the CPU to 16 and the Cores per Socket to 16.
Next set the Memory to 64 GB.
Next expand the New Hard disk option by clicking on the greater than sign. Set the storage to 150 GB and the Disk Provisioning to Thin Provision.
Review the New Virtual Machine configuration before completion. Click Finish when ready.
The new virtual machine container is created.
Configure the VM boot options for EFI. Right-click on the new VM and select Edit Settings.
Click on the VM Options tab, expand Boot Options, change the Firmware from BIOS to EFI.
Expand Advanced and select Edit Configuration.

Adjust the Memory Mapped I/O (MMIO) settings for the VM

Click Add Configuration Params and add the parameters from the table, fill in xxx with the corresponding value in the column MMIO Space Required for the your GPU model.

Name

Value

pciPassthru.64bitMMIOSizeGB

xxx

GPU

MMIO Space Required

NVIDIA A10

64

NVIDIA A30

64

NVIDIA A40

128

NVIDIA A100 40GB (all variants)

128

NVIDIA A100 80GB (all variants)

256

NVIDIA RTX A5000

64

NVIDIA RTX A5500

64

NVIDIA RTX A6000

128

Tesla P100 (all variants)

64

Note

When NVLink is enabled, adjust the MMIO space for each GPU used accordingly.

Click Add Configuration Params again and add the parameters from the table.

Name

Value

pciPassthru.use64bitMMIO

TRUE

Note

For GPUDirectRDMA P2P must be enabled.

Name

Value

pciPassthru.use64bitMMIO

TRUE

Click Ok to close the advance configuration window, then click Ok to complete the VM configuration.

Important

NVIDIA AI Enterprise supports both Ubuntu 20.04 and Red Hat Enterprise Linux 8.4 with .. versionadded:: 1.1. You can find both installation guides below.

Installing Ubuntu Server LTS#

NVIDIA AI Enterprise is supported on Ubuntu LTS operating systems. It is important to note there are two Ubuntu ISO types: Desktop and Live Server. The Desktop version includes a graphical user interface (GUI), while the Live Server version only operates via a command line. This document uses the Live Server version 20.04 (amd64 architecture) of Ubuntu, though it is worth noting a GUI may be installed later if needed.

Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.
Power on the VM and wait for the installation screen to appear.
Select your preferred language and press the enter key.
Continue without updating as this guide is built around 20.04.
Configure the keyboard layout and press the enter key.
On this screen, select your network connection type and modify it to fit your internal requirements. This guide uses DHCP for the configuration.
If you have a proxy address, input it in this screen and press Done.
If you have an alternative mirror address for Ubuntu, input it here. Otherwise, if there is a default address, use it and press Done.
Format the entire disk. Then, select a disk to install.
Review the file system summary and select Done if satisfactory. Select Continue in the pop-up window.
Configure the VM with a user account, name, and password.
Select Install OpenSSH server and select Done.
Select any server snaps that may be required for internal use in your environment and select Done. Wait for the system to finish installing.
Select Reboot Now on the Ubuntu OS screen.
When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.

Installing Red Hat Enterprise Linux#

Added in version 1.1.

NVIDIA AI Enterprise is supported on Red Hat Enterprise Linux operating system.

Before the installation can begin, you will need to disable Secure Boot on the VM. Right click on the VM and select Edit Settings….
Next, select VM Options at the top of the window. Locate Boot Options, make sure Secure Boot is unchecked, and click Ok.

Important

Make sure you have added the listed Prerequisites and the PCI configuration parameters listed in Step #18 of Creating a Virtual Machine
Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.
Power on the VM and wait for the installation screen to appear.
Select your preferred language and Continue.
Next, select Time & Date under the Localization column. Set the time and date as required and click Done.
Next, select Software Packages under the Software column. Select Server and click Done.
Next, select Installation Destination under the System Menu. Select the VMware Virtual disk and click Done.
Next, select Network & Host Name under the System column. If your system is connected to a network, then it will try to get IP from DHCP server otherwise it can be configured manually. Click Done when finished.
Select Root Password under the User Settings Column. Create a password and click Done.
Click Begin Installation to start the install.
The installation will begin as shown below.
Once the installation is completed reboot the VM by clicking the Reboot System.
When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.

Enabling the NVIDIA vGPU#

Use the following procedure to enable vGPU support for your virtual machine. You must edit the virtual machine settings.

Power down the virtual machine.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings.
Click on the New Device bar and select PCI device.
Select the desired GPU Profile underneath the New PCI device drop-down.

Note

NVIDIA AI Enterprise requires a C-series profile.
Click OK and power on the VM.

Note

A single VM may have multiple GPU (PCI devices) attached, however, this requires that each GPU be configured with maximum memory allocation.

Installing the NVIDIA Driver in the Virtual Machine#

Now that you created a Linux VM, we will boot the VM, and install the NVIDIA AI Enterprise Guest driver in the VM to fully enable GPU operation.

Important

For a VM with vGPU, please continue in the sections below for vGPU guest driver installation steps. If you are using a VM with GPU passthrough: a vGPU driver or Data Center Driver can be used. Instructions for installing the Data Center Driver driver can be found here.

Important

Before proceeding with the NVIDIA Driver installation, please confirm that Nouveau is disabled. Instructions to confirm this are diffferen for Ubuntu Ubuntu and for RHEL.

Downloading the NVIDIA AI Enterprise Software Driver Using NGC#

Important

Before you begin you will need to generate or use an existing API key.

From a browser, go to https://ngc.nvidia.com/signin/email and then enter your email and password.
In the top right corner, click your user account icon and select Setup.
Click Get API Key to open the Setup > API Key page.

Note

The API Key is the mechanism used to authenticate your access to the NGC container registry.
Click Generate API Key to generate your API key.

Note

A warning message appears to let you know that your old API key will become invalid if you create a new key.
Click Confirm to generate the key.
Your API key appears.

Important

You only need to generate an API Key once. NGC does not save your key, so store it in a secure place. (You can copy your API Key to the clipboard by clicking the copy icon to the right of the API key.)Should you lose your API Key, you can generate a new one from the NGC website. When you generate a new API Key, the old one is invalidated.
Now you will log into the VM using the VM Console link on the left pane of this page.

Run the following commands to install the NGC CLI on the CLI Install page for either AMD64 or ARM64.

You must configure NGC CLI for your use so that you can run the commands. Enter the following command, including your API key when prompted.
 1ngc config set
 2
 3Enter API key [no-apikey]. Choices: [<VALID_APIKEY>, 'no-apikey']:
 4
 5Enter CLI output format type [ascii]. Choices: [ascii, csv, json]: ascii
 6
 7Enter org [no-org]. Choices: ['no-org']:
 8
 9Enter team [no-team]. Choices: ['no-team']:
10
11Enter ace [no-ace]. Choices: ['no-ace']:
12
13Successfully saved NGC configuration to /home/$username/.ngc/config
Tip

For more information on configuring the NGC CLI, see the Getting Started with the NGC CLI

Important

Follow the driver installation based on the operating system installed in the previous steps.

Installing the NVIDIA Driver using the .run file with Ubuntu
Installing the NVIDIA Driver using the .run file with RHEL

Installing the NVIDIA Driver using the .run file with Ubuntu#

Installation of the NVIDIA AI Enterprise software driver for Linux requires:

Compiler toolchain
Kernel headers

Log in to the VM and check for updates.
sudo apt-get update
Install the gcc compiler and the make tool in the terminal.
sudo apt-get install build-essential
To find the latest NVIDIA AI Enterprise vGPU Software Driver, navigate to NGC Resources while signed into the NGC Catalog.
In the left pane, select NVIDIA AI Enterprise Essentials, then locate the NVIDIA vGPU Guest Driver Resource.
Select File Browser, and the latest version (or desired version).

Important

For the purposes of this guide, we will be using vGPU version 5.2. The commands that follow may be different within your environment based upon versioning.
Select Download, then CLI to copy the NGC CLI download command to your clipboard.

Paste this command into your terminal. It should look similar to:

ngc registry resource download-version "nvidia/vgpu/vgpu-guest-driver-5:5.2"

Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.
1cd vgpu-guest-driver-5_v5.2/ 2sudo chmod +x NVIDIA-Linux-x86_64-550.127.05-grid.run
From a console shell, run the driver installer as the root user, and accept defaults.
sudo sh ./NVIDIA-Linux-x86_64-550.127.05-grid.run
Reboot the system.
sudo reboot
After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.
nvidia-smi

After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.

Installing the NVIDIA Driver using the .run file with RHEL#

Important

Before starting the driver install Secure Boot will need to be disabled as shown in Installing Red Hat Enterprise Linux 8.4

Register machine to RHEL using subscription-manager with the command below.
subscription-manager register
Satisfy the external dependency for EPEL for Dynamic Kernel Module System (DKMS).
dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm dnf install dkms
Note

Please refer to The Getting Started with EPEL documentation for more information.
For RHEL 8, ensure that the system has the correct Linux kernel sources from the Red Hat repos0itories.
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Note

The NVIDIA driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 4.4.0, the 4.4.0 kernel headers and development packages must also be installed.

Install additional dependencies for NVIDIA drivers.

dnf install elfutils-libelf-devel.x86_64
dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils libglvnd-devel

Update the running kernel:

dnf install -y kernel kernel-core kernel-modules

Confirm the system has the correct Linux kernel sources from the Red Hat repositories after update.
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)

Download the NVIDIA AI Enterprise Software Driver.

ngc registry resource download-version "nvidia/vgpu/vgpu-guest-driver-5:5.2"

Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.
1sudo chmod +x NVIDIA-Linux-x86_64-550.127.05-grid.run
From the console shell, run the driver installer and accept defaults.
sudo sh ./NVIDIA-Linux-x86_64-550.127.05-grid.run
Note

Accept any warnings and ignore the CC version check
Reboot the system.
sudo reboot
After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.
nvidia-smi

After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.

Licensing the VM#

To use an NVIDIA vGPU software licensed product, each client system to which a physical or virtual GPU is assigned must be able to obtain a license from the NVIDIA License System. A client system can be a VM that is configured with NVIDIA vGPU, a VM that is configured for GPU pass through, or a physical host to which a physical GPU is assigned in a bare-metal deployment.

Virtual Machine Configuration
CPU	16 vCPU on a single socket
RAM	64 GB
Storage	150 GB thin provisioned disk

GPU	MMIO Space Required
NVIDIA A10	64
NVIDIA A30	64
NVIDIA A40	128
NVIDIA A100 40GB (all variants)	128
NVIDIA A100 80GB (all variants)	256
NVIDIA RTX A5000	64
NVIDIA RTX A5500	64
NVIDIA RTX A6000	128
Tesla P100 (all variants)	64