Creating Your First NVIDIA AI Enterprise VM#
Because C-Series vGPUs have large BAR memory settings, using these vGPUs has some restrictions on VMware ESXi.
The guest OS must be a 64-bit OS.
64-bit MMIO and EFI boot must be enabled for the VM.
The guest OS must be able to be installed in EFI boot mode.
The VM’s MMIO space must be increased to in correlation to GPU model.
For GPUDirectRDMA P2P must be enabled.
Creating a Virtual Machine#
These instructions are to assist in making a VM from scratch that will support NVIDIA vGPU. Later, the VM will be used as a gold master image. Use the following procedure to configure a vGPU for a single guest desktop:
Browse to the host or cluster using the vSphere Web Client.
Right-click the desired host or cluster and select New Virtual Machine.
Select Create a new virtual machine and click Next.
Enter a name for the virtual machine. Next, choose the location to host the virtual machine using the Select a location for the virtual machine section. Click Next to continue.
Select a compute resource to run the VM. Click Next to continue.
Note
This compute resource should include an NVIDIA vGPU enabled card installed and be correctly configured.
Select the datastore to host the virtual machine. Click Next to continue.
Next, select compatibility for the virtual machine. This should reflect the ESXi version for your NVIDIA-Certified Systems. Click Next to continue.
Select the appropriate Ubuntu Linux OS from the Guest OS Family and Guest OS Version pull-down menus. Click Next to continue.
Next, we will set up the hardware for the virtual machine. The following table summarizes the settings which we will set up within the upcoming steps.
Virtual Machine Configuration
CPU
16 vCPU on a single socket
RAM
64 GB
Storage
150 GB thin provisioned disk
Expand the CPU options by clicking the greater than sign. Set the CPU to 16 and the Cores per Socket to 16.
Next set the Memory to 64 GB.
Next expand the New Hard disk option by clicking on the greater than sign. Set the storage to 150 GB and the Disk Provisioning to Thin Provision.
Review the New Virtual Machine configuration before completion. Click Finish when ready.
The new virtual machine container is created.
Configure the VM boot options for EFI. Right-click on the new VM and select Edit Settings.
Click on the VM Options tab, expand Boot Options, change the Firmware from BIOS to EFI.
Expand Advanced and select Edit Configuration.
Adjust the Memory Mapped I/O (MMIO) settings for the VM
Click Add Configuration Params and add the parameters from the table, fill in
xxx
with the corresponding value in the column MMIO Space Required for the your GPU model.
Name
Value
pciPassthru.64bitMMIOSizeGB
xxx
GPU
MMIO Space Required
NVIDIA A10
64
NVIDIA A30
64
NVIDIA A40
128
NVIDIA A100 40GB (all variants)
128
NVIDIA A100 80GB (all variants)
256
NVIDIA RTX A5000
64
NVIDIA RTX A5500
64
NVIDIA RTX A6000
128
Tesla P100 (all variants)
64
Note
When NVLink is enabled, adjust the MMIO space for each GPU used accordingly.
Click Add Configuration Params again and add the parameters from the table.
Name
Value
pciPassthru.use64bitMMIO
TRUE
Note
For GPUDirectRDMA P2P must be enabled.
Name
Value
pciPassthru.use64bitMMIO
TRUE
Click Ok to close the advance configuration window, then click Ok to complete the VM configuration.
Important
NVIDIA AI Enterprise supports both Ubuntu 20.04 and Red Hat Enterprise Linux 8.4 with .. versionadded:: 1.1. You can find both installation guides below.
Installing Ubuntu Server LTS#
NVIDIA AI Enterprise is supported on Ubuntu LTS operating systems. It is important to note there are two Ubuntu ISO types: Desktop and Live Server. The Desktop version includes a graphical user interface (GUI), while the Live Server version only operates via a command line. This document uses the Live Server version 20.04 (amd64 architecture) of Ubuntu, though it is worth noting a GUI may be installed later if needed.
Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.
Power on the VM and wait for the installation screen to appear.
Select your preferred language and press the enter key.
Continue without updating as this guide is built around 20.04.
Configure the keyboard layout and press the enter key.
On this screen, select your network connection type and modify it to fit your internal requirements. This guide uses DHCP for the configuration.
If you have a proxy address, input it in this screen and press Done.
If you have an alternative mirror address for Ubuntu, input it here. Otherwise, if there is a default address, use it and press Done.
Format the entire disk. Then, select a disk to install.
Review the file system summary and select Done if satisfactory. Select Continue in the pop-up window.
Configure the VM with a user account, name, and password.
Select Install OpenSSH server and select Done.
Select any server snaps that may be required for internal use in your environment and select Done. Wait for the system to finish installing.
Select Reboot Now on the Ubuntu OS screen.
When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.
Installing Red Hat Enterprise Linux#
Added in version 1.1.
NVIDIA AI Enterprise is supported on Red Hat Enterprise Linux operating system.
Before the installation can begin, you will need to disable Secure Boot on the VM. Right click on the VM and select Edit Settings….
Next, select VM Options at the top of the window. Locate Boot Options, make sure Secure Boot is unchecked, and click Ok.
Important
Make sure you have added the listed Prerequisites and the PCI configuration parameters listed in Step #18 of Creating a Virtual Machine
Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.
Power on the VM and wait for the installation screen to appear.
Select your preferred language and Continue.
Next, select Time & Date under the Localization column. Set the time and date as required and click Done.
Next, select Software Packages under the Software column. Select Server and click Done.
Next, select Installation Destination under the System Menu. Select the VMware Virtual disk and click Done.
Next, select Network & Host Name under the System column. If your system is connected to a network, then it will try to get IP from DHCP server otherwise it can be configured manually. Click Done when finished.
Select Root Password under the User Settings Column. Create a password and click Done.
Click Begin Installation to start the install.
The installation will begin as shown below.
Once the installation is completed reboot the VM by clicking the Reboot System.
When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.
Enabling the NVIDIA vGPU#
Use the following procedure to enable vGPU support for your virtual machine. You must edit the virtual machine settings.
Power down the virtual machine.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings.
Click on the New Device bar and select PCI device.
Select the desired GPU Profile underneath the New PCI device drop-down.
Note
NVIDIA AI Enterprise requires a C-series profile.
Click OK and power on the VM.
Note
A single VM may have multiple GPU (PCI devices) attached, however, this requires that each GPU be configured with maximum memory allocation.
Installing the NVIDIA Driver in the Virtual Machine#
Now that you created a Linux VM, we will boot the VM, and install the NVIDIA AI Enterprise Guest driver in the VM to fully enable GPU operation.
Important
Before proceeding with the NVIDIA Driver installation, please confirm that Nouveau is disabled. Instructions to confirm this are located here for Ubuntu and here.
Downloading the NVIDIA AI Enterprise Software Driver Using NGC#
Important
Before you begin you will need to generate or use an existing API key.
From a browser, go to https://ngc.nvidia.com/signin/email and then enter your email and password.
In the top right corner, click your user account icon and select Setup.
Click Get API Key to open the Setup > API Key page.
Note
The API Key is the mechanism used to authenticate your access to the NGC container registry.
Click Generate API Key to generate your API key.
Note
A warning message appears to let you know that your old API key will become invalid if you create a new key.
Click Confirm to generate the key.
Your API key appears.
Important
You only need to generate an API Key once. NGC does not save your key, so store it in a secure place. (You can copy your API Key to the clipboard by clicking the copy icon to the right of the API key.)Should you lose your API Key, you can generate a new one from the NGC website. When you generate a new API Key, the old one is invalidated.
Now you will log into the VM using the VM Console link on the left pane of this page.
Run the following commands to install the NGC CLI for either AMD64 or ARM64
AMD64 Linux Install: The NGC CLI binary for Linux is supported on Ubuntu 16.04 and later distributions.
Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:
wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_linux.zip && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc
ARM64 Linux Install: The NGC CLI binary for ARM64 is supported on Ubuntu 18.04 and later distributions.
Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:
wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_arm64.zip && unzip ngccli_arm64.zip && chmod u+x ngc-cli/ngc
Note
The NGC CLI installations for Windows NGC CLI, Arm64 MacOs, or Intel MacOs can be found here
Important
The installation instructions for both AMD64 and ARM64 are the same in the below sections.
Check the binary’s MD5 hash to ensure the file wasn’t corrupted during download.
find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5
Add your current directory to path.
echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profile
You must configure NGC CLI for your use so that you can run the commands. Enter the following command, including your API key when prompted.
1ngc config set 2 3Enter API key [no-apikey]. Choices: [<VALID_APIKEY>, 'no-apikey']: 4 5Enter CLI output format type [ascii]. Choices: [ascii, csv, json]: ascii 6 7Enter org [no-org]. Choices: ['no-org']: 8 9Enter team [no-team]. Choices: ['no-team']: 10 11Enter ace [no-ace]. Choices: ['no-ace']: 12 13Successfully saved NGC configuration to /home/$username/.ngc/config
Important
Follow the driver installation based on the operating system installed in the previous steps.
Installing the NVIDIA Driver using the .run file with Ubuntu#
Installation of the NVIDIA AI Enterprise software driver for Linux requires:
Compiler toolchain
Kernel headers
Log in to the VM and check for updates.
sudo apt-get update
Install the gcc compiler and the make tool in the terminal.
sudo apt-get install build-essential
Download the NVIDIA AI Enterprise Software Driver.
ngc registry resource download-version "nvaie/vgpu_guest_driver_2_1:510.73.08"
Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.
1cd vgpu_guest_driver_2_1:510.73.08 2sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run
From a console shell, run the driver installer as the root user, and accept defaults.
sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run
Reboot the system.
sudo reboot
After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.
nvidia-smi
After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.
Installing the NVIDIA Driver using the .run file with RHEL#
Important
Before starting the driver install Secure Boot will need to be disabled as shown in Installing Red Hat Enterprise Linux 8.4
Register machine to RHEL using subscription-manager with the command below.
subscription-manager register
Satisfy the external dependency for EPEL for Dynamic Kernel Module System (DKMS).
dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm dnf install dkms
For RHEL 8, ensure that the system has the correct Linux kernel sources from the Red Hat repositories.
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Note
The NVIDIA driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 4.4.0, the 4.4.0 kernel headers and development packages must also be installed.
Install additional dependencies for NVIDIA drivers.
1dnf install elfutils-libelf-devel.x86_64 2dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils libglvnd-devel
Update the running kernel:
dnf install -y kernel kernel-core kernel-modules
Confirm the system has the correct Linux kernel sources from the Red Hat repositories after update.
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Download the NVIDIA AI Enterprise Software Driver.
ngc registry resource download-version "nvaie/vgpu_guest_driver_2_1:510.73.08"
Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.
1sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run
From the console shell, run the driver installer and accept defaults.
sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run
Note
Accept any warnings and ignore the CC version check
Reboot the system.
sudo reboot
After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.
nvidia-smi
After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.
Licensing the VM#
To use an NVIDIA vGPU software licensed product, each client system to which a physical or virtual GPU is assigned must be able to obtain a license from the NVIDIA License System. A client system can be a VM that is configured with NVIDIA vGPU, a VM that is configured for GPU pass through, or a physical host to which a physical GPU is assigned in a bare-metal deployment.