Because C-Series vGPUs have large BAR memory settings, using these vGPUs has some restrictions on VMware ESXi.
The guest OS must be a 64-bit OS.
64-bit MMIO and EFI boot must be enabled for the VM.
The guest OS must be able to be installed in EFI boot mode.
The VM’s MMIO space must be increased to in correlation to GPU model.
For GPUDirectRDMA P2P must be enabled.
These instructions are to assist in making a VM from scratch that will support NVIDIA vGPU. Later, the VM will be used as a gold master image. Use the following procedure to configure a vGPU for a single guest desktop:
Browse to the host or cluster using the vSphere Web Client.
Right-click the desired host or cluster and select New Virtual Machine.
Select Create a new virtual machine and click Next.
Enter a name for the virtual machine. Next, choose the location to host the virtual machine using the Select a location for the virtual machine section. Click Next to continue.
Select a compute resource to run the VM. Click Next to continue.
NoteThis compute resource should include an NVIDIA vGPU enabled card installed and be correctly configured.
Select the datastore to host the virtual machine. Click Next to continue.
Next, select compatibility for the virtual machine. This should reflect the ESXi version for your NVIDIA-Certified Systems. Click Next to continue.
Select the appropriate Ubuntu Linux OS from the Guest OS Family and Guest OS Version pull-down menus. Click Next to continue.
Next, we will set up the hardware for the virtual machine. The following table summarizes the settings which we will set up within the upcoming steps.
Virtual Machine Configuration
CPU
16 vCPU on a single socket
RAM
64 GB
Storage
150 GB thin provisioned disk
Expand the CPU options by clicking the greater than sign. Set the CPU to 16 and the Cores per Socket to 16.
Next set the Memory to 64 GB.
Next expand the New Hard disk option by clicking on the greater than sign. Set the storage to 150 GB and the Disk Provisioning to Thin Provision.
Review the New Virtual Machine configuration before completion. Click Finish when ready.
The new virtual machine container is created.
Configure the VM boot options for EFI. Right-click on the new VM and select Edit Settings.
Click on the VM Options tab, expand Boot Options, change the Firmware from BIOS to EFI.
Expand Advanced and select Edit Configuration.
Adjust the Memory Mapped I/O (MMIO) settings for the VM
Click Add Configuration Params and add the parameters from the table, fill in
xxx
with the corresponding value in the column MMIO Space Required for the your GPU model.
Name
Value
pciPassthru.64bitMMIOSizeGB
xxx
GPU
MMIO Space Required
NVIDIA A10
64
NVIDIA A30
64
NVIDIA A40
128
NVIDIA A100 40GB (all variants)
128
NVIDIA A100 80GB (all variants)
256
NVIDIA RTX A5000
64
NVIDIA RTX A5500
64
NVIDIA RTX A6000
128
Tesla P100 (all variants)
64
NoteWhen NVLink is enabled, adjust the MMIO space for each GPU used accordingly.
Click Add Configuration Params again and add the parameters from the table.
Name
Value
pciPassthru.use64bitMMIO
TRUE
NoteFor GPUDirectRDMA P2P must be enabled.
Name
Value
pciPassthru.use64bitMMIO
TRUE
Click Ok to close the advance configuration window, then click Ok to complete the VM configuration.
NVIDIA AI Enterprise supports both Ubuntu 20.04 and Red Hat Enterprise Linux 8.4 with NVIDIA AI Enterprise 1.1 or later. You can find both installation guides below.
Installing Ubuntu Server LTS
NVIDIA AI Enterprise is supported on Ubuntu LTS operating systems. It is important to note there are two Ubuntu ISO types: Desktop and Live Server. The Desktop version includes a graphical user interface (GUI), while the Live Server version only operates via a command line. This document uses the Live Server version 20.04 (amd64 architecture) of Ubuntu, though it is worth noting a GUI may be installed later if needed.
Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.
Power on the VM and wait for the installation screen to appear.
Select your preferred language and press the enter key.
Continue without updating as this guide is built around 20.04.
Configure the keyboard layout and press the enter key.
On this screen, select your network connection type and modify it to fit your internal requirements. This guide uses DHCP for the configuration.
If you have a proxy address, input it in this screen and press Done.
If you have an alternative mirror address for Ubuntu, input it here. Otherwise, if there is a default address, use it and press Done.
Format the entire disk. Then, select a disk to install.
Review the file system summary and select Done if satisfactory. Select Continue in the pop-up window.
Configure the VM with a user account, name, and password.
Select Install OpenSSH server and select Done.
Select any server snaps that may be required for internal use in your environment and select Done. Wait for the system to finish installing.
Select Reboot Now on the Ubuntu OS screen.
When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.
Installing Red Hat Enterprise Linux
NVIDIA AI Enterprise 1.1 or later
NVIDIA AI Enterprise is supported on Red Hat Enterprise Linux operating system.
Before the installation can begin, you will need to disable Secure Boot on the VM. Right click on the VM and select Edit Settings….
Next, select VM Options at the top of the window. Locate Boot Options, make sure Secure Boot is unchecked, and click Ok.
ImportantMake sure you have added the listed Prerequisites and the PCI configuration parameters listed in Step #18 of Creating a Virtual Machine
Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.
Power on the VM and wait for the installation screen to appear.
Select your preferred language and Continue.
Next, select Time & Date under the Localization column. Set the time and date as required and click Done.
Next, select Software Packages under the Software column. Select Server and click Done.
Next, select Installation Destination under the System Menu. Select the VMware Virtual disk and click Done.
Next, select Network & Host Name under the System column. If your system is connected to a network, then it will try to get IP from DHCP server otherwise it can be configured manually. Click Done when finished.
Select Root Password under the User Settings Column. Create a password and click Done.
Click Begin Installation to start the install.
The installation will begin as shown below.
Once the installation is completed reboot the VM by clicking the Reboot System.
When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.
Use the following procedure to enable vGPU support for your virtual machine. You must edit the virtual machine settings.
Power down the virtual machine.
Click on the VM in the Navigator window. Right-click the VM and select Edit Settings.
Click on the New Device bar and select PCI device.
Select the desired GPU Profile underneath the New PCI device drop-down.
Click OK and power on the VM.
A single VM may have multiple GPU (PCI devices) attached, however, this requires that each GPU be configured with maximum memory allocation.
Now that you created a Linux VM, we will boot the VM, and install the NVIDIA AI Enterprise Guest driver in the VM to fully enable GPU operation.
Downloading the NVIDIA AI Enterprise Software Driver Using NGC
Before you begin you will need to generate or use an existing API key.
From a browser, go to https://ngc.nvidia.com/signin/email and then enter your email and password.
In the top right corner, click your user account icon and select Setup.
Click Get API Key to open the Setup > API Key page.
NoteThe API Key is the mechanism used to authenticate your access to the NGC container registry.
Click Generate API Key to generate your API key.
NoteA warning message appears to let you know that your old API key will become invalid if you create a new key.
Click Confirm to generate the key.
Your API key appears.
ImportantYou only need to generate an API Key once. NGC does not save your key, so store it in a secure place. (You can copy your API Key to the clipboard by clicking the copy icon to the right of the API key.)Should you lose your API Key, you can generate a new one from the NGC website. When you generate a new API Key, the old one is invalidated.
Now you will log into the VM using the VM Console link on the left pane of this page.
Run the following commands to install the NGC CLI for either AMD64 or ARM64
AMD64 Linux Install: The NGC CLI binary for Linux is supported on Ubuntu 16.04 and later distributions.
Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:
wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_linux.zip && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc
ARM64 Linux Install: The NGC CLI binary for ARM64 is supported on Ubuntu 18.04 and later distributions.
Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:
wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_arm64.zip && unzip ngccli_arm64.zip && chmod u+x ngc-cli/ngc
NoteThe NGC CLI installations for Windows NGC CLI, Arm64 MacOs, or Intel MacOs can be found here
ImportantThe installation instructions for both AMD64 and ARM64 are the same in the below sections.
Check the binary’s MD5 hash to ensure the file wasn’t corrupted during download.
$ find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5
Add your current directory to path.
$ echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profile
You must configure NGC CLI for your use so that you can run the commands. Enter the following command, including your API key when prompted.
$ ngc config set Enter API key [no-apikey]. Choices: [<VALID_APIKEY>, 'no-apikey']: Enter CLI output format type [ascii]. Choices: [ascii, csv, json]: ascii Enter org [no-org]. Choices: ['no-org']: Enter team [no-team]. Choices: ['no-team']: Enter ace [no-ace]. Choices: ['no-ace']: Successfully saved NGC configuration to /home/$username/.ngc/config
Follow the driver installation based on the operating system installed in the previous steps.
Installing the NVIDIA Driver using the .run file with Ubuntu
Installation of the NVIDIA AI Enterprise software driver for Linux requires:
Compiler toolchain
Kernel headers
Log in to the VM and check for updates.
$ sudo apt-get update
Install the gcc compiler and the make tool in the terminal.
$ sudo apt-get install build-essential
Download the NVIDIA AI Enterprise Software Driver.
$ ngc registry resource download-version "nvaie/vgpu_guest_driver_2_1:510.73.08"
Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.
$ cd vgpu_guest_driver_2_1:510.73.08 $ sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run
From a console shell, run the driver installer as the root user, and accept defaults.
$ sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run
Reboot the system.
$ sudo reboot
After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.
$ nvidia-smi
After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.
Installing the NVIDIA Driver using the .run file with RHEL
Before starting the driver install Secure Boot will need to be disabled as shown in Installing Red Hat Enterprise Linux 8.4
Register machine to RHEL using subscription-manager with the command below.
$ subscription-manager register
Satisfy the external dependency for EPEL for DKMS.
$ dnf install elfutils-libelf-devel "kernel-devel-uname-r ==$(uname -r)"
For RHEL 8, ensure that the system has the correct Linux kernel sources from the Red Hat repositories.
$ dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
NoteThe NVIDIA driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 4.4.0, the 4.4.0 kernel headers and development packages must also be installed.
Install additional dependencies for NVIDIA drivers.
$ dnf install elfutils-libelf-devel.x86_64 $ dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils libglvnd-devel
Update the running kernel:
$ dnf install -y kernel kernel-core kernel-modules
Confirm the system has the correct Linux kernel sources from the Red Hat repositories after update.
$ dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Download the NVIDIA AI Enterprise Software Driver.
$ ngc registry resource download-version "nvaie/vgpu_guest_driver_2_1:510.73.08"
Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.
$ sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run
From the console shell, run the driver installer and accept defaults.
$ sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run
NoteAccept any warnings and ignore the CC version check
Reboot the system.
$ sudo reboot
After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.
$ nvidia-smi
After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.
To use an NVIDIA vGPU software licensed product, each client system to which a physical or virtual GPU is assigned must be able to obtain a license from the NVIDIA License System. A client system can be a VM that is configured with NVIDIA vGPU, a VM that is configured for GPU pass through, or a physical host to which a physical GPU is assigned in a bare-metal deployment.