Creating Your First NVIDIA AI Enterprise VM#

Because C-Series vGPUs have large BAR memory settings, using these vGPUs has some restrictions on VMware ESXi.

  • The guest OS must be a 64-bit OS.

  • 64-bit MMIO and EFI boot must be enabled for the VM.

  • The guest OS must be able to be installed in EFI boot mode.

  • The VM’s MMIO space must be increased to in correlation to GPU model.

  • For GPUDirectRDMA P2P must be enabled.

Creating a Virtual Machine#

These instructions are to assist in making a VM from scratch that will support NVIDIA vGPU. Later, the VM will be used as a gold master image. Use the following procedure to configure a vGPU for a single guest desktop:

  1. Browse to the host or cluster using the vSphere Web Client.

    _images/dg-first-vm-01.png
  2. Right-click the desired host or cluster and select New Virtual Machine.

    _images/dg-first-vm-02.png
  3. Select Create a new virtual machine and click Next.

    _images/dg-first-vm-03.png
  4. Enter a name for the virtual machine. Next, choose the location to host the virtual machine using the Select a location for the virtual machine section. Click Next to continue.

    _images/dg-first-vm-04.png
  5. Select a compute resource to run the VM. Click Next to continue.

    Note

    This compute resource should include an NVIDIA vGPU enabled card installed and be correctly configured.

    _images/dg-first-vm-05.png
  6. Select the datastore to host the virtual machine. Click Next to continue.

    _images/dg-first-vm-06.png
  7. Next, select compatibility for the virtual machine. This should reflect the ESXi version for your NVIDIA-Certified Systems. Click Next to continue.

    _images/dg-first-vm-07.png
  8. Select the appropriate Ubuntu Linux OS from the Guest OS Family and Guest OS Version pull-down menus. Click Next to continue.

    _images/dg-first-vm-08.png
  9. Next, we will set up the hardware for the virtual machine. The following table summarizes the settings which we will set up within the upcoming steps.

    Virtual Machine Configuration

    CPU

    16 vCPU on a single socket

    RAM

    64 GB

    Storage

    150 GB thin provisioned disk

  10. Expand the CPU options by clicking the greater than sign. Set the CPU to 16 and the Cores per Socket to 16.

    _images/dg-first-vm-39.png
  11. Next set the Memory to 64 GB.

    _images/dg-first-vm-40.png
  12. Next expand the New Hard disk option by clicking on the greater than sign. Set the storage to 150 GB and the Disk Provisioning to Thin Provision.

    _images/dg-first-vm-41.png
  13. Review the New Virtual Machine configuration before completion. Click Finish when ready.

    _images/dg-first-vm-10.png
  14. The new virtual machine container is created.

  15. Configure the VM boot options for EFI. Right-click on the new VM and select Edit Settings.

    _images/dg-first-vm-11.png
  16. Click on the VM Options tab, expand Boot Options, change the Firmware from BIOS to EFI.

    _images/dg-first-vm-12.png
  17. Expand Advanced and select Edit Configuration.

    _images/dg-first-vm-35.png
  18. Adjust the Memory Mapped I/O (MMIO) settings for the VM

    • Click Add Configuration Params and add the parameters from the table, fill in xxx with the corresponding value in the column MMIO Space Required for the your GPU model.

    Name

    Value

    pciPassthru.64bitMMIOSizeGB

    xxx

    GPU

    MMIO Space Required

    NVIDIA A10

    64

    NVIDIA A30

    64

    NVIDIA A40

    128

    NVIDIA A100 40GB (all variants)

    128

    NVIDIA A100 80GB (all variants)

    256

    NVIDIA RTX A5000

    64

    NVIDIA RTX A5500

    64

    NVIDIA RTX A6000

    128

    Tesla P100 (all variants)

    64

    _images/dg-first-vm-37.png

    Note

    When NVLink is enabled, adjust the MMIO space for each GPU used accordingly.

    • Click Add Configuration Params again and add the parameters from the table.

    Name

    Value

    pciPassthru.use64bitMMIO

    TRUE

    _images/dg-first-vm-38.png

    Note

    For GPUDirectRDMA P2P must be enabled.

    Name

    Value

    pciPassthru.use64bitMMIO

    TRUE

  19. Click Ok to close the advance configuration window, then click Ok to complete the VM configuration.

Important

NVIDIA AI Enterprise supports both Ubuntu 20.04 and Red Hat Enterprise Linux 8.4 with .. versionadded:: 1.1. You can find both installation guides below.

Installing Ubuntu Server LTS#

NVIDIA AI Enterprise is supported on Ubuntu LTS operating systems. It is important to note there are two Ubuntu ISO types: Desktop and Live Server. The Desktop version includes a graphical user interface (GUI), while the Live Server version only operates via a command line. This document uses the Live Server version 20.04 (amd64 architecture) of Ubuntu, though it is worth noting a GUI may be installed later if needed.

  1. Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.

    _images/dg-first-vm-13.png
  2. Power on the VM and wait for the installation screen to appear.

  3. Select your preferred language and press the enter key.

    _images/dg-first-vm-14.png
  4. Continue without updating as this guide is built around 20.04.

    _images/dg-first-vm-15.png
  5. Configure the keyboard layout and press the enter key.

    _images/dg-first-vm-16.png
  6. On this screen, select your network connection type and modify it to fit your internal requirements. This guide uses DHCP for the configuration.

    _images/dg-first-vm-17.png
  7. If you have a proxy address, input it in this screen and press Done.

    _images/dg-first-vm-18.png
  8. If you have an alternative mirror address for Ubuntu, input it here. Otherwise, if there is a default address, use it and press Done.

    _images/dg-first-vm-19.png
  9. Format the entire disk. Then, select a disk to install.

    _images/dg-first-vm-20.png
  10. Review the file system summary and select Done if satisfactory. Select Continue in the pop-up window.

    _images/dg-first-vm-21.png
  11. Configure the VM with a user account, name, and password.

    _images/dg-first-vm-22.png
  12. Select Install OpenSSH server and select Done.

    _images/dg-first-vm-23.png
  13. Select any server snaps that may be required for internal use in your environment and select Done. Wait for the system to finish installing.

    _images/dg-first-vm-24.png
  14. Select Reboot Now on the Ubuntu OS screen.

    _images/dg-first-vm-25.png
  15. When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.

    _images/dg-first-vm-50.png
  16. Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.

    _images/dg-first-vm-51.png

Installing Red Hat Enterprise Linux#

Added in version 1.1.

NVIDIA AI Enterprise is supported on Red Hat Enterprise Linux operating system.

  1. Before the installation can begin, you will need to disable Secure Boot on the VM. Right click on the VM and select Edit Settings….

    _images/rhel-edit-settings.png
  2. Next, select VM Options at the top of the window. Locate Boot Options, make sure Secure Boot is unchecked, and click Ok.

    _images/rhel-uncheck-secure-boot.png

    Important

    Make sure you have added the listed Prerequisites and the PCI configuration parameters listed in Step #18 of Creating a Virtual Machine

  3. Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.

    _images/dg-first-vm-13.png
  4. Power on the VM and wait for the installation screen to appear.

    _images/rhel-boot.png
  5. Select your preferred language and Continue.

    _images/rhel-keyboard.png
  6. Next, select Time & Date under the Localization column. Set the time and date as required and click Done.

    _images/rhel-date-and-time.png
  7. Next, select Software Packages under the Software column. Select Server and click Done.

    _images/rhel-software-selection.png
  8. Next, select Installation Destination under the System Menu. Select the VMware Virtual disk and click Done.

    _images/rhel-installation-destination.png
  9. Next, select Network & Host Name under the System column. If your system is connected to a network, then it will try to get IP from DHCP server otherwise it can be configured manually. Click Done when finished.

    _images/rhel-network-host-name.png
  10. Select Root Password under the User Settings Column. Create a password and click Done.

    _images/rhel-root-password.png
  11. Click Begin Installation to start the install.

    _images/rhel-begin-install.png
  12. The installation will begin as shown below.

    _images/rhel-install.png
  13. Once the installation is completed reboot the VM by clicking the Reboot System.

    _images/rhel-reboot.png
  14. When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.

  15. Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.

    _images/dg-first-vm-51.png

Enabling the NVIDIA vGPU#

Use the following procedure to enable vGPU support for your virtual machine. You must edit the virtual machine settings.

  1. Power down the virtual machine.

    _images/dg-first-vm-26.png
  2. Click on the VM in the Navigator window. Right-click the VM and select Edit Settings.

    _images/dg-first-vm-27.png
  3. Click on the New Device bar and select PCI device.

    _images/dg-first-vm-28.png
  4. Select the desired GPU Profile underneath the New PCI device drop-down.

    _images/dg-first-vm-30.png

    Note

    NVIDIA AI Enterprise requires a C-series profile.

  5. Click OK and power on the VM.

Note

A single VM may have multiple GPU (PCI devices) attached, however, this requires that each GPU be configured with maximum memory allocation.

Installing the NVIDIA Driver in the Virtual Machine#

Now that you created a Linux VM, we will boot the VM, and install the NVIDIA AI Enterprise Guest driver in the VM to fully enable GPU operation.

Important

Before proceeding with the NVIDIA Driver installation, please confirm that Nouveau is disabled. Instructions to confirm this are located here for Ubuntu and here.

Downloading the NVIDIA AI Enterprise Software Driver Using NGC#

Important

Before you begin you will need to generate or use an existing API key.

  1. From a browser, go to https://ngc.nvidia.com/signin/email and then enter your email and password.

  2. In the top right corner, click your user account icon and select Setup.

  3. Click Get API Key to open the Setup > API Key page.

    Note

    The API Key is the mechanism used to authenticate your access to the NGC container registry.

  4. Click Generate API Key to generate your API key.

    Note

    A warning message appears to let you know that your old API key will become invalid if you create a new key.

  5. Click Confirm to generate the key.

  6. Your API key appears.

    Important

    You only need to generate an API Key once. NGC does not save your key, so store it in a secure place. (You can copy your API Key to the clipboard by clicking the copy icon to the right of the API key.)Should you lose your API Key, you can generate a new one from the NGC website. When you generate a new API Key, the old one is invalidated.

  7. Now you will log into the VM using the VM Console link on the left pane of this page.

  8. Run the following commands to install the NGC CLI for either AMD64 or ARM64

    AMD64 Linux Install: The NGC CLI binary for Linux is supported on Ubuntu 16.04 and later distributions.

    • Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:

    wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_linux.zip && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc
    

    ARM64 Linux Install: The NGC CLI binary for ARM64 is supported on Ubuntu 18.04 and later distributions.

    • Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:

    wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_arm64.zip && unzip ngccli_arm64.zip && chmod u+x ngc-cli/ngc
    

    Note

    The NGC CLI installations for Windows NGC CLI, Arm64 MacOs, or Intel MacOs can be found here

    Important

    The installation instructions for both AMD64 and ARM64 are the same in the below sections.

    • Check the binary’s MD5 hash to ensure the file wasn’t corrupted during download.

    find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5
    
    • Add your current directory to path.

    echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profile
    
    • You must configure NGC CLI for your use so that you can run the commands. Enter the following command, including your API key when prompted.

     1ngc config set
     2
     3Enter API key [no-apikey]. Choices: [<VALID_APIKEY>, 'no-apikey']:
     4
     5Enter CLI output format type [ascii]. Choices: [ascii, csv, json]: ascii
     6
     7Enter org [no-org]. Choices: ['no-org']:
     8
     9Enter team [no-team]. Choices: ['no-team']:
    10
    11Enter ace [no-ace]. Choices: ['no-ace']:
    12
    13Successfully saved NGC configuration to /home/$username/.ngc/config
    

Important

Follow the driver installation based on the operating system installed in the previous steps.

Installing the NVIDIA Driver using the .run file with Ubuntu#

Installation of the NVIDIA AI Enterprise software driver for Linux requires:

  • Compiler toolchain

  • Kernel headers

  1. Log in to the VM and check for updates.

    sudo apt-get update
    
  2. Install the gcc compiler and the make tool in the terminal.

    sudo apt-get install build-essential
    
  3. Download the NVIDIA AI Enterprise Software Driver.

    ngc registry resource download-version "nvaie/vgpu_guest_driver_2_1:510.73.08"
    
  4. Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.

    1cd vgpu_guest_driver_2_1:510.73.08
    2sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run
    
  5. From a console shell, run the driver installer as the root user, and accept defaults.

    sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run
    
  6. Reboot the system.

    sudo reboot
    
  7. After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.

    nvidia-smi
    

After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.

Installing the NVIDIA Driver using the .run file with RHEL#

Important

Before starting the driver install Secure Boot will need to be disabled as shown in Installing Red Hat Enterprise Linux 8.4

  1. Register machine to RHEL using subscription-manager with the command below.

    subscription-manager register
    
  2. Satisfy the external dependency for EPEL for Dynamic Kernel Module System (DKMS).

    dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
    dnf install dkms
    
  3. For RHEL 8, ensure that the system has the correct Linux kernel sources from the Red Hat repositories.

    dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
    

    Note

    The NVIDIA driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 4.4.0, the 4.4.0 kernel headers and development packages must also be installed.

  4. Install additional dependencies for NVIDIA drivers.

    1dnf install elfutils-libelf-devel.x86_64
    2dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils libglvnd-devel
    
  5. Update the running kernel:

    dnf install -y kernel kernel-core kernel-modules
    
  6. Confirm the system has the correct Linux kernel sources from the Red Hat repositories after update.

    dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
    
  7. Download the NVIDIA AI Enterprise Software Driver.

    ngc registry resource download-version "nvaie/vgpu_guest_driver_2_1:510.73.08"
    
  8. Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.

    1sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run
    
  9. From the console shell, run the driver installer and accept defaults.

    sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run
    

    Note

    Accept any warnings and ignore the CC version check

  10. Reboot the system.

    sudo reboot
    
  11. After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.

    nvidia-smi
    

After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.

Licensing the VM#

To use an NVIDIA vGPU software licensed product, each client system to which a physical or virtual GPU is assigned must be able to obtain a license from the NVIDIA License System. A client system can be a VM that is configured with NVIDIA vGPU, a VM that is configured for GPU pass through, or a physical host to which a physical GPU is assigned in a bare-metal deployment.