Creating Your First NVIDIA AI Enterprise VM

Because C-Series vGPUs have large BAR memory settings, using these vGPUs has some restrictions on VMware ESXi.

  • The guest OS must be a 64-bit OS.

  • 64-bit MMIO and EFI boot must be enabled for the VM.

  • The guest OS must be able to be installed in EFI boot mode.

  • The VM’s MMIO space must be increased to in correlation to GPU model.

  • For GPUDirectRDMA P2P must be enabled.

These instructions are to assist in making a VM from scratch that will support NVIDIA vGPU. Later, the VM will be used as a gold master image. Use the following procedure to configure a vGPU for a single guest desktop:

  1. Browse to the host or cluster using the vSphere Web Client.

    dg-first-vm-01.png


  2. Right-click the desired host or cluster and select New Virtual Machine.

    dg-first-vm-02.png


  3. Select Create a new virtual machine and click Next.

    dg-first-vm-03.png


  4. Enter a name for the virtual machine. Next, choose the location to host the virtual machine using the Select a location for the virtual machine section. Click Next to continue.

    dg-first-vm-04.png


  5. Select a compute resource to run the VM. Click Next to continue.

    Note

    This compute resource should include an NVIDIA vGPU enabled card installed and be correctly configured.

    dg-first-vm-05.png


  6. Select the datastore to host the virtual machine. Click Next to continue.

    dg-first-vm-06.png


  7. Next, select compatibility for the virtual machine. This should reflect the ESXi version for your NVIDIA-Certified Systems. Click Next to continue.

    dg-first-vm-07.png


  8. Select the appropriate Ubuntu Linux OS from the Guest OS Family and Guest OS Version pull-down menus. Click Next to continue.

    dg-first-vm-08.png


  9. Next, we will set up the hardware for the virtual machine. The following table summarizes the settings which we will set up within the upcoming steps.

    Virtual Machine Configuration

    CPU 16 vCPU on a single socket
    RAM 64 GB
    Storage 150 GB thin provisioned disk

  10. Expand the CPU options by clicking the greater than sign. Set the CPU to 16 and the Cores per Socket to 16.

    dg-first-vm-39.png


  11. Next set the Memory to 64 GB.

    dg-first-vm-40.png


  12. Next expand the New Hard disk option by clicking on the greater than sign. Set the storage to 150 GB and the Disk Provisioning to Thin Provision.

    dg-first-vm-41.png


  13. Review the New Virtual Machine configuration before completion. Click Finish when ready.

    dg-first-vm-10.png


  14. The new virtual machine container is created.

  15. Configure the VM boot options for EFI. Right-click on the new VM and select Edit Settings.

    dg-first-vm-11.png


  16. Click on the VM Options tab, expand Boot Options, change the Firmware from BIOS to EFI.

    dg-first-vm-12.png


  17. Expand Advanced and select Edit Configuration.

    dg-first-vm-35.png


  18. Adjust the Memory Mapped I/O (MMIO) settings for the VM

    • Click Add Configuration Params and add the parameters from the table, fill in xxx with the corresponding value in the column MMIO Space Required for the your GPU model.

    Name

    Value

    pciPassthru.64bitMMIOSizeGB xxx

    GPU

    MMIO Space Required

    NVIDIA A10 64
    NVIDIA A30 64
    NVIDIA A40 128
    NVIDIA A100 40GB (all variants) 128
    NVIDIA A100 80GB (all variants) 256
    NVIDIA RTX A5000 64
    NVIDIA RTX A5500 64
    NVIDIA RTX A6000 128
    Tesla P100 (all variants) 64
    dg-first-vm-37.png

    Note

    When NVLink is enabled, adjust the MMIO space for each GPU used accordingly.

    • Click Add Configuration Params again and add the parameters from the table.

    Name

    Value

    pciPassthru.use64bitMMIO TRUE
    dg-first-vm-38.png

    Note

    For GPUDirectRDMA P2P must be enabled.

    Name

    Value

    pciPassthru.use64bitMMIO TRUE


  19. Click Ok to close the advance configuration window, then click Ok to complete the VM configuration.

Important

NVIDIA AI Enterprise supports both Ubuntu 20.04 and Red Hat Enterprise Linux 8.4 with NVIDIA AI Enterprise 1.1 or later. You can find both installation guides below.

Installing Ubuntu Server LTS

NVIDIA AI Enterprise is supported on Ubuntu LTS operating systems. It is important to note there are two Ubuntu ISO types: Desktop and Live Server. The Desktop version includes a graphical user interface (GUI), while the Live Server version only operates via a command line. This document uses the Live Server version 20.04 (amd64 architecture) of Ubuntu, though it is worth noting a GUI may be installed later if needed.

  1. Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.

    dg-first-vm-13.png


  2. Power on the VM and wait for the installation screen to appear.

  3. Select your preferred language and press the enter key.

    dg-first-vm-14.png


  4. Continue without updating as this guide is built around 20.04.

    dg-first-vm-15.png


  5. Configure the keyboard layout and press the enter key.

    dg-first-vm-16.png


  6. On this screen, select your network connection type and modify it to fit your internal requirements. This guide uses DHCP for the configuration.

    dg-first-vm-17.png


  7. If you have a proxy address, input it in this screen and press Done.

    dg-first-vm-18.png


  8. If you have an alternative mirror address for Ubuntu, input it here. Otherwise, if there is a default address, use it and press Done.

    dg-first-vm-19.png


  9. Format the entire disk. Then, select a disk to install.

    dg-first-vm-20.png


  10. Review the file system summary and select Done if satisfactory. Select Continue in the pop-up window.

    dg-first-vm-21.png


  11. Configure the VM with a user account, name, and password.

    dg-first-vm-22.png


  12. Select Install OpenSSH server and select Done.

    dg-first-vm-23.png


  13. Select any server snaps that may be required for internal use in your environment and select Done. Wait for the system to finish installing.

    dg-first-vm-24.png


  14. Select Reboot Now on the Ubuntu OS screen.

    dg-first-vm-25.png


  15. When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.

    dg-first-vm-50.png


  16. Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.

    dg-first-vm-51.png


Installing Red Hat Enterprise Linux

NVIDIA AI Enterprise 1.1 or later

NVIDIA AI Enterprise is supported on Red Hat Enterprise Linux operating system.

  1. Before the installation can begin, you will need to disable Secure Boot on the VM. Right click on the VM and select Edit Settings….

    rhel-edit-settings.png


  2. Next, select VM Options at the top of the window. Locate Boot Options, make sure Secure Boot is unchecked, and click Ok.

    rhel-uncheck-secure-boot.png

    Important

    Make sure you have added the listed Prerequisites and the PCI configuration parameters listed in Step #18 of Creating a Virtual Machine


  3. Upload the ISO to the datastore of your VM. Right-click on the VM container in vSphere Client and select Edit Settings. Mount the ISO to your VM by clicking Browse and make sure to check Connect At Power On. Click Okay to finish.

    dg-first-vm-13.png


  4. Power on the VM and wait for the installation screen to appear.

    rhel-boot.png


  5. Select your preferred language and Continue.

    rhel-keyboard.png


  6. Next, select Time & Date under the Localization column. Set the time and date as required and click Done.

    rhel-date-and-time.png


  7. Next, select Software Packages under the Software column. Select Server and click Done.

    rhel-software-selection.png


  8. Next, select Installation Destination under the System Menu. Select the VMware Virtual disk and click Done.

    rhel-installation-destination.png


  9. Next, select Network & Host Name under the System column. If your system is connected to a network, then it will try to get IP from DHCP server otherwise it can be configured manually. Click Done when finished.

    rhel-network-host-name.png


  10. Select Root Password under the User Settings Column. Create a password and click Done.

    rhel-root-password.png


  11. Click Begin Installation to start the install.

    rhel-begin-install.png


  12. The installation will begin as shown below.

    rhel-install.png


  13. Once the installation is completed reboot the VM by clicking the Reboot System.

    rhel-reboot.png


  14. When the reboot is complete, return to vCenter. Right click on the VM, select Power, and click Power Off.

  15. Click on the VM in the Navigator window. Right-click the VM and select Edit Settings. Uncheck Connect check box on the CD/DVD drive 1.

    dg-first-vm-51.png


Use the following procedure to enable vGPU support for your virtual machine. You must edit the virtual machine settings.

  1. Power down the virtual machine.

    dg-first-vm-26.png


  2. Click on the VM in the Navigator window. Right-click the VM and select Edit Settings.

    dg-first-vm-27.png


  3. Click on the New Device bar and select PCI device.

    dg-first-vm-28.png


  4. Select the desired GPU Profile underneath the New PCI device drop-down.

    dg-first-vm-30.png

    Note

    NVIDIA AI Enterprise requires a C-series profile.


  5. Click OK and power on the VM.

Note

A single VM may have multiple GPU (PCI devices) attached, however, this requires that each GPU be configured with maximum memory allocation.

Now that you created a Linux VM, we will boot the VM, and install the NVIDIA AI Enterprise Guest driver in the VM to fully enable GPU operation.

Important

Before proceeding with the NVIDIA Driver installation, please confirm that Nouveau is disabled. Instructions to confirm this are located here for Ubuntu and here.

Downloading the NVIDIA AI Enterprise Software Driver Using NGC

Important

Before you begin you will need to generate or use an existing API key.

  1. From a browser, go to https://ngc.nvidia.com/signin/email and then enter your email and password.

  2. In the top right corner, click your user account icon and select Setup.

  3. Click Get API Key to open the Setup > API Key page.

    Note

    The API Key is the mechanism used to authenticate your access to the NGC container registry.


  4. Click Generate API Key to generate your API key.

    Note

    A warning message appears to let you know that your old API key will become invalid if you create a new key.


  5. Click Confirm to generate the key.

  6. Your API key appears.

    Important

    You only need to generate an API Key once. NGC does not save your key, so store it in a secure place. (You can copy your API Key to the clipboard by clicking the copy icon to the right of the API key.)Should you lose your API Key, you can generate a new one from the NGC website. When you generate a new API Key, the old one is invalidated.


  7. Now you will log into the VM using the VM Console link on the left pane of this page.

  8. Run the following commands to install the NGC CLI for either AMD64 or ARM64

    AMD64 Linux Install: The NGC CLI binary for Linux is supported on Ubuntu 16.04 and later distributions.

    • Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:

    Copy
    Copied!
                

    wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_linux.zip && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc

    ARM64 Linux Install: The NGC CLI binary for ARM64 is supported on Ubuntu 18.04 and later distributions.

    • Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:

    Copy
    Copied!
                

    wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_arm64.zip && unzip ngccli_arm64.zip && chmod u+x ngc-cli/ngc

    Note

    The NGC CLI installations for Windows NGC CLI, Arm64 MacOs, or Intel MacOs can be found here

    Important

    The installation instructions for both AMD64 and ARM64 are the same in the below sections.

    • Check the binary’s MD5 hash to ensure the file wasn’t corrupted during download.

    Copy
    Copied!
                

    find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5

    • Add your current directory to path.

    Copy
    Copied!
                

    echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profile

    • You must configure NGC CLI for your use so that you can run the commands. Enter the following command, including your API key when prompted.

    Copy
    Copied!
                

    ngc config set Enter API key [no-apikey]. Choices: [<VALID_APIKEY>, 'no-apikey']: Enter CLI output format type [ascii]. Choices: [ascii, csv, json]: ascii Enter org [no-org]. Choices: ['no-org']: Enter team [no-team]. Choices: ['no-team']: Enter ace [no-ace]. Choices: ['no-ace']: Successfully saved NGC configuration to /home/$username/.ngc/config


Important

Follow the driver installation based on the operating system installed in the previous steps.

Installing the NVIDIA Driver using the .run file with Ubuntu

Installation of the NVIDIA AI Enterprise software driver for Linux requires:

  • Compiler toolchain

  • Kernel headers

  1. Log in to the VM and check for updates.

    Copy
    Copied!
                

    sudo apt-get update


  2. Install the gcc compiler and the make tool in the terminal.

    Copy
    Copied!
                

    sudo apt-get install build-essential


  3. Download the NVIDIA AI Enterprise Software Driver.

    Copy
    Copied!
                

    ngc registry resource download-version "nvaie/vgpu_guest_driver_2_1:510.73.08"


  4. Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.

    Copy
    Copied!
                

    cd vgpu_guest_driver_2_1:510.73.08 sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run


  5. From a console shell, run the driver installer as the root user, and accept defaults.

    Copy
    Copied!
                

    sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run


  6. Reboot the system.

    Copy
    Copied!
                

    sudo reboot


  7. After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.

    Copy
    Copied!
                

    nvidia-smi


After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.

Installing the NVIDIA Driver using the .run file with RHEL

Important

Before starting the driver install Secure Boot will need to be disabled as shown in Installing Red Hat Enterprise Linux 8.4

  1. Register machine to RHEL using subscription-manager with the command below.

    Copy
    Copied!
                

    subscription-manager register


  2. Satisfy the external dependency for EPEL for DKMS.

    Copy
    Copied!
                

    dnf install elfutils-libelf-devel "kernel-devel-uname-r ==$(uname -r)"


  3. For RHEL 8, ensure that the system has the correct Linux kernel sources from the Red Hat repositories.

    Copy
    Copied!
                

    dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)

    Note

    The NVIDIA driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 4.4.0, the 4.4.0 kernel headers and development packages must also be installed.


  4. Install additional dependencies for NVIDIA drivers.

    Copy
    Copied!
                

    dnf install elfutils-libelf-devel.x86_64 dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils libglvnd-devel


  5. Update the running kernel:

    Copy
    Copied!
                

    dnf install -y kernel kernel-core kernel-modules


  6. Confirm the system has the correct Linux kernel sources from the Red Hat repositories after update.

    Copy
    Copied!
                

    dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)


  7. Download the NVIDIA AI Enterprise Software Driver.

    Copy
    Copied!
                

    ngc registry resource download-version "nvaie/vgpu_guest_driver_2_1:510.73.08"


  8. Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.

    Copy
    Copied!
                

    sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run


  9. From the console shell, run the driver installer and accept defaults.

    Copy
    Copied!
                

    sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run

    Note

    Accept any warnings and ignore the CC version check


  10. Reboot the system.

    Copy
    Copied!
                

    sudo reboot


  11. After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.

    Copy
    Copied!
                

    nvidia-smi


After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.

To use an NVIDIA vGPU software licensed product, each client system to which a physical or virtual GPU is assigned must be able to obtain a license from the NVIDIA License System. A client system can be a VM that is configured with NVIDIA vGPU, a VM that is configured for GPU pass through, or a physical host to which a physical GPU is assigned in a bare-metal deployment.

Previous Advanced GPU configuration (Optional)
Next Installing Docker and The Docker Utility Engine for NVIDIA GPUs
© Copyright 2024, NVIDIA. Last updated on Apr 2, 2024.