Installing and Configuring NVIDIA AI Enterprise Host Software

This section covers installing and configuring the NVIDIA AI Enterprise Host Software:

  • Preparing the VIB file for Install

  • Uploading VIB in vSphere Client

  • Installing NVIDIA AI Enterprise Host Software with the VIB

  • Updating the VIB

  • Verifying the Installation of the VIB

  • Uninstalling the VIB

  • Changing the Default Graphics Type in VMWare vSphere

Before you begin, download the archive containing the VIB file and extract the archive contents to a folder. The file ending with .VIB is the file that you must upload to the host data store for installation. For demonstration purposes, these steps use the VMWare vSphere web interface to upload the VIB to the server host.

To upload the VIB file to the data store using vSphere Web Client:

  1. Select the host server and select the Datastores tab.

  2. Right-click the data store and then select Browse Files. The Datastore Browser window displays.

    dg-vgpu-01.png


  3. Click the New Folder icon. The Create a new folder window displays.

  4. Name the new folder VIB and then click OK.

    dg-vgpu-02.png


  5. Select the VIB folder in the Datastore Browser window.

  6. Click the Upload Files button and navigate to the VIB file. Double click on the file to upload. A progress bar should display below. If the operation fails, press Details and follow the instructions to bypass the certificate manually.

    dg-vgpu-03.png


The .VIB file is uploaded to the data store on the host.

Note

If you do not click Allow before the timer runs out, further attempts to upload a file will silently fail. If this happens, exit and restart vSphere Web Client. Repeat this procedure and be sure to click Allow before the timer runs out.


The NVIDIA AI Enterprise Host Software runs on the ESXi host. It is provided in the following formats:

  • As a VIB file, which must be copied to the ESXi host and then installed

  • As an offline bundle that you can import manually as explained in Import Patches Manually

Note

To install the NVIDIA AI Enterprise Host Software (VIB), you need to access the ESXi host via the ESXi Shell or SSH. Refer to VMware’s documentation on how to Enable Access to ESXi Shell or SSH.

Note

Before proceeding with the NVIDIA AI Enterprise Host Software installation, ensure that all VMs are powered off, and the ESXi host is placed in maintenance mode. Refer to VMware’s documentation on how to Place a ESXi Host in Maintenance Mode.

  1. Place the host into Maintenance mode by right-clicking it and then selecting Maintenance Mode - Enter Maintenance Mode.

    dg-vgpu-04.png

    Note

    Alternatively, you can place the host into Maintenance mode using the command prompt by entering:

    Copy
    Copied!
                

    esxcli system maintenanceMode set --enable=true

    This command will not return a response. Making this change using the command prompt will not refresh the vSphere Web Client UI. Click the Refresh icon in the upper right corner of the vSphere Web Client window.

    Important

    Placing the host into maintenance mode disables any vCenter appliance running on this host until you exit maintenance mode and restart that vCenter appliance.


  2. Click OK to confirm your selection.

  3. Use the esxcli command to install the NVIDIA AI Enterprise Host Software package:

    Copy
    Copied!
                

    [root@esxi:~] esxcli software vib install -v directory/NVIDIA-AIE_ESXi_6.7.0_Driver_470.105-1OEM.670.0.0.8169922.vib Installation Result Message: Operation finished successfully. Reboot Required: false VIBs Installed: NVIDIA-AIE_ESXi_6.7.0_Driver_470.105-1OEM.670.0.0.8169922 VIBs Removed: VIBs Skipped:

    The directory is the absolute path to the directory that contains the VIB file. You must specify the absolute path even if the VIB file is in the current working directory. Do not include the ds:/// term in the absolute file path. Instead, start the file path with /vmfs/volumes/... etc.

  4. From the vSphere Web Client, exit Maintenance Mode by right-clicking the host and selecting Exit Maintenance Mode.

    Note

    Although the display states Reboot Required: false, a reboot is necessary for the VIB to load and xorg to start.

    Note

    Alternatively, you may exit from Maintenance mode via the command prompt by entering:

    Copy
    Copied!
                

    esxcli system maintenanceMode set --enable=false

    This command will not return a response. Making this change via the command prompt will not refresh the vSphere Web Client UI. Click the Refresh icon in the upper right corner of the vSphere Web Client window.


  5. Reboot the host from the vSphere Web Client by right-clicking the host and then selecting Reboot.

    Note

    You can reboot the host by entering the following at the command prompt:

    Copy
    Copied!
                

    reboot

    This command will not return a response. The Reboot Host window displays.


  6. When rebooting from the vSphere Web Client, enter a descriptive reason for the reboot in the Log a reason for this reboot operation field, and then click OK to proceed.

Update the NVIDIA AI Enterprise Host Software package if you want to install a new version of NVIDIA AI Enterprise Host Software on a system where an existing version is already installed.

  • To update the NVIDIA AI Enterprise Host Software (VIB), you need to access the ESXi host via the ESXi Shell or SSH. Refer to VMware’s documentation on how to enable ESXi Shell or SSH for an ESXi host.

  • The driver version seen within this document is for demonstration purposes. There will be similarities, albeit minor differences, within your local environment.

    Note

    Before proceeding with the NVIDIA AI Enterprise Host Software update, ensure that all VMs are powered off, and the ESXi host is placed in maintenance mode. Refer to VMware’s documentation on how to place an ESXi host in maintenance mode.


  1. Use the esxcli command to update the NVIDIA AI Enterprise Host Software package:

    Copy
    Copied!
                

    [root@esxi:~] esxcli software vib update -v directory/NVIDIA-AIE_ESXi_6.7.0_Driver_470.105-1OEM.670.0.0.8169922.vib Installation Result Message: Operation finished successfully. Reboot Required: false VIBs Installed: NVIDIA-AIE_ESXi_6.7.0_Driver_470.105-1OEM.670.0.0.8169922 VIBs Removed: NVIDIA-vGPU- VMware_ESXi_6.0_Host_Driver_390.57-1OEM.600.0.0.2159203 VIBs Skipped:


  2. Reboot the ESXi host and remove it from maintenance mode.

After the ESXi host has rebooted, verify the installation of the NVIDIA vGPU software package. You can also view the version of the driver with the steps below.

  1. Verify that the NVIDIA vGPU software package installed and loaded correctly by checking for the NVIDIA kernel driver in the list of kernels loaded modules.

    Copy
    Copied!
                

    [root@esxi:~] vmkload_mod -l | grep nvidia nvidia 5 8420


  2. If the NVIDIA driver is not listed in the output, check dmesg for any load-time errors reported by the driver.

  3. Verify that the NVIDIA kernel driver can successfully communicate with the NVIDIA physical GPUs in your system by running the nvidia-smi command.

    Running the nvidia-smi command should produce a listing of the GPUs in your platform.

    Copy
    Copied!
                

    [root@esxi:~] nvidia-smi Wen January 19 10:10:15 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.105 Driver Version: 470.105 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:1A:00.0 Off | 0 | | N/A 38C P8 17W / 70W | 83MiB / 15359MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla T4 On | 00000000:3B:00.0 Off | 0 | | N/A 37C P8 16W / 70W | 75MiB / 15359MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla T4 On | 00000000:87:00.0 Off | 0 | | N/A 34C P8 16W / 70W | 75MiB / 15359MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla T4 On | 00000000:AF:00.0 Off | 0 | | N/A 38C P8 16W / 70W | 75MiB / 15359MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 4 Tesla T4 On | 00000000:D8:00.0 Off | 0 | | N/A 36C P8 16W / 70W | 75MiB / 15359MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+


If nvidia-smi fails to report the expected output for all the NVIDIA GPUs in your system, see NVIDIA AI Enterprise User Guide for troubleshooting steps.

The NVIDIA System Management Interface nvidia-smi also allows GPU monitoring using the following command:

Copy
Copied!
            

nvidia-smi -l


This command switch adds a loop, automatically refreshing the display. The default refresh interval is 1 second.

To uninstall NVIDIA AI Enterprise Host Software:

  1. Run esxcli to determine the name of the vGPU driver bundle.

    Copy
    Copied!
                

    esxcli software vib list | grep -i nvidia NVIDIA-AIE_ESXi_7.0.2_Driver_470.63-1OEM.702.0.0.17630552 NVIDIA VMwareAccepted 2022-01-019


  2. Run the following command to uninstall the driver package:

    Copy
    Copied!
                

    esxcli software vib remove -n NVIDIA-AIE_ESXi_7.0.2_Driver_470.63-1OEM.702.0.0.17630552 - maintenance-mode


The following message displays if the uninstall process is successful:

Copy
Copied!
            

Removal Result Message: Operation finished successfully. Reboot Required: false VIBs Installed: VIBs Removed: NVIDIA-AIE_ESXi_7.0.2_Driver_470.63-1OEM.702.0.0.17630552 VIBs Skipped:


Reboot the host to complete the uninstall of the NVIDIA AI Enterprise Host Software.

The NVIDIA AI Enterprise Host Software (VIB) for VMware vSphere provides Virtual Shared Graphics Acceleration (vSGA) and vGPU functionality in a single VIB. After this VIB is installed, the default graphics type is Shared, which provides vSGA functionality. To enable vGPU support for VMs in VMware vSphere, you must change the default graphics type to Shared Direct. If you do not modify the default graphics type, VMs to which a vGPU is assigned fail to start, and the following error message is displayed:

The amount of graphics resources available in the parent resource pool is insufficient for the operation.

Change the default graphics type before configuring vGPU. Output from the VM console in the VMware vSphere Web Client is not available for VMs that are running vGPU. Before changing the default graphics type, ensure that the ESXi host is running and that all VMs on the host is powered off.

  1. Log in to vCenter Server by using the vSphere Web Client.

  2. In the navigation tree, select your ESXi host and click the Configure tab.

  3. From the menu, choose Graphics and then click the Host Graphics tab.

  4. On the Host Graphics tab, click Edit.

    dg-vgpu-05.png


  5. In the Edit Host Graphics Settings dialog box that opens, select Shared Direct and click OK.

    dg-vgpu-06.png

    Note

    This dialog box also lets you change the allocation scheme for vGPU-enabled VMs. For more information, see Modifying GPU Allocation Policy on VMware vSphere.


  6. After you click OK, the default graphics type changes to Shared Direct.

  7. Either restart the ESXi host, or stop and restart the Xorg service and nv-hostengine on the ESXi host. To stop and restart the Xorg service and nv-hostengine, perform these steps:

    • Stop the Xorg service.

      Copy
      Copied!
                  

      [root@esxi:~] /etc/init.d/xorg stop


    • Stop nv-hostengine.

      Copy
      Copied!
                  

      [root@esxi:~] nv-hostengine -t


    • Wait for 1 second to allow nv-hostengine to stop,

    • Start nv-hostengine.

      Copy
      Copied!
                  

      [root@esxi:~] nv-hostengine -d


    • Start the Xorg service.

      Copy
      Copied!
                  

      [root@esxi:~] /etc/init.d/xorg start


After changing the default graphics type, configure vGPU as needed in Configuring a vSphere VM with Virtual GPU.

See also the following topics in VMware vSphere documentation:

Previous Installing VMware vCenter Server
Next NVIDIA License System
© Copyright 2024, NVIDIA. Last updated on Apr 2, 2024.