To enhance the AI capabilities of your virtual machine (VMs) on a RHEL host, you can create multiple vGPU profiles from a physical GPU, and assign these devices to multiple guests. This is supported on select NVIDIA GPUs with RHEL KVM Virtualization, and only one mediated device can be assigned to a single guest.
NVIDIA AI Enterprise supports up to a maximum of 16 vGPUs per VM on Red Hat Enterprise Linux with KVM
Red Hat Enterprise Linux guest OS support is limited to running containers by using Docker without Kubernetes. NVIDIA AI Enterprise features that depend on Kubernetes, for example, the use of GPU Operator, are not supported on Red Hat Enterprise Linux.
NVIDIA vGPU technology makes it possible to divide a physical NVIDIA GPU device into multiple virtual devices. These mediated devices can then be assigned to multiple VMs as virtual GPUs. As a result, these VMs can share the performance of a single physical GPU.
Assigning a physical GPU to VMs, with or without using vGPU devices, makes it impossible for the host to use the GPU.
To setup the NVIDIA vGPU feature, please download NVIDIA vGPU drivers for your GPU device, create mediated devices, and assign them to the intended virtual machines. For detailed instructions, see below.
Please refer here for a list of supported GPUs for utilizing NVIDIA vGPU with RHEL KVM.
If you do not know which GPU your host is using, install the lshw package and use the lshw -C display command. The following example shows the system is using an NVIDIA A100 GPU, compatible with vGPU.
# lshw -C display

Now that we have verified the presence of an NVIDIA GPU on the host, we will install the NVIDIA AI Enterprise Guest driver within the VM to fully enable GPU operation.
NVIDIA Driver
The NVIDIA driver is the software driver that is installed on the OS and is responsible for communicating with the NVIDIA GPU.
NVIDIA AI Enterprise drivers are available by either downloading them from the NVIDIA Enterprise Licensing Portal, the NVIDIA Download Drivers web page, or pulling them from NGC Catalog. Please review the NVIDIA AI Enterprise Quick Start Guide for more details regarding licensing entitlement certificates.
Installing the NVIDIA Driver using CLS
This section will cover the steps required to properly install, configure, and license the NVIDIA driver for CLS users.
Now that you have installed Linux, the NVIDIA AI Enterprise Driver will fully enable GPU operation. Before proceeding with the NVIDIA Driver installation, please confirm that Nouveau is disabled. Instructions to confirm this are located in the RHEL section.
Downloading the NVIDIA AI Enterprise Software Driver Using NGC
Before you begin you will need to generate or use an existing API key.
From a browser, go to https://ngc.nvidia.com/signin/email and then enter your email and password.
In the top right corner, click your user account icon and select Setup.
Click Get API Key to open the Setup > API Key page.
NoteThe API Key is the mechanism used to authenticate your access to the NGC container registry.
Click Generate API Key to generate your API key.
NoteA warning message appears to let you know that your old API key will become invalid if you create a new key.
Click Confirm to generate the key.
Your API key appears.
ImportantYou only need to generate an API Key once. NGC does not save your key, so store it in a secure place. (You can copy your API Key to the clipboard by clicking the copy icon to the right of the API key.)Should you lose your API Key, you can generate a new one from the NGC website. When you generate a new API Key, the old one is invalidated.
Run the following commands to install the NGC CLI for AMD64
AMD64 Linux Install: The NGC CLI binary for Linux is supported on Ubuntu 16.04 and later distributions.
Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:
wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_linux.zip && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc
NoteThe NGC CLI installations for Windows NGC CLI, Arm64 MacOs, or Intel MacOs can be found here
Check the binary’s MD5 hash to ensure the file wasn’t corrupted during download.
$ md5sum -c ngc.md5
Add your current directory to path.
$ echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profile
You must configure NGC CLI for your use so that you can run the commands. Enter the following command, including your API key when prompted.
$ ngc config set Enter API key [no-apikey]. Choices: [<VALID_APIKEY>, 'no-apikey']: Enter CLI output format type [ascii]. Choices: [ascii, csv, json]: ascii Enter org [no-org]. Choices: ['no-org']: Enter team [no-team]. Choices: ['no-team']: Enter ace [no-ace]. Choices: ['no-ace']: Successfully saved NGC configuration to /home/$username/.ngc/config
Download the NVIDIA AI Enterprise Software Driver.
Installing the NVIDIA Driver using the .run file with RHEL
Before starting the driver install Secure Boot will need to be disabled as shown in Installing Red Hat Enterprise Linux 8.4 section.
Register machine to RHEL using subscription-manager with the command below.
$ subscription-manager register
Satisfy the external dependency for EPEL for DKMS.
$ dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
For RHEL 8, ensure that the system has the correct Linux kernel sources from the Red Hat repositories.
$ dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
NoteThe NVIDIA driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 4.4.0, the 4.4.0 kernel headers and development packages must also be installed.
Install additional dependencies for NVIDIA drivers.
$ dnf install elfutils-libelf-devel.x86_64 $ dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils libglvnd-devel
Update the running kernel:
$ dnf install -y kernel kernel-core kernel-modules
Confirm the system has the correct Linux kernel sources from the Red Hat repositories after update.
$ dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Download the NVIDIA AI Enterprise Software Driver.
$ ngc registry resource download-version "nvaie/vgpu_guest_driver_x_x:xxx.xx.xx"
NoteWhere
x_x:xxx.xx.xx
is the current driver version from NGC Enterprise Catalog.
Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command.
$ sudo chmod +x NVIDIA-Linux-x86_64-xxx.xx.xx-grid.run
NoteWhere
xxx.xx.xx
is the current driver version from NGC Enterprise Catalog.
From the console shell, run the driver installer and accept defaults.
$ sudo sh ./NVIDIA-Linux-x86_64-xxx.xx.xx-grid.run
NoteWhere
xxx.xx.xx
is the current driver version from NGC Enterprise Catalog.NoteAccept any warnings and ignore the CC version check
Reboot the system.
$ sudo reboot
After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from nvidia-smi.
$ nvidia-smi
After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.
Once completed, check that the kernel has loaded the nvidia_vgpu_vfio module and that the nvidia-vgpu-mgr.service service is running.
# lsmod | grep nvidia_vgpu_vfio
nvidia_vgpu_vfio 45011 0
nvidia 14333621 10 nvidia_vgpu_vfio
mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio
vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
# systemctl status nvidia-vgpu-mgr.service
nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago
Main PID: 1553 (nvidia-vgpu-mgr)
[...]
If creating a vGPU based on an NVIDIA Ampere GPU device, ensure that virtual functions are enabled for the physical GPU. For instructions, please refer to Creating an NVIDIA vGPU that supports SR-IOV on Linux with KVM Hypervisor.
Generate a device UUID.
# uuidgen
Example result.
30820a6f-b1a5-4503-91ca-0c10ba58692a
Prepare an XML file with a configuration of the mediated device, based on the detected GPU hardware. For example, the following configures a mediated device of the nvidia-321
vGPU type on an NVIDIA T4 card that runs on the 0000:65:00.0
PCI bus and uses the UUID generated in the previous step.
<device>
<parent>pci_0000_65_00_0</parent>
<capability type="mdev">
<type id="nvidia-321"/>
<uuid>d7462e1e-eda8-4f59-b760-6cecd9ec3671</uuid>
</capability>
</device>

To find your vGPU profile name and description, navigate to mdev_supported_types and list the description and name. An example is shown below using the NVIDIA T4 with profile name nvidia-321
that corresponds to a T4-16C NVIDIA vGPU profile.
cd /sys/bus/pci/devices/0000:65:00.0/mdev_supported_types/nvidia-321

For more information on how to locate the correct profile name for various NVIDIA vGPU profiles, please refer here.
Define a vGPU mediated device based on the XML file you prepared. For example:
# virsh nodedev-define vgpu.xml
Verify that the mediated device is listed as inactive.
# virsh nodedev-list --cap mdev --inactive
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Start the vGPU mediated device you created.
# virsh nodedev-start mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 started
Ensure that the mediated device is listed as active.
# virsh nodedev-list --cap mdev
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Set the vGPU device to start automatically after the host reboots.
# virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Device mdev_d196754e_d8ed_4f43_bf22_684ed698b08b_0000_9b_00_0 marked as autostarted
Attach the mediated device to a VM that you want to share the vGPU resources. To do so, add the following lines, along with the previously generated UUID, to the <devices/> sections in the XML configuration of the VM.
First navigate to where the VM’s XML configurations are located at path:
cd /etc/libvirt/qemu/
Then nano into the XML configuration and look for the <devices/> section to add the following to the VM’s XML configuration.
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='d7462e1e-eda8-4f59-b760-6cecd9ec3671'/>
</source>
</hostdev>

Each UUID can only be assigned to one VM at a time. In addition, if the VM does not have QEMU video devices, such as virtio-vga, add also the ramfb=’on’ parameter on the <hostdev> line.
Now we will verify the capabilities of the vGPU created, and ensure it is listed as active and persistent.
# virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Name: virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Parent: pci_0000_01_00_0
Active: yes
Persistent: yes
Autostart: yes
Start the VM and verify that the guest operating system detects the mediated device as an NVIDIA GPU. For example:
# lspci -d 10de: -k

After installing the NVIDIA vGPU compute driver, you can license any NVIDIA AI Enterprise Software licensed products you are using.
For additional information on how to manage your NVIDIA vGPU within the KVM hypevisor, please refer to NVIDIA vGPU software documentation in addition to the man virsh command for managing guests with virsh.
Please refer to the NVIDIA vGPU release notes for instructions on how to change the vGPU scheduling policy for time-sliced vGPUs.
To use an NVIDIA vGPU software licensed product, each client system to which a physical or virtual GPU is assigned must be able to obtain a license from the NVIDIA License System. A client system can be a VM that is configured with NVIDIA vGPU, a VM that is configured for GPU pass through, or a physical host to which a physical GPU is assigned in a bare-metal deployment.