Virtual GPU Software User Guide
Virtual GPU Software User Guide
Documentation for administrators that explains how to install and configure NVIDIA Virtual GPU manager, configure virtual GPU software in pass-through mode, and install drivers on guest operating systems.
NVIDIA vGPU software is a graphics virtualization platform that provides virtual machines (VMs) access to NVIDIA GPU technology.
1.1. How NVIDIA vGPU Software Is Used
NVIDIA vGPU software can be used in several ways.
1.1.1. NVIDIA vGPU
NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous, direct access to a single physical GPU, using the same NVIDIA graphics drivers that are deployed on non-virtualized operating systems. By doing this, NVIDIA vGPU provides VMs with unparalleled graphics performance, compute performance, and application compatibility, together with the cost-effectiveness and scalability brought about by sharing a GPU among multiple workloads.
For more information, see Installing and Configuring NVIDIA Virtual GPU Manager.
1.1.2. GPU Pass-Through
In GPU pass-through mode, an entire physical GPU is directly assigned to one VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU is accessed exclusively by the NVIDIA driver running in the VM to which it is assigned. The GPU is not shared among VMs.
For more information, see Using GPU Pass-Through.
1.1.3. Bare-Metal Deployment
In a bare-metal deployment, you can use NVIDIA vGPU software graphics drivers with vWS and vApps licenses to deliver remote virtual desktops and applications. If you intend to use Tesla boards without a hypervisor for this purpose, use NVIDIA vGPU software graphics drivers, not other NVIDIA drivers.
To use NVIDIA vGPU software drivers for a bare-metal deployment, complete these tasks:
-
Install the driver on the physical host.
For instructions, see Installing the NVIDIA vGPU Software Graphics Driver.
-
License any NVIDIA vGPU software that you are using.
For instructions, see Virtual GPU Client Licensing User Guide.
-
Configure the platform for remote access.
To use graphics features with Tesla GPUs, you must use a supported remoting solution, for example, RemoteFX, Citrix Virtual Apps and Desktops, VNC, or similar technology.
-
Use the display settings feature of the host OS to configure the Tesla GPU as the primary display.
NVIDIA Tesla generally operates as a secondary device on bare-metal platforms.
-
If the system has multiple display adapters, disable display devices connected through adapters that are not from NVIDIA.
You can use the display settings feature of the host OS or the remoting solution for this purpose. On NVIDIA GPUs, including Tesla GPUs, a default display device is enabled.
Users can launch applications that require NVIDIA GPU technology for enhanced user experience only after displays that are driven by NVIDIA adapters are enabled.
1.2. Primary Display Adapter Requirements for NVIDIA vGPU Software Deployments
The GPU that is set as the primary display adapter cannot be used for NVIDIA vGPU deployments or GPU pass through deployments. The primary display is the boot display of the hypervisor host, which displays SBIOS console messages and then boot of the OS or hypervisor.
Any GPU that is being used for NVIDIA vGPU deployments or GPU pass through deployments must be set as a secondary display adapter.
XenServer provides a specific setting to allow the primary display adapter to be used for GPU pass through deployments.
Only the following GPUs are supported as the primary display adapter:
- Tesla M6
- Quadro RTX 6000
- Quadro RTX 8000
All other GPUs that support NVIDIA vGPU software cannot function as the primary display adapter because they are 3D controllers, not VGA devices.
If the hypervisor host does not have an extra graphics adapter, consider installing a low-end display adapter to be used as the primary display adapter. If necessary, ensure that the primary display adapter is set correctly in the BIOS options of the hypervisor host.
1.3. NVIDIA vGPU Software Features
NVIDIA vGPU software includes vWS, vPC, and vApps.
1.3.1. API Support on NVIDIA vGPU
NVIDIA vGPU includes support for the following APIs:
- Open Computing Language (OpenCL™ software) 3.0
- OpenGL® 4.6
- Vulkan® 1.3
- DirectX 11
- DirectX 12 (Windows 10)
- Direct2D
- DirectX Video Acceleration (DXVA)
- NVIDIA® CUDA® 12.4
- NVIDIA vGPU software SDK (remote graphics acceleration)
- NVIDIA RTX (on GPUs based on the NVIDIA Volta graphic architecture and later architectures)
These APIs are backwards compatible. Older versions of the API are also supported.
1.3.2. NVIDIA CUDA Toolkit and OpenCL Support on NVIDIA vGPU Software
NVIDIA CUDA Toolkit and OpenCL are supported with NVIDIA vGPU only on a subset of vGPU types and supported GPUs.
For more information about NVIDIA CUDA Toolkit, see CUDA Toolkit Documentation 12.4.
If you are using NVIDIA vGPU software with CUDA on Linux, avoid conflicting installation methods by installing CUDA from a distribution-independent runfile package. Do not install CUDA from a distribution-specific RPM or Deb package.
To ensure that the NVIDIA vGPU software graphics driver is not overwritten when CUDA is installed, deselect the CUDA driver when selecting the CUDA components to install.
For more information, see NVIDIA CUDA Installation Guide for Linux.
OpenCL and CUDA Application Support
OpenCL and CUDA applications are supported on the following NVIDIA vGPU types:
- The 8Q vGPU type on the Tesla M10 GPU
- All Q-series vGPU types on the following GPUs:
- NVIDIA L2
- NVIDIA L4
- NVIDIA L20
- NVIDIA L40
- NVIDIA L40S
- NVIDIA RTX 5000 Ada
- NVIDIA RTX 6000 Ada
- NVIDIA A2
- NVIDIA A10
- NVIDIA A16
- NVIDIA A40
- NVIDIA RTX A5000
- NVIDIA RTX A5500
- NVIDIA RTX A6000
- Tesla V100 SXM2
- Tesla V100 SXM2 32GB
- Tesla V100 PCIe
- Tesla V100 PCIe 32GB
- Tesla V100S PCIe 32GB
- Tesla V100 FHHL
- Tesla T4
- Quadro RTX 6000
- Quadro RTX 6000 passive
- Quadro RTX 8000
- Quadro RTX 8000 passive
NVIDIA CUDA Toolkit Development Tool Support
NVIDIA vGPU supports the following NVIDIA CUDA Toolkit development tools on some GPUs:
- Debuggers:
- CUDA-GDB
- Compute Sanitizer
- Profilers:
- The Activity, Callback, and Profiling APIs of the CUDA Profiling Tools Interface (CUPTI)
Other CUPTI APIs, such as the Event and Metric APIs, are not supported.
- NVIDIA Nsight™ Compute
- NVIDIA Nsight Systems
- NVIDIA Nsight plugin
- NVIDIA Nsight Visual Studio plugin
Other CUDA profilers, such as nvprof and NVIDIA Visual Profiler, are not supported.
- The Activity, Callback, and Profiling APIs of the CUDA Profiling Tools Interface (CUPTI)
These tools are supported only in Linux guest VMs.
NVIDIA CUDA Toolkit profilers are supported and can be enabled on a VM for which unified memory is enabled.
By default, NVIDIA CUDA Toolkit development tools are disabled on NVIDIA vGPU. If used, you must enable NVIDIA CUDA Toolkit development tools individually for each VM that requires them by setting vGPU plugin parameters. For instructions, see Enabling NVIDIA CUDA Toolkit Development Tools for NVIDIA vGPU.
The following table lists the GPUs on which NVIDIA vGPU supports these debuggers and profilers.
GPU | vGPU Mode | Debuggers | Profilers |
---|---|---|---|
NVIDIA L2 | Time-sliced | ✓ | ✓ |
NVIDIA L4 | Time-sliced | ✓ | ✓ |
NVIDIA L20 | Time-sliced | ✓ | ✓ |
NVIDIA L40 | Time-sliced | ✓ | ✓ |
NVIDIA L40S | Time-sliced | ✓ | ✓ |
NVIDIA RTX 5000 Ada | Time-sliced | ✓ | ✓ |
NVIDIA RTX 6000 Ada | Time-sliced | ✓ | ✓ |
NVIDIA A2 | Time-sliced | ✓ | ✓ |
NVIDIA A10 | Time-sliced | ✓ | ✓ |
NVIDIA A16 | Time-sliced | ✓ | ✓ |
NVIDIA A40 | Time-sliced | ✓ | ✓ |
NVIDIA RTX A5000 | Time-sliced | ✓ | ✓ |
NVIDIA RTX A5500 | Time-sliced | ✓ | ✓ |
NVIDIA RTX A6000 | Time-sliced | ✓ | ✓ |
Tesla T4 | Time-sliced | ✓ | ✓ |
Quadro RTX 6000 | Time-sliced | ✓ | ✓ |
Quadro RTX 6000 passive | Time-sliced | ✓ | ✓ |
Quadro RTX 8000 | Time-sliced | ✓ | ✓ |
Quadro RTX 8000 passive | Time-sliced | ✓ | ✓ |
Tesla V100 SXM2 | Time-sliced | ✓ | ✓ |
Tesla V100 SXM2 32GB | Time-sliced | ✓ | ✓ |
Tesla V100 PCIe | Time-sliced | ✓ | ✓ |
Tesla V100 PCIe 32GB | Time-sliced | ✓ | ✓ |
Tesla V100S PCIe 32GB | Time-sliced | ✓ | ✓ |
Tesla V100 FHHL | Time-sliced | ✓ | ✓ |
✓ Feature is supported
- Feature is not supported
Supported NVIDIA CUDA Toolkit Features
NVIDIA vGPU supports the following NVIDIA CUDA Toolkit features if the vGPU type, physical GPU, and the hypervisor software version support the feature:
- Error-correcting code (ECC) memory
- Peer-to-peer CUDA transfers over NVLink
Note:
To determine the NVLink topology between physical GPUs in a host or vGPUs assigned to a VM, run the following command from the host or VM:
$ nvidia-smi topo -m
- Unified Memory
Note:
Unified memory is disabled by default. If used, you must enable unified memory individually for each vGPU that requires it by setting a vGPU plugin parameter. For instructions, see Enabling Unified Memory for a vGPU.
- NVIDIA Nsight Systems GPU context switch trace
Dynamic page retirement is supported for all vGPU types on physical GPUs that support ECC memory, even if ECC memory is disabled on the physical GPU.
NVIDIA CUDA Toolkit Features Not Supported by NVIDIA vGPU
NVIDIA vGPU does not support the NVIDIA Nsight Graphics feature of NVIDIA CUDA Toolkit.
The NVIDIA Nsight Graphics feature is supported in GPU pass-through mode and in bare-metal deployments.
1.3.3. Additional vWS Features
In addition to the features of vPC and vApps, vWS provides the following features:
- Workstation-specific graphics features and accelerations
- Certified drivers for professional applications
- GPU pass through for workstation or professional 3D graphics
In pass-through mode, vWS supports multiple virtual display heads at resolutions up to 8K and flexible virtual display resolutions based on the number of available pixels. For details, see Display Resolutions for Physical GPUs.
- 10-bit color for Windows users. (HDR/10-bit color is not currently supported on Linux, NvFBC capture is supported but deprecated.)
1.3.4. NVIDIA GPU Cloud (NGC) Containers Support on NVIDIA vGPU Software
NVIDIA vGPU software supports NGC containers in NVIDIA vGPU and GPU pass-through deployments on all supported hypervisors.
In NVIDIA vGPU deployments, Q-series vGPU types are supported only on GPUs based on NVIDIA GPU architectures after the Maxwell architecture.
In GPU pass-through deployments, all GPUs based on NVIDIA GPU architectures after the NVIDIA Maxwell™ architecture that support NVIDIA vGPU software are supported.
NVIDIA vGPU software supports NGC containers on any guest operating system listed in Supported Platforms - NVIDIA Container Toolkit that is also supported by NVIDIA vGPU software.
For more information about setting up NVIDIA vGPU software for use with NGC containers, see Using NGC with NVIDIA Virtual GPU Software Setup Guide.
1.3.5. NVIDIA GPU Operator Support
NVIDIA GPU Operator simplifies the deployment of NVIDIA vGPU software on software container platforms that are managed by the Kubernetes container orchestration engine. It automates the installation and update of NVIDIA vGPU software graphics drivers for container platforms running in guest VMs that are configured with NVIDIA vGPU.
Any drivers to be installed by NVIDIA GPU Operator must be downloaded from the NVIDIA Licensing Portal to a local computer. Automated access to the NVIDIA Licensing Portal by NVIDIA GPU Operator is not supported.
NVIDIA GPU Operator supports automated configuration of NVIDIA vGPU software and provides telemetry support through DCGM Exporter running in a guest VM.
NVIDIA GPU Operator is supported only on specific combinations of hypervisor software release, container platform, vGPU type, and guest OS release. To determine if your configuration supports NVIDIA GPU Operator with NVIDIA vGPU deployments, consult the release notes for your chosen hypervisor at NVIDIA Virtual GPU Software Documentation.
For more information, see NVIDIA GPU Operator Overview on the NVIDIA documentation portal.
1.4. How this Guide Is Organized
Virtual GPU Software User Guide is organized as follows:
- This chapter introduces the capabilities and features of NVIDIA vGPU software.
- Installing and Configuring NVIDIA Virtual GPU Manager provides a step-by-step guide to installing and configuring vGPU on supported hypervisors.
- Using GPU Pass-Through explains how to configure a GPU for pass-through on supported hypervisors.
- Installing the NVIDIA vGPU Software Graphics Driver explains how to install NVIDIA vGPU software graphics driver on Windows and Linux operating systems.
- Licensing an NVIDIA vGPU explains how to license NVIDIA vGPU licensed products on Windows and Linux operating systems.
- Modifying a VM's NVIDIA vGPU Configuration explains how to remove a VM’s vGPU configuration and modify GPU assignments for vGPU-enabled VMs.
- Monitoring GPU Performance covers performance monitoring of physical GPUs and virtual GPUs from the hypervisor and from within individual guest VMs.
- Changing Scheduling Behavior for Time-Sliced vGPUs describes the scheduling behavior of NVIDIA vGPUs and how to change it.
- Troubleshooting provides guidance on troubleshooting.
- Virtual GPU Types Reference provides details of each vGPU available from each supported GPU and provides examples of mixed virtual display configurations for B-series and Q-series vGPUs.
- Configuring x11vnc for Checking the GPU in a Linux Server explains how to use x11vnc to confirm that the NVIDIA GPU in a Linux server to which no display devices are directly connected is working as expected.
- Disabling NVIDIA Notification Icon for Citrix Published Application User Sessions explains how to ensure that the NVIDIA Notification Icon application does not prevent the Citrix Published Application user session from being logged off even after the user has quit all ot
- XenServer Basics explains how to perform basic operations on XenServer to install and configure NVIDIA vGPU software and optimize XenServer operation with vGPU.
- XenServer vGPU Management covers vGPU management on XenServer.
- XenServer Performance Tuning covers vGPU performance optimization on XenServer.
The process for installing and configuring NVIDIA Virtual GPU Manager depends on the hypervisor that you are using. After you complete this process, you can install the display drivers for your guest OS and license any NVIDIA vGPU software licensed products that you are using.
2.1. About NVIDIA Virtual GPUs
2.1.1. NVIDIA vGPU Architecture
The high-level architecture of NVIDIA vGPU is illustrated in Figure 1. Under the control of the NVIDIA Virtual GPU Manager running under the hypervisor, NVIDIA physical GPUs are capable of supporting multiple virtual GPU devices (vGPUs) that can be assigned directly to guest VMs.
Guest VMs use NVIDIA vGPUs in the same manner as a physical GPU that has been passed through by the hypervisor: an NVIDIA driver loaded in the guest VM provides direct access to the GPU for performance-critical fast paths, and a paravirtualized interface to the NVIDIA Virtual GPU Manager is used for non-performant management operations.
Figure 1. NVIDIA vGPU System Architecture
Each NVIDIA vGPU is analogous to a conventional GPU, having a fixed amount of GPU framebuffer, and one or more virtual display outputs or "heads". The vGPU’s framebuffer is allocated out of the physical GPU’s framebuffer at the time the vGPU is created, and the vGPU retains exclusive use of that framebuffer until it is destroyed. Depending on the physical GPU and the GPU virtualization software, NVIDIA Virtual GPU Manager supports different types of vGPU on a physical GPU:
- On all GPUs that support NVIDIA vGPU software, time-sliced vGPUs can be created.
- Additionally, on GPUs that support the Multi-Instance GPU (MIG) feature and NVIDIA AI Enterprise, MIG-backed vGPUs are supported. The MIG feature is introduced on GPUs that are based on the NVIDIA Ampere GPU architecture.
Note:
Although earlier releases of NVIDIA vGPU software supported GPUs that support the MIG feature, such GPUs are not supported on this release of NVIDIA vGPU software. GPUs that support the MIG feature are supported only on NVIDIA AI Enterprise.
2.1.1.1. Time-Sliced NVIDIA vGPU Internal Architecture
A time-sliced vGPU is a vGPU that resides on a physical GPU that is not partitioned into multiple GPU instances. All time-sliced vGPUs resident on a GPU share access to the GPU’s engines including the graphics (3D), video decode, and video encode engines.
In a time-sliced vGPU, processes that run on the vGPU are scheduled to run in series. Each vGPU waits while other processes run on other vGPUs. While processes are running on a vGPU, the vGPU has exclusive use of the GPU's engines. You can change the default scheduling behavior as explained in Changing Scheduling Behavior for Time-Sliced vGPUs.
Figure 2. Time-Sliced NVIDIA vGPU Internal Architecture
2.1.2. About Virtual GPU Types
The number of physical GPUs that a board has depends on the board. Each physical GPU can support several different types of virtual GPU (vGPU). vGPU types have a fixed amount of frame buffer, number of supported display heads, and maximum resolutions1. They are grouped into different series according to the different classes of workload for which they are optimized. Each series is identified by the last letter of the vGPU type name.
Series | Optimal Workload |
---|---|
Q-series | Virtual workstations for creative and technical professionals who require the performance and features of Quadro technology |
B-series | Virtual desktops for business professionals and knowledge workers |
A-series | App streaming or session-based solutions for virtual applications users4 |
The number after the board type in the vGPU type name denotes the amount of frame buffer that is allocated to a vGPU of that type. For example, a vGPU of type A16-4Q is allocated 4096 Mbytes of frame buffer on an NVIDIA A16 board.
Due to their differing resource requirements, the maximum number of vGPUs that can be created simultaneously on a physical GPU varies according to the vGPU type. For example, an NVDIA A16 board can support up to 4 A16-4Q vGPUs on each of its two physical GPUs, for a total of 16 vGPUs, but only 2 A16-8Q vGPUs, for a total of 8 vGPUs. When enabled, the frame-rate limiter (FRL) limits the maximum frame rate in frames per second (FPS) for a vGPU as follows:
- For B-series vGPUs, the maximum frame rate is 45 FPS.
- For Q-series and A-series vGPUs, the maximum frame rate is 60 FPS.
By default, the FRL is enabled for all GPUs. The FRL is disabled when the vGPU scheduling behavior is changed from the default best-effort scheduler on GPUs that support alternative vGPU schedulers. For details, see Changing Scheduling Behavior for Time-Sliced vGPUs. On vGPUs that use the best-effort scheduler, the FRL can be disabled as explained in the release notes for your chosen hypervisor at NVIDIA Virtual GPU Software Documentation.
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license is required to enable all vGPU features within the guest VM. The type of license required depends on the vGPU type.
- Q-series vGPU types require a vWS license.
- B-series vGPU types require a vPC license but can also be used with a vWS license.
- A-series vGPU types require a vApps license.
For details of the virtual GPU types available from each supported GPU, see Virtual GPU Types for Supported GPUs.
2.1.3. Virtual Display Resolutions for Q-series and B-series vGPUs
Instead of a fixed maximum resolution per display, Q-series and B-series vGPUs support a maximum combined resolution based on the number of available pixels, which is determined by their frame buffer size. You can choose between using a small number of high resolution displays or a larger number of lower resolution displays with these vGPUs.
The number of virtual displays that you can use depends on a combination of the following factors:
- Virtual GPU series
- GPU architecture
- vGPU frame buffer size
- Display resolution
You cannot use more than the maximum number of displays that a vGPU supports even if the combined resolution of the displays is less than the number of available pixels from the vGPU. For example, because -0Q and -0B vGPUs support a maximum of only two displays, you cannot use four 1280×1024 displays with these vGPUs even though the combined resolution of the displays (6220800) is less than the number of available pixels from these vGPUs (8192000).
Various factors affect the consumption of the GPU frame buffer, which can impact the user experience. These factors include and are not limited to the number of displays, display resolution, workload and applications deployed, remoting solution, and guest OS. The ability of a vGPU to drive a certain combination of displays does not guarantee that enough frame buffer remains free for all applications to run. If applications run out of frame buffer, consider changing your setup in one of the following ways:
- Switching to a vGPU type with more frame buffer
- Using fewer displays
- Using lower resolution displays
The maximum number of displays per vGPU listed in Virtual GPU Types for Supported GPUs is based on a configuration in which all displays have the same resolution. For examples of configurations with a mixture of display resolutions, see Mixed Display Configurations for B-Series and Q-Series vGPUs.
2.1.4. Valid Time-Sliced Virtual GPU Configurations on a Single GPU
NVIDIA vGPU software supports a mixture of different types of time-sliced vGPUs on the same physical GPU. Any combination of A-series, B-series, and Q-series vGPUs with any amount of frame buffer can reside on the same physical GPU simultaneously. The total amount of frame buffer allocated to the vGPUs on a physical GPU must not exceed the amount of frame buffer that the physical GPU has.
For example, the following combinations of vGPUs can reside on the same physical GPU simultaneously:
- A40-2B and A40-2Q
- A40-2Q and A40-4Q
- A40-2B and A40-4Q
By default, a GPU supports only vGPUs with the same amount of frame buffer and, therefore, is in equal-size mode. To support vGPUs with different amounts of frame buffer, the GPU must be put into mixed-size mode. When a GPU is in mixed-size mode, the maximum number of some types of vGPU allowed on a GPU is less than when the GPU is in equal-size mode. For more information, refer to the following topics:
Not all hypervisors and GPUs support a mixture of different types of time-sliced vGPUs on the same physical GPU. To determine if your chosen hypervisor supports this feature with your chosen GPU, consult the release notes for your hypervisor at NVIDIA Virtual GPU Software Documentation.
2.1.5. Guest VM Support
NVIDIA vGPU supports Windows and Linux guest VM operating systems. The supported vGPU types depend on the guest VM OS.
For details of the supported releases of Windows and Linux, and for further information on supported configurations, see the driver release notes for your hypervisor at NVIDIA Virtual GPU Software Documentation.
2.1.5.1. Windows Guest VM Support
Windows guest VMs are supported on all NVIDIA vGPU types, namely: Q-series, B-series, and A-series NVIDIA vGPU types.
2.1.5.2. Linux Guest VM support
Linux guest VMs are supported on all NVIDIA vGPU types, namely: Q-series, B-series, and A-series NVIDIA vGPU types.
2.2. Prerequisites for Using NVIDIA vGPU
Before proceeding, ensure that these prerequisites are met:
- You have a server platform that is capable of hosting your chosen hypervisor and NVIDIA GPUs that support NVIDIA vGPU software.
- One or more NVIDIA GPUs that support NVIDIA vGPU software is installed in your server platform.
-
If you are using GPUs based on the NVIDIA Ampere architecture or later architectures, the following BIOS settings are enabled on your server platform:
- VT-D/IOMMU
- SR-IOV
- Alternative Routing ID Interpretation (ARI)
- You have downloaded the NVIDIA vGPU software package for your chosen hypervisor, which consists of the following software:
- NVIDIA Virtual GPU Manager for your hypervisor
- NVIDIA vGPU software graphics drivers for supported guest operating systems
- The following software is installed according to the instructions in the software vendor's documentation:
- Your chosen hypervisor, for example, XenServer, Red Hat Enterprise Linux KVM, or VMware vSphere Hypervisor (ESXi)
- The software for managing your chosen hypervisor, for example, Citrix XenCenter management GUI, or VMware vCenter Server
- The virtual desktop software that you will use with virtual machines (VMs) running NVIDIA Virtual GPU, for example, Citrix Virtual Apps and Desktops, or Omnissa Horizon
Note:If you are using VMware vSphere Hypervisor (ESXi), ensure that the ESXi host on which you will configure a VM with NVIDIA vGPU is not a member of a fully automated VMware Distributed Resource Scheduler (DRS) cluster. For more information, see Installing and Configuring the NVIDIA Virtual GPU Manager for VMware vSphere.
- A VM to be enabled with one or more virtual GPUs is created.
Note:
If the VM uses UEFI boot and you plan to install a Linux guest OS in the VM, ensure that secure boot is disabled.
- Your chosen guest OS is installed in the VM.
For information about supported hardware and software, and any known issues for this release of NVIDIA vGPU software, refer to the Release Notes for your chosen hypervisor:
- Virtual GPU Software for XenServer Release Notes
- Virtual GPU Software for Microsoft Azure Stack HCI Release Notes
- Virtual GPU Software for Red Hat Enterprise Linux with KVM Release Notes
- Virtual GPU Software for Ubuntu Release Notes
- Virtual GPU Software for VMware vSphere Release Notes
2.3. Switching the Mode of a GPU that Supports Multiple Display Modes
Some GPUs support display-off and display-enabled modes but must be used in NVIDIA vGPU software deployments in display-off mode.
The GPUs listed in the following table support multiple display modes. As shown in the table, some GPUs are supplied from the factory in display-off mode, but other GPUs are supplied in a display-enabled mode.
GPU | Mode as Supplied from the Factory |
---|---|
NVIDIA A40 | Display-off |
NVIDIA L40 | Display-off |
NVIDIA L40S | Display-off |
NVIDIA L20 | Display-off |
NVIDIA L20 liquid cooled | Display-off |
NVIDIA RTX 5000 Ada | Display enabled |
NVIDIA RTX 6000 Ada | Display enabled |
NVIDIA RTX A5000 | Display enabled |
NVIDIA RTX A5500 | Display enabled |
NVIDIA RTX A6000 | Display enabled |
A GPU that is supplied from the factory in display-off mode, such as the NVIDIA A40 GPU, might be in a display-enabled mode if its mode has previously been changed.
To change the mode of a GPU that supports multiple display modes, use the displaymodeselector tool, which you can request from the NVIDIA Display Mode Selector Tool page on the NVIDIA Developer website.
Only the GPUs listed in the table support the displaymodeselector tool. Other GPUs that support NVIDIA vGPU software do not support the displaymodeselector tool and, unless otherwise stated, do not require display mode switching.
2.4. Installing and Configuring the NVIDIA Virtual GPU Manager for XenServer
The following topics step you through the process of setting up a single XenServer VM to use NVIDIA vGPU. After the process is complete, you can install the graphics driver for your guest OS and license any NVIDIA vGPU software licensed products that you are using.
These setup steps assume familiarity with the XenServer skills covered in XenServer Basics.
2.4.1. Installing and Updating the NVIDIA Virtual GPU Manager for XenServer
The NVIDIA Virtual GPU Manager runs in the XenServer dom0 domain. The NVIDIA Virtual GPU Manager for XenServer is supplied as an RPM file and as a Supplemental Pack.
NVIDIA Virtual GPU Manager and guest VM drivers must be compatible. If you update vGPU Manager to a release that is incompatible with the guest VM drivers, guest VMs will boot with vGPU disabled until their guest vGPU driver is updated to a compatible version. Consult Virtual GPU Software for XenServer Release Notes for further details.
2.4.1.1. Installing the RPM package for XenServer
The RPM file must be copied to the XenServer dom0 domain prior to installation (see Copying files to dom0).
- Use the rpm command to install the package:
[root@xenserver ~]# rpm -iv NVIDIA-vGPU-NVIDIA-vGPU-CitrixHypervisor-8.2-550.127.06.x86_64.rpm Preparing packages for installation... NVIDIA-vGPU-NVIDIA-vGPU-CitrixHypervisor-8.2-550.127.06 [root@xenserver ~]#
- Reboot the XenServer platform:
[root@xenserver ~]# shutdown –r now Broadcast message from root (pts/1) (Fri Oct 25 14:24:11 2024): The system is going down for reboot NOW! [root@xenserver ~]#
2.4.1.2. Updating the RPM Package for XenServer
If an existing NVIDIA Virtual GPU Manager is already installed on the system and you want to upgrade, follow these steps:
- Shut down any VMs that are using NVIDIA vGPU.
- Install the new package using the –U option to the rpm command, to upgrade from the previously installed package:
[root@xenserver ~]# rpm -Uv NVIDIA-vGPU-NVIDIA-vGPU-CitrixHypervisor-8.2-550.127.06.x86_64.rpm Preparing packages for installation... NVIDIA-vGPU-NVIDIA-vGPU-CitrixHypervisor-8.2-550.127.06 [root@xenserver ~]#
Note:You can query the version of the current NVIDIA Virtual GPU Manager package using the rpm –q command:
[root@xenserver ~]# rpm –q NVIDIA-vGPU-NVIDIA-vGPU-CitrixHypervisor-8.2-550.127.06 [root@xenserver ~]# If an existing NVIDIA GRID package is already installed and you don’t select the upgrade (-U) option when installing a newer GRID package, the rpm command will return many conflict errors. Preparing packages for installation... file /usr/bin/nvidia-smi from install of NVIDIA-vGPU-NVIDIA-vGPU-CitrixHypervisor-8.2-550.127.06.x86_64 conflicts with file from package NVIDIA-vGPU-xenserver-8.2-550.54.16.x86_64 file /usr/lib/libnvidia-ml.so from install of NVIDIA-vGPU-NVIDIA-vGPU-CitrixHypervisor-8.2-550.127.06.x86_64 conflicts with file from package NVIDIA-vGPU-xenserver-8.2-550.54.16.x86_64 ...
- Reboot the XenServer platform:
[root@xenserver ~]# shutdown –r now Broadcast message from root (pts/1) (Fri Oct 25 14:24:11 2024): The system is going down for reboot NOW! [root@xenserver ~]#
2.4.1.3. Installing or Updating the Supplemental Pack for XenServer
XenCenter can be used to install or update Supplemental Packs on XenServer hosts. The NVIDIA Virtual GPU Manager supplemental pack is provided as an ISO.
- Select Install Update from the Tools menu.
- Click Next after going through the instructions on the Before You Start section.
- Click Select update or supplemental pack from disk on the Select Update section and open NVIDIA’s XenServer Supplemental Pack ISO.
Figure 3. NVIDIA vGPU Manager supplemental pack selected in XenCenter
- Click Next on the Select Update section.
- In the Select Servers section select all the XenServer hosts on which the Supplemental Pack should be installed on and click Next.
- Click Next on the Upload section once the Supplemental Pack has been uploaded to all the XenServer hosts.
- Click Next on the Prechecks section.
- Click Install Update on the Update Mode section.
- Click Finish on the Install Update section.
Figure 4. Successful installation of NVIDIA vGPU Manager supplemental pack
2.4.1.4. Verifying the Installation of the NVIDIA vGPU Software for XenServer Package
After the XenServer platform has rebooted, verify the installation of the NVIDIA vGPU software package for XenServer.
- Verify that the NVIDIA vGPU software package is installed and loaded correctly by checking for the NVIDIA kernel driver in the list of kernel loaded modules.
[root@xenserver ~]# lsmod | grep nvidia nvidia 9522927 0 i2c_core 20294 2 nvidia,i2c_i801 [root@xenserver ~]#
- Verify that the NVIDIA kernel driver can successfully communicate with the NVIDIA physical GPUs in your system by running the nvidia-smi command. The nvidia-smi command is described in more detail in NVIDIA System Management Interface nvidia-smi.
Running the nvidia-smi command should produce a listing of the GPUs in your platform.
[root@xenserver ~]# nvidia-smi
Fri Oct 25 18:46:50 2024
+------------------------------------------------------+
| NVIDIA-SMI 550.127.06 Driver Version: 550.127.06 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:05:00.0 Off | Off |
| N/A 25C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:06:00.0 Off | Off |
| N/A 24C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:86:00.0 Off | Off |
| N/A 25C P8 25W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:87:00.0 Off | Off |
| N/A 28C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[root@xenserver ~]#
If nvidia-smi fails to run or doesn’t produce the expected output for all the NVIDIA GPUs in your system, see Troubleshooting for troubleshooting steps.
2.4.2. Configuring a XenServer VM with Virtual GPU
To support applications and workloads that are compute or graphics intensive, you can add multiple vGPUs to a single VM.
For details about which XenServer versions and NVIDIA vGPUs support the assignment of multiple vGPUs to a VM, see Virtual GPU Software for XenServer Release Notes.
XenServer supports configuration and management of virtual GPUs using XenCenter, or the xe command line tool that is run in a XenServer dom0 shell. Basic configuration using XenCenter is described in the following sections. Command line management using xe is described in XenServer vGPU Management.
If you are using Citrix Hypervisor 8.1 or later and need to assign plugin configuration parameters, create vGPUs using the xe
command as explained in Creating a vGPU Using xe.
- Ensure the VM is powered off.
- Right-click the VM in XenCenter, select Properties to open the VM’s properties, and select the GPU property. The available GPU types are listed in the GPU type drop-down list:
Figure 5. Using Citrix XenCenter to configure a VM with a vGPU
After you have configured a XenServer VM with a vGPU, start the VM, either from XenCenter or by using xe vm-start in a dom0 shell. You can view the VM’s console in XenCenter.
After the VM has booted, install the NVIDIA vGPU software graphics driver as explained in Installing the NVIDIA vGPU Software Graphics Driver.
2.4.3. Setting vGPU Plugin Parameters on XenServer
Plugin parameters for a vGPU control the behavior of the vGPU, such as the frame rate limiter (FRL) configuration in frames per second or whether console virtual network computing (VNC) for the vGPU is enabled. The VM to which the vGPU is assigned is started with these parameters. If parameters are set for multiple vGPUs assigned to the same VM, the VM is started with the parameters assigned to each vGPU.
For each vGPU for which you want to set plugin parameters, perform this task in a command shell in the XenServer dom0 domain.
- Get the UUIDs of all VMs on the hypervisor host and use the output from the command to identify the VM to which the vGPU is assigned.
[root@xenserver ~] xe vm-list ... uuid ( RO) : 7f6c855d-5635-2d57-9fbc-b1200172162f name-label ( RW): RHEL8.3 power-state ( RO): running ...
- Get the UUIDs of all vGPUs on the hypervisor host and from the UUID of the VM to which the vGPU is assigned, determine the UUID of the vGPU.
[root@xenserver ~] xe vgpu-list ... uuid ( RO) : d15083f8-5c59-7474-d0cb-fbc3f7284f1b vm-uuid ( RO): 7f6c855d-5635-2d57-9fbc-b1200172162f device ( RO): 0 gpu-group-uuid ( RO): 3a2fbc36-827d-a078-0b2f-9e869ae6fd93 ...
- Use the xe command to set each vGPU plugin parameter that you want to set.
[root@xenserver ~] xe vgpu-param-set uuid=vgpu-uuid extra_args='parameter=value'
- vgpu-uuid
- The UUID of the vGPU, which you obtained in the previous step.
- parameter
- The name of the vGPU plugin parameter that you want to set.
- value
- The value to which you want to set the vGPU plugin parameter.
This example sets the enable_uvm vGPU plugin parameter to 1 for the vGPU that has the UUID
d15083f8-5c59-7474-d0cb-fbc3f7284f1b
. This parameter setting enables unified memory for the vGPU.[root@xenserver ~] xe vgpu-param-set uuid=d15083f8-5c59-7474-d0cb-fbc3f7284f1b extra_args='enable_uvm=1'
2.5. Installing the Virtual GPU Manager Package for Linux KVM
NVIDIA vGPU software for Linux Kernel-based Virtual Machine (KVM) (Linux KVM) is intended only for use with supported versions of Linux KVM hypervisors. For details about which Linux KVM hypervisor versions are supported, see Virtual GPU Software for Generic Linux with KVM Release Notes.
If you are using Red Hat Enterprise Linux KVM, follow the instructions in Installing and Configuring the NVIDIA Virtual GPU Manager for Red Hat Enterprise Linux KVM.
Before installing the Virtual GPU Manager package for Linux KVM, ensure that the following prerequisites are met:
-
The following packages are installed on the Linux KVM server:
- The
x86_64
build of the GNU Compiler Collection (GCC) - Linux kernel headers
- The
-
The package file is copied to a directory in the file system of the Linux KVM server.
If the Nouveau driver for NVIDIA graphics cards is present, disable it before installing the package.
- Change to the directory on the Linux KVM server that contains the package file.
# cd package-file-directory
- package-file-directory
- The path to the directory that contains the package file.
- Make the package file executable.
# chmod +x package-file-name
- package-file-name
- The name of the file that contains the Virtual GPU Manager package for Linux KVM, for example NVIDIA-Linux-x86_64-390.42-vgpu-kvm.run.
- Run the package file as the root user.
# sudo sh./package-file-name
- Accept the license agreement to continue with the installation.
- When installation has completed, select OK to exit the installer.
- Reboot the Linux KVM server.
# systemctl reboot
2.6. Installing and Configuring the NVIDIA Virtual GPU Manager for Microsoft Azure Stack HCI
Before you begin, ensure that the prerequisites in Prerequisites for Using NVIDIA vGPU are met and the Microsoft Azure Stack HCI host is configured as follows:
- The Microsoft Azure Stack HCI OS is installed as explained in Deploy the Azure Stack HCI operating system on the Microsoft documentation site.
- The following BIOS settings are enabled:
- Virtualization support, for example, Intel Virtualization Technology (VT-D) or AMD Virtualization (AMD-V)
- SR-IOV
- Above 4G Decoding
- For Supermicro servers: ASPM Support
- For servers that have an AMD CPU:
- Alternative Routing ID Interpretation (ARI)
- Access Control Service (ACS)
- Advanced Error Reporting (AER)
Follow this sequence of instructions to set up a single Microsoft Azure Stack HCI VM to use NVIDIA vGPU.
- Installing the NVIDIA Virtual GPU Manager for Microsoft Azure Stack HCI
- Setting the vGPU Series Allowed on a GPU
- Adding a vGPU to a Microsoft Azure Stack HCI VM
These instructions assume familiarity with the Microsoft Windows PowerShell commands covered in Manage VMs on Azure Stack HCI using Windows PowerShell on the Microsoft documentation site.
After the set up is complete, you can install the graphics driver for your guest OS and license any NVIDIA vGPU software licensed products that you are using.
2.6.1. Installing the NVIDIA Virtual GPU Manager for Microsoft Azure Stack HCI
The driver package for the Virtual GPU Manager is distributed as an archive file. You must extract the contents of this archive file to enable the package to be added to the driver store from a setup information file.
Perform this task in a Windows PowerShell window as the Administrator user.
- Download the archive file in which the driver package for the Virtual GPU Manager is distributed.
- Extract the contents of the archive file to a directory that is accessible from the Microsoft Azure Stack HCI host.
- Change to the GridSW-Azure-Stack-HCI directory that you extracted from the archive file.
- Use the PnPUtil tool to add the driver package for the Virtual GPU Manager to the driver store from the nvgridswhci.inf setup information file. In the command for adding the driver package, also set the options to traverse subdirectories for driver packages and reboot the Microsoft Azure Stack HCI host if necessary to complete the operation.
PS C:> pnputil /add-driver nvgridswhci.inf /subdirs /install /reboot
- After the host has rebooted, verify that the NVIDIA Virtual GPU Manager can successfully communicate with the NVIDIA physical GPUs in your system. Run the nvidia-smi command with no arguments for this purpose. Running the nvidia-smi command should produce a listing of the GPUs in your platform.
- Confirm that the Microsoft Azure Stack HCI host has GPU adapters that can be partitioned by listing the GPUs that support GPU-P.
PS C:> Get-VMHostPartitionableGpu
- For each GPU, set the number of partitions that the GPU should support to the maximum number of vGPUs that can be added to the GPU.
PS C:> Set-VMHostPartitionableGpu -Name "gpu-name" -PartitionCount partitions
- gpu-name
- The unique name for referencing the GPU that you obtained in the previous step.
- partitions
-
The maximum number of vGPUs that can be added to the GPU. This number depends on the virtual GPU type. For example, the maximum number of each type of vGPU that can be added to the NVIDIA A16 GPU is as follows:
Virtual GPU Type Maximum vGPUs per GPU A16-16Q
A16-16A
1 A16-8Q
A16-8A
2 A16-4Q
A16-4A
4 A16-2Q
A16-2B
A16-2A
8 A16-1Q
A16-1B
A16-1A
16
2.6.2. Setting the vGPU Series Allowed on a GPU
The Virtual GPU Manager allows virtual GPUs (vGPUs) to be created on a GPU from only one vGPU series. By default, only Q-series vGPUs may be created on a GPU. You can change the vGPU series allowed on a GPU by setting the GridGpupProfileType
value for the GPU in the Windows registry.
This task requires administrator user privileges.
- Use Windows PowerShell to get the driver key of the GPU on which you want to set the allowed vGPU series. You will need this information in the next step to identify the Windows registry key in which information about the GPU is stored.
- Get the
InstanceID
property of the GPU on which you want to set the allowed vGPU series.PS C:\> Get-PnpDevice -PresentOnly | >> Where-Object {$_.InstanceId -like "PCI\VEN_10DE*" } | >> Select-Object -Property FriendlyName,InstanceId | >> Format-List FriendlyName : NVIDIA A100 InstanceId : PCI\VEN_10DE&DEV_2236&SUBSYS_148210DE&REV_A1\6&17F903&0&00400000
- Get the
DEVPKEY_Device_Driver
property of the GPU from theInstanceID
property that you got in the previous step.PS C:\> Get-PnpDeviceProperty -InstanceId "instance-id" | >> where {$_.KeyName -eq "DEVPKEY_Device_Driver"} | >> Select-Object -Property Data Data ---- {4d36e968-e325-11ce-bfc1-08002be10318}\0001
- instance-id
-
The
InstanceID
property of the GPU that you got in the previous step, for example,PCI\VEN_10DE&DEV_2236&SUBSYS_148210DE&REV_A1\6&17F903&0&00400000
.
- Get the
- Set the
GridGpupProfileType
DWord (REG_DWORD
) registry value in the Windows registry keyHKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Class\driver-key
.- driver-key
-
The driver key for the GPU that you got in the previous step, for example,
{4d36e968-e325-11ce-bfc1-08002be10318}\0001
.
The value to set depends on the vGPU series that you want to be allowed on the GPU.
vGPU Series Value Q-series 1 A-series 2 B-series 3
2.6.3. Adding a vGPU to a Microsoft Azure Stack HCI VM
You add a vGPU to a Microsoft Azure Stack HCI VM by adding a GPU-P adapter to a VM.
Perform this task in a Windows PowerShell window as the Administrator user.
- Set the variable
$vm
to the name of the virtual machine to which you are adding a vGPU.PS C:> $vm = "vm-name"
- vm-name
- The name of the virtual machine to which you are adding a vGPU.
- Allow the VM to control cache types for MMIO access.
PS C:> Set-VM -GuestControlledCacheTypes $true -VMName $vm
- Set the lower MMIO space to 1 GB to allow sufficient MMIO space to be mapped.
PS C:> Set-VM -LowMemoryMappedIoSpace 1Gb -VMName $vm
This amount is twice the amount that the device must allow for alignment. Lower MMIO space is the address space below 4 GB and is required for any device that has 32-bit BAR memory.
- Set the upper MMIO space to 32 GB to allow sufficient MMIO space to be mapped.
PS C:> Set-VM –HighMemoryMappedIoSpace 32GB –VMName $vm
This amount is twice the amount that the device must allow for alignment. Upper MMIO space is the address space above approximately 64 GB.
- Confirm that the Microsoft Azure Stack HCI host has a GPU that supports the GPU-P adapter that you want to create.
PS C:> get-VMHostPartitionableGpu
- Add a GPU-P adapter to the VM.
PS C:> Add-VMGpuPartitionAdapter –VMName $vm ` –MinPartitionVRAM min-ram ` -MaxPartitionVRAM max-ram ` -OptimalPartitionVRAM opt-ram ` -MinPartitionEncode min-enc ` -MaxPartitionEncode max-enc ` -OptimalPartitionEncode opt-enc ` -MinPartitionDecode min-dec ` -MaxPartitionDecode max-dec ` -OptimalPartitionDecode opt-dec ` -MinPartitionCompute min-compute ` -MaxPartitionCompute max-compute ` -OptimalPartitionCompute opt-compute
Note:Because partitions are resolved only when the VM is started, this command cannot validate that the Microsoft Azure Stack HCI host has a GPU that supports the GPU-P adapter that you want to create. The values that you specify must be within the maximum and minimum values that were listed in the previous step.
- List the adapters assigned to the VM to confirm that the GPU-P adapter has been added to the VM.
PS C:> Get-VMGpuPartitionAdapter –VMName $vm
This command also returns the adapter ID to use for reconfiguring or deleting a GPU partition.
- Connect to and start the VM.
2.6.4. Uninstalling the NVIDIA Virtual GPU Manager for Microsoft Azure Stack HCI
If you no longer require the Virtual GPU Manager on your Microsoft Azure Stack HCI server, you can uninstall the driver package for the Virtual GPU Manager.
Perform this task in a Windows PowerShell window as the Administrator user.
- Determine the published name of the driver package for the Virtual GPU Manager by enumerating all third-party driver packages in the driver store.
PS C:> pnputil /enum-drivers
Microsoft PnP Utility ... Published name : oem5.inf Driver package provider : NVIDIA Class : Display adapters Driver date and version : 01/01/2023 31.0.15.2807 Signer name : Microsoft Windows Hardware Compatibility Publisher ...
- Delete and uninstall the driver package for the Virtual GPU Manager.
PS C:> pnputil /delete-driver vgpu-manager-package-published-name /uninstall /reboot
- vgpu-manager-package-published-name
- The published name of the driver package for the Virtual GPU Manager that you obtained in the previous step, for example, oem5.inf.
This example deletes and uninstalls the driver package for which the published name is oem5.inf.
PS C:> pnputil.exe /delete-driver oem5.inf /uninstall /reboot Microsoft PnP Utility Driver package uninstalled. Driver package deleted successfully.
If necessary, the Microsoft Azure Stack HCI server is rebooted.
2.7. Installing and Configuring the NVIDIA Virtual GPU Manager for Red Hat Enterprise Linux KVM
The following topics step you through the process of setting up a single Red Hat Enterprise Linux Kernel-based Virtual Machine (KVM) VM to use NVIDIA vGPU.
Output from the VM console is not available for VMs that are running vGPU. Make sure that you have installed an alternate means of accessing the VM (such as a VNC server) before you configure vGPU.
Follow this sequence of instructions:
- Installing the Virtual GPU Manager Package for Red Hat Enterprise Linux KVM
- Verifying the Installation of the NVIDIA vGPU Software for Red Hat Enterprise Linux KVM
- vGPUs that support SR-IOV only: Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor
- Optional: Putting a GPU Into Mixed-Size Mode
- Getting the BDF and Domain of a GPU on a Linux with KVM Hypervisor
- Creating an NVIDIA vGPU on a Linux with KVM Hypervisor
- Adding One or More vGPUs to a Linux with KVM Hypervisor VM
- Optional: Placing a vGPU on a Physical GPU in Mixed-Size Mode on a Linux with KVM Hypervisor
- Setting vGPU Plugin Parameters on a Linux with KVM Hypervisor
After the process is complete, you can install the graphics driver for your guest OS and license any NVIDIA vGPU software licensed products that you are using.
If you are using a generic Linux KVM hypervisor, follow the instructions in Installing the Virtual GPU Manager Package for Linux KVM.
2.7.1. Installing the Virtual GPU Manager Package for Red Hat Enterprise Linux KVM
The NVIDIA Virtual GPU Manager for Red Hat Enterprise Linux KVM is provided as a .rpm file.
NVIDIA Virtual GPU Manager and guest VM drivers must be compatible. If you update vGPU Manager to a release that is incompatible with the guest VM drivers, guest VMs will boot with vGPU disabled until their guest vGPU driver is updated to a compatible version. Consult Virtual GPU Software for Red Hat Enterprise Linux with KVM Release Notes for further details.
Before installing the RPM package for Red Hat Enterprise Linux KVM, ensure that the sshd service on the Red Hat Enterprise Linux KVM server is configured to permit root login. If the Nouveau driver for NVIDIA graphics cards is present, disable it before installing the package. For instructions, see How to disable the Nouveau driver and install the Nvidia driver in RHEL 7 (Red Hat subscription required).
Some versions of Red Hat Enterprise Linux KVM have z-stream updates that break Kernel Application Binary Interface (kABI) compatibility with the previous kernel or the GA kernel. For these versions of Red Hat Enterprise Linux KVM, the following Virtual GPU Manager RPM packages are supplied:
- A package for the GA Linux KVM kernel
- A package for the updated z-stream kernel
To differentiate these packages, the name of each RPM package includes the kernel version. Ensure that you install the RPM package that is compatible with your Linux KVM kernel version.
- Securely copy the RPM file from the system where you downloaded the file to the Red Hat Enterprise Linux KVM server.
- From a Windows system, use a secure copy client such as WinSCP.
- From a Linux system, use the scp command.
- Use secure shell (SSH) to log in as root to the Red Hat Enterprise Linux KVM server.
# ssh root@kvm-server
- kvm-server
- The host name or IP address of the Red Hat Enterprise Linux KVM server.
- Change to the directory on the Red Hat Enterprise Linux KVM server to which you copied the RPM file.
# cd rpm-file-directory
- rpm-file-directory
- The path to the directory to which you copied the RPM file.
- Use the rpm command to install the package.
# rpm -iv NVIDIA-vGPU-rhel-8.9-550.127.06.x86_64.rpm Preparing packages for installation... NVIDIA-vGPU-rhel-8.9-550.127.06 #
- Reboot the Red Hat Enterprise Linux KVM server.
# systemctl reboot
2.7.2. Verifying the Installation of the NVIDIA vGPU Software for Red Hat Enterprise Linux KVM
After the Red Hat Enterprise Linux KVM server has rebooted, verify the installation of the NVIDIA vGPU software package for Red Hat Enterprise Linux KVM.
- Verify that the NVIDIA vGPU software package is installed and loaded correctly by checking for the VFIO drivers in the list of kernel loaded modules.
# lsmod | grep vfio nvidia_vgpu_vfio 27099 0 nvidia 12316924 1 nvidia_vgpu_vfio vfio_mdev 12841 0 mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio vfio_iommu_type1 22342 0 vfio 32331 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 #
- Verify that the libvirtd service is active and running.
# service libvirtd status
- Verify that the NVIDIA kernel driver can successfully communicate with the NVIDIA physical GPUs in your system by running the nvidia-smi command. The nvidia-smi command is described in more detail in NVIDIA System Management Interface nvidia-smi.
Running the nvidia-smi command should produce a listing of the GPUs in your platform.
# nvidia-smi
Fri Oct 25 18:46:50 2024
+------------------------------------------------------+
| NVIDIA-SMI 550.127.06 Driver Version: 550.127.06 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:85:00.0 Off | Off |
| N/A 23C P8 23W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 0000:86:00.0 Off | Off |
| N/A 29C P8 23W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P40 On | 0000:87:00.0 Off | Off |
| N/A 21C P8 18W / 250W | 53MiB / 24575MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
#
If nvidia-smi fails to run or doesn’t produce the expected output for all the NVIDIA GPUs in your system, see Troubleshooting for troubleshooting steps.
2.8. Installing and Configuring the NVIDIA Virtual GPU Manager for Ubuntu
Follow this sequence of instructions to set up a single Ubuntu VM to use NVIDIA vGPU.
- Installing the NVIDIA Virtual GPU Manager for Ubuntu
- Getting the BDF and Domain of a GPU on a Linux with KVM Hypervisor
- vGPUs that support SR-IOV only: Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor
- Optional: Putting a GPU Into Mixed-Size Mode
- Creating an NVIDIA vGPU on a Linux with KVM Hypervisor
- Adding One or More vGPUs to a Linux with KVM Hypervisor VM
- Optional: Placing a vGPU on a Physical GPU in Mixed-Size Mode on a Linux with KVM Hypervisor
- Setting vGPU Plugin Parameters on a Linux with KVM Hypervisor
Output from the VM console is not available for VMs that are running vGPU. Make sure that you have installed an alternate means of accessing the VM (such as a VNC server) before you configure vGPU.
After the process is complete, you can install the graphics driver for your guest OS and license any NVIDIA vGPU software licensed products that you are using.
2.8.1. Installing the NVIDIA Virtual GPU Manager for Ubuntu
The NVIDIA Virtual GPU Manager for Ubuntu is provided as a Debian package (.deb) file.
NVIDIA Virtual GPU Manager and guest VM drivers must be compatible. If you update vGPU Manager to a release that is incompatible with the guest VM drivers, guest VMs will boot with vGPU disabled until their guest vGPU driver is updated to a compatible version. Consult Virtual GPU Software for Ubuntu Release Notes for further details.
2.8.1.1. Installing the Virtual GPU Manager Package for Ubuntu
Before installing the Debian package for Ubuntu, ensure that the sshd service on the Ubuntu server is configured to permit root login. If the Nouveau driver for NVIDIA graphics cards is present, disable it before installing the package.
- Securely copy the Debian package file from the system where you downloaded the file to the Ubuntu server.
- From a Windows system, use a secure copy client such as WinSCP.
- From a Linux system, use the scp command.
- Use secure shell (SSH) to log in as root to the Ubuntu server.
# ssh root@ubuntu-server
- ubuntu-server
- The host name or IP address of the Ubuntu server.
- Change to the directory on the Ubuntu server to which you copied the Debian package file.
# cd deb-file-directory
- deb-file-directory
- The path to the directory to which you copied the Debian package file.
- Use the apt command to install the package.
# apt install ./nvidia-vgpu-ubuntu-550.127.06_amd64.deb
- Reboot the Ubuntu server.
# systemctl reboot
2.8.1.2. Verifying the Installation of the NVIDIA vGPU Software for Ubuntu
After the Ubuntu server has rebooted, verify the installation of the NVIDIA vGPU software package for Ubuntu.
- Verify that the NVIDIA vGPU software package is installed and loaded correctly by checking for the VFIO drivers in the list of kernel loaded modules.
# lsmod | grep vfio nvidia_vgpu_vfio 27099 0 nvidia 12316924 1 nvidia_vgpu_vfio vfio_mdev 12841 0 mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio vfio_iommu_type1 22342 0 vfio 32331 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 #
- Verify that the libvirtd service is active and running.
# service libvirtd status
- Verify that the NVIDIA kernel driver can successfully communicate with the NVIDIA physical GPUs in your system by running the nvidia-smi command. The nvidia-smi command is described in more detail in NVIDIA System Management Interface nvidia-smi.
Running the nvidia-smi command should produce a listing of the GPUs in your platform.
# nvidia-smi
Fri Oct 25 18:46:50 2024
+------------------------------------------------------+
| NVIDIA-SMI 550.127.06 Driver Version: 550.127.06 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:85:00.0 Off | Off |
| N/A 23C P8 23W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 0000:86:00.0 Off | Off |
| N/A 29C P8 23W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P40 On | 0000:87:00.0 Off | Off |
| N/A 21C P8 18W / 250W | 53MiB / 24575MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
#
If nvidia-smi fails to run or doesn’t produce the expected output for all the NVIDIA GPUs in your system, see Troubleshooting for troubleshooting steps.
2.9. Installing and Configuring the NVIDIA Virtual GPU Manager for VMware vSphere
You can use the NVIDIA Virtual GPU Manager for VMware vSphere to set up a VMware vSphere VM to use NVIDIA vGPU.
Some servers, for example, the Dell R740, do not configure SR-IOV capability if the SR-IOV SBIOS setting is disabled on the server. If you are using the Tesla T4 GPU with VMware vSphere on such a server, you must ensure that the SR-IOV SBIOS setting is enabled on the server.
However, with any server hardware, do not enable SR-IOV in VMware vCenter Server for the Tesla T4 GPU. If SR-IOV is enabled in VMware vCenter Server for T4, VMware vCenter Server lists the status of the GPU as needing a reboot. You can ignore this status message.
NVIDIA vGPU Instructions
The Xorg service is not required for graphics devices in NVIDIA vGPU mode. For more information, see Installing and Updating the NVIDIA Virtual GPU Manager for VMware vSphere.
To set up a VMware vSphere VM to use NVIDIA vGPU, follow this sequence of instructions:
- Installing and Updating the NVIDIA Virtual GPU Manager for VMware vSphere
- Configuring VMware vMotion with vGPU for VMware vSphere
- Changing the Default Graphics Type in VMware vSphere
- Configuring a vSphere VM with NVIDIA vGPU
- Optional: Setting vGPU Plugin Parameters on VMware vSphere
After configuring a vSphere VM to use NVIDIA vGPU, you can install the NVIDIA vGPU software graphics driver for your guest OS and license any NVIDIA vGPU software licensed products that you are using.
Requirements for Configuring NVIDIA vGPU in a DRS Cluster
You can configure a VM with NVIDIA vGPU on an ESXi host in a VMware Distributed Resource Scheduler (DRS) cluster. However, to ensure that the automation level of the cluster supports VMs configured with NVIDIA vGPU, you must set the automation level to Partially Automated or Manual.
For more information about these settings, see Edit Cluster Settings in the VMware documentation.
2.9.1. Installing and Updating the NVIDIA Virtual GPU Manager for VMware vSphere
The NVIDIA Virtual GPU Manager runs on the ESXi host. It is distributed as a number of software components in a ZIP archive.
The NVIDIA Virtual GPU Manager software components are as follows:
- A software component for the NVIDIA vGPU hypervisor host driver
- A software component for the NVIDIA GPU Management daemon
You can install these software components in one of the following ways:
- By copying the software components to the ESXi host and then installing them as explained in Installing the NVIDIA Virtual GPU Manager on VMware vSphere
- By importing the software components manually as explained in Import Patches Manually in the VMware vSphere documentation
NVIDIA Virtual GPU Manager and guest VM drivers must be compatible. If you update vGPU Manager to a release that is incompatible with the guest VM drivers, guest VMs will boot with vGPU disabled until their guest vGPU driver is updated to a compatible version. Consult Virtual GPU Software for VMware vSphere Release Notes for further details.
2.9.1.1. Installing the NVIDIA Virtual GPU Manager on VMware vSphere
To install the NVIDIA Virtual GPU Manager you need to access the ESXi host via the ESXi Shell or SSH. Refer to VMware’s documentation on how to enable ESXi Shell or SSH for an ESXi host.
Before you begin, ensure that the following prerequisites are met:
- The ZIP archive that contains NVIDIA vGPU software has been downloaded from the NVIDIA Licensing Portal.
- The software components for the NVIDIA Virtual GPU Manager have been extracted from the downloaded ZIP archive.
- Copy the NVIDIA Virtual GPU Manager component files to the ESXi host.
- Put the ESXi host into maintenance mode.
$ esxcli system maintenanceMode set –-enable true
- Install the NVIDIA vGPU hypervisor host driver and the NVIDIA GPU Management daemon from their software component files.
- Run the esxcli command to install the NVIDIA vGPU hypervisor host driver from its software component file.
$ esxcli software vib install -d /vmfs/volumes/datastore/host-driver-component.zip
- Run the esxcli command to install the NVIDIA GPU Management daemon from its software component file.
$ esxcli software vib install -d /vmfs/volumes/datastore/gpu-management-daemon-component.zip
- datastore
- The name of the VMFS datastore to which you copied the software components.
- host-driver-component
- The name of the file that contains the NVIDIA vGPU hypervisor host driver in the form of a software component. Ensure that you specify the file that was extracted from the downloaded ZIP archive. For example, for VMware vSphere 7.0.2, host-driver-component is NVD-VMware-x86_64-550.127.06-1OEM.702.0.0.17630552-bundle-build-number.
- gpu-management-daemon-component
- The name of the file that contains the NVIDIA GPU Management daemon in the form of a software component. Ensure that you specify the file that was extracted from the downloaded ZIP archive. For example, for VMware vSphere 7.0.2, gpu-management-daemon-component is VMW-esx-7.0.2-nvd-gpu-mgmt-daemon-1.0-0.0.0001.
- Run the esxcli command to install the NVIDIA vGPU hypervisor host driver from its software component file.
- Exit maintenance mode.
$ esxcli system maintenanceMode set –-enable false
- Reboot the ESXi host.
$ reboot
2.9.1.2. Updating the NVIDIA Virtual GPU Manager for VMware vSphere
Update the NVIDIA Virtual GPU Manager if you want to install a new version of NVIDIA Virtual GPU Manager on a system where an existing version is already installed.
To update the vGPU Manager VIB you need to access the ESXi host via the ESXi Shell or SSH. Refer to VMware’s documentation on how to enable ESXi Shell or SSH for an ESXi host.
Before proceeding with the vGPU Manager update, make sure that all VMs are powered off and the ESXi host is placed in maintenance mode. Refer to VMware’s documentation on how to place an ESXi host in maintenance mode
- Stop the NVIDIA GPU Management Daemon.
$ /etc/init.d/nvdGpuMgmtDaemon stop
- Update the NVIDIA vGPU hypervisor host driver and the NVIDIA GPU Management daemon.
- Run the esxcli command to update the NVIDIA vGPU hypervisor host driver from its software component file.
$ esxcli software vib update -d /vmfs/volumes/datastore/host-driver-component.zip
- Run the esxcli command to update the NVIDIA GPU Management daemon from its software component file.
$ esxcli software vib update -d /vmfs/volumes/datastore/gpu-management-daemon-component.zip
- datastore
- The name of the VMFS datastore to which you copied the software components.
- host-driver-component
- The name of the file that contains the NVIDIA vGPU hypervisor host driver in the form of a software component. Ensure that you specify the file that was extracted from the downloaded ZIP archive. For example, for VMware vSphere 7.0.2, host-driver-component is NVD-VMware-x86_64-550.127.06-1OEM.702.0.0.17630552-bundle-build-number.
- gpu-management-daemon-component
- The name of the file that contains the NVIDIA GPU Management daemon in the form of a software component. Ensure that you specify the file that was extracted from the downloaded ZIP archive. For example, for VMware vSphere 7.0.2, gpu-management-daemon-component is VMW-esx-7.0.2-nvd-gpu-mgmt-daemon-1.0-0.0.0001.
- Run the esxcli command to update the NVIDIA vGPU hypervisor host driver from its software component file.
- Reboot the ESXi host and remove it from maintenance mode.
2.9.1.3. Verifying the Installation of the NVIDIA vGPU Software Package for vSphere
After the ESXi host has rebooted, verify the installation of the NVIDIA vGPU software package for vSphere.
- Verify that the NVIDIA vGPU software package installed and loaded correctly by checking for the NVIDIA kernel driver in the list of kernel loaded modules.
[root@esxi:~] vmkload_mod -l | grep nvidia nvidia 5 8420
- If the NVIDIA driver is not listed in the output, check dmesg for any load-time errors reported by the driver.
- Verify that the NVIDIA GPU Management daemon has started.
$ /etc/init.d/nvdGpuMgmtDaemon status
- Verify that the NVIDIA kernel driver can successfully communicate with the NVIDIA physical GPUs in your system by running the nvidia-smi command. The nvidia-smi command is described in more detail in NVIDIA System Management Interface nvidia-smi.
Running the nvidia-smi command should produce a listing of the GPUs in your platform.
[root@esxi:~] nvidia-smi
Fri Oct 25 17:56:22 2024
+------------------------------------------------------+
| NVIDIA-SMI 550.127.06 Driver Version: 550.127.06 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:05:00.0 Off | Off |
| N/A 25C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:06:00.0 Off | Off |
| N/A 24C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:86:00.0 Off | Off |
| N/A 25C P8 25W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:87:00.0 Off | Off |
| N/A 28C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
If nvidia-smi fails to report the expected output for all the NVIDIA GPUs in your system, see Troubleshooting for troubleshooting steps.
2.9.1.4. Managing the NVIDIA GPU Management Daemon for VMware vSphere
The NVIDIA GPU Management Daemon for VMware vSphere is a service that is controlled through scripts in the /etc/init.d directory. You can use these scripts to start the daemon, stop the daemon, and get its status.
- To start the NVIDIA GPU Management Daemon, enter the following command:
$ /etc/init.d/nvdGpuMgmtDaemon start
- To stop the NVIDIA GPU Management Daemon, enter the following command:
$ /etc/init.d/nvdGpuMgmtDaemon stop
- To get the status of the NVIDIA GPU Management Daemon, enter the following command:
$ /etc/init.d/nvdGpuMgmtDaemon status
2.9.2. Configuring VMware vMotion with vGPU for VMware vSphere
NVIDIA vGPU software supports vGPU migration, which includes VMware vMotion and suspend-resume, for VMs that are configured with vGPU. To enable VMware vMotion with vGPU, an advanced vCenter Server setting must be enabled. However, suspend-resume for VMs that are configured with vGPU is enabled by default.
For details about which VMware vSphere versions, NVIDIA GPUs, and guest OS releases support vGPU migration, see Virtual GPU Software for VMware vSphere Release Notes.
Before configuring VMware vMotion with vGPU for an ESXi host, ensure that the current NVIDIA Virtual GPU Manager for VMware vSphere package is installed on the host.
- Log in to vCenter Server by using the vSphere Web Client.
- In the Hosts and Clusters view, select the vCenter Server instance.
Note:
Ensure that you select the vCenter Server instance, not the vCenter Server VM.
- Click the Configure tab.
- In the Settings section, select Advanced Settings and click Edit.
- In the Edit Advanced vCenter Server Settings window that opens, type vGPU in the search field.
- When the vgpu.hotmigrate.enabled setting appears, set the Enabled option and click OK.
2.9.3. Changing the Default Graphics Type in VMware vSphere
After the vGPU Manager VIB for VMware vSphere VIB is installed, the default graphics type is Shared. To enable vGPU support for VMs in VMware vSphere, you must change the default graphics type to Shared Direct.
If you do not change the default graphics type, VMs to which a vGPU is assigned fail to start and the following error message is displayed:
The amount of graphics resource available in the parent resource pool is insufficient for the operation.
Change the default graphics type before configuring vGPU. Output from the VM console in the VMware vSphere Web Client is not available for VMs that are running vGPU.
Before changing the default graphics type, ensure that the ESXi host is running and that all VMs on the host are powered off.
- Log in to vCenter Server by using the vSphere Web Client.
- In the navigation tree, select your ESXi host and click the Configure tab.
- From the menu, choose Graphics and then click the Host Graphics tab.
- On the Host Graphics tab, click Edit.
Figure 6. Shared default graphics type
- In the Edit Host Graphics Settings dialog box that opens, select Shared Direct and click OK.
Figure 7. Host graphics settings for vGPU
Note:In this dialog box, you can also change the allocation scheme for vGPU-enabled VMs. For more information, see Modifying GPU Allocation Policy on VMware vSphere.
After you click OK, the default graphics type changes to Shared Direct.
- Click the Graphics Devices tab to verify the configured type of each physical GPU on which you want to configure vGPU. The configured type of each physical GPU must be Shared Direct. For any physical GPU for which the configured type is Shared, change the configured type as follows:
- On the Graphics Devices tab, select the physical GPU and click the Edit icon.
Figure 8. Shared graphics type
- In the Edit Graphics Device Settings dialog box that opens, select Shared Direct and click OK.
Figure 9. Graphics device settings for a physical GPU
- On the Graphics Devices tab, select the physical GPU and click the Edit icon.
- Restart the ESXi host or stop and restart the Xorg service if necessary and nv-hostengine on the ESXi host.
To stop and restart the Xorg service and nv-hostengine, perform these steps:
- VMware vSphere releases before 7.0 Update 1 only: Stop the Xorg service.
The Xorg service is not required for graphics devices in NVIDIA vGPU mode.
- Stop nv-hostengine.
[root@esxi:~] nv-hostengine -t
- Wait for 1 second to allow nv-hostengine to stop.
- Start nv-hostengine.
[root@esxi:~] nv-hostengine -d
- VMware vSphere releases before 7.0 Update 1 only: Start the Xorg service.
The Xorg service is not required for graphics devices in NVIDIA vGPU mode.
[root@esxi:~] /etc/init.d/xorg start
- VMware vSphere releases before 7.0 Update 1 only: Stop the Xorg service.
- In the Graphics Devices tab of the VMware vCenter Web UI, confirm that the active type and the configured type of each physical GPU are Shared Direct.
Figure 10. Shared direct graphics type
After changing the default graphics type, configure vGPU as explained in Configuring a vSphere VM with NVIDIA vGPU.
See also the following topics in the VMware vSphere documentation:
2.9.4. Configuring a vSphere VM with NVIDIA vGPU
To support applications and workloads that are compute or graphics intensive, you can add multiple vGPUs to a single VM.
For details about which VMware vSphere versions and NVIDIA vGPUs support the assignment of multiple vGPUs to a VM, see Virtual GPU Software for VMware vSphere Release Notes.
Output from the VM console in the VMware vSphere Web Client is not available for VMs that are running vGPU. Make sure that you have installed an alternate means of accessing the VM (such as Omnissa Horizon or a VNC server) before you configure vGPU.
VM console in vSphere Web Client will become active again once the vGPU parameters are removed from the VM’s configuration.
How to configure a vSphere VM with a vGPU depends on your VMware vSphere version as explained in the following topics:
After you have configured a vSphere VM with a vGPU, start the VM. VM console in vSphere Web Client is not supported in this vGPU release. Therefore, use Omnissa Horizon or VNC to access the VM’s desktop.
After the VM has booted, install the NVIDIA vGPU software graphics driver as explained in Installing the NVIDIA vGPU Software Graphics Driver.
2.9.4.1. Configuring a vSphere 8 VM with NVIDIA vGPU
- Open the vCenter Web UI.
- In the vCenter Web UI, right-click the VM and choose Edit Settings.
- In the Edit Settings window that opens, configure the vGPUs that you want to add to the VM. Add each vGPU that you want to add to the VM as follows:
- From the ADD NEW DEVICE menu, choose PCI Device.
Figure 11. Command for Adding a PCI Device
- In the Device Selection window that opens, select the type of vGPU you want to configure and click SELECT.
Note:
NVIDIA vGPU software does not support vCS on VMware vSphere. Therefore, C-series vGPU types are not available for selection in the Device Selection window.
Figure 12. VM Device Selections for vGPU
- From the ADD NEW DEVICE menu, choose PCI Device.
- Back in the Edit Settings window, click OK.
2.9.4.2. Configuring a vSphere 7 VM with NVIDIA vGPU
If you are adding multiple vGPUs to a single VM, perform this task for each vGPU that you want to add to the VM.
- Open the vCenter Web UI.
- In the vCenter Web UI, right-click the VM and choose Edit Settings.
- Click the Virtual Hardware tab.
- In the New device list, select Shared PCI Device and click Add. The PCI device field should be auto-populated with NVIDIA GRID vGPU.
Figure 13. VM settings for vGPU
- From the GPU Profile drop-down menu, choose the type of vGPU you want to configure and click OK.
Note:
NVIDIA vGPU software does not support vCS on VMware vSphere. Therefore, C-series vGPU types are not available for selection from the GPU Profile drop-down menu.
- Ensure that VMs running vGPU have all their memory reserved:
- Select Edit virtual machine settings from the vCenter Web UI.
- Expand the Memory section and click Reserve all guest memory (All locked).
2.9.5. Setting vGPU Plugin Parameters on VMware vSphere
Plugin parameters for a vGPU control the behavior of the vGPU, such as the frame rate limiter (FRL) configuration in frames per second or whether console virtual network computing (VNC) for the vGPU is enabled. The VM to which the vGPU is assigned is started with these parameters. If parameters are set for multiple vGPUs assigned to the same VM, the VM is started with the parameters assigned to each vGPU.
Ensure that the VM to which the vGPU is assigned is powered off.
For each vGPU for which you want to set plugin parameters, perform this task in the vSphere Client. vGPU plugin parameters are PCI pass through configuration parameters in advanced VM attributes.
- In the vSphere Client, browse to the VM to which the vGPU is assigned.
- Context-click the VM and choose Edit Settings.
- In the Edit Settings window, click the VM Options tab.
- From the Advanced drop-down list, select Edit Configuration.
- In the Configuration Parameters dialog box, click Add Row.
- In the Name field, type the parameter name pciPassthruvgpu-id.cfg.parameter, in the Value field type the parameter value, and click OK.
- vgpu-id
-
A positive integer that identifies the vGPU assigned to a VM. For the first vGPU assigned to a VM, vgpu-id is 0. For example, if two vGPUs are assigned to a VM and you are setting a plugin parameter for both vGPUs, set the following parameters:
- pciPassthru0.cfg.parameter
- pciPassthru1.cfg.parameter
- parameter
- The name of the vGPU plugin parameter that you want to set. For example, the name of the vGPU plugin parameter for enabling unified memory is enable_uvm.
To enable unified memory for two vGPUs that are assigned to a VM, set pciPassthru0.cfg.enable_uvm and pciPassthru1.cfg.enable_uvm to 1.
2.10. Configuring the vGPU Manager for a Linux with KVM Hypervisor
NVIDIA vGPU software supports the following Linux with KVM hypervisors: Red Hat Enterprise Linux with KVM and Ubuntu.
If you're configuring an NVIDIA vGPU that requires a large BAR address space on a UEFI VM, refer to NVIDIA vGPU software graphics driver fails to load on KVM-based hypervsiors for a workaround to ensure that BAR resources are mapped into the VM.
This workaround involves setting an experimental QEMU parameter.
2.10.1. Getting the BDF and Domain of a GPU on a Linux with KVM Hypervisor
Sometimes when configuring a physical GPU for use with NVIDIA vGPU software, you must find out which directory in the sysfs file system represents the GPU. This directory is identified by the domain, bus, slot, and function of the GPU.
For more information about the directory in the sysfs file system that represents a physical GPU, see NVIDIA vGPU Information in the sysfs File System.
- Obtain the PCI device bus/device/function (BDF) of the physical GPU.
# lspci | grep NVIDIA
The NVIDIA GPUs listed in this example have the PCI device BDFs
06:00.0
and07:00.0
.# lspci | grep NVIDIA 06:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M10] (rev a1) 07:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M10] (rev a1)
- Obtain the full identifier of the GPU from its PCI device BDF.
# virsh nodedev-list --cap pci| grep transformed-bdf
- transformed-bdf
-
The PCI device BDF of the GPU with the colon and the period replaced with underscores, for example,
06_00_0
.
This example obtains the full identifier of the GPU with the PCI device BDF
06:00.0
.# virsh nodedev-list --cap pci| grep 06_00_0 pci_0000_06_00_0
- Obtain the domain, bus, slot, and function of the GPU from the full identifier of the GPU.
virsh nodedev-dumpxml full-identifier| egrep 'domain|bus|slot|function'
- full-identifier
-
The full identifier of the GPU that you obtained in the previous step, for example,
pci_0000_06_00_0
.
This example obtains the domain, bus, slot, and function of the GPU with the PCI device BDF
06:00.0
.# virsh nodedev-dumpxml pci_0000_06_00_0| egrep 'domain|bus|slot|function' <domain>0x0000</domain> <bus>0x06</bus> <slot>0x00</slot> <function>0x0</function> <address domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
2.10.2. Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor
An NVIDIA vGPU that supports SR-IOV resides on a physical GPU that supports SR-IOV, such as a GPU based on the NVIDIA Ampere architecture. Before creating an NVIDIA vGPU on a GPUthat supports SR-IOV, you must enable the virtual functions of the GPU and obtain the domain, bus, slot, and function of the specific virtual function on which you want to create the vGPU.
Before performing this task, ensure that the GPU is not being used by any other processes, such as CUDA applications, monitoring applications, or the nvidia-smi command.
- Enable the virtual functions for the physical GPU in the sysfs file system.
Note:
The virtual functions for the physical GPU in the sysfs file system are disabled after the hypervisor host is rebooted or if the driver is reloaded or upgraded.
Use only the custom script sriov-manage provided by NVIDIA vGPU software for this purpose. Do not try to enable the virtual function for the GPU by any other means.
# /usr/lib/nvidia/sriov-manage -e domain:bus:slot.function
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without the
0x
prefix.
Note:Only one
mdev
device file can be created on a virtual function.This example enables the virtual functions for the GPU with the domain
00
, bus41
, slot0000
, and function0
.# /usr/lib/nvidia/sriov-manage -e 00:41:0000.0
- Obtain the domain, bus, slot, and function of the available virtual functions on the GPU.
# ls -l /sys/bus/pci/devices/domain\:bus\:slot.function/ | grep virtfn
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without the
0x
prefix.
This example shows the output of this command for a physical GPU with slot
00
, bus41
, domain0000
, and function0
.# ls -l /sys/bus/pci/devices/0000:41:00.0/ | grep virtfn lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn0 -> ../0000:41:00.4 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn1 -> ../0000:41:00.5 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn10 -> ../0000:41:01.6 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn11 -> ../0000:41:01.7 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn12 -> ../0000:41:02.0 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn13 -> ../0000:41:02.1 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn14 -> ../0000:41:02.2 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn15 -> ../0000:41:02.3 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn16 -> ../0000:41:02.4 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn17 -> ../0000:41:02.5 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn18 -> ../0000:41:02.6 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn19 -> ../0000:41:02.7 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn2 -> ../0000:41:00.6 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn20 -> ../0000:41:03.0 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn21 -> ../0000:41:03.1 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn22 -> ../0000:41:03.2 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn23 -> ../0000:41:03.3 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn24 -> ../0000:41:03.4 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn25 -> ../0000:41:03.5 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn26 -> ../0000:41:03.6 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn27 -> ../0000:41:03.7 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn28 -> ../0000:41:04.0 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn29 -> ../0000:41:04.1 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn3 -> ../0000:41:00.7 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn30 -> ../0000:41:04.2 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn31 -> ../0000:41:04.3 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn4 -> ../0000:41:01.0 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn5 -> ../0000:41:01.1 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn6 -> ../0000:41:01.2 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn7 -> ../0000:41:01.3 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn8 -> ../0000:41:01.4 lrwxrwxrwx. 1 root root 0 Jul 16 04:42 virtfn9 -> ../0000:41:01.5
- Choose the available virtual function on which you want to create the vGPU and note its domain, bus, slot, and function.
Since 17.4: On systems configured with NVLink, the sriov-manage script might not be able to enable the virtual functions for the physical GPU because the initialization of the Virtual GPU Manager has not completed. In this situation, the sriov-manage script writes the following error message to the log file on the hypervisor host:
NVRM: Timeout occurred in event processing by vgpu_mgr daemon
If this error message appears in the log file, try again later to run the sriov-manage script to enable the virtual functions for the physical GPU.
2.10.3. Creating an NVIDIA vGPU on a Linux with KVM Hypervisor
For each vGPU that you want to create, perform this task in a Linux command shell on the a Linux with KVM hypervisor host.
Before you begin, ensure that you have the domain, bus, slot, and function of the GPU on which you are creating the vGPU. For instructions, see Getting the BDF and Domain of a GPU on a Linux with KVM Hypervisor.
How to create an NVIDIA vGPU on a Linux with KVM hypervisor depends on the following factors:
- Whether the NVIDIA vGPU supports single root I/O virtualization (SR-IOV)
- Whether the hypervisor uses a vendor-specific Virtual Function I/O (VFIO) framework for an NVIDIA vGPU that supports SR-IOV
Note:
A hypervisor that uses a vendor-specific VFIO framework uses it only for an NVIDIA vGPU that supports SR-IOV. The hypervisor still uses the mediated VFIO
mdev
driver framework for a legacy NVIDIA vGPU.A vendor-specific VFIO framework does not support the mediated VFIO
mdev
driver framework.
For GPUs that support SR-IOV, use of a vendor-specific VFIO framework is introduced in Ubuntu release 24.04.
To determine which instructions to follow for the NVIDIA vGPU that you are creating, refer to the following table.
NVIDIA vGPU Type | VFIO Framework | Instructions |
---|---|---|
Legacy: SR-IOV not supported | mdev |
Creating a Legacy NVIDIA vGPU on a Linux with KVM Hypervisor |
SR-IOV supported | mdev |
Creating an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor |
SR-IOV supported | Vendor specific | Creating an NVIDIA vGPU on a Linux with KVM Hypervisor that Uses a Vendor-Specific VFIO Framework |
Since 17.4: On systems configured with NVLink, the vGPU might not be created because the initialization of the Virtual GPU Manager has not completed. In this situation, the following error message is written to the log file on the hypervisor host:
NVRM: kvgpumgrCreateRequestVgpu: GPU is not initialized yet
If this error message appears in the log file, try again later to create the vGPU.
2.10.3.1. Creating a Legacy NVIDIA vGPU on a Linux with KVM Hypervisor
A legacy NVIDIA vGPU does not support SR-IOV.
- Change to the mdev_supported_types directory for the physical GPU.
# cd /sys/class/mdev_bus/domain\:bus\:slot.function/mdev_supported_types/
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without the
0x
prefix.
This example changes to the mdev_supported_types directory for the GPU with the domain
0000
and PCI device BDF06:00.0
.# cd /sys/bus/pci/devices/0000\:06\:00.0/mdev_supported_types/
- Find out which subdirectory of mdev_supported_types contains registration information for the vGPU type that you want to create.
# grep -l "vgpu-type" nvidia-*/name
- vgpu-type
-
The vGPU type, for example,
M10-2Q
.
This example shows that the registration information for the M10-2Q vGPU type is contained in the nvidia-41 subdirectory of mdev_supported_types.
# grep -l "M10-2Q" nvidia-*/name nvidia-41/name
- Confirm that you can create an instance of the vGPU type on the physical GPU.
# cat subdirectory/available_instances
- subdirectory
-
The subdirectory that you found in the previous step, for example,
nvidia-41
.
The number of available instances must be at least 1. If the number is 0, either an instance of another vGPU type already exists on the physical GPU, or the maximum number of allowed instances has already been created.
This example shows that four more instances of the M10-2Q vGPU type can be created on the physical GPU.
# cat nvidia-41/available_instances 4
- Generate a correctly formatted universally unique identifier (UUID) for the vGPU.
# uuidgen aa618089-8b16-4d01-a136-25a0f3c73123
- Write the UUID that you obtained in the previous step to the create file in the registration information directory for the vGPU type that you want to create.
# echo "uuid"> subdirectory/create
- uuid
- The UUID that you generated in the previous step, which will become the UUID of the vGPU that you want to create.
- subdirectory
-
The registration information directory for the vGPU type that you want to create, for example,
nvidia-41
.
This example creates an instance of the M10-2Q vGPU type with the UUID
aa618089-8b16-4d01-a136-25a0f3c73123
.# echo "aa618089-8b16-4d01-a136-25a0f3c73123" > nvidia-41/create
An
mdev
device file for the vGPU is added to the parent physical device directory of the vGPU. The vGPU is identified by its UUID.The /sys/bus/mdev/devices/ directory contains a symbolic link to the
mdev
device file. - Make the
mdev
device file that you created to represent the vGPU persistent.# mdevctl define --auto --uuid uuid
- uuid
- The UUID that you specified in the previous step for the vGPU that you are creating.
Note:Not all Linux with KVM hypervisor releases include the mdevctl command. If your release does not include the mdevctl command, you can use standard features of the operating system to automate the re-creation of this device file when the host is booted. For example, you can write a custom script that is executed when the host is rebooted.
- Confirm that the vGPU was created.
- Confirm that the /sys/bus/mdev/devices/ directory contains the
mdev
device file for the vGPU.# ls -l /sys/bus/mdev/devices/ total 0 lrwxrwxrwx. 1 root root 0 Nov 24 13:33 aa618089-8b16-4d01-a136-25a0f3c73123 -> ../../../devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:09.0/0000:06:00.0/aa618089-8b16-4d01-a136-25a0f3c73123
- If your release includes the mdevctl command, list the active mediated devices on the hypervisor host.
# mdevctl list aa618089-8b16-4d01-a136-25a0f3c73123 0000:06:00.0 nvidia-41
- Confirm that the /sys/bus/mdev/devices/ directory contains the
2.10.3.2. Creating an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor
An NVIDIA vGPU that supports SR-IOV resides on a physical GPU that supports SR-IOV, such as a GPU based on the NVIDIA Ampere architecture.
Before performing this task, ensure that the virtual function on which you want to create the vGPU has been prepared as explained in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
If you want to support vGPUs with different amounts of frame buffer, also ensure that the GPU has been put into mixed-size mode as explained in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
- Change to the mdev_supported_types directory for the virtual function on which you want to create the vGPU.
# cd /sys/class/mdev_bus/domain\:bus\:vf-slot.v-function/mdev_supported_types/
- domain
- bus
-
The domain and bus of the GPU, without the
0x
prefix. - vf-slot
- v-function
- The slot and function of the virtual function that you noted in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
This example changes to the mdev_supported_types directory for the first virtual function (
virtfn0
) for the GPU with the domain0000
and bus41
. The first virtual function (virtfn0
) has slot00
and function4
.# cd /sys/class/mdev_bus/0000\:41\:00.4/mdev_supported_types
- Find out which subdirectory of mdev_supported_types contains registration information for the vGPU type that you want to create.
# grep -l "vgpu-type" nvidia-*/name
- vgpu-type
-
The vGPU type, for example,
A40-2Q
.
This example shows that the registration information for the A40-2Q vGPU type is contained in the nvidia-558 subdirectory of mdev_supported_types.
# grep -l "A40-2Q" nvidia-*/name nvidia-558/name
- Confirm that you can create an instance of the vGPU type on the virtual function.
# cat subdirectory/available_instances
- subdirectory
-
The subdirectory that you found in the previous step, for example,
nvidia-558
.
The number of available instances must be 1. If the number is 0, a vGPU has already been created on the virtual function. Only one instance of any vGPU type can be created on a virtual function.
This example shows that an instance of the A40-2Q vGPU type can be created on the virtual function.
# cat nvidia-558/available_instances 1
- Generate a correctly formatted universally unique identifier (UUID) for the vGPU.
# uuidgen aa618089-8b16-4d01-a136-25a0f3c73123
- Write the UUID that you obtained in the previous step to the create file in the registration information directory for the vGPU type that you want to create.
# echo "uuid"> subdirectory/create
- uuid
- The UUID that you generated in the previous step, which will become the UUID of the vGPU that you want to create.
- subdirectory
-
The registration information directory for the vGPU type that you want to create, for example,
nvidia-558
.
This example creates an instance of the A40-2Q vGPU type with the UUID
aa618089-8b16-4d01-a136-25a0f3c73123
.# echo "aa618089-8b16-4d01-a136-25a0f3c73123" > nvidia-558/create
An
mdev
device file for the vGPU is added to the parent virtual function directory of the vGPU. The vGPU is identified by its UUID. - Time-sliced vGPUs only: Make the
mdev
device file that you created to represent the vGPU persistent.# mdevctl define --auto --uuid uuid
- uuid
- The UUID that you specified in the previous step for the vGPU that you are creating.
Note:- If you are using a GPU that supports SR-IOV, the
mdev
device file persists after a host reboot only if you enable the virtual functions for the GPU as explained in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor before rebooting any VM that is configured with a vGPU on the GPU. - You cannot use the mdevctl command to make the
mdev
device file for a MIG-backed vGPU persistent. Themdev
device file for a MIG-backed vGPU is not retained after the host is rebooted because MIG instances are no longer available. - Not all Linux with KVM hypervisor releases include the mdevctl command. If your release does not include the mdevctl command, you can use standard features of the operating system to automate the re-creation of this device file when the host is booted. For example, you can write a custom script that is executed when the host is rebooted.
- Confirm that the vGPU was created.
- Confirm that the /sys/bus/mdev/devices/ directory contains a symbolic link to the
mdev
device file.# ls -l /sys/bus/mdev/devices/ total 0 lrwxrwxrwx. 1 root root 0 Jul 16 05:57 aa618089-8b16-4d01-a136-25a0f3c73123 -> ../../../devices/pci0000:40/0000:40:01.1/0000:41:00.4/aa618089-8b16-4d01-a136-25a0f3c73123
- If your release includes the mdevctl command, list the active mediated devices on the hypervisor host.
# mdevctl list aa618089-8b16-4d01-a136-25a0f3c73123 0000:06:00.0 nvidia-558
- Confirm that the /sys/bus/mdev/devices/ directory contains a symbolic link to the
2.10.3.3. Creating an NVIDIA vGPU on a Linux with KVM Hypervisor that Uses a Vendor-Specific VFIO Framework
A hypervisor uses a vendor-specific VFIO framework only for an NVIDIA vGPU that supports SR-IOV. For a legacy NVIDIA vGPU, the hypervisor uses the standard VFIO framework. A vendor-specific VFIO framework does not support the mediated VFIO mdev
driver framework.
For GPUs that support SR-IOV, use of a vendor-specific VFIO framework is introduced in Ubuntu release 24.04.
Before performing this task, ensure that the virtual function on which you want to create the vGPU has been prepared as explained in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
If you want to support vGPUs with different amounts of frame buffer, also ensure that the GPU has been put into mixed-size mode as explained in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
- Change to the directory in the sysfs file system that contains the files for vGPU management on the virtual function on which you want to create the vGPU.
# cd /sys/bus/pci/devices/domain\:bus\:vf-slot.v-function/nvidia
- domain
- bus
-
The domain and bus of the GPU, without the
0x
prefix. - vf-slot
- v-function
- The slot and function of the virtual function that you noted in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
This example changes to the nvidia directory for the first virtual function (
virtfn0
) for the GPU with the domain0000
and bus3d
. The first virtual function (virtfn0
) has slot00
and function4
.# cd /sys/bus/pci/devices/0000\:3d\:00.4/nvidia
- Confirm that the directory contains the files for vGPU management on the virtual function, namely creatable_vgpu_types and current_vgpu_type.
# ll -r--r--r-- 1 root root 4096 Aug 3 00:39 creatable_vgpu_types -rw-r--r-- 1 root root 4096 Aug 3 00:39 current_vgpu_type ...
- Confirm that a vGPU has not already been created on the virtual function.
# cat current_vgpu_type 0
If the current vGPU type is 0, a vGPU has not already been created on the virtual function.
Note:If the current vGPU type is not 0, a vGPU cannot be created on the virtual function because a vGPU has already been created on it and only one vGPU can be created on a virtual function.
- Determine the NVIDIA vGPU types that can be created on the virtual function and the integer ID that represents each vGPU type in the sysfs file system.
# cat creatable_vgpu_types NVIDIA A40-1Q 557 NVIDIA A40-2Q 558 NVIDIA A40-3Q 559 NVIDIA A40-4Q 560 NVIDIA A40-6Q 561
- Write the ID that represents the type of the NVIDIA vGPU that you want to create to the current_vgpu_type file.
# echo vgpu-type-id > current_vgpu_type
- vgpu-type-id
- The ID that represents the type of the NVIDIA vGPU that you want to create in the sysfs file system.
Note:You must specify an valid ID. If you specify an invalid ID, the write operation fails and current vGPU type is set to
0
.This example creates an instance of the A40-4Q vGPU type.
# echo 560 > current_vgpu_type
- Confirm that current vGPU type on the virtual function matches the type of the vGPU that you created in the previous step.
# cat current_vgpu_type 560
- Confirm that the creatable_vgpu_types file is empty, signifying that no vGPUs can be created on the virtual function.
# cat creatable_vgpu_types
To reconfigure the vGPU on a virtual function, the existing vGPU must first be deleted as explained in Deleting a vGPU on a Linux with KVM Hypervisor that Uses a Vendor-Specific VFIO Framework.
2.10.4. Adding One or More vGPUs to a Linux with KVM Hypervisor VM
To support applications and workloads that are compute or graphics intensive, you can add multiple vGPUs to a single VM.
For details about which hypervisor versions and NVIDIA vGPUs support the assignment of multiple vGPUs to a VM, see Virtual GPU Software for Red Hat Enterprise Linux with KVM Release Notes and Virtual GPU Software for Ubuntu Release Notes.
Ensure that the following prerequisites are met:
- The VM to which you want to add the vGPUs is shut down.
- The vGPUs that you want to add have been created as explained in Creating an NVIDIA vGPU on a Linux with KVM Hypervisor.
You can add vGPUs to a Linux with KVM hypervisor VM by using any of the following tools:
- The virsh command
- The QEMU command line
After adding vGPUs to a Linux with KVM hypervisor VM, start the VM.
# virsh start vm-name
- vm-name
- The name of the VM that you added the vGPUs to.
After the VM has booted, install the NVIDIA vGPU software graphics driver as explained in Installing the NVIDIA vGPU Software Graphics Driver.
2.10.4.1. Adding One or More vGPUs to a Linux with KVM Hypervisor VM by Using virsh
- In virsh, open for editing the XML file of the VM that you want to add the vGPU to.
# virsh edit vm-name
- vm-name
- The name of the VM to that you want to add the vGPUs to.
- For each vGPU that you want to add to the VM, add a device entry in the form of an
address
element inside thesource
element to add the vGPU to the guest VM.The content of the device entry depends on whether the hypervisor uses a vendor-specific VFIO framework for an NVIDIA vGPU that supports SR-IOV.
For GPUs that support SR-IOV, use of a vendor-specific VFIO framework is introduced in Ubuntu release 24.04.
-
For a hypervisor that uses the
mdev
VFIO framework, add a device entry that identifies the vGPU through its UUID as follows:<device> ... <hostdev mode='subsystem' type='mdev' model='vfio-pci'> <source> <address uuid='uuid'/> </source> </hostdev> </device>
- uuid
- The UUID that was assigned to the vGPU when the vGPU was created.
This example adds a device entry for the vGPU with the UUID
a618089-8b16-4d01-a136-25a0f3c73123
.<device> ... <hostdev mode='subsystem' type='mdev' model='vfio-pci'> <source> <address uuid='a618089-8b16-4d01-a136-25a0f3c73123'/> </source> </hostdev> </device>
This example adds device entries for two vGPUs with the following UUIDs:
-
c73f1fa6-489e-4834-9476-d70dabd98c40
-
3b356d38-854e-48be-b376-00c72c7d119c
<device> ... <hostdev mode='subsystem' type='mdev' model='vfio-pci'> <source> <address uuid='c73f1fa6-489e-4834-9476-d70dabd98c40'/> </source> </hostdev> <hostdev mode='subsystem' type='mdev' model='vfio-pci'> <source> <address uuid='3b356d38-854e-48be-b376-00c72c7d119c'/> </source> </hostdev> </device>
-
For a hypervisor that uses a vendor-specific VFIO framework, add a device entry that identifies the vGPU through the virtual function on which the vGPU is created as follows:
<hostdev mode='subsystem' type='pci' managed='no'> <source> <address domain='domain' bus='bus' slot='vf-slot' function='v-function'/> </source> </hostdev>
- domain
- bus
-
The domain and bus of the GPU, including the
0x
prefix. - vf-slot
- v-function
- The slot and function of the virtual function that you noted in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
Note:A vGPU is supported only in unmanaged
libvirt
mode. Therefore, ensure that in thehostdev
element, themanaged
attribute is set tono
.This example adds a device entry for the vGPU that is created on the virtual function
0000:3d:00.4
.<device> ... <hostdev mode='subsystem' type='pci' managed='no'> <source> <address domain='0x0000' bus='0x3d' slot='0x00' function='0x4'/> </source> </hostdev> </device>
This example adds device entries for two vGPUs that are created on the following virtual functions:
-
0000:3d:00.4
-
0000:3d:00.5
<device> ... <hostdev mode='subsystem' type='pci' managed='no'> <source> <address domain='0x0000' bus='0x3d' slot='0x00' function='0x4'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='no'> <source> <address domain='0x0000' bus='0x3d' slot='0x00' function='0x5'/> </source> </hostdev> </device>
-
- Optional: Add a
video
element that contains amodel
element in which thetype
attribute is set tonone
.<video> <model type='none'/> </video>
Adding this
video
element prevents the default video device thatlibvirt
adds from being loaded into the VM. If you don't add thisvideo
element, you must configure the Xorg server or your remoting solution to load only the vGPU devices you added and not the default video device.
2.10.4.2. Adding One or More vGPUs to a Linux with KVM Hypervisor VM by Using the QEMU Command Line
This task involves adding options to the QEMU command line that identify the vGPUs that you want to add and the VM to which you want to add them.
- For each vGPU that you want to add to the VM, add one -device option that identifies the vGPU.
The format of each -device option depends on whether the hypervisor uses a vendor-specific VFIO framework for an NVIDIA vGPU that supports SR-IOV.
For GPUs that support SR-IOV, use of a vendor-specific VFIO framework is introduced in Ubuntu release 24.04.
-
For each vGPU on a hypervisor that uses the
mdev
VFIO framework, add a -device option that identifies the vGPU through its UUID.-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/vgpu-uuid
- vgpu-uuid
- The UUID that was assigned to the vGPU when the vGPU was created.
-
For each vGPU on a hypervisor that uses a vendor-specific VFIO framework, add a -device option that identifies the vGPU through the virtual function on which the vGPU is created.
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/domain\:bus\:vf-slot.v-function/
- domain
- bus
-
The domain and bus of the GPU, without the
0x
prefix. - vf-slot
- v-function
- The slot and function of the virtual function that you noted in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
-
- Add a -uuid option to specify the VM to which you want to add the vGPUs.
-uuid vm-uuid
- vm-uuid
- The UUID that was assigned to the VM when the VM was created.
Adding One vGPU to a VM on a Hypervisor that Uses the mdev
VFIO Framework
This example adds the vGPU with the UUID aa618089-8b16-4d01-a136-25a0f3c73123
to the VM with the UUID ebb10a6e-7ac9-49aa-af92-f56bb8c65893
.
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/aa618089-8b16-4d01-a136-25a0f3c73123 \
-uuid ebb10a6e-7ac9-49aa-af92-f56bb8c65893
Adding Two vGPUs to a VM on a Hypervisor that Uses the mdev
VFIO Framework
This example adds device entries for two vGPUs with the following UUIDs:
-
676428a0-2445-499f-9bfd-65cd4a9bd18f
-
6c5954b8-5bc1-4769-b820-8099fe50aaba
The entries are added to the VM with the UUID ec5e8ee0-657c-4db6-8775-da70e332c67e
.
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/676428a0-2445-499f-9bfd-65cd4a9bd18f \
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/6c5954b8-5bc1-4769-b820-8099fe50aaba \
-uuid ec5e8ee0-657c-4db6-8775-da70e332c67e
Adding One vGPU to a VM on a Hypervisor that Uses a Vendor-Specific VFIO Framework
This example adds the vGPU that is created on the virtual function 0000:3d:00.4
to the VM with the UUID ebb10a6e-7ac9-49aa-af92-f56bb8c65893
.
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000\:3d\:00.4 \
-uuid ebb10a6e-7ac9-49aa-af92-f56bb8c65893
Adding Two vGPUs to a VM on a Hypervisor that Uses a Vendor-Specific VFIO Framework
This example adds device entries for two vGPUs that are created on the following virtual functions:
-
0000:3d:00.4
-
0000:3d:00.5
The entries are added to the VM with the UUID ec5e8ee0-657c-4db6-8775-da70e332c67e
.
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000\:3d\:00.4 \
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000\:3d\:00.5 \
-uuid ec5e8ee0-657c-4db6-8775-da70e332c67e
2.10.5. Setting vGPU Plugin Parameters on a Linux with KVM Hypervisor
Plugin parameters for a vGPU control the behavior of the vGPU, such as the frame rate limiter (FRL) configuration in frames per second or whether console virtual network computing (VNC) for the vGPU is enabled. The VM to which the vGPU is assigned is started with these parameters. If parameters are set for multiple vGPUs assigned to the same VM, the VM is started with the parameters assigned to each vGPU.
For each vGPU for which you want to set plugin parameters, perform this task in a Linux command shell on the Linux with KVM hypervisor host.
- Change to the directory in the sysfs file system that contains the vgpu_params file for the vGPU for which you want to set vGPU plugin parameters.
The directory depends on whether the hypervisor uses a vendor-specific VFIO framework for an NVIDIA vGPU that supports SR-IOV.
For GPUs that support SR-IOV, use of a vendor-specific VFIO framework is introduced in Ubuntu release 24.04.
-
For a hypervisor that uses the
mdev
VFIO framework, change to the nvidia subdirectory of themdev
device directory that represents the vGPU.# cd /sys/bus/mdev/devices/uuid/nvidia
- uuid
-
The UUID of the vGPU, for example,
aa618089-8b16-4d01-a136-25a0f3c73123
.
-
For a hypervisor that uses a vendor-specific VFIO framework, change to the directory in the sysfs file system that contains the files for vGPU management on the virtual function on which the vGPU was created.
# cd /sys/bus/pci/devices/domain\:bus\:vf-slot.v-function/nvidia
- domain
- bus
-
The domain and bus of the GPU, without the
0x
prefix. - vf-slot
- v-function
- The slot and function of the virtual function that you noted in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
This example changes to the nvidia directory for the first virtual function (
virtfn0
) for the GPU with the domain0000
and bus3d
. The first virtual function (virtfn0
) has slot00
and function4
.# cd /sys/bus/pci/devices/0000\:3d\:00.4/nvidia
-
- Write the plugin parameters that you want to set to the vgpu_params file in the directory that you changed to in the previous step.
# echo "plugin-config-params" > vgpu_params
- plugin-config-params
- A comma-separated list of parameter-value pairs, where each pair is of the form parameter-name=value.
This example disables frame rate limiting and console VNC for a vGPU.
# echo "frame_rate_limiter=0, disable_vnc=1" > vgpu_params
This example enables unified memory for a vGPU.
# echo "enable_uvm=1" > vgpu_params
This example enables NVIDIA CUDA Toolkit debuggers for a vGPU.
# echo "enable_debugging=1" > vgpu_params
This example enables NVIDIA CUDA Toolkit profilers for a vGPU.
# echo "enable_profiling=1" > vgpu_params
To clear any vGPU plugin parameters that were set previously, write a space to the vgpu_params file for the vGPU.
# echo " " > vgpu_params
2.10.6. Deleting a vGPU on a Linux with KVM Hypervisor
How to delete a vGPU on a Linux with KVM hypervisor depends on whether the hypervisor uses a vendor-specific VFIO framework for an NVIDIA vGPU that supports SR-IOV.
A hypervisor that uses a vendor-specific VFIO framework uses it only for an NVIDIA vGPU that supports SR-IOV. The hypervisor still uses the mediated VFIO mdev
driver framework for a legacy NVIDIA vGPU.
For GPUs that support SR-IOV, use of a vendor-specific VFIO framework is introduced in Ubuntu release 24.04.
To determine which instructions to follow for the NVIDIA vGPU that you are deleting, refer to the following table.
NVIDIA vGPU Type | VFIO Framework | Instructions |
---|---|---|
Legacy: SR-IOV not supported | mdev |
Deleting a vGPU on a Linux with KVM Hypervisor that Uses the mdev VFIO Framework |
SR-IOV supported | mdev |
|
SR-IOV supported | Vendor specific | Deleting a vGPU on a Linux with KVM Hypervisor that Uses a Vendor-Specific VFIO Framework |
2.10.6.1. Deleting a vGPU on a Linux with KVM Hypervisor that Uses the mdev
VFIO Framework
For each vGPU that you want to delete, perform this task in a Linux command shell on the Linux with KVM hypervisor host.
Before you begin, ensure that the following prerequisites are met:
- You have the domain, bus, slot, and function of the GPU where the vGPU that you want to delete resides. For instructions, see Getting the BDF and Domain of a GPU on a Linux with KVM Hypervisor.
- The VM to which the vGPU is assigned is shut down.
- Change to the mdev_supported_types directory for the physical GPU.
# cd /sys/class/mdev_bus/domain\:bus\:slot.function/mdev_supported_types/
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without the
0x
prefix.
This example changes to the mdev_supported_types directory for the GPU with the PCI device BDF
06:00.0
.# cd /sys/bus/pci/devices/0000\:06\:00.0/mdev_supported_types/
- Change to the subdirectory of mdev_supported_types that contains registration information for the vGPU.
# cd `find . -type d -name uuid`
- uuid
-
The UUID of the vGPU, for example,
aa618089-8b16-4d01-a136-25a0f3c73123
.
- Write the value
1
to the remove file in the registration information directory for the vGPU that you want to delete.# echo "1" > remove
2.10.6.2. Deleting a vGPU on a Linux with KVM Hypervisor that Uses a Vendor-Specific VFIO Framework
A hypervisor uses a vendor-specific VFIO framework only for an NVIDIA vGPU that supports SR-IOV. For a legacy NVIDIA vGPU, the hypervisor uses the mdev
VFIO framework. A vendor-specific VFIO framework does not support the mediated VFIO mdev
driver framework.
For GPUs that support SR-IOV, use of a vendor-specific VFIO framework is introduced in Ubuntu release 24.04.
Before you begin, ensure that the following prerequisites are met:
- You have the following information:
- The domain and bus of the GPU where the vGPU that you want to delete resides. For instructions, see Getting the BDF and Domain of a GPU on a Linux with KVM Hypervisor.
- The slot and function of the virtual function on which the vGPU that you want to delete was created.
- The VM to which the vGPU is assigned is shut down.
- Change to the directory in the sysfs file system that contains the files for vGPU management on the virtual function on which the vGPU was created.
# cd /sys/bus/pci/devices/domain\:bus\:vf-slot.v-function/nvidia
- domain
- bus
-
The domain and bus of the GPU, without the
0x
prefix. - vf-slot
- v-function
- The slot and function of the virtual function that you noted in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
This example changes to the nvidia directory for the first virtual function (
virtfn0
) for the GPU with the domain0000
and bus3d
. The first virtual function (virtfn0
) has slot00
and function4
.# cd /sys/bus/pci/devices/0000\:3d\:00.4/nvidia
- Confirm that the directory contains the files for vGPU management on the virtual function, namely creatable_vgpu_types and current_vgpu_type.
# ll -r--r--r-- 1 root root 4096 Aug 3 00:39 creatable_vgpu_types -rw-r--r-- 1 root root 4096 Aug 3 00:39 current_vgpu_type ...
- Confirm that the current vGPU type on the virtual function is the ID that represents the type of the vGPU that you want to delete.
# cat current_vgpu_type 560
- Write
0
to the current_vgpu_type file.# echo 0 > current_vgpu_type
- Confirm that current vGPU type on the virtual function is
0
, signifying that the vGPU has been deleted.# cat current_vgpu_type 0
- Confirm that the creatable_vgpu_types file is no longer empty, signifying that the vGPU has been deleted and that a vGPU can again be created on the virtual function.
# cat creatable_vgpu_types NVIDIA A40-1Q 557 NVIDIA A40-2Q 558 NVIDIA A40-3Q 559 NVIDIA A40-4Q 560 NVIDIA A40-6Q 561
2.10.7. Preparing a GPU Configured for Pass-Through for Use with vGPU
The mode in which a physical GPU is being used determines the Linux kernel module to which the GPU is bound. If you want to switch the mode in which a GPU is being used, you must unbind the GPU from its current kernel module and bind it to the kernel module for the new mode. After binding the GPU to the correct kernel module, you can then configure it for vGPU.
A physical GPU that is passed through to a VM is bound to the vfio-pci
kernel module. A physical GPU that is bound to the vfio-pci
kernel module can be used only for pass-through. To enable the GPU to be used for vGPU, the GPU must be unbound from vfio-pci
kernel module and bound to the nvidia
kernel module.
Before you begin, ensure that you have the domain, bus, slot, and function of the GPU that you are preparing for use with vGPU. For instructions, see Getting the BDF and Domain of a GPU on a Linux with KVM Hypervisor.
- Determine the kernel module to which the GPU is bound by running the lspci command with the -k option on the NVIDIA GPUs on your host.
# lspci -d 10de: -k
The
Kernel driver in use:
field indicates the kernel module to which the GPU is bound.The following example shows that the NVIDIA Tesla M60 GPU with BDF
06:00.0
is bound to thevfio-pci
kernel module and is being used for GPU pass through.06:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) Subsystem: NVIDIA Corporation Device 115e Kernel driver in use: vfio-pci
- Unbind the GPU from
vfio-pci
kernel module.- Change to the sysfs directory that represents the
vfio-pci
kernel module.# cd /sys/bus/pci/drivers/vfio-pci
- Write the domain, bus, slot, and function of the GPU to the unbind file in this directory.
# echo domain:bus:slot.function > unbind
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without a
0x
prefix.
This example writes the domain, bus, slot, and function of the GPU with the domain
0000
and PCI device BDF06:00.0
.# echo 0000:06:00.0 > unbind
- Change to the sysfs directory that represents the
- Bind the GPU to the
nvidia
kernel module.- Change to the sysfs directory that contains the PCI device information for the physical GPU.
# cd /sys/bus/pci/devices/domain\:bus\:slot.function
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without a
0x
prefix.
This example changes to the sysfs directory that contains the PCI device information for the GPU with the domain
0000
and PCI device BDF06:00.0
.# cd /sys/bus/pci/devices/0000\:06\:00.0
- Write the kernel module name
nvidia
to the driver_override file in this directory.# echo nvidia > driver_override
- Change to the sysfs directory that represents the
nvidia
kernel module.# cd /sys/bus/pci/drivers/nvidia
- Write the domain, bus, slot, and function of the GPU to the bind file in this directory.
# echo domain:bus:slot.function > bind
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without a
0x
prefix.
This example writes the domain, bus, slot, and function of the GPU with the domain
0000
and PCI device BDF06:00.0
.# echo 0000:06:00.0 > bind
- Change to the sysfs directory that contains the PCI device information for the physical GPU.
You can now configure the GPU with vGPU as explained in Installing and Configuring the NVIDIA Virtual GPU Manager for Red Hat Enterprise Linux KVM.
2.10.8. NVIDIA vGPU Information in the sysfs File System
Information about the NVIDIA vGPU types supported by each physical GPU in a Linux with KVM hypervisor host is stored in the sysfs file system.
How NVIDIA vGPU information is stored in the sysfs
file system depends on whether the hypervisor uses a vendor-specific VFIO framework for an NVIDIA vGPU that supports SR-IOV.
A hypervisor that uses a vendor-specific VFIO framework for an NVIDIA vGPU that supports SR-IOV uses the mdev
VFIO framework for a legacy NVIDIA vGPU.
For GPUs that support SR-IOV, use of a vendor-specific VFIO framework is introduced in Ubuntu release 24.04.
For more detailed information about how NVIDIA vGPU information is stored in the sysfs
file system, refer to the following topics:
- NVIDIA vGPU Information in the sysfs File System for Hypervisors that Use the mdev VFIO Framework
- NVIDIA vGPU Information in the sysfs File System for Hypervisors that Use a Vendor-Specific VFIO Framework
2.10.8.1. NVIDIA vGPU Information in the sysfs File System for Hypervisors that Use the mdev
VFIO Framework
All physical GPUs on the host are registered with the mdev
kernel module. Information about the physical GPUs and the vGPU types that can be created on each physical GPU is stored in directories and files under the /sys/class/mdev_bus/ directory.
The sysfs directory for each physical GPU is at the following locations:
- /sys/bus/pci/devices/
- /sys/class/mdev_bus/
Both directories are a symbolic link to the real directory for PCI devices in the sysfs file system.
The organization of the sysfs directory for each physical GPU is as follows:
/sys/class/mdev_bus/
|-parent-physical-device
|-mdev_supported_types
|-nvidia-vgputype-id
|-available_instances
|-create
|-description
|-device_api
|-devices
|-name
- parent-physical-device
-
Each physical GPU on the host is represented by a subdirectory of the /sys/class/mdev_bus/ directory.
The name of each subdirectory is as follows:
domain\:bus\:slot.function
domain, bus, slot, function are the domain, bus, slot, and function of the GPU, for example,
0000\:06\:00.0
.Each directory is a symbolic link to the real directory for PCI devices in the sysfs file system. For example:
# ll /sys/class/mdev_bus/ total 0 lrwxrwxrwx. 1 root root 0 Dec 12 03:20 0000:05:00.0 -> ../../devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:08.0/0000:05:00.0 lrwxrwxrwx. 1 root root 0 Dec 12 03:20 0000:06:00.0 -> ../../devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:09.0/0000:06:00.0 lrwxrwxrwx. 1 root root 0 Dec 12 03:20 0000:07:00.0 -> ../../devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:10.0/0000:07:00.0 lrwxrwxrwx. 1 root root 0 Dec 12 03:20 0000:08:00.0 -> ../../devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:11.0/0000:08:00.0
- mdev_supported_types
-
A directory named mdev_supported_types is required under the sysfs directory for each physical GPU that will be configured with NVIDIA vGPU. How this directory is created for a GPU depends on whether the GPU supports SR-IOV.
- For a GPU that does not support SR-IOV, this directory is created automatically after the Virtual GPU Manager is installed on the host and the host has been rebooted.
- For a GPU that supports SR-IOV, such as a GPU based on the NVIDIA Ampere architecture, you must create this directory by enabling the virtual function for the GPU as explained in Creating an NVIDIA vGPU on a Linux with KVM Hypervisor. The mdev_supported_types directory itself is never visible on the physical function.
The mdev_supported_types directory contains a subdirectory for each vGPU type that the physical GPU supports. The name of each subdirectory is nvidia-vgputype-id, where vgputype-id is an unsigned integer serial number. For example:
# ll mdev_supported_types/ total 0 drwxr-xr-x 3 root root 0 Dec 6 01:37 nvidia-35 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-36 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-37 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-38 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-39 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-40 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-41 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-42 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-43 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-44 drwxr-xr-x 3 root root 0 Dec 5 10:43 nvidia-45
- nvidia-vgputype-id
-
Each directory represents an individual vGPU type and contains the following files and directories:
- available_instances
-
This file contains the number of instances of this vGPU type that can still be created. This file is updated any time a vGPU of this type is created on or removed from the physical GPU.
Note:
When a time-sliced vGPU is created, the content of the available_instances for all other time-sliced vGPU types on the physical GPU is set to 0. This behavior enforces the requirement that all time-sliced vGPUs on a physical GPU must be of the same type. However, this requirement does not apply to MIG-backed vGPUs. Therefore, when a MIG-backed vGPU is created, available_instances for all other MIG-backed vGPU types on the physical GPU is not set to 0
- create
- This file is used for creating a vGPU instance. A vGPU instance is created by writing the UUID of the vGPU to this file. The file is write only.
- description
-
This file contains the following details of the vGPU type:
- The maximum number of virtual display heads that the vGPU type supports
- The frame rate limiter (FRL) configuration in frames per second
- The frame buffer size in Mbytes
- The maximum resolution per display head
- The maximum number of vGPU instances per physical GPU
For example:
# cat description num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=4096x2160, max_instance=4
- device_api
-
This file contains the string
vfio_pci
to indicate that a vGPU is a PCI device. - devices
-
This directory contains all the
mdev
devices that are created for the vGPU type. For example:# ll devices total 0 lrwxrwxrwx 1 root root 0 Dec 6 01:52 aa618089-8b16-4d01-a136-25a0f3c73123 -> ../../../aa618089-8b16-4d01-a136-25a0f3c73123
- name
-
This file contains the name of the vGPU type. For example:
# cat name GRID M10-2Q
2.10.8.2. NVIDIA vGPU Information in the sysfs File System for Hypervisors that Use a Vendor-Specific VFIO Framework
A vendor-specific VFIO framework does not support the mediated VFIO mdev
driver framework. Information about the physical GPUs and the vGPU types that can be created on each physical GPU is stored in directories and files under the /sys/bus/pci/devices/ directory.
The organization of the sysfs directory for each virtual function on a physical GPU is as follows:
/sys/bus/pci/devices/
|-virtual-function
|-nvidia
|-creatable_vgpu_types
|-current_vgpu_type
|-vgpu_params
- virtual-function
-
Each virtual function on each physical GPU on the host is represented by a subdirectory of the /sys/bus/pci/devices/ directory.
The name of each subdirectory is as follows:
domain\:bus\:vf-slot.v-function
domain and bus are the domain and bus of the GPU. vf-slot and v-function are the slot and function of the virtual function. For example:
0000\:3d\:00.4
.You must create this directory by enabling the virtual function for the GPU as explained in Creating an NVIDIA vGPU on a Linux with KVM Hypervisor. This directory is not created automatically.
- nvidia
-
The nvidia directory contains the files for vGPU management on the virtual function. These files are as follows:
- creatable_vgpu_types
-
This file contains the NVIDIA vGPU types that can be created on the virtual function and the integer ID that represents each vGPU type in the sysfs file system. For example:
# cat creatable_vgpu_types NVIDIA A40-1Q 557 NVIDIA A40-2Q 558 NVIDIA A40-3Q 559 NVIDIA A40-4Q 560 NVIDIA A40-6Q 561
- If a vGPU has been created on this virtual function, this file is empty.
- If a vGPU has been created on another virtual function on the same GPU, this file contains only the vGPU types that can reside on the same GPU as the existing vGPU.
- If the maximum number of vGPUs that the GPU supports has been created on other virtual functions for the GPU, this file is empty.
Note:When a time-sliced vGPU is created on a GPU in equal-size mode, the content of the creatable_vgpu_types for all virtual functions on the physical GPU is set to only the vGPU types with the same amount of frame buffer as the vGPU that was created. This behavior enforces the requirement that all time-sliced vGPUs on the physical GPU must have the same amount of frame buffer. However, this requirement does not apply to time-sliced vGPUs created on a GPU in mixed-size mode or to MIG-backed vGPUs.
- current_vgpu_type
-
This file contains the integer ID that represents the vGPU type in the sysfs file system of the vGPU that is created on this virtual function. For example, if an NVIDIA A40-4Q vGPU has been created on this virtual function, this file contains the integer
560
:# cat current_vgpu_type 560
If no vGPU is created on the virtual function, this file contains the integer 0. When this file is created, its contents are set to the default value of 0.
This file is used for creating and deleting a vGPU on the virtual function.
- A vGPU is created by writing the integer ID that represents the vGPU type in the sysfs file system to this file.
- A vGPU is deleted by writing 0 to this file.
- vgpu_params
- This file is used for setting plugin parameters for the vGPU on the virtual function to control its behavior. Plugin parameters are set by writing a list of parameter-value pairs to this file. For more information, refer to Setting vGPU Plugin Parameters on a Linux with KVM Hypervisor.
2.11. Putting a GPU Into Mixed-Size Mode
By default, a GPU supports only vGPUs with the same amount of frame buffer and, therefore, is in equal-size mode. To support vGPUs with different amounts of frame buffer, the GPU must be put into mixed-size mode. When a GPU is in mixed-size mode, the maximum number of some types of vGPU allowed on a GPU is less than when the GPU is in equal-size mode.
- How a GPU in mixed-size mode behaves if the hypervisor host is rebooted, the NVIDIA Virtual GPU Manager is reloaded, or the GPU is reset depends on the hypervisor:
- On a Linux with KVM hypervisor, a GPU in mixed-size mode reverts to its default mode.
- On VMware vSphere, a GPU in mixed-size mode remains in mixed-size mode. The GPU doesn't revert to its default mode
- When a GPU is in mixed-size mode, only the best effort and equal share schedulers are supported. The fixed share scheduler is not supported.
Before performing this task, ensure that no vGPUs are running on the GPU and that the GPU is not being used by any other processes, such as CUDA applications, monitoring applications, or the nvidia-smi command.
How to put a GPU into mixed-size mode depends on the hypervisor that you are using as explained in the following topics:
- Putting a GPU Into Mixed-Size Mode on a Linux with KVM Hypervisor
- Putting a GPU Into Mixed-Size Mode on VMware vSphere
2.11.1. Putting a GPU Into Mixed-Size Mode on a Linux with KVM Hypervisor
If you are using a GPU that supports SR-IOV, ensure that the virtual functions for the physical GPU in the sysfs file system are enabled as explained in Preparing the Virtual Function for an NVIDIA vGPU that Supports SR-IOV on a Linux with KVM Hypervisor.
- Use nvidia-smi to list the status of all physical GPUs, and check that heterogeneous time-sliced vGPU sizes are noted as supported.
# nvidia-smi -q ... Attached GPUs : 1 GPU 00000000:41:00.0 ... Heterogeneous Time-Slice Sizes : Supported ...
- Put each GPU that you want to support vGPUs with different amounts of frame buffer into mixed-size mode.
# nvidia-smi vgpu -i id -shm 1
- id
- The index of the GPU as reported by nvidia-smi.
This example puts the GPU with index
00000000:41:00.0
into mixed-size mode.# nvidia-smi vgpu -i 0 -shm 1 Enabled vGPU heterogeneous mode for GPU 00000000:41:00.0
- Confirm that the GPU is now in mixed-size mode by using nvidia-smi to check that vGPU heterogeneous mode is enabled.
# nvidia-smi -q ... vGPU Heterogeneous Mode : Enabled ...
2.11.2. Putting a GPU Into Mixed-Size Mode on VMware vSphere
On VMware vSphere, you can put a GPU into mixed-size mode either by using vCenter Server or by using the esxcli command.
For instructions, refer to the following topics:
- Putting a GPU Into Mixed-Size Mode on VMware vSphere by Using vCenter Server
- Putting a GPU Into Mixed-Size Mode on VMware vSphere by Using the esxcli Command
2.11.2.1. Putting a GPU Into Mixed-Size Mode on VMware vSphere by Using vCenter Server
- Log in to vCenter Server by using the vSphere Web Client.
- In the navigation tree, select your ESXi host and click the Configure tab.
- From the navigation tree in the Configure tab, select Hardware > Graphics.
- Select the GPU and click Edit.
- In the pop-up window that opens, set vGPU Mode to Mixed Size.
Note:
Do not set the option to restart the Xorg server. This option is required only when the device type is changed, not when the vGPU mode of a GPU is changed.
2.11.2.2. Putting a GPU Into Mixed-Size Mode on VMware vSphere by Using the esxcli Command
Perform this task from ESXi Shell on the hypervisor host.
- Run the esxcli command to change the vGPU mode of the GPU to mixed size.
$ esxcli graphics device set --device-id=slot:bus:domain.function --type SharedPassthru --vgpu-mode=MixedSize
- slot
- bus
- domain
- function
-
The domain, bus, slot, and function of the GPU, without the
0x
prefix.
Note:To put a GPU into equal size mode, run this command with the --vgpu-mode=SameSize option.
- Refresh the host and rebuild the assignable hardware tree.
$ esxcli graphics host refresh
- Confirm that the vGPU mode of the GPU has been changed.
$ esxcli graphics device list
2.12. Placing a vGPU on a Physical GPU in Mixed-Size Mode on a Linux with KVM Hypervisor
By default, the Virtual GPU Manager determines where a vGPU is placed on a GPU. On a Linux with KVM hypervisor, you can control the placement of vGPUs on a GPU in mixed-size mode to fit as many vGPUs as possible on the GPU. By controlling the placement of vGPUs on the GPU, you can ensure that no gaps that can be occupied by a vGPU are left in the placement region on the GPU.
The vGPU placements that a GPU in mixed-size mode supports depend on the total amount of frame buffer that the GPU has. For details, refer to vGPU Placements for GPUs in Mixed-Size Mode.
- On a Linux with KVM hypervisor, this task is optional. If you want the Virtual GPU Manager to determine where a vGPU is placed on a GPU, omit this task.
- On VMware vSphere, you cannot control the placement of vGPUs on a GPU in mixed-size mode. The VMware vSphere software determines where a vGPU is placed on a GPU.
Before performing this task, ensure that following prerequisites are met:
- The GPU has been put into mixed-size mode as explained in Putting a GPU Into Mixed-Size Mode.
- The vGPU that you want to place on the physical GPU has been created as explained in Creating an NVIDIA vGPU on a Linux with KVM Hypervisor.
Perform this task in a command shell on the hypervisor host.
- Use nvidia-smi to list the placement size and available placement IDs for the type of the vGPU.
# nvidia-smi vgpu -c -v ... vGPU Type ID : 0x392 Name : NVIDIA L4-6Q ... Placement Size : 6 Creatable Placement IDs : 6 18 ...
Note:Some supported placement IDs for the vGPU type might be unavailable because they are already in use by another vGPU. To list the placement size and all supported placement IDs for the type of the vGPU, run the following command:
# nvidia-smi vgpu -s -v ... vGPU Type ID : 0x392 Name : NVIDIA L4-6Q ... Placement Size : 6 Supported Placement IDs : 0 6 12 18 ...
The number of supported placement IDs is the maximum number of vGPUs of the type that are allowed on the GPU in mixed-size mode.
- Set the vgpu-placement-id vGPU plugin parameter for the vGPU to the placement ID that you want.
For a Linux with KVM hypervisor, write the parameter to the vgpu_params file in the nvidia subdirectory of the mdev device directory that represents the vGPU.
# echo "vgpu-placement-id=placement-id" > /sys/bus/mdev/devices/uuid/nvidia/vgpu_params
- placement-id
- The placement ID that you want to set for the vGPU.
- uuid
-
The UUID of the vGPU, for example,
aa618089-8b16-4d01-a136-25a0f3c73123
.
This example sets the placement ID for the vGPU that has the UUID
aa618089-8b16-4d01-a136-25a0f3c73123
to 6.# echo "vgpu-placement-id=6" > \ /sys/bus/mdev/devices/aa618089-8b16-4d01-a136-25a0f3c73123/nvidia/vgpu_params
When the VM to which the vGPU is assigned is rebooted, the Virtual GPU Manager validates the placement ID that you assigned to the vGPU. If the placement ID is invalid or unavailable, the VM fails to boot.
After the VM to which the vGPU is assigned has been rebooted, you can confirm that the vGPU has been assigned the correct placement ID.
# nvidia-smi vgpu -q
GPU 00000000:41:00.0
Active vGPUs : 1
vGPU ID : 3251719533
VM ID : 2150987
...
Placement ID : 6
...
2.13. Disabling and Enabling ECC Memory
Some GPUs that support NVIDIA vGPU software support error correcting code (ECC) memory with NVIDIA vGPU. ECC memory improves data integrity by detecting and handling double-bit errors. However, not all GPUs, vGPU types, and hypervisor software versions support ECC memory with NVIDIA vGPU.
On GPUs that support ECC memory with NVIDIA vGPU, ECC memory is supported with C-series and Q-series vGPUs, but not with A-series and B-series vGPUs. Although A-series and B-series vGPUs start on physical GPUs on which ECC memory is enabled, enabling ECC with vGPUs that do not support it might incur some costs.
On physical GPUs that do not have HBM2 memory, the amount of frame buffer that is usable by vGPUs is reduced. All types of vGPU are affected, not just vGPUs that support ECC memory.
The effects of enabling ECC memory on a physical GPU are as follows:
- ECC memory is exposed as a feature on all supported vGPUs on the physical GPU.
- In VMs that support ECC memory, ECC memory is enabled, with the option to disable ECC in the VM.
- ECC memory can be enabled or disabled for individual VMs. Enabling or disabling ECC memory in a VM does not affect the amount of frame buffer that is usable by vGPUs.
GPUs based on the Pascal GPU architecture and later GPU architectures support ECC memory with NVIDIA vGPU. To determine whether ECC memory is enabled for a GPU, run nvidia-smi -q for the GPU.
Tesla M60 and M6 GPUs support ECC memory when used without GPU virtualization, but NVIDIA vGPU does not support ECC memory with these GPUs. In graphics mode, these GPUs are supplied with ECC memory disabled by default.
Some hypervisor software versions do not support ECC memory with NVIDIA vGPU.
If you are using a hypervisor software version or GPU that does not support ECC memory with NVIDIA vGPU and ECC memory is enabled, NVIDIA vGPU fails to start. In this situation, you must ensure that ECC memory is disabled on all GPUs if you are using NVIDIA vGPU.
2.13.1. Disabling ECC Memory
If ECC memory is unsuitable for your workloads but is enabled on your GPUs, disable it. You must also ensure that ECC memory is disabled on all GPUs if you are using NVIDIA vGPU with a hypervisor software version or a GPU that does not support ECC memory with NVIDIA vGPU. If your hypervisor software version or GPU does not support ECC memory and ECC memory is enabled, NVIDIA vGPU fails to start.
Where to perform this task depends on whether you are changing ECC memory settings for a physical GPU or a vGPU.
- For a physical GPU, perform this task from the hypervisor host.
- For a vGPU, perform this task from the VM to which the vGPU is assigned.
Note:
ECC memory must be enabled on the physical GPU on which the vGPUs reside.
Before you begin, ensure that NVIDIA Virtual GPU Manager is installed on your hypervisor. If you are changing ECC memory settings for a vGPU, also ensure that the NVIDIA vGPU software graphics driver is installed in the VM to which the vGPU is assigned.
- Use nvidia-smi to list the status of all physical GPUs or vGPUs, and check for ECC noted as enabled.
# nvidia-smi -q ==============NVSMI LOG============== Timestamp : Mon Oct 28 18:36:45 2024 Driver Version : 550.127.06 Attached GPUs : 1 GPU 0000:02:00.0 [...] Ecc Mode Current : Enabled Pending : Enabled [...]
- Change the ECC status to off for each GPU for which ECC is enabled.
- If you want to change the ECC status to off for all GPUs on your host machine or vGPUs assigned to the VM, run this command:
# nvidia-smi -e 0
- If you want to change the ECC status to off for a specific GPU or vGPU, run this command:
# nvidia-smi -i id -e 0
id is the index of the GPU or vGPU as reported by nvidia-smi.
This example disables ECC for the GPU with index
0000:02:00.0
.# nvidia-smi -i 0000:02:00.0 -e 0
- If you want to change the ECC status to off for all GPUs on your host machine or vGPUs assigned to the VM, run this command:
- Reboot the host or restart the VM.
- Confirm that ECC is now disabled for the GPU or vGPU.
# nvidia—smi —q ==============NVSMI LOG============== Timestamp : Mon Oct 28 18:37:53 2024 Driver Version : 550.127.06 Attached GPUs : 1 GPU 0000:02:00.0 [...] Ecc Mode Current : Disabled Pending : Disabled [...]
If you later need to enable ECC on your GPUs or vGPUs, follow the instructions in Enabling ECC Memory.
2.13.2. Enabling ECC Memory
If ECC memory is suitable for your workloads and is supported by your hypervisor software and GPUs, but is disabled on your GPUs or vGPUs, enable it.
Where to perform this task depends on whether you are changing ECC memory settings for a physical GPU or a vGPU.
- For a physical GPU, perform this task from the hypervisor host.
- For a vGPU, perform this task from the VM to which the vGPU is assigned.
Note:
ECC memory must be enabled on the physical GPU on which the vGPUs reside.
Before you begin, ensure that NVIDIA Virtual GPU Manager is installed on your hypervisor. If you are changing ECC memory settings for a vGPU, also ensure that the NVIDIA vGPU software graphics driver is installed in the VM to which the vGPU is assigned.
- Use nvidia-smi to list the status of all physical GPUs or vGPUs, and check for ECC noted as disabled.
# nvidia-smi -q ==============NVSMI LOG============== Timestamp : Mon Oct 28 18:36:45 2024 Driver Version : 550.127.06 Attached GPUs : 1 GPU 0000:02:00.0 [...] Ecc Mode Current : Disabled Pending : Disabled [...]
- Change the ECC status to on for each GPU or vGPU for which ECC is enabled.
- If you want to change the ECC status to on for all GPUs on your host machine or vGPUs assigned to the VM, run this command:
# nvidia-smi -e 1
- If you want to change the ECC status to on for a specific GPU or vGPU, run this command:
# nvidia-smi -i id -e 1
id is the index of the GPU or vGPU as reported by nvidia-smi.
This example enables ECC for the GPU with index
0000:02:00.0
.# nvidia-smi -i 0000:02:00.0 -e 1
- If you want to change the ECC status to on for all GPUs on your host machine or vGPUs assigned to the VM, run this command:
- Reboot the host or restart the VM.
- Confirm that ECC is now enabled for the GPU or vGPU.
# nvidia—smi —q ==============NVSMI LOG============== Timestamp : Mon Oct 28 18:37:53 2024 Driver Version : 550.127.06 Attached GPUs : 1 GPU 0000:02:00.0 [...] Ecc Mode Current : Enabled Pending : Enabled [...]
If you later need to disable ECC on your GPUs or vGPUs, follow the instructions in Disabling ECC Memory.
2.14. Configuring a vGPU VM for Use with NVIDIA GPUDirect Storage Technology
To use NVIDIA® GPUDirect Storage® technology with NVIDIA vGPU, you must install all the required software in the VM that is configured with NVIDIA vGPU.
Ensure that the prerequisites in Prerequisites for Using NVIDIA vGPU are met.
- Install and configure the NVIDIA Virtual GPU Manager as explained in Installing and Configuring the NVIDIA Virtual GPU Manager for Red Hat Enterprise Linux KVM.
- As root, log in to the VM that you configured with NVIDIA vGPU in the previous step.
- Install the Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) in the VM as explained in Installation Procedure in Installing Mellanox OFED.
In the command to run the installation script, specify the following options:
- --with-nvmf
- --with-nfsrdma
- --enable-gds
- --add-kernel-support
- Install the NVIDIA vGPU software graphics driver for Linux in the VM from a distribution-specific package.
Note:
GPUDirect Storage technology does not support installation of the NVIDIA vGPU software graphics driver for Linux from a .run file.
Follow the instructions for the Linux distribution that is installed in the VM:
- Install NVIDIA CUDA Toolkit from a .run file, deselecting the CUDA driver when selecting the CUDA components to install.
Note:
To avoid overwriting the NVIDIA vGPU software graphics driver that you installed in the previous step, do not install NVIDIA CUDA Toolkit from a distribution-specific package.
For instructions, refer to Runfile Installation in NVIDIA CUDA Installation Guide for Linux.
- Use the package manager of the Linux distribution that is installed in the VM to install the GPUDirect Storage technology packages, omitting the installation of the NVIDIA CUDA Toolkit packages.
Follow the instructions in NVIDIA CUDA Installation Guide for Linux for the Linux distribution that is installed in the VM:
-
In the step to install CUDA, execute only the command to include all GPUDirect Storage technology packages:
sudo dnf install nvidia-gds
- Ubuntu In the step to install CUDA, execute only the command to include all GPUDirect Storage technology packages:
sudo apt-get install nvidia-gds
-
After you configure a vGPU VM for use with NVIDIA GPUDirect Storage technology, you can license the NVIDIA vGPU software licensed products that you are using. For instructions, refer to Virtual GPU Client Licensing User Guide.
GPU pass-through is used to directly assign an entire physical GPU to one VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU is accessed exclusively by the NVIDIA driver running in the VM to which it is assigned; the GPU is not shared among VMs.
In pass-through mode, GPUs based on NVIDIA GPU architectures after the Maxwell architecture support error-correcting code (ECC).
GPU pass-through can be used in a server platform alongside NVIDIA vGPU, with some restrictions:
- A physical GPU can host NVIDIA vGPUs, or can be used for pass-through, but cannot do both at the same time. Some hypervisors, for example VMware vSphere ESXi, require a host reboot to change a GPU from pass-through mode to vGPU mode.
- A single VM cannot be configured for both vGPU and GPU pass-through at the same time.
- The performance of a physical GPU passed through to a VM can be monitored only from within the VM itself. Such a GPU cannot be monitored by tools that operate through the hypervisor, such as XenCenter or nvidia-smi (see Monitoring GPU Performance).
-
The following BIOS settings must be enabled on your server platform:
- VT-D/IOMMU
- SR-IOV in Advanced Options
- All GPUs directly connected to each other through NVLink must be assigned to the same VM.
You can assign multiple physical GPUs to one VM. The maximum number of physical GPUs that you can assign to a VM depends on the maximum number of PCIe pass-through devices per VM that your chosen hypervisor can support. For more information, refer to the documentation for your hypervisor, for example:
- XenServer: Configuration limits
- Red Hat Enterprise Linux:
- Red Hat Enterprise Linux 9 releases: Assigning a GPU to a virtual machine, Known Issues
- Red Hat Enterprise Linux 8 releases: Assigning a GPU to a virtual machine, Known Issues
- Red Hat Enterprise Linux 7 releases: GPU PCI Device Assignment
- VMware vSphere: vSphere 7.0 Configuration Limits
If you intend to configure all GPUs in your server platform for pass-through, you do not need to install the NVIDIA Virtual GPU Manager.
3.1. Display Resolutions for Physical GPUs
The display resolutions supported by a physical GPU depend on the NVIDIA GPU architecture and the NVIDIA vGPU software license that is applied to the GPU.
vWS Physical GPU Resolutions
GPUs that are licensed with a vWS license support a maximum combined resolution based on the number of available pixels, which is determined by the NVIDIA GPU architecture. You can choose between using a small number of high resolution displays or a larger number of lower resolution displays with these GPUs.
The following table lists the maximum number of displays per GPU at each supported display resolution for configurations in which all displays have the same resolution.
NVIDIA GPU Architecture | Available Pixels | Display Resolution | Displays per GPU |
---|---|---|---|
Pascal and later | 66355200 | 7680×4320 | 2 |
5120×2880 or lower | 4 | ||
Maxwell | 35389440 | 5120×2880 | 2 |
4096×2160 or lower | 4 |
The following table provides examples of configurations with a mixture of display resolutions.
NVIDIA GPU Architecture | Available Pixels | Available Pixel Basis | Maximum Displays | Sample Mixed Display Configurations |
---|---|---|---|---|
Pascal and later | 66355200 | 2 7680×4320 displays | 4 | 1 7680×4320 display plus 2 5120×2880 displays |
1 7680×4320 display plus 3 4096×2160 displays | ||||
Maxwell | 35389440 | 4 4096×2160 displays | 4 | 1 5120×2880 display plus 2 4096×2160 displays |
You cannot use more than four displays even if the combined resolution of the displays is less than the number of available pixels from the GPU. For example, you cannot use five 4096×2160 displays with a GPU based on the NVIDIA Pascal architecture even though the combined resolution of the displays (44236800) is less than the number of available pixels from the GPU (66355200).
vApps or vCS Physical GPU Resolutions
GPUs that are licensed with a vApps or a vCS license support a single display with a fixed maximum resolution. The maximum resolution depends on the following factors:
- NVIDIA GPU architecture
- The NVIDIA vGPU software license that is applied to the GPU
- The operating system that is running in the on the system to which the GPU is assigned
License | NVIDIA GPU Architecture | Operating System | Maximum Display Resolution | Displays per GPU |
---|---|---|---|---|
vApps | Pascal or later | Linux | 2560×1600 | 1 |
Pascal or later | Windows | 1280×1024 | 1 | |
Maxwell | Windows and Linux | 2560×1600 | 1 |
3.2. Using GPU Pass-Through on XenServer
You can configure a GPU for pass-through on XenServer by using XenCenter or by using the xe command.
The following additional restrictions apply when GPU pass-through is used in a server platform alongside NVIDIA vGPU:
- The performance of a physical GPU passed through to a VM cannot be monitored through XenCenter.
- nvidia-smi in dom0 no longer has access to the GPU.
- Pass-through GPUs do not provide console output through XenCenter’s VM Console tab. Use a remote graphics connection directly into the VM to access the VM’s OS.
3.2.1. Configuring a VM for GPU Pass Through by Using XenCenter
Select the Pass-through whole GPU option as the GPU type in the VM’s Properties:
Figure 14. Using XenCenter to configure a pass-through GPU
After configuring a XenServer VM for GPU pass through, install the NVIDIA graphics driver in the guest OS on the VM as explained in Installing the NVIDIA vGPU Software Graphics Driver.
3.2.2. Configuring a VM for GPU Pass Through by Using xe
Create a vgpu
object with the passthrough
vGPU type:
[root@xenserver ~]# xe vgpu-type-list model-name="passthrough"
uuid ( RO) : fa50b0f0-9705-6c59-689e-ea62a3d35237
vendor-name ( RO):
model-name ( RO): passthrough
framebuffer-size ( RO): 0
[root@xenserver ~]# xe vgpu-create vm-uuid=753e77a9-e10d-7679-f674-65c078abb2eb vgpu-type-uuid=fa50b0f0-9705-6c59-689e-ea62a3d35237 gpu-group-uuid=585877ef-5a6c-66af-fc56-7bd525bdc2f6
6aa530ec-8f27-86bd-b8e4-fe4fde8f08f9
[root@xenserver ~]#
Do not assign pass-through GPUs using the legacy other-config:pci
parameter setting. This mechanism is not supported alongside the XenCenter UI and xe vgpu mechanisms, and attempts to use it may lead to undefined results.
After configuring a XenServer VM for GPU pass through, install the NVIDIA graphics driver in the guest OS on the VM as explained in Installing the NVIDIA vGPU Software Graphics Driver.
3.3. Using GPU Pass-Through on a Linux with KVM Hypervisor
NVIDIA vGPU software supports the following Linux with KVM hypervisors: Red Hat Enterprise Linux with KVM and Ubuntu.
You can configure a GPU for pass-through on a Linux with KVM hypervisor by using any of the following tools:
- The Virtual Machine Manager (virt-manager) graphical tool
- The virsh command
- The QEMU command line
Before configuring a GPU for pass-through on Red Hat Enterprise Linux KVM or Ubuntu, ensure that the following prerequisites are met:
- Red Hat Enterprise Linux KVM or Ubuntu is installed.
- A virtual disk has been created.
Note:
Do not create any virtual disks in /root.
- A virtual machine has been created.
If you're configuring a pass-through GPU that requires a large BAR address space on a UEFI VM, refer to NVIDIA vGPU software graphics driver fails to load on KVM-based hypervsiors for a workaround to ensure that BAR resources are mapped into the VM.
This workaround involves setting an experimental QEMU parameter.
3.3.1. Configuring a VM for GPU Pass-Through by Using Virtual Machine Manager (virt-manager)
For more information about using Virtual Machine Manager, see the following topics in the documentation for Red Hat Enterprise Linux 7:
- Managing Guests with the Virtual Machine Manager (virt-manager)
- Starting virt-manager
- Assigning a PCI Device with virt-manager
- Start virt-manager.
- In the virt-manager main window, select the VM that you want to configure for pass-through.
- From the Edit menu, choose Virtual Machine Details.
- In the virtual machine hardware information window that opens, click Add Hardware.
- In the Add New Virtual Hardware dialog box that opens, in the hardware list on the left, select PCI Host Device.
- From the Host Device list that appears, select the GPU that you want to assign to the VM and click Finish.
If you want to remove a GPU from the VM to which it is assigned, in the virtual machine hardware information window, select the GPU and click Remove.
After configuring the VM for GPU pass through, install the NVIDIA graphics driver in the guest OS on the VM as explained in Installing the NVIDIA vGPU Software Graphics Driver.
3.3.2. Configuring a VM for GPU Pass-Through by Using virsh
For more information about using virsh, see the following topics in the documentation for Red Hat Enterprise Linux 7:
- Verify that the
vfio-pci
module is loaded.# lsmod | grep vfio-pci
- Obtain the PCI device bus/device/function (BDF) of the GPU that you want to assign in pass-through mode to a VM.
# lspci | grep NVIDIA
The NVIDIA GPUs listed in this example have the PCI device BDFs
85:00.0
and86:00.0
.# lspci | grep NVIDIA 85:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 86:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
- Obtain the full identifier of the GPU from its PCI device BDF.
# virsh nodedev-list --cap pci| grep transformed-bdf
- transformed-bdf
-
The PCI device BDF of the GPU with the colon and the period replaced with underscores, for example,
85_00_0
.
This example obtains the full identifier of the GPU with the PCI device BDF
85:00.0
.# virsh nodedev-list --cap pci| grep 85_00_0 pci_0000_85_00_0
- Obtain the domain, bus, slot, and function of the GPU.
virsh nodedev-dumpxml full-identifier| egrep 'domain|bus|slot|function'
- full-identifier
-
The full identifier of the GPU that you obtained in the previous step, for example,
pci_0000_85_00_0
.
This example obtains the domain, bus, slot, and function of the GPU with the PCI device BDF
85:00.0
.# virsh nodedev-dumpxml pci_0000_85_00_0| egrep 'domain|bus|slot|function' <domain>0x0000</domain> <bus>0x85</bus> <slot>0x00</slot> <function>0x0</function> <address domain='0x0000' bus='0x85' slot='0x00' function='0x0'/>
- In virsh, open for editing the XML file of the VM that you want to assign the GPU to.
# virsh edit vm-name
- vm-name
- The name of the VM to that you want to assign the GPU to.
- Add a device entry in the form of an
address
element inside thesource
element to assign the GPU to the guest VM. You can optionally add a second address element after thesource
element to set a fixed PCI device BDF for the GPU in the guest operating system.<hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='domain' bus='bus' slot='slot' function='function'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev>
- domain
- bus
- slot
- function
- The domain, bus, slot, and function of the GPU, which you obtained in the previous step.
This example adds a device entry for the GPU with the PCI device BDF
85:00.0
and fixes the BDF for the GPU in the guest operating system.<hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x85' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev>
- Start the VM that you assigned the GPU to.
# virsh start vm-name
- vm-name
- The name of the VM that you assigned the GPU to.
After configuring the VM for GPU pass through, install the NVIDIA graphics driver in the guest OS on the VM as explained in Installing the NVIDIA vGPU Software Graphics Driver.
3.3.3. Configuring a VM for GPU Pass-Through by Using the QEMU Command Line
- Obtain the PCI device bus/device/function (BDF) of the GPU that you want to assign in pass-through mode to a VM.
# lspci | grep NVIDIA
The NVIDIA GPUs listed in this example have the PCI device BDFs
85:00.0
and86:00.0
.# lspci | grep NVIDIA 85:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 86:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
- Add the following option to the QEMU command line:
-device vfio-pci,host=bdf
- bdf
-
The PCI device BDF of the GPU that you want to assign in pass-through mode to a VM, for example,
85:00.0
.
This example assigns the GPU with the PCI device BDF
85:00.0
in pass-through mode to a VM.-device vfio-pci,host=85:00.0
After configuring the VM for GPU pass through, install the NVIDIA graphics driver in the guest OS on the VM as explained in Installing the NVIDIA vGPU Software Graphics Driver.
3.3.4. Preparing a GPU Configured for vGPU for Use in Pass-Through Mode
The mode in which a physical GPU is being used determines the Linux kernel module to which the GPU is bound. If you want to switch the mode in which a GPU is being used, you must unbind the GPU from its current kernel module and bind it to the kernel module for the new mode. After binding the GPU to the correct kernel module, you can then configure it for pass-through.
When the Virtual GPU Manager is installed on a Red Hat Enterprise Linux KVM or Ubuntu host, the physical GPUs on the host are bound to the nvidia
kernel module. A physical GPU that is bound to the nvidia
kernel module can be used only for vGPU. To enable the GPU to be passed through to a VM, the GPU must be unbound from nvidia
kernel module and bound to the vfio-pci
kernel module.
Before you begin, ensure that you have the domain, bus, slot, and function of the GPU that you are preparing for use in pass-through mode. For instructions, see Getting the BDF and Domain of a GPU on a Linux with KVM Hypervisor.
- If you are using a GPU that supports SR-IOV, such as a GPU based on the NVIDIA Ampere architecture, disable the virtual function for the GPU in the sysfs file system.
If your GPU does not support SR-IOV, omit this step.
Note:Before performing this step, ensure that the GPU is not being used by any other processes, such as CUDA applications, monitoring applications, or the nvidia-smi command.
Use the custom script sriov-manage provided by NVIDIA vGPU software for this purpose.
# /usr/lib/nvidia/sriov-manage -d domain:bus:slot.function
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without the
0x
prefix.
This example disables the virtual function for the GPU with the domain
00
, bus06
, slot0000
, and function0
.# /usr/lib/nvidia/sriov-manage -d 00:06:0000.0
- Determine the kernel module to which the GPU is bound by running the lspci command with the -k option on the NVIDIA GPUs on your host.
# lspci -d 10de: -k
The
Kernel driver in use:
field indicates the kernel module to which the GPU is bound.The following example shows that the NVIDIA Tesla M60 GPU with BDF
06:00.0
is bound to thenvidia
kernel module and is being used for vGPU.06:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) Subsystem: NVIDIA Corporation Device 115e Kernel driver in use: nvidia
- To ensure that no clients are using the GPU, acquire the unbind lock of the GPU.
- Ensure that no VM is running to which a vGPU on the physical GPU is assigned and that no process running on the host is using that GPU. Processes on the host that use the GPU include the nvidia-smi command and all processes based on the NVIDIA Management Library (NVML).
- Change to the directory in the proc file system that represents the GPU.
# cd /proc/driver/nvidia/gpus/domain\:bus\:slot.function
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without a
0x
prefix.
This example changes to the directory in the proc file system that represents the GPU with the domain
0000
and PCI device BDF06:00.0
.# cd /proc/driver/nvidia/gpus/0000\:06\:00.0
- Write the value
1
to the unbindLock file in this directory.# echo 1 > unbindLock
- Confirm that the unbindLock file now contains the value
1
.# cat unbindLock 1
If the unbindLock file contains the value
0
, the unbind lock could not be acquired because a process or client is using the GPU.
- Unbind the GPU from
nvidia
kernel module.- Change to the sysfs directory that represents the
nvidia
kernel module.# cd /sys/bus/pci/drivers/nvidia
- Write the domain, bus, slot, and function of the GPU to the unbind file in this directory.
# echo domain:bus:slot.function > unbind
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without a
0x
prefix.
This example writes the domain, bus, slot, and function of the GPU with the domain
0000
and PCI device BDF06:00.0
.# echo 0000:06:00.0 > unbind
- Change to the sysfs directory that represents the
- Bind the GPU to the
vfio-pci
kernel module.- Change to the sysfs directory that contains the PCI device information for the physical GPU.
# cd /sys/bus/pci/devices/domain\:bus\:slot.function
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without a
0x
prefix.
This example changes to the sysfs directory that contains the PCI device information for the GPU with the domain
0000
and PCI device BDF06:00.0
.# cd /sys/bus/pci/devices/0000\:06\:00.0
- Write the kernel module name
vfio-pci
to the driver_override file in this directory.# echo vfio-pci > driver_override
- Change to the sysfs directory that represents the
nvidia
kernel module.# cd /sys/bus/pci/drivers/vfio-pci
- Write the domain, bus, slot, and function of the GPU to the bind file in this directory.
# echo domain:bus:slot.function > bind
- domain
- bus
- slot
- function
-
The domain, bus, slot, and function of the GPU, without a
0x
prefix.
This example writes the domain, bus, slot, and function of the GPU with the domain
0000
and PCI device BDF06:00.0
.# echo 0000:06:00.0 > bind
- Change back to the sysfs directory that contains the PCI device information for the physical GPU.
# cd /sys/bus/pci/devices/domain\:bus\:slot.function
- Clear the content of the driver_override file in this directory.
# echo > driver_override
- Change to the sysfs directory that contains the PCI device information for the physical GPU.
You can now configure the GPU for use in pass-through mode as explained in Using GPU Pass-Through on a Linux with KVM Hypervisor.
3.4. Using GPU Pass-Through on Microsoft Windows Server
On supported versons of Microsoft Windows Server with Hyper-V role, you can use Discrete Device Assignment (DDA) to enable a VM to access a GPU directly.
3.4.1. Assigning a GPU to a VM on Microsoft Windows Server with Hyper-V
Perform this task in Windows PowerShell. If you do not know the location path of the GPU that you want to assign to a VM, use Device Manager to obtain it.
If you are using an actively cooled NVIDIA Quadro graphics card such as the RTX 8000 or RTX 6000, you must also pass through the audio device on the graphics card.
Ensure that the following prerequisites are met:
-
Windows Server with Desktop Experience and the Hyper-V role are installed and configured on your server platform, and a VM is created.
For instructions, refer to the following articles on the Microsoft technical documentation site:
- The guest OS is installed in the VM.
- The VM is powered off.
- Obtain the location path of the GPU that you want to assign to a VM.
- In the device manager, context-click the GPU and from the menu that pops up, choose Properties.
- In the Properties window that opens, click the Details tab and in the Properties drop-down list, select Location paths.
An example location path is as follows:
PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)
- If you are using an actively cooled NVIDIA Quadro graphics card, obtain the location path of the audio device on the graphics card and disable the device.
- In the device manager, from the View menu, choose Devices by connection.
- Navigate to ACPI x64-based PC > Microsoft ACPI-Compliant System > PCI Express Root Complex > PCI-to-PCI Bridge.
- Context-click High Definition Audio Controller and from the menu that pops up, choose Properties.
- In the Properties window that opens, click the Details tab and in the Properties drop-down list, select Location paths.
- Context-click High Definition Audio Controller again and from the menu that pops up, choose Disable device.
- Dismount the GPU and, if present, the audio device from host to make them unavailable to the host so that they can be used solely by the VM.
For each device that you are dismounting, type the following command:
Dismount-VMHostAssignableDevice -LocationPath gpu-device-location -force
- gpu-device-location
- The location path of the GPU or the audio device that you obtained previously.
This example dismounts the GPU at the location path
PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)
.Dismount-VMHostAssignableDevice -LocationPath "PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)" -force
- Assign the GPU and, if present, the audio device that you dismounted in the previous step to the VM.
For each device that you are assigning, type the following command:
Add-VMAssignableDevice -LocationPath gpu-device-location -VMName vm-name
- gpu-device-location
- The location path of the GPU or the audio device that you dismounted in the previous step.
- vm-name
- The name of the VM to which you are attaching the GPU or the audio device.
Note:You can assign a pass-through GPU and, if present, its audio device to only one virtual machine at a time.
This example assigns the GPU at the location path
PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)
to the VMVM1
.Add-VMAssignableDevice -LocationPath "PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)" -VMName VM1
- Power on the VM. The guest OS should now be able to use the GPU and, if present, the audio device.
After assigning a GPU to a VM, install the NVIDIA graphics driver in the guest OS on the VM as explained in Installing the NVIDIA vGPU Software Graphics Driver.
3.4.2. Returning a GPU to the Host OS from a VM on Windows Server with Hyper-V
Perform this task in the Windows PowerShell.
If you are using an actively cooled NVIDIA Quadro graphics card such as the RTX 8000 or RTX 6000, you must also return the audio device on the graphics card.
- List the GPUs and, if present, the audio devices that are currently assigned to the virtual machine (VM).
Get-VMAssignableDevice -VMName vm-name
- vm-name
- The name of the VM whose assigned GPUs and audio devices you want to list.
- Shut down the VM to which the GPU and any audio devices are assigned.
- Remove the GPU and, if present, the audio device from the VM to which they are assigned.
For each device that you are removing, type the following command:
Remove-VMAssignableDevice –LocationPath gpu-device-location -VMName vm-name
- gpu-device-location
- The location path of the GPU or the audio device that you are removing, which you obtained previously.
- vm-name
- The name of the VM from which you are removing the GPU or the audio device.
This example removes the GPU at the location path
PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)
from the VMVM1
.Remove-VMAssignableDevice –LocationPath "PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)" -VMName VM1
After the GPU and, if present, its audio device are removed from the VM, they are unavailable to the host operating system (OS) until you remount them on the host OS.
- Remount the GPU and, if present, its audio device on the host OS.
For each device that you are remounting, type the following command:
Mount-VMHostAssignableDevice –LocationPath gpu-device-location
- gpu-device-location
- The location path of the GPU or the audio device that you are remounting, which you specified in the previous step to remove the GPU or the audio device from the VM.
This example remounts the GPU at the location path
PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)
on the host OS.Mount-VMHostAssignableDevice -LocationPath "PCIROOT(80)#PCI(0200)#PCI(0000)#PCI(1000)#PCI(0000)"
The host OS should now be able to use the GPU and, if present, its audio device.
3.5. Using GPU Pass-Through on VMware vSphere
On VMware vSphere, you can use Virtual Dedicated Graphics Acceleration (vDGA) to enable a VM to access a GPU directly. vDGA is a feature of VMware vSphere that dedicates a single physical GPU on an ESXi host to a single virtual machine.
Before configuring a vSphere VM with vDGA, ensure that these prerequisites are met
- The VM and the ESXi host are configured as explained in Preparing for vDGA Capabilities in the VMware Horizon documentation.
- The VM is powered off.
- Open the vCenter Web UI.
- In the vCenter Web UI, right-click the ESXi host and choose Configure.
- From the Hardware menu, choose PCI Devices.
- On the PCI Devices page that opens, click ALL PCI DEVICES and in the table of devices, select the GPU.
Note:
When selecting the GPU to pass through, you must select only the physical device. To list only NVIDIA physical devices, set the filter on the Vendor Name field to NVIDIA and filter out any virtual function devices of the GPU by setting the filter on the ID field to 00.0.
- Click TOGGLE PASSTHROUGH.
- Reboot the ESXi host.
- After the ESXi host has booted, right-click the VM and choose Edit Settings.
- From the New Device menu, choose PCI Device and click Add.
- On the page that opens, from the New Device drop-down list, select the GPU.
- Click Reserve all memory and click OK.
- Start the VM.
For more information about vDGA, see the following topics in the VMware Horizon documentation:
After configuring a vSphere VM with vDGA, install the NVIDIA graphics driver in the guest OS on the VM as explained in Installing the NVIDIA vGPU Software Graphics Driver.
The process for installing the NVIDIA vGPU software graphics driver depends on the OS that you are using. However, for any OS, the process for installing the driver is the same in a VM configured with vGPU, in a VM that is running pass-through GPU, or on a physical host in a bare-metal deployment.
After you install the NVIDIA vGPU software graphics driver, you can license any NVIDIA vGPU software licensed products that you are using.
4.1. Installing the NVIDIA vGPU Software Graphics Driver and NVIDIA Control Panel on Windows
To fully enable GPU operation in a VM or on a bare-metal host, the NVIDIA vGPU software graphics driver must be installed. If the NVIDIA Control Panel app is not installed when the graphics driver is installed, you can install it separately from the graphics driver.
4.1.1. Installing the NVIDIA vGPU Software Graphics Driver on Windows
Installation in a VM: After you create a Windows VM on the hypervisor and boot the VM, the VM should boot to a standard Windows desktop in VGA mode at 800×600 resolution. You can use the Windows screen resolution control panel to increase the resolution to other standard resolutions, but to fully enable GPU operation, the NVIDIA vGPU software graphics driver must be installed. Windows guest VMs are supported on all NVIDIA vGPU types, namely: Q-series, B-series, and A-series NVIDIA vGPU types.
Installation on bare metal: When the physical host is booted before the NVIDIA vGPU software graphics driver is installed, boot and the primary display are handled by an on-board graphics adapter. To install the NVIDIA vGPU software graphics driver, access the Windows desktop on the host by using a display connected through the on-board graphics adapter.
The procedure for installing the driver is the same in a VM and on bare metal.
- Copy the NVIDIA Windows driver package to the guest VM or physical host where you are installing the driver.
- Execute the package to unpack and run the driver installer.
Figure 15. NVIDIA driver installation
- Click through the license agreement.
- Select Express Installation and click NEXT. After the driver installation is complete, the installer may prompt you to restart the platform.
- If prompted to restart the platform, do one of the following:
- Select Restart Now to reboot the VM or physical host.
- Exit the installer and reboot the VM or physical host when you are ready.
After the VM or physical host restarts, it boots to a Windows desktop.
- Verify that the NVIDIA driver is running.
- Right-click on the desktop.
- From the menu that opens, choose NVIDIA Control Panel.
- In the NVIDIA Control Panel, from the Help menu, choose System Information.
NVIDIA Control Panel reports the vGPU or physical GPU that is being used, its capabilities, and the NVIDIA driver version that is loaded.
Figure 16. Verifying NVIDIA driver operation using NVIDIA Control Panel
Installation in a VM: After you install the NVIDIA vGPU software graphics driver, you can license any NVIDIA vGPU software licensed products that you are using. For instructions, refer to Virtual GPU Client Licensing User Guide.
The graphics driver for Windows in this release of NVIDIA vGPU software is distributed in a DCH-compliant package. A DCH-compliant package differs from a driver package that is not DCH compliant in the following ways:
- The Windows registry key for license settings for a DCH-compliant package is different than the key for a driver package that is not DCH compliant. If you are upgrading from a driver package that is not DCH compliant in a VM that was previously licensed, you must reconfigure the license settings for the VM. Existing license settings are not propagated to the new Windows registry key for a DCH-compliant package.
- NVIDIA System Management Interface, nvidia-smi, is installed in a folder that is in the default executable path.
- The NVWMI binary files are installed in the Windows Driver Store under %SystemDrive%:\Windows\System32\DriverStore\FileRepository\.
- NVWMI help information in Windows Help format is not installed with graphics driver for Windows guest OSes.
Installation on bare metal: After you install the NVIDIA vGPU software graphics driver, complete the bare-metal deployment as explained in Bare-Metal Deployment.
4.1.2. Installing the Standalone NVIDIA Control Panel App
The NVIDIA Control Panel app is now distributed through the Microsoft Store. If your system does not allow the installation apps from the Microsoft Store, the NVIDIA Control Panel app is not installed when the NVIDIA vGPU software graphics driver for Windows is installed.
Your system might not allow the installation apps from the Microsoft Store for any of the following reasons:
- The Microsoft Store app is disabled.
- Your system is not connected to the Internet
- Installation of apps from the Microsoft Store is blocked by your system settings.
You can install the NVIDIA Control Panel app separately from the graphics driver by downloading and running the standalone NVIDIA Control Panel installer that is available from NVIDIA Licensing Portal.
- Download and extract the standalone NVIDIA Control Panel installer from NVIDIA Licensing Portal.
- Copy the extracted standalone NVIDIA Control Panel installer to the guest VM or physical host where you are installing the NVIDIA Control Panel app.
- Double-click the installer executable file to start the installer.
- When asked if you want to allow the installer app to make changes to your device, click Yes.
- Accept the NVIDA software license agreement.
- Select the Express installation option and click NEXT.
- When the installation is complete, click CLOSE to close the installer. The NVIDIA Control Panel app opens.
4.2. Installing the NVIDIA vGPU Software Graphics Driver on Linux
The NVIDIA vGPU software graphics driver for Linux is distributed as a .run file that can be installed on all supported Linux distributions. The driver is also distributed as a Debian package for Ubuntu distributions and as an RPM package for Red Hat distributions.
Installation in a VM: After you create a Linux VM on the hypervisor and boot the VM, install the NVIDIA vGPU software graphics driver in the VM to fully enable GPU operation. Linux guest VMs are supported on all NVIDIA vGPU types, namely: Q-series, B-series, and A-series NVIDIA vGPU types.
Installation on bare metal: When the physical host is booted before the NVIDIA vGPU software graphics driver is installed, the vesa Xorg driver starts the X server. If a primary display device is connected to the host, use the device to access the desktop. Otherwise, use secure shell (SSH) to log in to the host from a remote host. In addition to the proprietary release of the NVIDIA vGPU software graphics driver for Linux, a release that is based on NVIDIA Linux open GPU kernel modules is also available. The release that is based on NVIDIA Linux open GPU kernel modules is compatible with the following NVIDIA vGPU software deployments:
- NVIDIA vGPU deployments on GPUs that are based on the NVIDIA Ada Lovelace GPU architecture or later architectures
- Bare-metal deployments on GPUs that are based on the NVIDIA Turing GPU architecture or later architectures
The release that is based on NVIDIA Linux open GPU kernel modules can be installed only from the .run file, not from a Debian package or RPM package.
The procedure for installing the driver is the same in a VM and on bare metal.
Before installing the NVIDIA vGPU software graphics driver, ensure that the following prerequisites are met:
- OpenSSL is installed in the VM. If OpenSSL is not installed, the VM will not be able to obtain NVIDIA vGPU software licenses.
- NVIDIA Direct Rendering Manager Kernel Modesetting (DRM KMS) is disabled. By default, DRM KMS is disabled. However, if it has been enabled, remove
nvidia-drm.modeset=1
from the kernel command-line options. - If the VM uses UEFI boot, ensure that secure boot is disabled.
- If the Nouveau driver for NVIDIA graphics cards is present, disable it. For instructions, refer to as explained in Disabling the Nouveau Driver for NVIDIA Graphics Cards.
- If you are using a Linux OS for which the Wayland display server protocol is enabled by default, disable it as explained in Disabling the Wayland Display Server Protocol for Red Hat Enterprise Linux.
How to install the NVIDIA vGPU softwaregraphics driver on Linux depends on the distribution format from which you are installing the driver. For detailed instructions, refer to:
- Installing the NVIDIA vGPU Software Graphics Driver on Linux from a .run File
- Installing the NVIDIA vGPU Software Graphics Driver on Ubuntu from a Debian Package
- Installing the NVIDIA vGPU Software Graphics Driver on Red Hat Distributions from an RPM Package
Installation in a VM: After you install the NVIDIA vGPU software graphics driver, you can license any NVIDIA vGPU software licensed products that you are using. For instructions, refer to Virtual GPU Client Licensing User Guide.
Installation on bare metal: After you install the NVIDIA vGPU software graphics driver, complete the bare-metal deployment as explained in Bare-Metal Deployment.
4.2.1. Installing the NVIDIA vGPU Software Graphics Driver on Linux from a .run File
You can use the .run file to install the NVIDIA vGPU software graphics driver on any supported Linux distribution.
Installation of the NVIDIA vGPU software graphics driver for Linux from a .run file requires:
- Compiler toolchain
- Kernel headers
If a driver has previously been installed on the guest VM or physical host from a Debian package or RPM package, uninstall that driver before installing the driver from a .run file.
If Dynamic Kernel Module Support (DKMS) is enabled, ensure that the dkms
package is installed.
- Copy the NVIDIA vGPU software Linux driver package, for example NVIDIA-Linux_x86_64-550.127.05-grid.run, to the guest VM or physical host where you are installing the driver.
- Before attempting to run the driver installer, exit the X server and terminate all OpenGL applications.
- On Red Hat Enterprise Linux and CentOS systems, exit the X server by transitioning to runlevel 3:
[nvidia@localhost ~]$ sudo init 3
- On Ubuntu platforms, do the following:
- Switch to a console login prompt.
- If you have access to the terminal's function keys, press CTRL-ALT-F1.
- If you are accessing the guest VM or physical host through VNC or a web browser and do not have access to the terminal's function keys, run the chvt command of the OS as root.
[nvidia@localhost ~]$ sudo chvt 3
- Log in and shut down the display manager:
-
For Ubuntu 18 and later releases, stop the gdm service.
[nvidia@localhost ~]$ sudo service gdm stop
-
For releases earlier than Ubuntu 18, stop the lightdm service.
[nvidia@localhost ~]$ sudo service lightdm stop
-
- Switch to a console login prompt.
- On Red Hat Enterprise Linux and CentOS systems, exit the X server by transitioning to runlevel 3:
- From a console shell, run the driver installer as the root user.
- To install the proprietary release of the driver, run the driver installer without any additional options.
sudo sh ./NVIDIA-Linux_x86_64-550.127.05-grid.run
- To install the release that is based on NVIDIA Linux open GPU kernel modules, run the driver installer with the -m=kernel-open option.
sudo sh ./NVIDIA-Linux_x86_64-550.127.05-grid.run -m=kernel-open
If DKMS is enabled, set the -dkms option. This option requires the
dkms
package to be installed.sudo sh ./NVIDIA-Linux_x86_64-550.127.05-grid.run -dkms
In some instances, the installer may fail to detect the installed kernel headers and sources. In this situation, rerun the installer, specifying the kernel source path with the --kernel-source-path option.
sudo sh ./NVIDIA-Linux_x86_64-550.127.05-grid.run \ –kernel-source-path=/usr/src/kernels/3.10.0-229.11.1.el7.x86_64
- To install the proprietary release of the driver, run the driver installer without any additional options.
- When prompted, accept the option to update the X configuration file (xorg.conf).
Figure 17. Update xorg.conf settings
- After the installation is complete, select OK to exit the installer.
- Verify that the NVIDIA driver is operational.
- Reboot the system and log in.
- Run nvidia-settings.
[nvidia@localhost ~]$ nvidia-settings
4.2.2. Installing the NVIDIA vGPU Software Graphics Driver on Ubuntu from a Debian Package
The NVIDIA vGPU software graphics driver for Ubuntu is distributed as a Debian package file.
This task requires sudo privileges.
- Copy the NVIDIA vGPU software Linux driver package, for example nvidia-linux-grid-550_550.127.05_amd64.deb, to the guest VM where you are installing the driver.
- Log in to the guest VM as a user with sudo privileges.
- Open a command shell and change to the directory that contains the NVIDIA vGPU software Linux driver package.
- From the command shell, run the command to install the package.
$ sudo apt-get install ./nvidia-linux-grid-550_550.127.05_amd64.deb
- Verify that the NVIDIA driver is operational.
- Reboot the system and log in.
- After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from the nvidia-smi command.
$ nvidia-smi
4.2.3. Installing the NVIDIA vGPU Software Graphics Driver on Red Hat Distributions from an RPM Package
The NVIDIA vGPU software graphics driver for Red Hat Distributions is distributed as an RPM package file.
This task requires root
user privileges.
- Copy the NVIDIA vGPU software Linux driver package, for example nvidia-linux-grid-550_550.127.05_amd64.rpm, to the guest VM where you are installing the driver.
- Log in to the guest VM as a user with
root
user privileges. - Open a command shell and change to the directory that contains the NVIDIA vGPU software Linux driver package.
- From the command shell, run the command to install the package.
$ rpm -iv ./nvidia-linux-grid-550_550.127.05_amd64.rpm
- Verify that the NVIDIA driver is operational.
- Reboot the system and log in.
- After the system has rebooted, confirm that you can see your NVIDIA vGPU device in the output from the nvidia-smi command.
$ nvidia-smi
4.2.4. Disabling the Nouveau Driver for NVIDIA Graphics Cards
If the Nouveau driver for NVIDIA graphics cards is present, disable it before installing the NVIDIA vGPU software graphics driver.
If you are using SUSE Linux Enterprise Server, you can skip this task because the Nouveau driver is not present in SUSE Linux Enterprise Server.
Run the following command and if the command prints any output, the Nouveau driver is present and must be disabled.
$ lsmod | grep nouveau
- Create the file /etc/modprobe.d/blacklist-nouveau.conf with the following contents:
blacklist nouveau options nouveau modeset=0
- Regenerate the kernel initial RAM file system (initramfs). The command to run to regenerate the kernel initramfs depends on the Linux distribution that you are using.
Linux Distribution Command CentOS $ sudo dracut --force
Debian $ sudo update-initramfs -u
Red Hat Enterprise Linux $ sudo dracut --force
Ubuntu $ sudo update-initramfs -u
- Reboot the host or guest VM.
4.2.5. Disabling the Wayland Display Server Protocol for Red Hat Enterprise Linux
Starting with Red Hat Enterprise Linux Desktop 8.0, the Wayland display server protocol is used by default on supported GPU and graphics driver configurations. However, the NVIDIA vGPU software graphics driver for Linux requires the X Window System. Before installing the driver, you must disable the Wayland display server protocol to revert to the X Window System.
Perform this task from the host or guest VM that is running Red Hat Enterprise Linux Desktop.
This task requires administrative access.
- In a plain text editor, edit the file /etc/gdm/custom.conf and remove the comment from the option
WaylandEnable=false
. - Save your changes to /etc/gdm/custom.conf.
- Reboot the host or guest VM.
4.2.6. Disabling GSP Firmware
Some GPUs include a GPU System Processor (GSP), which may be used to offload GPU initialization and management tasks. In GPU pass through and bare-metal deployments on Linux, GSP is supported only for vCS. If you are using any other product in a GPU pass through or bare-metal deployment on Linux, you must disable the GSP firmware.
For NVIDIA vGPU deployments on Linux and all NVIDIA vGPU software deployments on Windows, omit this task.
GSP firmware is supported with NVIDIA vGPU deployments on GPUs that are based on the NVIDIA Ada Lovelace GPU architecture. For NVIDIA vGPU deployments on Linux and all NVIDIA vGPU software deployments on Windows on GPUs based on earlier GPU architectures, GSP is also not supported but GSP firmware is already disabled.
For each NVIDIA vGPU software product, the following table lists whether GSP is supported in deployments in which GSP firmware can be enabled. The table also summarizes the behavior of NVIDIA vGPU software if a VM or host requests a license when GSP firmware is enabled. The deployments in which GSP firmware can be enabled are GPU pass through and bare-metal deployments on Linux.
Product | GSP | License Request | Error Message |
---|---|---|---|
vCS | Supported | Allowed | Not applicable |
vApps | Not supported | Blocked | Printed |
vWS | Not supported | Blocked | Printed |
When a license request is blocked, the following error message is written to the licensing event log file at the location given in Virtual GPU Client Licensing User Guide:
Invalid feature requested for the underlying GSP firmware configuration.
Disable GSP firmware to use this feature.
Perform this task on the VM to which the GPU is passed through or on the bare-metal host.
Ensure that the NVIDIA vGPU software graphics driver for Linux is installed on the VM or bare-metal host.
- Log in to the VM or bare-metal host and open a command shell.
- Determine whether GSP firmware is enabled.
$ nvidia-smi -q
- If GSP firmware is enabled, the command displays the GSP firmware version, for example:
GSP Firmware Version : 550.127.05
- Otherwise, the command displays N/A as the GSP firmware version.
- If GSP firmware is enabled, the command displays the GSP firmware version, for example:
- If GSP firmware is enabled, disable it by setting the NVIDIA module parameter
NVreg_EnableGpuFirmware
to 0.Set this parameter by adding the following entry to the /etc/modprobe.d/nvidia.conf file:
options nvidia NVreg_EnableGpuFirmware=0
If the /etc/modprobe.d/nvidia.conf file does not already exist, create it.
- Reboot the VM or bare-metal host.
If you later need to enable GSP firmware, set the NVIDIA module parameter NVreg_EnableGpuFirmware
to 1.
NVIDIA vGPU is a licensed product. When booted on a supported GPU, a vGPU initially operates at full capability but its performance is degraded over time if the VM fails to obtain a license. If the performance of a vGPU has been degraded, the full capability of the vGPU is restored when a license is acquired. For information about how the performance of an unlicensed vGPU is degraded, see Virtual GPU Client Licensing User Guide.
After you license NVIDIA vGPU, the VM that is set up to use NVIDIA vGPU is capable of running the full range of DirectX and OpenGL graphics applications.
If licensing is configured, the virtual machine (VM) obtains a license from the license server when a vGPU is booted on these GPUs. The VM retains the license until it is shut down. It then releases the license back to the license server. Licensing settings persist across reboots and need only be modified if the license server address changes, or the VM is switched to running GPU pass through.
For complete information about configuring and using NVIDIA vGPU software licensed features, including vGPU, refer to Virtual GPU Client Licensing User Guide.
5.1. Prerequisites for Configuring a Licensed Client of NVIDIA License System
A client with a network connection obtains a license by leasing it from a NVIDIA License System service instance. The service instance serves the license to the client over the network from a pool of floating licenses obtained from the NVIDIA Licensing Portal. The license is returned to the service instance when the licensed client no longer requires the license.
Before configuring a licensed client, ensure that the following prerequisites are met:
- The NVIDIA vGPU software graphics driver is installed on the client.
- The client configuration token that you want to deploy on the client has been created from the NVIDIA Licensing Portal or the DLS as explained in Generating a Client Configuration Token in NVIDIA License System User Guide.
- Ports 443 and 80 in your firewall or proxy must be open to allow HTTPS traffic between a service instance and its the licensed clients. These ports must be open for both CLS instances and DLS instances.
Note:
For DLS releases before DLS 1.1, ports 8081 and 8082 were also required to be open to allow HTTPS traffic between a DLS instance and its licensed clients. Although these ports are no longer required, they remain supported for backward compatibility.
The graphics driver creates a default location in which to store the client configuration token on the client.
The process for configuring a licensed client is the same for CLS and DLS instances but depends on the OS that is running on the client.
5.2. Configuring a Licensed Client on Windows with Default Settings
Perform this task from the client.
- Copy the client configuration token to the %SystemDrive%:\Program Files\NVIDIA Corporation\vGPU Licensing\ClientConfigToken folder.
- Restart the NvDisplayContainer service.
The NVIDIA service on the client should now automatically obtain a license from the CLS or DLS instance.
5.3. Configuring a Licensed Client on Linux with Default Settings
Perform this task from the client.
- As root, open the file /etc/nvidia/gridd.conf in a plain-text editor, such as vi.
$ sudo vi /etc/nvidia/gridd.conf
Note:You can create the /etc/nvidia/gridd.conf file by copying the supplied template file /etc/nvidia/gridd.conf.template.
- Add the
FeatureType
configuration parameter to the file /etc/nvidia/gridd.conf on a new line asFeatureType="value"
.value depends on the type of the GPU assigned to the licensed client that you are configuring.
GPU Type Value NVIDIA vGPU 1. NVIDIA vGPU software automatically selects the correct type of license based on the vGPU type. Physical GPU The feature type of a GPU in pass-through mode or a bare-metal deployment: - 0: NVIDIA Virtual Applications
- 2: NVIDIA RTX Virtual Workstation
Note:You can also perform this step from NVIDIA X Server Settings. Before using NVIDIA X Server Settings to perform this step, ensure that this option has been enabled as explained in Virtual GPU Client Licensing User Guide.
This example shows how to configure a licensed Linux client for NVIDIA RTX Virtual Workstation.
# /etc/nvidia/gridd.conf.template - Configuration file for NVIDIA Grid Daemon … # Description: Set Feature to be enabled # Data type: integer # Possible values: # 0 => for unlicensed state # 1 => for NVIDIA vGPU # 2 => for NVIDIA RTX Virtual Workstation # 4 => for NVIDIA Virtual Compute Server FeatureType=2 ...
- Copy the client configuration token to the /etc/nvidia/ClientConfigToken directory.
- Ensure that the file access modes of the client configuration token allow the owner to read, write, and execute the token, and the group and others only to read the token.
- Determine the current file access modes of the client configuration token.
# ls -l client-configuration-token-directory
- If necessary, change the mode of the client configuration token to 744.
# chmod 744 client-configuration-token-directory/client_configuration_token_*.tok
- client-configuration-token-directory
- The directory to which you copied the client configuration token in the previous step.
- Determine the current file access modes of the client configuration token.
- Save your changes to the /etc/nvidia/gridd.conf file and close the file.
- Restart the nvidia-gridd service.
The NVIDIA service on the client should now automatically obtain a license from the CLS or DLS instance.
5.4. Verifying the NVIDIA vGPU Software License Status of a Licensed Client
After configuring a client with an NVIDIA vGPU software license, verify the license status by displaying the licensed product name and status.
To verify the license status of a licensed client, run nvidia-smi with the –q or --query optionfrom the licensed client, not the hypervisor host. If the product is licensed, the expiration date is shown in the license status.
nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Wed Nov 23 10:52:59 2022
Driver Version : 525.60.06
CUDA Version : 12.0
Attached GPUs : 2
GPU 00000000:02:03.0
Product Name : NVIDIA A2-8Q
Product Brand : NVIDIA RTX Virtual Workstation
Product Architecture : Ampere
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-ba5b1e9b-1dd3-11b2-be4f-98ef552f4216
Minor Number : 0
VBIOS Version : 00.00.00.00.00
MultiGPU Board : No
Board ID : 0x203
Board Part Number : N/A
GPU Part Number : 25B6-890-A1
Module ID : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : VGPU
Host VGPU Mode : N/A
vGPU Software Licensed Product Product Name : NVIDIA RTX Virtual Workstation License Status : Licensed (Expiry: 2022-11-23 10:41:16 GMT)
…
…
You can modify a VM's NVIDIA vGPU configuration by removing the NVIDIA vGPU configuration from a VM or by modifying GPU allocation policy.
6.1. Removing a VM’s NVIDIA vGPU Configuration
Remove a VM’s NVIDIA vGPU configuration when you no longer require the VM to use a virtual GPU.
6.1.1. Removing a Citrix Virtual Apps and Desktops VM’s vGPU configuration
You can remove a virtual GPU assignment from a VM, such that it no longer uses a virtual GPU, by using either XenCenter or the xe command.
The VM must be in the powered-off state in order for its vGPU configuration to be modified or removed.
6.1.1.1. Removing a VM’s vGPU configuration by using XenCenter
- Set the GPU type to None in the VM’s GPU Properties, as shown in Figure 18.
Figure 18. Using XenCenter to remove a vGPU configuration from a VM
- Click OK.
6.1.1.2. Removing a VM’s vGPU configuration by using xe
- Use vgpu-list to discover the vGPU object UUID associated with a given VM:
[root@xenserver ~]# xe vgpu-list vm-uuid=e71afda4-53f4-3a1b-6c92-a364a7f619c2 uuid ( RO) : c1c7c43d-4c99-af76-5051-119f1c2b4188 vm-uuid ( RO): e71afda4-53f4-3a1b-6c92-a364a7f619c2 gpu-group-uuid ( RO): d53526a9-3656-5c88-890b-5b24144c3d96
- Use vgpu-destroy to delete the virtual GPU object associated with the VM:
[root@xenserver ~]# xe vgpu-destroy uuid=c1c7c43d-4c99-af76-5051-119f1c2b4188 [root@xenserver ~]#
6.1.2. Removing a vSphere VM’s vGPU Configuration
To remove a vSphere vGPU configuration from a VM:
- Select Edit settings after right-clicking on the VM in the vCenter Web UI.
- Select the Virtual Hardware tab.
- Mouse over the PCI Device entry showing NVIDIA GRID vGPU and click on the (X) icon to mark the device for removal.
- Click OK to remove the device and update the VM settings.
6.2. Modifying GPU Allocation Policy
XenServer and VMware vSphere both support the breadth first and depth-first GPU allocation policies for vGPU-enabled VMs.
- breadth-first
- The breadth-first allocation policy attempts to minimize the number of vGPUs running on each physical GPU. Newly created vGPUs are placed on the physical GPU that can support the new vGPU and that has the fewest vGPUs already resident on it. This policy generally leads to higher performance because it attempts to minimize sharing of physical GPUs, but it may artificially limit the total number of vGPUs that can run.
- depth-first
- The depth-first allocation policy attempts to maximize the number of vGPUs running on each physical GPU. Newly created vGPUs are placed on the physical GPU that can support the new vGPU and that has the most vGPUs already resident on it. This policy generally leads to higher density of vGPUs, particularly when different types of vGPUs are being run, but may result in lower performance because it attempts to maximize sharing of physical GPUs.
Each hypervisor uses a different GPU allocation policy by default.
- XenServer uses the depth-first allocation policy.
- VMware vSphere ESXi uses the breadth-first allocation policy.
If the default GPU allocation policy does not meet your requirements for performance or density of vGPUs, you can change it.
6.2.1. Modifying GPU Allocation Policy on XenServer
You can modify GPU allocation policy on XenServer by using XenCenter or the xe command.
6.2.1.1. Modifying GPU Allocation Policy by Using xe
The allocation policy of a GPU group is stored in the allocation-algorithm
parameter of the gpu-group
object.
To change the allocation policy of a GPU group, use gpu-group-param-set
:
[root@xenserver ~]# xe gpu-group-param-get uuid=be825ba2-01d7-8d51-9780-f82cfaa64924 param-name=allocation-algorithmdepth-first
[root@xenserver ~]# xe gpu-group-param-set uuid=be825ba2-01d7-8d51-9780-f82cfaa64924 allocation-algorithm=breadth-first
[root@xenserver ~]#
6.2.1.2. Modifying GPU Allocation Policy GPU by Using XenCenter
You can modify GPU allocation policy from the GPU tab in XenCenter.
Figure 19. Modifying GPU placement policy in XenCenter
6.2.2. Modifying GPU Allocation Policy on VMware vSphere
Before using the vSphere Web Client to change the allocation scheme, ensure that the ESXi host is running and that all VMs on the host are powered off.
- Log in to vCenter Server by using the vSphere Web Client.
- In the navigation tree, select your ESXi host and click the Configure tab.
- From the menu, choose Graphics and then click the Host Graphics tab.
- On the Host Graphics tab, click Edit.
Figure 20. Breadth-first allocation scheme setting for vGPU-enabled VMs
- In the Edit Host Graphics Settings dialog box that opens, select these options and click OK.
- If not already selected, select Shared Direct.
- Select Group VMs on GPU until full.
Figure 21. Host graphics settings for vGPU
After you click OK, the default graphics type changes to Shared Direct and the allocation scheme for vGPU-enabled VMs is breadth-first.
Figure 22. Depth-first allocation scheme setting for vGPU-enabled VMs
- Restart the ESXi host or the Xorg service on the host.
See also the following topics in the VMware vSphere documentation:
6.3. Migrating a VM Configured with vGPU
On some hypervisors, NVIDIA vGPU software supports migration of VMs that are configured with vGPU.
Before migrating a VM configured with vGPU, ensure that the following prerequisites are met:
- The VM is configured with vGPU.
- The VM is running.
- The VM obtained a suitable vGPU license when it was booted.
- The destination host has a physical GPU of the same type as the GPU where the vGPU currently resides.
- ECC memory configuration (enabled or disabled) on both the source and destination hosts must be identical.
- The GPU topologies (including NVLink widths) on both the source and destination hosts must be identical.
How to migrate a VM configured with vGPU depends on the hypervisor that you are using.
After migration, the vGPU type of the vGPU remains unchanged.
The time required for migration depends on the amount of frame buffer that the vGPU has. Migration for a vGPU with a large amount of frame buffer is slower than for a vGPU with a small amount of frame buffer.
6.3.1. Migrating a VM Configured with vGPU on XenServer
NVIDIA vGPU software supports XenMotion for VMs that are configured with vGPU. XenMotion enables you to move a running virtual machine from one physical host machine to another host with very little disruption or downtime. For a VM that is configured with vGPU, the vGPU is migrated with the VM to an NVIDIA GPU on the other host. The NVIDIA GPUs on both host machines must be of the same type.
For details about which XenServer versions, NVIDIA GPUs, and guest OS releases support XenMotion with vGPU, see Virtual GPU Software for XenServer Release Notes.
For best performance, the physical hosts should be configured to use the following:
- Shared storage, such as NFS, iSCSI, or Fiberchannel
If shared storage is not used, migration can take a very long time because vDISK must also be migrated.
- 10 GB networking.
- In Citrix XenCenter, context-click the VM and from the menu that opens, choose Migrate.
- From the list of available hosts, select the destination host to which you want to migrate the VM. The destination host must have a physical GPU of the same type as the GPU where the vGPU currently resides. Furthermore, the physical GPU must be capable of hosting the vGPU. If these requirements are not met, no available hosts are listed.
6.3.2. Since 17.2: Migrating a VM Configured with vGPU on a Linux with KVM Hypervisor
NVIDIA vGPU software supports vGPU Migration for VMs that are configured with vGPU. vGPU Migration enables you to move a running virtual machine from one physical host machine to another host with very little disruption or downtime. For a VM that is configured with vGPU, the vGPU is migrated with the VM to an NVIDIA GPU on the other host. The NVIDIA GPUs on both host machines must be of the same type.
NVIDIA vGPU software supports the following Linux with KVM hypervisors: Red Hat Enterprise Linux with KVM and Ubuntu. For details about which Linux with KVM hypervisor versions, NVIDIA GPUs, and guest OS releases support vGPU Migration, refer to the following documentation:
- Virtual GPU Software for Red Hat Enterprise Linux with KVM Release Notes
- Virtual GPU Software for Ubuntu Release Notes
Perform this task in a Linux command shell on the Linux with KVM hypervisor host on which the VM to be migrated is running.
Before migrating a VM configured with vGPU on a Linux with KVM hypervisor, ensure that the prerequisites listed for all supported hypervisors in Migrating a VM Configured with vGPU are met.
- Set the maximum downtime of the VM to a length of time that is greater than the time required to complete the migration.
If the VM is heavily loaded, migration might not be completed within the default maximum downtime. To ensure that migration of the VM is completed, ensure that the maxim downtime exceeds the time required to complete the migration.
# virsh migrate-setmaxdowntime --domain vm-name --downtime length
- vm-name
- The name of the VM on the local host that you want to migrate.
- length
- The maximum downtime of the VM in milliseconds.
This example sets the maximum downtime of the VM named
guestvm
on the local host to 10 s (10,000 ms).# virsh migrate-setmaxdowntime --domain guestvm --downtime 10000
- Run the following virsh migrate command:
# virsh migrate --live vm-name destination-url --verbose
- vm-name
- The name of the VM on the local host that you want to migrate.
- destination-url
- The URL of the connection to the remote host to which you want to migrate the VM. For example, to migrate the VM to the system connection of the remote host at IP v4 address 192.0.2.12 by using an SSH tunnel, specify destination-url as qemu+ssh://root@192.0.2.12/system.
This example uses an SSH tunnel to migrate the VM named
guestvm
on the local host to the system connection of the remote host at IP v4 address 192.0.2.12.# virsh migrate --live guestvm qemu+ssh://root@192.0.2.12/system --verbose
For more information, refer to Migrating virtual machines in the product documentation for Red Hat Enterprise Linux 9.
6.3.3. Since 17.2: Suspending and Resuming a VM Configured with vGPU on a Linux with KVM Hypervisor
NVIDIA vGPU software supports suspend and resume for VMs that are configured with vGPU.
NVIDIA vGPU software supports the following Linux with KVM hypervisors: Red Hat Enterprise Linux with KVM and Ubuntu. For details about which Linux with KVM hypervisor versions, NVIDIA GPUs, and guest OS releases support suspend and resume, refer to the following documentation:
- Virtual GPU Software for Red Hat Enterprise Linux with KVM Release Notes
- Virtual GPU Software for Ubuntu Release Notes
Perform this task in a Linux command shell on the Linux with KVM hypervisor host on which the VM to be suspended is running or on which the VM to be resumed will run.
- To suspend a VM, use the virsh save command to save the state of the VM to a file.
# virsh save vm-name vm-state-file
- vm-name
- The name of the VM on the local host that you want to suspend.
- vm-state-file
- The name of the file to which you want to save the state of the VM.
This example suspends the VM named
guestvm
on the local host to by saving its state to the file guestvm-state.save.# virsh save guestvm guestvm-state.save
- To resume a VM, use the virsh restore command to restore the VM from a file to which the state of the VM has previously been saved.
# virsh restore vm-state-file
- vm-state-file
- The name of the file to which the state of the VM has previously been saved..
This example resumes the VM named
guestvm
on the local host to by restoring its state from the file guestvm-state.save.# virsh restore guestvm-state.save
6.3.4. Migrating a VM Configured with vGPU on VMware vSphere
NVIDIA vGPU software supports VMware vMotion for VMs that are configured with vGPU. VMware vMotion enables you to move a running virtual machine from one physical host machine to another host with very little disruption or downtime. For a VM that is configured with vGPU, the vGPU is migrated with the VM to an NVIDIA GPU on the other host. The NVIDIA GPUs on both host machines must be of the same type.
For details about which VMware vSphere versions, NVIDIA GPUs, and guest OS releases support suspend and resume, see Virtual GPU Software for VMware vSphere Release Notes.
Perform this task in the VMware vSphere web client by using the Migration wizard.
Before migrating a VM configured with vGPU on VMware vSphere, ensure that the following prerequisites are met:
- Your hosts are correctly configured for VMware vMotion. See Host Configuration for vMotion in the VMware documentation.
- The prerequisites listed for all supported hypervisors in Migrating a VM Configured with vGPU are met.
- NVIDIA vGPU migration is configured. See Configuring VMware vMotion with vGPU for VMware vSphere.
- Context-click the VM and from the menu that opens, choose Migrate.
- For the type of migration, select Change compute resource only and click Next. If you select Change both compute resource and storage, the time required for the migration increases.
- Select the destination host and click Next. The destination host must have a physical GPU of the same type as the GPU where the vGPU currently resides. Furthermore, the physical GPU must be capable of hosting the vGPU. If these requirements are not met, no available hosts are listed.
- Select the destination network and click Next.
- Select the migration priority level and click Next.
- Review your selections and click Finish.
For more information, see the following topics in the VMware documentation:
If NVIDIA vGPU migration is not configured, any attempt to migrate a VM with an NVIDIA vGPU fails and a window containing the following error message is displayed:
Compatibility Issue/Host
Migration was temporarily disabled due to another
migration activity.
vGPU hot migration is not enabled.
The window appears as follows:
If you see this error, configure NVIDIA vGPU migration as explained in Configuring VMware vMotion with vGPU for VMware vSphere.
If your version of VMware vSpehere ESXi does not support vMotion for VMs configured with NVIDIA vGPU, any attempt to migrate a VM with an NVIDIA vGPU fails and a window containing the following error message is displayed:
Compatibility Issues
...
A required migration feature is not supported on the "Source" host 'host-name'.
A warning or error occurred when migrating the virtual machine.
Virtual machine relocation, or power on after relocation or cloning can fail if
vGPU resources are not available on the destination host.
The window appears as follows:
For details about which VMware vSphere versions, NVIDIA GPUs, and guest OS releases support suspend and resume, see Virtual GPU Software for VMware vSphere Release Notes.
6.3.5. Suspending and Resuming a VM Configured with vGPU on VMware vSphere
NVIDIA vGPU software supports suspend and resume for VMs that are configured with vGPU.
For details about which VMware vSphere versions, NVIDIA GPUs, and guest OS releases support suspend and resume, see Virtual GPU Software for VMware vSphere Release Notes.
Perform this task in the VMware vSphere web client.
- To suspend a VM, context-click the VM that you want to suspend, and from the context menu that pops up, choose Power > Suspend.
- To resume a VM, context-click the VM that you want to resume, and from the context menu that pops up, choose Power > Power On.
6.4. Enabling Unified Memory for a vGPU
Unified memory is disabled by default. If used, you must enable unified memory individually for each vGPU that requires it by setting a vGPU plugin parameter. How to enable unified memory for a vGPU depends on the hypervisor that you are using.
6.4.1. Enabling Unified Memory for a vGPU on XenServer
On XenServer, enable unified memory by setting the enable_uvm vGPU plugin parameter.
Perform this task for each vGPU that requires unified memory by using the xe command.
Set the enable_uvm vGPU plugin parameter for the vGPU to 1 as explained in Setting vGPU Plugin Parameters on XenServer.
This example enables unified memory for the vGPU that has the UUID d15083f8-5c59-7474-d0cb-fbc3f7284f1b
.
[root@xenserver ~] xe vgpu-param-set uuid=d15083f8-5c59-7474-d0cb-fbc3f7284f1b extra_args='enable_uvm=1'
6.4.2. Enabling Unified Memory for a vGPU on Red Hat Enterprise Linux KVM
On Red Hat Enterprise Linux KVM, enable unified memory by setting the enable_uvm vGPU plugin parameter.
Ensure that the mdev
device file that represents the vGPU has been created as explained in Creating an NVIDIA vGPU on a Linux with KVM Hypervisor.
Perform this task for each vGPU that requires unified memory.
Set the enable_uvm vGPU plugin parameter for the mdev
device file that represents the vGPU to 1 as explained in Setting vGPU Plugin Parameters on a Linux with KVM Hypervisor.
6.4.3. Enabling Unified Memory for a vGPU on VMware vSphere
On VMware vSphere, enable unified memory by setting the pciPassthruvgpu-id.cfg.enable_uvm configuration parameter in advanced VM attributes.
Ensure that the VM to which the vGPU is assigned is powered off.
Perform this task in the vSphere Client for each vGPU that requires unified memory.
In advanced VM attributes, set the pciPassthruvgpu-id.cfg.enable_uvm vGPU plugin parameter for the vGPU to 1 as explained in Setting vGPU Plugin Parameters on VMware vSphere.
- vgpu-id
- A positive integer that identifies the vGPU assigned to a VM. For the first vGPU assigned to a VM, vgpu-id is 0. For example, if two vGPUs are assigned to a VM and you are enabling unified memory for both vGPUs, set pciPassthru0.cfg.enable_uvm and pciPassthru1.cfg.enable_uvm to 1.
6.5. Enabling NVIDIA CUDA Toolkit Development Tools for NVIDIA vGPU
By default, NVIDIA CUDA Toolkit development tools are disabled on NVIDIA vGPU. If used, you must enable NVIDIA CUDA Toolkit development tools individually for each VM that requires them by setting vGPU plugin parameters. One parameter must be set for enabling NVIDIA CUDA Toolkit debuggers and a different parameter must be set for enabling NVIDIA CUDA Toolkit profilers.
6.5.1. Enabling NVIDIA CUDA Toolkit Debuggers for NVIDIA vGPU
By default, NVIDIA CUDA Toolkit debuggers are disabled. If used, you must enable them for each vGPU VM that requires them by setting a vGPU plugin parameter. How to set the parameter to enable NVIDIA CUDA Toolkit debuggers for a vGPU VM depends on the hypervisor that you are using.
You can enable NVIDIA CUDA Toolkit debuggers for any number of VMs configured with vGPUs on the same GPU. When NVIDIA CUDA Toolkit debuggers are enabled for a VM, the VM cannot be migrated.
Perform this task for each VM for which you want to enable NVIDIA CUDA Toolkit debuggers.
Enabling NVIDIA CUDA Toolkit Debuggers for NVIDIA vGPU on XenServer
Set the enable_debugging vGPU plugin parameter for the vGPU that is assigned to the VM to 1 as explained in Setting vGPU Plugin Parameters on XenServer.
This example enables NVIDIA CUDA Toolkit debuggers for the vGPU that has the UUID d15083f8-5c59-7474-d0cb-fbc3f7284f1b
.
[root@xenserver ~] xe vgpu-param-set uuid=d15083f8-5c59-7474-d0cb-fbc3f7284f1b extra_args='enable_debugging=1'
The setting of this parameter is preserved after a guest VM is restarted and after the hypervisor host is restarted.
Enabling NVIDIA CUDA Toolkit Debuggers for NVIDIA vGPU on Red Hat Enterprise Linux KVM
Set the enable_debugging vGPU plugin parameter for the mdev
device file that represents the vGPU that is assigned to the VM to 1 as explained in Setting vGPU Plugin Parameters on a Linux with KVM Hypervisor.
The setting of this parameter is preserved after a guest VM is restarted. However, this parameter is reset to its default value after the hypervisor host is restarted.
Enabling NVIDIA CUDA Toolkit Debuggers for NVIDIA vGPU on on VMware vSphere
Ensure that the VM for which you want to enable NVIDIA CUDA Toolkit debuggers is powered off.
In advanced VM attributes, set the pciPassthruvgpu-id.cfg.enable_debugging vGPU plugin parameter for the vGPU that is assigned to the VM to 1 as explained in Setting vGPU Plugin Parameters on VMware vSphere.
- vgpu-id
- A positive integer that identifies the vGPU assigned to the VM. For the first vGPU assigned to a VM, vgpu-id is 0. For example, if two vGPUs are assigned to a VM and you are enabling debuggers for both vGPUs, set pciPassthru0.cfg.enable_debugging and pciPassthru1.cfg.enable_debugging to 1.
The setting of this parameter is preserved after a guest VM is restarted. However, this parameter is reset to its default value after the hypervisor host is restarted.
6.5.2. Enabling NVIDIA CUDA Toolkit Profilers for NVIDIA vGPU
By default, only GPU workload trace is enabled. If you want to use all NVIDIA CUDA Toolkit profiler features that NVIDIA vGPU supports, you must enable them for each vGPU VM that requires them.
Enabling profiling for a VM gives the VM access to the GPU’s global performance counters, which may include activity from other VMs executing on the same GPU. Enabling profiling for a VM also allows the VM to lock clocks on the GPU, which impacts all other VMs executing on the same GPU.
6.5.2.1. Supported NVIDIA CUDA Toolkit Profiler Features
You can enable the following NVIDIA CUDA Toolkit profiler features for a vGPU VM:
- NVIDIA Nsight™ Compute
- NVIDIA Nsight Systems
- CUDA Profiling Tools Interface (CUPTI)
6.5.2.2. Clock Management for a vGPU VM for Which NVIDIA CUDA Toolkit Profilers Are Enabled
Clocks are not locked for periodic sampling use cases such as NVIDIA Nsight Systems profiling.
Clocks are locked for multipass profiling such as:
- NVIDIA Nsight Compute kernel profiling
- CUPTI range profiling
Clocks are locked automatically when profiling starts and are unlocked automatically when profiling ends.
6.5.2.3. Limitations on the Use of NVIDIA CUDA Toolkit Profilers with NVIDIA vGPU
The following limitations apply when NVIDIA CUDA Toolkit profilers are enabled for NVIDIA vGPU:
- NVIDIA CUDA Toolkit profilers can be used on only one VM at a time.
- Multiple CUDA contexts cannot be profiled simultaneously.
- Profiling data is collected separately for each context.
- A VM for which NVIDIA CUDA Toolkit profilers are enabled cannot be migrated.
Because NVIDIA CUDA Toolkit profilers can be used on only one VM at a time, you should enable them for only one VM assigned a vGPU on a GPU. However, NVIDIA vGPU software cannot enforce this requirement. If NVIDIA CUDA Toolkit profilers are enabled on more than one VM assigned a vGPU on a GPU, profiling data is collected only for the first VM to start the profiler.
6.5.2.4. Enabling NVIDIA CUDA Toolkit Profilers for a vGPU VM
You enable NVIDIA CUDA Toolkit profilers for a vGPU VM by setting a vGPU plugin parameter. How to set the parameter to enable NVIDIA CUDA Toolkit profilers for a vGPU VM depends on the hypervisor that you are using.
Perform this task for the VM for which you want to enable NVIDIA CUDA Toolkit profilers.
Enabling NVIDIA CUDA Toolkit Profilers for NVIDIA vGPU on XenServer
Set the enable_profiling vGPU plugin parameter for the vGPU that is assigned to the VM to 1 as explained in Setting vGPU Plugin Parameters on XenServer.
This example enables NVIDIA CUDA Toolkit profilers for the vGPU that has the UUID d15083f8-5c59-7474-d0cb-fbc3f7284f1b
.
[root@xenserver ~] xe vgpu-param-set uuid=d15083f8-5c59-7474-d0cb-fbc3f7284f1b extra_args='enable_profiling=1'
The setting of this parameter is preserved after a guest VM is restarted and after the hypervisor host is restarted.
Enabling NVIDIA CUDA Toolkit Profilers for NVIDIA vGPU on Red Hat Enterprise Linux KVM
Set the enable_profiling vGPU plugin parameter for the mdev
device file that represents the vGPU that is assigned to the VM to 1 as explained in Setting vGPU Plugin Parameters on a Linux with KVM Hypervisor.
The setting of this parameter is preserved after a guest VM is restarted. However, this parameter is reset to its default value after the hypervisor host is restarted.
Enabling NVIDIA CUDA Toolkit Profilers for NVIDIA vGPU on on VMware vSphere
Ensure that the VM for which you want to enable NVIDIA CUDA Toolkit profilers is powered off.
In advanced VM attributes, set the pciPassthruvgpu-id.cfg.enable_profiling vGPU plugin parameter for the vGPU that is assigned to the VM to 1 as explained in Setting vGPU Plugin Parameters on VMware vSphere.
- vgpu-id
- A positive integer that identifies the vGPU assigned to the VM. For the first vGPU assigned to a VM, vgpu-id is 0. For example, if two vGPUs are assigned to a VM and you are enabling profilers for the second vGPU, set pciPassthru1.cfg.enable_profiling to 1.
The setting of this parameter is preserved after a guest VM is restarted. However, this parameter is reset to its default value after the hypervisor host is restarted.
6.6. Enabling the TCC Driver Model for a vGPU
The Tesla Compute Cluster (TCC) driver model supports CUDA C/C++ applications. This model is optimized for compute applications and reduces kernel launch times on Windows. By default, the driver model of a vGPU that is assigned to a Windows VM is Windows Display Driver Model (WDDM). If you want to use the TCC driver model, you must enable it explicitly.
This task requires administrator privileges.
Perform this task from the VM to which the vGPU is assigned.
Only Q-series vGPUs support the TCC driver model.
- Log on to the VM to which the vGPU is assigned.
- Set the driver model of the vGPU to the TCC driver model.
nvidia-smi -g vgpu-id -dm 1
- vgpu-id
- The ID of the vGPU for which you want to enable the TCC driver model. If the -g is omitted, the TCC driver model is enabled for all vGPUs that are assigned to the VM.
- Reboot the VM.
NVIDIA vGPU software enables you to monitor the performance of physical GPUs and virtual GPUs from the hypervisor and from within individual guest VMs. You can use several tools for monitoring GPU performance:
- From any supported hypervisor, and from a guest VM that is running a 64-bit edition of Windows or Linux, you can use NVIDIA System Management Interface, nvidia-smi.
- From XenServer, you can use Citrix XenCenter.
- From a Windows guest VM, you can use these tools:
- Windows Performance Monitor
- Windows Management Instrumentation (WMI)
7.1. NVIDIA System Management Interface nvidia-smi
NVIDIA System Management Interface, nvidia-smi, is a command-line tool that reports management information for NVIDIA GPUs.
The nvidia-smi tool is included in the following packages:
- NVIDIA Virtual GPU Manager package for each supported hypervisor
- NVIDIA driver package for each supported guest OS
The scope of the reported management information depends on where you run nvidia-smi from:
-
From a hypervisor command shell, such as the XenServer dom0 shell or VMware ESXi host shell, nvidia-smi reports management information for NVIDIA physical GPUs and virtual GPUs present in the system.
Note:When run from a hypervisor command shell, nvidia-smi will not list any GPU that is currently allocated for GPU pass-through.
-
From a guest VM, nvidia-smi retrieves usage statistics for vGPUs or pass-through GPUs that are assigned to the VM.
In a Windows guest VM, nvidia-smi is installed in a folder that is in the default executable path. Therefore, you can run nvidia-smi from a command prompt from any folder by running the nvidia-smi.exe command.
7.2. Monitoring GPU Performance from a Hypervisor
You can monitor GPU performance from any supported hypervisor by using the NVIDIA System Management Interface nvidia-smi command-line utility. On XenServer platforms, you can also use Citrix XenCenter to monitor GPU performance.
You cannot monitor from the hypervisor the performance of GPUs that are being used for GPU pass-through. You can monitor the performance of pass-through GPUs only from within the guest VM that is using them.
7.2.1. Using nvidia-smi to Monitor GPU Performance from a Hypervisor
You can get management information for the NVIDIA physical GPUs and virtual GPUs present in the system by running nvidia-smi from a hypervisor command shell such as the XenServer dom0 shell or the VMware ESXi host shell.
Without a subcommand, nvidia-smi provides management information for physical GPUs. To examine virtual GPUs in more detail, use nvidia-smi with the vgpu subcommand.
From the command line, you can get help information about the nvidia-smi tool and the vgpu subcommand.
Help Information | Command |
---|---|
A list of subcommands supported by the nvidia-smi tool. Note that not all subcommands apply to GPUs that support NVIDIA vGPU software. | nvidia-smi -h |
A list of all options supported by the vgpu subcommand. | nvidia-smi vgpu –h |
7.2.1.1. Getting a Summary of all Physical GPUs in the System
To get a summary of all physical GPUs in the system, along with PCI bus IDs, power state, temperature, current memory usage, and so on, run nvidia-smi without additional arguments.
Each vGPU instance is reported in the Compute processes section, together with its physical GPU index and the amount of frame-buffer memory assigned to it.
In the example that follows, three vGPUs are running in the system: One vGPU is running on each of the physical GPUs 0, 1, and 2.
[root@vgpu ~]# nvidia-smi
Fri Oct 25 09:26:18 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.06 Driver Version: 550.127.06 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:83:00.0 Off | Off |
| N/A 31C P8 23W / 150W | 1889MiB / 8191MiB | 7% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 0000:84:00.0 Off | Off |
| N/A 26C P8 23W / 150W | 926MiB / 8191MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M10 On | 0000:8A:00.0 Off | N/A |
| N/A 23C P8 10W / 53W | 1882MiB / 8191MiB | 12% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M10 On | 0000:8B:00.0 Off | N/A |
| N/A 26C P8 10W / 53W | 10MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla M10 On | 0000:8C:00.0 Off | N/A |
| N/A 34C P8 10W / 53W | 10MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla M10 On | 0000:8D:00.0 Off | N/A |
| N/A 32C P8 10W / 53W | 10MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 11924 C+G /usr/lib64/xen/bin/vgpu 1856MiB |
| 1 11903 C+G /usr/lib64/xen/bin/vgpu 896MiB |
| 2 11908 C+G /usr/lib64/xen/bin/vgpu 1856MiB |
+-----------------------------------------------------------------------------+
[root@vgpu ~]#
7.2.1.2. Getting a Summary of all vGPUs in the System
To get a summary of the vGPUs currently that are currently running on each physical GPU in the system, run nvidia-smi vgpu without additional arguments.
[root@vgpu ~]# nvidia-smi vgpu
Fri Oct 25 09:27:06 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.06 Driver Version: 550.127.06 |
|-------------------------------+--------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|===============================+================================+============|
| 0 Tesla M60 | 0000:83:00.0 | 7% |
| 11924 GRID M60-2Q | 3 Win7-64 GRID test 2 | 6% |
+-------------------------------+--------------------------------+------------+
| 1 Tesla M60 | 0000:84:00.0 | 9% |
| 11903 GRID M60-1B | 1 Win8.1-64 GRID test 3 | 8% |
+-------------------------------+--------------------------------+------------+
| 2 Tesla M10 | 0000:8A:00.0 | 12% |
| 11908 GRID M10-2Q | 2 Win7-64 GRID test 1 | 10% |
+-------------------------------+--------------------------------+------------+
| 3 Tesla M10 | 0000:8B:00.0 | 0% |
+-------------------------------+--------------------------------+------------+
| 4 Tesla M10 | 0000:8C:00.0 | 0% |
+-------------------------------+--------------------------------+------------+
| 5 Tesla M10 | 0000:8D:00.0 | 0% |
+-------------------------------+--------------------------------+------------+
[root@vgpu ~]#
7.2.1.3. Getting Physical GPU Details
To get detailed information about all the physical GPUs on the platform, run nvidia-smi with the –q or --query option.
[root@vgpu ~]# nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Tue Nov 22 10:33:26 2022
Driver Version : 525.60.06
CUDA Version : Not Found
vGPU Driver Capability
Heterogenous Multi-vGPU : Supported
Attached GPUs : 3
GPU 00000000:C1:00.0
Product Name : Tesla T4
Product Brand : NVIDIA
Product Architecture : Turing
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
vGPU Device Capability
Fractional Multi-vGPU : Supported
Heterogeneous Time-Slice Profiles : Supported
Heterogeneous Time-Slice Sizes : Not Supported
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1321120031291
GPU UUID : GPU-9084c1b2-624f-2267-4b66-345583fbd981
Minor Number : 1
VBIOS Version : 90.04.38.00.03
MultiGPU Board : No
Board ID : 0xc100
Board Part Number : 900-2G183-0000-001
GPU Part Number : 1EB8-895-A1
Module ID : 0
Inforom Version
Image Version : G183.0200.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : Non SR-IOV
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0xC1
Device : 0x00
Domain : 0x0000
Device Id : 0x1EB810DE
Bus Id : 00000000:C1:00.0
Sub System Id : 0x12A210DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Device Current : 1
Device Max : 3
Host Max : N/A
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 15360 MiB
Reserved : 0 MiB
Used : 3859 MiB
Free : 11500 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 17 MiB
Free : 239 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 35 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 16.57 W
Power Limit : 70.00 W
Default Power Limit : 70.00 W
Enforced Power Limit : 70.00 W
Min Power Limit : 60.00 W
Max Power Limit : 70.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Default Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1590 MHz
SM : 1590 MHz
Memory : 5001 MHz
Video : 1470 MHz
Max Customer Boost Clocks
Graphics : 1590 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2103065
Type : C+G
Name : Win11SV2_View87
Used GPU Memory : 3810 MiB
[root@vgpu ~]#
7.2.1.4. Getting vGPU Details
To get detailed information about all the vGPUs on the platform, run nvidia-smi vgpu with the –q or --query option.
To limit the information retrieved to a subset of the GPUs on the platform, use the –i or --id option to select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -q -i 1
GPU 00000000:C1:00.0
Active vGPUs : 1
vGPU ID : 3251634327
VM ID : 2103066
VM Name : Win11SV2_View87
vGPU Name : GRID T4-4Q
vGPU Type : 232
vGPU UUID : afdcf724-1dd2-11b2-8534-624f22674b66
Guest Driver Version : 527.15
License Status : Licensed (Expiry: 2022-11-23 5:2:12 GMT)
GPU Instance ID : N/A
Accounting Mode : Disabled
ECC Mode : Enabled
Accounting Buffer Size : 4000
Frame Rate Limit : 60 FPS
PCI
Bus Id : 00000000:02:04.0
FB Memory Usage
Total : 4096 MiB
Used : 641 MiB
Free : 3455 MiB
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
[root@vgpu ~]#
7.2.1.5. Monitoring vGPU engine usage
To monitor vGPU engine usage across multiple vGPUs, run nvidia-smi vgpu with the –u or --utilization option.
For each vGPU, the usage statistics in the following table are reported once every second. The table also shows the name of the column in the command output under which each statistic is reported.
Statistic | Column |
---|---|
3D/Compute | sm |
Memory controller bandwidth | mem |
Video encoder | enc |
Video decoder | dec |
Each reported percentage is the percentage of the physical GPU’s capacity that a vGPU is using. For example, a vGPU that uses 20% of the GPU’s graphics engine’s capacity will report 20%.
To modify the reporting frequency, use the –l or --loop option.
To limit monitoring to a subset of the GPUs on the platform, use the –i or --id option to select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -u
# gpu vgpu sm mem enc dec
# Idx Id % % % %
0 11924 6 3 0 0
1 11903 8 3 0 0
2 11908 10 4 0 0
3 - - - - -
4 - - - - -
5 - - - - -
0 11924 6 3 0 0
1 11903 9 3 0 0
2 11908 10 4 0 0
3 - - - - -
4 - - - - -
5 - - - - -
0 11924 6 3 0 0
1 11903 8 3 0 0
2 11908 10 4 0 0
3 - - - - -
4 - - - - -
5 - - - - -
^C[root@vgpu ~]#
7.2.1.6. Monitoring vGPU engine usage by applications
To monitor vGPU engine usage by applications across multiple vGPUs, run nvidia-smi vgpu with the –p option.
For each application on each vGPU, the usage statistics in the following table are reported once every second. Each application is identified by its process ID and process name. The table also shows the name of the column in the command output under which each statistic is reported.
Statistic | Column |
---|---|
3D/Compute | sm |
Memory controller bandwidth | mem |
Video encoder | enc |
Video decoder | dec |
Each reported percentage is the percentage of the physical GPU’s capacity used by an application running on a vGPU that resides on the physical GPU. For example, an application that uses 20% of the GPU’s graphics engine’s capacity will report 20%.
To modify the reporting frequency, use the –l or --loop option.
To limit monitoring to a subset of the GPUs on the platform, use the –i or --id option to select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -p
# GPU vGPU process process sm mem enc dec
# Idx Id Id name % % % %
0 38127 1528 dwm.exe 0 0 0 0
1 37408 4232 DolphinVS.exe 32 25 0 0
1 257869 4432 FurMark.exe 16 12 0 0
1 257969 4552 FurMark.exe 48 37 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 37408 4232 DolphinVS.exe 16 12 0 0
1 257911 656 DolphinVS.exe 32 24 0 0
1 257969 4552 FurMark.exe 48 37 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 257869 4432 FurMark.exe 38 30 0 0
1 257911 656 DolphinVS.exe 19 14 0 0
1 257969 4552 FurMark.exe 38 30 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 257848 3220 Balls64.exe 16 12 0 0
1 257869 4432 FurMark.exe 16 12 0 0
1 257911 656 DolphinVS.exe 16 12 0 0
1 257969 4552 FurMark.exe 48 37 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 257911 656 DolphinVS.exe 32 25 0 0
1 257969 4552 FurMark.exe 64 50 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 37408 4232 DolphinVS.exe 16 12 0 0
1 257911 656 DolphinVS.exe 16 12 0 0
1 257969 4552 FurMark.exe 64 49 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 37408 4232 DolphinVS.exe 16 12 0 0
1 257869 4432 FurMark.exe 16 12 0 0
1 257969 4552 FurMark.exe 64 49 0 0
[root@vgpu ~]#
7.2.1.7. Monitoring Encoder Sessions
Encoder sessions can be monitored only for vGPUs assigned to Windows VMs. No encoder session statistics are reported for vGPUs assigned to Linux VMs.
To monitor the encoder sessions for processes running on multiple vGPUs, run nvidia-smi vgpu with the –es or --encodersessions option.
For each encoder session, the following statistics are reported once every second:
- GPU ID
- vGPU ID
- Encoder session ID
- PID of the process in the VM that created the encoder session
- Codec type, for example, H.264 or H.265
- Encode horizontal resolution
- Encode vertical resolution
- One-second trailing average encoded FPS
- One-second trailing average encode latency in microseconds
To modify the reporting frequency, use the –l or --loop option.
To limit monitoring to a subset of the GPUs on the platform, use the –i or --id option to select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -es
# GPU vGPU Session Process Codec H V Average Average
# Idx Id Id Id Type Res Res FPS Latency(us)
1 21211 2 2308 H.264 1920 1080 424 1977
1 21206 3 2424 H.264 1920 1080 0 0
1 22011 1 3676 H.264 1920 1080 374 1589
1 21211 2 2308 H.264 1920 1080 360 807
1 21206 3 2424 H.264 1920 1080 325 1474
1 22011 1 3676 H.264 1920 1080 313 1005
1 21211 2 2308 H.264 1920 1080 329 1732
1 21206 3 2424 H.264 1920 1080 352 1415
1 22011 1 3676 H.264 1920 1080 434 1894
1 21211 2 2308 H.264 1920 1080 362 1818
1 21206 3 2424 H.264 1920 1080 296 1072
1 22011 1 3676 H.264 1920 1080 416 1994
1 21211 2 2308 H.264 1920 1080 444 1912
1 21206 3 2424 H.264 1920 1080 330 1261
1 22011 1 3676 H.264 1920 1080 436 1644
1 21211 2 2308 H.264 1920 1080 344 1500
1 21206 3 2424 H.264 1920 1080 393 1727
1 22011 1 3676 H.264 1920 1080 364 1945
1 21211 2 2308 H.264 1920 1080 555 1653
1 21206 3 2424 H.264 1920 1080 295 925
1 22011 1 3676 H.264 1920 1080 372 1869
1 21211 2 2308 H.264 1920 1080 326 2206
1 21206 3 2424 H.264 1920 1080 318 1366
1 22011 1 3676 H.264 1920 1080 464 2015
1 21211 2 2308 H.264 1920 1080 305 1167
1 21206 3 2424 H.264 1920 1080 445 1892
1 22011 1 3676 H.264 1920 1080 361 906
1 21211 2 2308 H.264 1920 1080 353 1436
1 21206 3 2424 H.264 1920 1080 354 1798
1 22011 1 3676 H.264 1920 1080 373 1310
^C[root@vgpu ~]#
7.2.1.8. Monitoring Frame Buffer Capture (FBC) Sessions
To monitor the FBC sessions for processes running on multiple vGPUs, run nvidia-smi vgpu with the -fs or --fbcsessions option.
For each FBC session, the following statistics are reported once every second:
- GPU ID
- vGPU ID
- FBC session ID
- PID of the process in the VM that created the FBC session
- Display ordinal associated with the FBC session.
- FBC session type
- FBC session flags
- Capture mode
- Maximum horizontal resolution supported by the session
- Maximum vertical resolution supported by the session
- Horizontal resolution requested by the caller in the capture call
- Vertical resolution requested by the caller in the capture call
- Moving average of new frames captured per second by the session
- Moving average new frame capture latency in microseconds for the session
To modify the reporting frequency, use the –l or --loop option.
To limit monitoring to a subset of the GPUs on the platform, use the –i or --id option to select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -fs
# GPU vGPU Session Process Display Session Diff. Map Class. Map Capture Max H Max V H V Average Average
# Idx Id Id Id Ordinal Type State State Mode Res Res Res Res FPS Latency(us)
0 - - - - - - - - - - - - - -
1 3251634178 - - - - - - - - - - - - -
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 - - - - - - - - - - - - -
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 - - - - - - - - - - - - -
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 - - - - - - - - - - - - -
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 - - - - - - - - - - - - -
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
# GPU vGPU Session Process Display Session Diff. Map Class. Map Capture Max H Max V H V Average Average
# Idx Id Id Id Ordinal Type State State Mode Res Res Res Res FPS Latency(us)
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Unknown 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 1600 900 25 39964
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 1600 900 25 39964
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
# GPU vGPU Session Process Display Session Diff. Map Class. Map Capture Max H Max V H V Average Average
# Idx Id Id Id Ordinal Type State State Mode Res Res Res Res FPS Latency(us)
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 1600 900 135 7400
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 1600 900 227 4403
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 1600 900 227 4403
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
# GPU vGPU Session Process Display Session Diff. Map Class. Map Capture Max H Max V H V Average Average
# Idx Id Id Id Ordinal Type State State Mode Res Res Res Res FPS Latency(us)
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
0 - - - - - - - - - - - - - -
1 3251634178 1 3984 0 ToSys Disabled Disabled Blocking 4096 2160 0 0 0 0
2 - - - - - - - - - - - - - -
^C[root@vgpu ~]#
7.2.1.9. Listing Supported vGPU Types
To list the virtual GPU types that the GPUs in the system support, run nvidia-smi vgpu with the –s or --supported option.
To limit the retrieved information to a subset of the GPUs on the platform, use the –i or --id option to select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -s -i 0
GPU 0000:83:00.0
GRID M60-0B
GRID M60-0Q
GRID M60-1A
GRID M60-1B
GRID M60-1Q
GRID M60-2A
GRID M60-2Q
GRID M60-4A
GRID M60-4Q
GRID M60-8A
GRID M60-8Q
[root@vgpu ~]#
To view detailed information about the supported vGPU types, add the –v or --verbose option:
[root@vgpu ~]# nvidia-smi vgpu -s -i 0 -v | less
GPU 00000000:40:00.0
vGPU Type ID : 0xc
Name : GRID M60-0Q
Class : Quadro
GPU Instance Profile ID : N/A
Max Instances : 16
Max Instances Per VM : 1
Multi vGPU Exclusive : False
vGPU Exclusive Type : False
vGPU Exclusive Size : False
Device ID : 0x13f210de
Sub System ID : 0x13f2114c
FB Memory : 512 MiB
Display Heads : 2
Maximum X Resolution : 2560
Maximum Y Resolution : 1600
Frame Rate Limit : 60 FPS
GRID License : Quadro-Virtual-DWS,5.0;GRID-Virtual-WS,2.0;GRID-Virtual-WS-Ext,2.0
vGPU Type ID : 0xf
Name : GRID M60-1Q
Class : Quadro
GPU Instance Profile ID : N/A
Max Instances : 8
Max Instances Per VM : 1
Multi vGPU Exclusive : False
vGPU Exclusive Type : False
vGPU Exclusive Size : False
Device ID : 0x13f210de
Sub System ID : 0x13f2114d
FB Memory : 1024 MiB
Display Heads : 4
Maximum X Resolution : 5120
Maximum Y Resolution : 2880
Frame Rate Limit : 60 FPS
GRID License : Quadro-Virtual-DWS,5.0;GRID-Virtual-WS,2.0;GRID-Virtual-WS-Ext,2.0
vGPU Type ID : 0x12
Name : GRID M60-2Q
Class : Quadro
GPU Instance Profile ID : N/A
Max Instances : 4
Max Instances Per VM : 1
Multi vGPU Exclusive : False
vGPU Exclusive Type : False
vGPU Exclusive Size : False
…
[root@vgpu ~]#
7.2.1.10. Listing the vGPU Types that Can Currently Be Created
To list the virtual GPU types that can currently be created on GPUs in the system, run nvidia-smi vgpu with the –c or --creatable option. This property is a dynamic property that reflects the number and type of vGPUs that are already running on the GPU.
- If no vGPUs are running on the GPU, all vGPU types that the GPU supports are listed.
- If one or more vGPUs are running on the GPU, but the GPU is not fully loaded, only the type of the vGPUs that are already running is listed.
- If the GPU is fully loaded, no vGPU types are listed.
To limit the retrieved information to a subset of the GPUs on the platform, use the –i or --id option to select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -c -i 0
GPU 0000:83:00.0
GRID M60-2Q
[root@vgpu ~]#
To view detailed information about the vGPU types that can currently be created, add the –v or --verbose option.
7.2.2. Using Citrix XenCenter to monitor GPU performance
If you are using XenServer as your hypervisor, you can monitor GPU performance in XenCenter.
- Click on a server’s Performance tab.
- Right-click on the graph window, then select Actions and New Graph.
- Provide a name for the graph.
- In the list of available counter resources, select one or more GPU counters.
Counters are listed for each physical GPU not currently being used for GPU pass-through.
Figure 23. Using Citrix XenCenter to monitor GPU performance
7.3. Monitoring GPU Performance from a Guest VM
You can use monitoring tools within an individual guest VM to monitor the performance of vGPUs or pass-through GPUs that are assigned to the VM. The scope of these tools is limited to the guest VM within which you use them. You cannot use monitoring tools within an individual guest VM to monitor any other GPUs in the platform.
For a vGPU, only these metrics are reported in a guest VM:
- 3D/Compute
- Memory controller
- Video encoder
- Video decoder
- Frame buffer usage
Other metrics normally present in a GPU are not applicable to a vGPU and are reported as zero or N/A, depending on the tool that you are using.
7.3.1. Using nvidia-smi to Monitor GPU Performance from a Guest VM
In guest VMs, you can use the nvidia-smi command to retrieve statistics for the total usage by all applications running in the VM and usage by individual applications of the following resources:
- GPU
- Video encoder
- Video decoder
- Frame buffer
To use nvidia-smi to retrieve statistics for the total resource usage by all applications running in the VM, run the following command:
nvidia-smi dmon
The following example shows the result of running nvidia-smi dmon from within a Windows guest VM.
Figure 24. Using nvidia-smi from a Windows guest VM to get total resource usage by all applications
To use nvidia-smi to retrieve statistics for resource usage by individual applications running in the VM, run the following command:
nvidia-smi pmon
Figure 25. Using nvidia-smi from a Windows guest VM to get resource usage by individual applications
7.3.2. Using Windows Performance Counters to monitor GPU performance
In Windows VMs, GPU metrics are available as Windows Performance Counters through the NVIDIA GPU
object.
Any application that is enabled to read performance counters can access these metrics. You can access these metrics directly through the Windows Performance Monitor application that is included with the Windows OS.
The following example shows GPU metrics in the Performance Monitor application.
Figure 26. Using Windows Performance Monitor to monitor GPU performance
On vGPUs, the following GPU performance counters read as 0 because they are not applicable to vGPUs:
- % Bus Usage
- % Cooler rate
- Core Clock MHz
- Fan Speed
- Memory Clock MHz
- PCI-E current speed to GPU Mbps
- PCI-E current width to GPU
- PCI-E downstream width to GPU
- Power Consumption mW
- Temperature C
7.3.3. Using NVWMI to monitor GPU performance
In Windows VMs, Windows Management Instrumentation (WMI) exposes GPU metrics in the ROOT\CIMV2\NV
namespace through NVWMI. NVWMI is included with the NVIDIA driver package. The NVWMI API Reference in Windows Help format is available for download from the NVIDIA website.
Any WMI-enabled application can access these metrics. The following example shows GPU metrics in the third-party application WMI Explorer, which is available for download from the from the CodePlex WMI Explorer page.
Figure 27. Using WMI Explorer to monitor GPU performance
On vGPUs, some instance properties of the following classes do not apply to vGPUs:
- Gpu
- PcieLink
Gpu instance properties that do not apply to vGPUs
Gpu Instance Property | Value reported on vGPU |
---|---|
gpuCoreClockCurrent | -1 |
memoryClockCurrent | -1 |
pciDownstreamWidth | 0 |
pcieGpu.curGen | 0 |
pcieGpu.curSpeed | 0 |
pcieGpu.curWidth | 0 |
pcieGpu.maxGen | 1 |
pcieGpu.maxSpeed | 2500 |
pcieGpu.maxWidth | 0 |
power | -1 |
powerSampleCount | -1 |
powerSamplingPeriod | -1 |
verVBIOS.orderedValue | 0 |
verVBIOS.strValue | - |
verVBIOS.value | 0 |
PcieLink instance properties that do not apply to vGPUs
No instances of PcieLink are reported for vGPU.
NVIDIA GPUs based on the NVIDIA Maxwell™ graphic architecture implement a best effort vGPU scheduler that aims to balance performance across vGPUs. The best effort scheduler allows a vGPU to use GPU processing cycles that are not being used by other vGPUs. Under some circumstances, a VM running a graphics-intensive application may adversely affect the performance of graphics-light applications running in other VMs.
GPUs based on NVIDIA GPU architectures after the Maxwell architecture additionally support equal share and fixed share vGPU schedulers. These schedulers impose a limit on GPU processing cycles used by a vGPU, which prevents graphics-intensive applications running in one VM from affecting the performance of graphics-light applications running in other VMs. On GPUs that support multiple vGPU schedulers, you can select the vGPU scheduler to use. You can also set the length of the time slice for the equal share and fixed share vGPU schedulers.
If you use the equal share or fixed share vGPU scheduler, the frame-rate limiter (FRL) is disabled.
The best effort scheduler is the default scheduler for all supported GPU architectures.
If you are unsure of the NVIDIA GPU architecture of your GPU, consult the release notes for your hypervisor at NVIDIA Virtual GPU Software Documentation.