Sizing Methodology#
Before deploying NVIDIA virtual GPU (vGPU) technology, conducting a proof of concept (POC) is highly recommended. This initial phase allows you to gain insights into user workflows, assess GPU resource requirements, and gather feedback to optimize configuration settings for optimal performance and scalability. Benchmarking examples provided in subsequent sections of this guide offer valuable insights for sizing deployments.
User behavior varies significantly and plays a pivotal role in determining the appropriate GPU and vGPU profile sizes. Typically, recommendations are categorized into three user types: light, medium, and heavy, based on their workflow demands and data/model sizes. For instance, heavy users handle advanced graphics and larger datasets, while light and medium users require less intensive graphics and work with smaller models.
The following sections delve into methodologies and considerations for sizing deployments, ensuring alignment with user requirements and performance expectations.
vGPU Profiles#
NVIDIA vGPU software enables the partitioning or fractionalization of an NVIDIA data center GPU. These virtual GPU resources are allocated to virtual machines (VMs) via vGPU profiles in the hypervisor management console.
vGPU profiles determine the allocation of GPU frame buffer to VMs, significantly impacting total cost of ownership, scalability, stability, and performance in VDI environments.
Each vGPU profile features a specific frame buffer size, supports multiple display heads, and offers maximum resolutions. These profiles are categorized into different series, each optimized for various classes of workloads. A profile is a combination of a vGPU profile (such as A, B, Q) and a vGPU size (the amount of GPU memory in gigabytes). Further details and a list of available vGPU profiles across all license levels are provided in the table below.
vGPU Profiles |
Optimal Workload |
|---|---|
Q-profile [1] |
Virtual workstations for creative and technical professionals who require the performance and features of Quadro technology |
B-profile |
Virtual desktops for business professionals and knowledge workers |
A-profile |
App streaming or session-based solutions for virtual applications users |
Note
Avoid using 1A, 2A, and 4A vGPU profiles for vApps, as they are not suitable and may lead to misconfigurations.
For more information regarding vGPU profiles, please refer to the vGPU software user guide.
Choosing the appropriate vGPU profile for deployment is crucial as it dictates the number of vGPU-backed VMs that can be deployed.
Two types of deployment configurations are supported:
Homogeneous vGPU: A configuration where a physical GPU is fractionalized into vGPUs that have the same amounts of frame buffers. All vGPUs hosted on a physical GPU must have the same profile size (same frame buffer size), but are allowed to have different vGPU profiles (for example, 2Q & 2B can be hosted on the same physical GPU). Figure 5 illustrates some valid configurations for homogeneous vGPU on an L40S GPU.
Heterogeneous vGPU: A configuration that allows a physical GPU to support vGPUs with different vGPU profile sizes (different amounts of frame buffer) simultaneously. This configuration allows for more flexible and efficient use of GPU resources, as different VMs can have different GPU requirements. Figure 6 illustrates some valid configurations for heterogeneous vGPU on an L40S GPU. This feature was introduced in vGPU 17.0.
Example Homogeneous vGPU Configurations for NVIDIA L40S#
Figure 5 Example Homogeneous vGPU Configurations for NVIDIA L40S#
Example Heterogeneous vGPU Configurations for NVIDIA L40S#
Figure 6 Example Heterogeneous vGPU Configurations for NVIDIA L40S#
Heteroegeneous vGPU allows support of different vGPU profiles (A, B, and Q series) as well as different vGPU sizes on the same physical GPU. For example, an L4 GPU with heterogeneous vGPU can host an L4-8Q and L4-2B vGPU instances. However, the maximum number of vGPU instances of a given size that can be supported is the closest power-of-2 to the number of instances with homogeneous vGPU.
In the below example, we see that an L40S GPU with 48 GB of GPU memory can support:
6 instances of the L40S-8Q profile with homogeneous vGPU
4 instances of the L40S-8Q profile with heterogeneous vGPU
vGPU Profile |
GPU Memory (MB) |
Maximum vGPUs per GPU with Homogeneous vGPU |
Maximum vGPUs per GPU with Heterogeneous vGPU |
|---|---|---|---|
L40S-8Q |
8192 |
6 |
4 |
For more information, refer to Valid Time-Sliced Virtual GPU Configurations on a Single GPU.
The following diagram shows the supported placements for each size of vGPU on a GPU based on the Ada Lovelace GPU architecture with a total of 48 GB of frame buffer with heterogeneous vGPU:
Figure 7 vGPU Placements for Ada Lovelace GPUs with 48 GB Frame Buffer#
For more details, refer to vGPU Placements for GPUs.
Note
Multi-session desktops require careful consideration of GPU memory. We suggest selecting a large vGPU profile size based on the results of POC testing. Conducting POCs is crucial for identifying the appropriate vGPU profile size, addressing potential bottlenecks, and ensuring that the deployed solution meets the desired performance criteria.