Sizing Methodology#

Before deploying NVIDIA virtual GPU (vGPU) technology, conducting a proof of concept (POC) is highly recommended. This initial phase allows you to gain insights into user workflows, assess GPU resource requirements, and gather feedback to optimize configuration settings for optimal performance and scalability. Benchmarking examples provided in subsequent sections of this guide offer valuable insights for sizing deployments.

User behavior varies significantly and plays a pivotal role in determining the appropriate GPU and vGPU profile sizes. Typically, recommendations are categorized into three user types: light, medium, and heavy, based on their workflow demands and data/model sizes. For instance, heavy users handle advanced graphics and larger datasets, while light and medium users require less intensive graphics and work with smaller models.

The following sections delve into methodologies and considerations for sizing deployments, ensuring alignment with user requirements and performance expectations.

vGPU Profiles#

NVIDIA vGPU software enables the partitioning or fractionalization of an NVIDIA data center GPU. These virtual GPU resources are allocated to virtual machines (VMs) via vGPU profiles in the hypervisor management console.

vGPU profiles determine the allocation of GPU frame buffer to VMs, significantly impacting total cost of ownership, scalability, stability, and performance in VDI environments.

Each vGPU profile features a specific frame buffer size, supports multiple display heads, and offers maximum resolutions. These profiles are categorized into different series, each optimized for various classes of workloads. A profile is a combination of a vGPU profile (such as A, B, Q) and a vGPU size (the amount of GPU memory in gigabytes). Further details and a list of available vGPU profiles across all license levels are provided in the table below.

Table 5 NVIDIA vGPU Profiles#

vGPU Profiles

Optimal Workload

Q-profile [1]

Virtual workstations for creative and technical professionals who require the performance and features of Quadro technology

B-profile

Virtual desktops for business professionals and knowledge workers

A-profile

App streaming or session-based solutions for virtual applications users

Note

Avoid using 1A, 2A, and 4A vGPU profiles for vApps, as they are not suitable and may lead to misconfigurations.

For more information regarding vGPU profiles, please refer to the vGPU software user guide.

Choosing the appropriate vGPU profile for deployment is crucial as it dictates the number of vGPU-backed VMs that can be deployed.

Two types of deployment configurations are supported:

  • Homogeneous vGPU: A configuration where a physical GPU is fractionalized into vGPUs that have the same amounts of frame buffers. When MIG is disabled, all vGPUs hosted on a physical GPU must have the same profile size (same frame buffer size), but are allowed to have different vGPU profiles (for example, 2Q & 2B can be hosted on the same physical GPU). Example Homogeneous vGPU Configuration for NVIDIA RTX PRO 6000 Blackwell Server Edition illustrates some valid configurations for homogeneous vGPU on an RTX PRO 6000 Blackwell Server Edition GPU.

  • Heterogeneous vGPU: A configuration that allows a physical GPU to support vGPUs with different vGPU profile sizes (different amounts of frame buffer) simultaneously. This configuration allows for more flexible and efficient use of GPU resources, as different VMs can have different GPU requirements. Example Heterogeneous vGPU Configuration for NVIDIA RTX PRO 6000 Blackwell Server Edition illustrates some valid configurations for heterogeneous vGPU on an an RTX PRO 6000 Blackwell Server Edition GPU. This feature was introduced in vGPU 17.0.

When MIG is enabled on supported Blackwell GPUs, each MIG slice can be configured independently in homogeneous or heterogeneous mode. See Figure 6 for an example configuration.

Example Homogeneous vGPU Configuration for NVIDIA RTX PRO 6000 Blackwell Server Edition#

_images/vgpu-007.png

Figure 5 Example Homogeneous vGPU Configurations for NVIDIA RTX PRO 6000 Blackwell Server Edition#

Figure 5 shows an example configuration in which MIG is not enabled, so the entire NVIDIA RTX PRO 6000 Blackwell Server Edition GPU is configured in homogeneous mode. In this configuration, the GPU hosts four DC-24Q vGPU profiles.

Example Heterogeneous vGPU Configuration for NVIDIA RTX PRO 6000 Blackwell Server Edition#

_images/vgpu-008.png

Figure 6 Example Heterogeneous vGPU Configurations for NVIDIA RTX PRO 6000 Blackwell Server Edition#

Figure 6 shows an example heterogeneous configuration in which MIG is not enabled, so the entire NVIDIA RTX PRO 6000 Blackwell Server Edition GPU is configured in heterogeneous mode. In this configuration, a single GPU hosts four DC-12Q vGPU profiles and two DC-24Q vGPU profiles.

Example MIG-Backed vGPU Configuration Showing Homogeneous and Heterogeneous Modes per MIG Slice on NVIDIA RTX PRO 6000 Blackwell Server Edition#

_images/vgpu-019.png

Figure 7 Example MIG-Backed vGPU Configuration Showing Homogeneous and Heterogeneous Modes per MIG Slice on NVIDIA RTX PRO 6000 Blackwell Server Edition#

Figure 7 shows a mixed MIG-backed vGPU configuration in which four MIG 1g.24gb+gfx slices on a single NVIDIA RTX PRO 6000 Blackwell Server Edition GPU host different profile combinations.

  • MIG Slice 0 is configured in homogeneous mode and hosts four DC-1-6Q profiles.

  • MIG Slice 1 is configured in homogeneous mode and hosts two DC-1-12Q profiles.

  • MIG Slice 2 is configured in heterogeneous mode and hosts one DC-1-12Q profile and one DC-1-8Q profile.

  • MIG Slice 3 is configured in heterogeneous mode and hosts one DC-1-12Q profile and three DC-1-4Q profiles.

Heterogeneous vGPU allows support of different vGPU profiles (A, B, and Q series) as well as different vGPU sizes on the same physical GPU. For example, an L4 GPU with heterogeneous vGPU can host an L4-8Q and L4-2B vGPU instances. However, the maximum number of vGPU instances of a given size that can be supported is the closest power-of-2 to the number of instances with homogeneous vGPU.

In the below example, we see that an L40S GPU with 48 GB of GPU memory can support:

  • 6 instances of the L40S-8Q profile with homogeneous vGPU

  • 4 instances of the L40S-8Q profile with heterogeneous vGPU

Table 6 L40S-8Q vGPU Profile#

vGPU Profile

GPU Memory (MB)

Maximum vGPUs per GPU with Homogeneous vGPU

Maximum vGPUs per GPU with Heterogeneous vGPU

L40S-8Q

8192

6

4

For more information, refer to Valid Time-Sliced Virtual GPU Configurations on a Single GPU.

The following diagram shows the supported placements for each size of vGPU on a GPU based on the Ada Lovelace GPU architecture with a total of 48 GB of frame buffer with heterogeneous vGPU:

_images/vgpu-009.png

Figure 8 vGPU Placements for Ada Lovelace GPUs with 48 GB Frame Buffer#

For more details, refer to vGPU Placements for GPUs.

Note

Multi-session desktops require careful consideration of GPU memory. We suggest selecting a large vGPU profile size based on the results of POC testing. Conducting POCs is crucial for identifying the appropriate vGPU profile size, addressing potential bottlenecks, and ensuring that the deployed solution meets the desired performance criteria.