Example VDI Deployment Configurations

Application-specific sizing uses benchmark results and typical user configurations to answer three common questions:

Which NVIDIA Data Center GPU should I use for my business needs?
How do I select the correct profile(s) for my user types?
How many users can be supported per server (user density)?

User behavior varies and critically impacts the best GPU and profile size. Recommendations are provided for three user types, each with two levels of quality of service (QoS): Dedicated Performance and Typical Customer Deployment.

User Types:

Light Users: Low graphics requirements and smaller model sizes.
Medium Users: Moderate graphics requirements and medium model sizes.
Heavy Users: High graphics requirements and large data sets.

Recommendations for these user types within each QoS level along with server configurations are provided. These are guidelines. The most successful deployments start with a proof of concept (POC) and are continuously tuned.

If performance is the primary concern, it is recommended to use the fixed share scheduler and larger profile sizes, resulting in fewer users per server. Most deployments, however, use the best effort GPU scheduler policy for better GPU utilization, supporting more users per server and improving TCO per user. Keep the scheduling policy in mind when comparing options.

Note

For a detailed explanation of vGPU scheduling policies, please refer to the Understanding the GPU Scheduler section.

The table below summarizes findings for Typical Customer Deployment:

*Table 17 - Typical Customer Deployment*
Light User	Medium User	Heavy User
16-24 Users Per Server User VM Config: L40S-4Q 4vCPU 8-16GB RAM 1U or 2U Server Config	9-18 Users Per Server User VM Config: L40S-4Q, L40S-6Q, L40S-8Q 8vCPU 16-32GB RAM 1U or 2U Server Config	6-12 Users Per Server User VM Config: L40S-12Q, L40S-16Q, L40S-24Q 12vCPU+ 32-64GB RAM 1U or 2U Server Config

While NVIDIA recommends the L40S for RTX vWS deployments, the A16 is suitable for lightweight entry level virtual workstation use cases. For best performance, it is recommended to use a minimum 8GB profile when deploying virtual workstations on the NVIDIA A16. This configuration supports up to 2 users per GPU, and up to 8 users per A16 board.

The A16 provides additional flexibility with its 4 GPUs per board, each equipped with 16 GB of memory (64 GB total). This allows IT to deploy multiple profile sizes and vGPU software licenses on a single board. For example, one GPU on the A16 could support 2B profiles for vPC users, while another GPU on the same board could support 8Q profiles for RTX vWS users. It is highly recommended to conduct a POC to determine if the A16 meets the density and performance needs of your organization. For OEM considerations, please consult vGPU Certified Servers for more information.

By selecting the Best Effort GPU scheduler policy, the GPU compute engine can be oversubscribed, maximizing GPU usage during idle or low utilization periods. In many customer deployments, it is unlikely that all 12 users will be rendering simultaneously or to the extent replicated in dedicated performance testing. Therefore, selecting the Best Effort scheduler often results in a 2–3x oversubscription of the GPU compute engine, effectively supporting 2-3 times the number of users. The extent of higher scalability depends on users’ typical daily activities, such as the number of meetings, the length of breaks, multi-tasking, etc. It is recommended to test and validate the appropriate GPU scheduling policy to meet your users’ needs.

The recommended vGPU profiles are based on Dedicated Performance by first understanding the graphics performance of a workstation GPU (e.g., RTX 4000 Ada). The benchmark scores of the physical workstation card were then aligned with the scores achieved for the virtual GPU. The following table summarizes these findings:

*Table 18 - Reference Server Lab Builds For Dedicated Performance*
User Type	Equivalent Performance	Users per Server	vCPUs	vGPU Profile	vMemory	CPUs	GPUs	Memory	Storage Type	Detailed Server Specifications
Light	RTX A1000	18	4	L40S-8Q	8GB	Intel Xeon 6746E	3 x L40S	512GB	Flash-Based	112-cores, 2GHz (Turbo 2.7GHz), 128-512GB RAM, 10 GbE Network (min)
Medium	RTX 2000 Ada	12	8	L40S-12Q	16GB	Intel Xeon 6731E	3 x L40S	512GB	Flash-Based	96-cores, 2.2GHz (Turbo 3.1GHz), 512-768+GB RAM, 10 GbE Network (min)
Heavy	RTX 4000 Ada	6	12	L40S-24Q	32GB	Intel Xeon 6740E	3 x L40S	512GB	Flash-Based	96-cores, 2.4GHz (Turbo 3.2GHz), 512-768+GB RAM, 10 GbE Network (min)

The following example illustrates how different Quality of Service (QoS) thresholds can impact the number of users per server through the application of various GPU scheduling policies. By selecting the Fixed Share Scheduler, a specific QoS is always guaranteed. For instance, six users on an L40S will consistently experience performance similar to a workstation with an NVIDIA RTX A1000 GPU.

In contrast, the Best Effort Scheduler, the most common option for enterprises, does not guarantee the same level of QoS but can accommodate more users experiencing NVIDIA RTX A1000-level performance. However, user performance will fluctuate based on the load from other users on the same L40S at any given time. For example, a single user on an L40S will have performance similar to an NVIDIA RTX A2000. As the user density increases to 3-8 users per GPU, the performance can be comparable to a workstation with a Quadro P620 card.

This example assumes sufficient frame buffer at all scales to demonstrate how GPU scheduling policies can impact user density.

*Table 19 - Impact of GPU Scheduling Policies on User Density*
	Dedicated Performance (Fixed Share Scheduler)	Typical Customer Configuration (Best Effort Scheduler)
Users/Server Host (3x NVIDIA L40S)	18 (6 users per GPU with the performance of RTX A1000 at all times)	16-24 (3-8 users per GPU with the performance of P620-A2000)

For more on the GPU scheduling options and how to configure the server, refer to NVIDIA’s VMware or Citrix Hypervisor vGPU Deployment Guide.

The NVIDIA-specific and third-party industry tools mentioned in this guide were used to capture VM and server-level metrics to validate optimal performance and scalability based on benchmark data. It is highly recommended that you conduct a proof of concept (POC) for each deployment type. This will allow you to validate performance using objective measurements and gather subjective feedback from your end-users to ensure the deployment meets their needs effectively.

Previous Performance Analysis

Next Deployment Best Practices