Example VDI Deployment Configurations#

Application-specific sizing uses benchmark results and typical user configurations to answer three common questions:

Which NVIDIA Data Center GPU should I use for my business needs?

How do I select the correct profile(s) for my user types?

How many users can be supported per server (user density)?

User behavior varies and critically impacts the best GPU and profile size. Recommendations are provided for three user types, each with two levels of quality of service (QoS): Dedicated Performance and Typical Customer Deployment.

User Types:

Light Users : Low graphics requirements and smaller model sizes.

Medium Users : Moderate graphics requirements and medium model sizes.

Heavy Users: High graphics requirements and large data sets.

Recommendations for these user types within each QoS level along with server configurations are provided. These are guidelines. The most successful deployments start with a proof of concept (POC) and are continuously tuned.

If performance is the primary concern, it is recommended to use the fixed share scheduler and larger profile sizes, resulting in fewer users per server. Most deployments, however, use the best effort GPU scheduler policy for better GPU utilization, supporting more users per server and improving TCO per user. Keep the scheduling policy in mind when comparing options.

Note For a detailed explanation of vGPU scheduling policies, see vGPU Schedulers.

The table below summarizes findings for Typical Customer Deployment:

Table 9 Typical Customer Deployment # Light User Medium User Heavy User 16-24 Users Per Server User VM Config: L40S-4Q

4vCPU

8-16GB RAM

1U or 2U Server Config 9-18 Users Per Server User VM Config: L40S-4Q, L40S-6Q, L40S-8Q

8vCPU

16-32GB RAM

1U or 2U Server Config 6-12 Users Per Server User VM Config: L40S-12Q, L40S-16Q, L40S-24Q

12vCPU+

32-64GB RAM

1U or 2U Server Config

Note Information regarding the NVIDIA RTX PRO 6000 Blackwell Server Edition will be updated soon.

The NVIDIA A16 is suitable for lightweight entry level virtual workstation use cases. For best performance, it is recommended to use a minimum 8GB profile when deploying virtual workstations. On the A16, this configuration supports up to 2 users per GPU, and up to 8 users per A16 board.

The A16 provides additional flexibility with its 4 GPUs per board, each equipped with 16 GB of memory (64 GB total). This enables a VDI administrator to deploy multiple profile sizes and vGPU software licenses on a single board. For example, one GPU on the A16 could support 2B profiles for vPC users, while another GPU on the same board could support 8Q profiles for RTX vWS users. It is highly recommended to conduct a POC to determine if the A16 meets the density and performance needs of your organization. For OEM considerations, please consult vGPU Certified Servers for more information.

By selecting the best effort GPU scheduler policy, the GPU compute engine can be oversubscribed, maximizing GPU usage during idle or low utilization periods. In many customer deployments, it is unlikely that all users will be rendering simultaneously or to the extent replicated in dedicated performance testing. Therefore, selecting the best effort scheduler often results in a 2–3x oversubscription of the GPU compute engine, effectively supporting 2-3 times the number of users. The extent of higher scalability depends on users’ typical daily activities, such as the number of meetings, the length of breaks, multi-tasking, etc. It is recommended to test and validate the appropriate GPU scheduling policy to meet your users’ needs.

The recommended vGPU profiles are based on Dedicated Performance, meaning bare-metal workstation–equivalent performance, by first understanding the graphics performance of a workstation GPU. The benchmark scores of the physical workstation card were then aligned with the scores achieved for the virtual GPU. The following table summarizes these findings:

Table 10 Reference Server Lab Builds For Dedicated Performance # User Type Equivalent Performance Users per Server vCPUs vGPU Profile vMemory CPUs GPUs Memory Storage Type Detailed Server Specifications Light RTX A1000 18 4 L40S-8Q 8GB Intel Xeon 6746E 3 x L40S 512GB Flash-Based 112-cores, 2GHz (Turbo 2.7GHz), 128-512GB RAM, 10 GbE Network (min) Medium RTX 2000 Ada 12 8 L40S-12Q 16GB Intel Xeon 6731E 3 x L40S 512GB Flash-Based 96-cores, 2.2GHz (Turbo 3.1GHz), 512-768+GB RAM, 10 GbE Network (min) Heavy RTX 4000 Ada 6 12 L40S-24Q 32GB Intel Xeon 6740E 3 x L40S 512GB Flash-Based 96-cores, 2.4GHz (Turbo 3.2GHz), 512-768+GB RAM, 10 GbE Network (min)

Note Information regarding the NVIDIA RTX PRO 6000 Blackwell Server Edition will be updated soon.

The following example illustrates how different Quality of Service (QoS) thresholds can impact the number of users per server through the application of various GPU scheduling policies.

When using the fixed share scheduler, a specific QoS is always guaranteed. For instance, six users sharing an L40S GPU will consistently experience performance comparable to a workstation equipped with an NVIDIA RTX A1000 GPU.

In contrast, the best effort scheduler, the most commonly used option in enterprise environments, does not guarantee the same level of QoS but can support a higher user density at NVIDIA RTX A1000-class performance. However, user performance will fluctuate depending on the load from other users on the same L40S at any given time. For example, a single user on an L40S may experience performance similar to an NVIDIA RTX A2000, whereas at 3–8 users per GPU, the performance is typically comparable to a workstation with an RTX A1000 GPU.

This example assumes sufficient frame buffer capacity at all scales and is intended to demonstrate how GPU scheduling policies affect achievable user density.

Table 11 Impact of GPU Scheduling Policies on User Density # Dedicated Performance (Fixed Share Scheduler) Typical Customer Configuration (Best Effort Scheduler) Users/Server Host (3x NVIDIA L40S) 18 (6 users per GPU with the performance of RTX A1000 at all times) 16-24 (3-8 users per GPU with the performance of A1000-A2000)

The NVIDIA-specific and third-party industry tools mentioned in this guide were used to capture VM and server-level metrics to validate optimal performance and scalability based on benchmark data. It is highly recommended that you conduct a proof of concept (POC) for each deployment type. This will allow you to validate performance using objective measurements and gather subjective feedback from your end-users to ensure the deployment meets their needs effectively.