NVIDIA Docs Hub Homepage NVIDIA Virtual GPU (vGPU) Software NVIDIA RTX vWS: Sizing and GPU Selection Guide for Virtualized Workloads Example VDI Deployment Configurations

Example VDI Deployment Configurations

Application-specific sizing uses benchmark results and typical user configurations to answer three common questions:

Which NVIDIA Data Center GPU should I use for my business needs?
How do I select the correct profile(s) for my user types?
How many users can be supported per server (user density)?

User behavior varies and critically impacts the best GPU and profile size. Recommendations are provided for three user types, each with two levels of quality of service (QoS): Dedicated Performance and Typical Customer Deployment.

User Types:

Light Users: Low graphics requirements and smaller model sizes.
Medium Users: Moderate graphics requirements and medium model sizes.
Heavy Users: High graphics requirements and large data sets.

Recommendations for these user types within each QoS level along with server configurations are provided. These are guidelines. The most successful deployments start with a proof of concept (POC) and are continuously tuned.

If performance is the primary concern, it is recommended to use the fixed share scheduler and larger profile sizes, resulting in fewer users per server. Most deployments, however, use the best effort GPU scheduler policy for better GPU utilization, supporting more users per server and improving TCO per user. Keep the scheduling policy in mind when comparing options.

Note

For a detailed explanation of vGPU scheduling policies, see vGPU Schedulers.

The table below summarizes findings for Typical Customer Deployment:

Table 9 Typical Customer Deployment

Light User	Medium User	Heavy User
16-24 Users Per Server User VM Config: L40S-4Q 4vCPU 8-16GB RAM 1U or 2U Server Config	9-18 Users Per Server User VM Config: L40S-4Q, L40S-6Q, L40S-8Q 8vCPU 16-32GB RAM 1U or 2U Server Config	6-12 Users Per Server User VM Config: L40S-12Q, L40S-16Q, L40S-24Q 12vCPU+ 32-64GB RAM 1U or 2U Server Config

Note

Information regarding the NVIDIA RTX PRO 6000 Blackwell Server Edition will be updated soon.

The NVIDIA A16 is suitable for lightweight entry level virtual workstation use cases. For best performance, it is recommended to use a minimum 8GB profile when deploying virtual workstations. On the A16, this configuration supports up to 2 users per GPU, and up to 8 users per A16 board.

The A16 provides additional flexibility with its 4 GPUs per board, each equipped with 16 GB of memory (64 GB total). This enables a VDI administrator to deploy multiple profile sizes and vGPU software licenses on a single board. For example, one GPU on the A16 could support 2B profiles for vPC users, while another GPU on the same board could support 8Q profiles for RTX vWS users. It is highly recommended to conduct a POC to determine if the A16 meets the density and performance needs of your organization. For OEM considerations, please consult vGPU Certified Servers for more information.

By selecting the best effort GPU scheduler policy, the GPU compute engine can be oversubscribed, maximizing GPU usage during idle or low utilization periods. In many customer deployments, it is unlikely that all users will be rendering simultaneously or to the extent replicated in dedicated performance testing. Therefore, selecting the best effort scheduler often results in a 2–3x oversubscription of the GPU compute engine, effectively supporting 2-3 times the number of users. The extent of higher scalability depends on users’ typical daily activities, such as the number of meetings, the length of breaks, multi-tasking, etc. It is recommended to test and validate the appropriate GPU scheduling policy to meet your users’ needs.

The recommended vGPU profiles are based on Dedicated Performance, meaning bare-metal workstation–equivalent performance, by first understanding the graphics performance of a workstation GPU. The benchmark scores of the physical workstation card were then aligned with the scores achieved for the virtual GPU. The following table summarizes these findings:

Table 10 Reference Server Lab Builds For Dedicated Performance

User Type	Equivalent Performance	Users per Server	vCPUs	vGPU Profile	vMemory	CPUs	GPUs	Memory	Storage Type	Detailed Server Specifications
Light	RTX A1000	18	4	L40S-8Q	8GB	Intel Xeon 6746E	3 x L40S	512GB	Flash-Based	112-cores, 2GHz (Turbo 2.7GHz), 128-512GB RAM, 10 GbE Network (min)
Medium	RTX 2000 Ada	12	8	L40S-12Q	16GB	Intel Xeon 6731E	3 x L40S	512GB	Flash-Based	96-cores, 2.2GHz (Turbo 3.1GHz), 512-768+GB RAM, 10 GbE Network (min)
Heavy	RTX 4000 Ada	6	12	L40S-24Q	32GB	Intel Xeon 6740E	3 x L40S	512GB	Flash-Based	96-cores, 2.4GHz (Turbo 3.2GHz), 512-768+GB RAM, 10 GbE Network (min)

User Type

Equivalent

Performance

Users

per Server

vCPUs

vGPU

Profile

vMemory

CPUs

GPUs

Memory

Storage

Type

Detailed

Server

Specifications

Light

RTX A1000

L40S-8Q

8GB

Intel Xeon 6746E

3 x L40S

512GB

Flash-Based

112-cores, 2GHz (Turbo 2.7GHz), 128-512GB RAM, 10 GbE Network (min)

Medium

RTX 2000 Ada

L40S-12Q

16GB

Intel Xeon 6731E

3 x L40S

512GB

Flash-Based

96-cores, 2.2GHz (Turbo 3.1GHz), 512-768+GB RAM, 10 GbE Network (min)

Heavy

RTX 4000 Ada

L40S-24Q

32GB

Intel Xeon 6740E

3 x L40S

512GB

Flash-Based

96-cores, 2.4GHz (Turbo 3.2GHz), 512-768+GB RAM, 10 GbE Network (min)

Note

Information regarding the NVIDIA RTX PRO 6000 Blackwell Server Edition will be updated soon.

The following example illustrates how different Quality of Service (QoS) thresholds can impact the number of users per server through the application of various GPU scheduling policies.

When using the fixed share scheduler, a specific QoS is always guaranteed. For instance, six users sharing an L40S GPU will consistently experience performance comparable to a workstation equipped with an NVIDIA RTX A1000 GPU.

In contrast, the best effort scheduler, the most commonly used option in enterprise environments, does not guarantee the same level of QoS but can support a higher user density at NVIDIA RTX A1000-class performance. However, user performance will fluctuate depending on the load from other users on the same L40S at any given time. For example, a single user on an L40S may experience performance similar to an NVIDIA RTX A2000, whereas at 3–8 users per GPU, the performance is typically comparable to a workstation with an RTX A1000 GPU.

This example assumes sufficient frame buffer capacity at all scales and is intended to demonstrate how GPU scheduling policies affect achievable user density.

Table 11 Impact of GPU Scheduling Policies on User Density

	Dedicated Performance (Fixed Share Scheduler)	Typical Customer Configuration (Best Effort Scheduler)
Users/Server Host (3x NVIDIA L40S)	18 (6 users per GPU with the performance of RTX A1000 at all times)	16-24 (3-8 users per GPU with the performance of A1000-A2000)

Dedicated Performance

(Fixed Share Scheduler)

Typical Customer Configuration

(Best Effort Scheduler)

Users/Server Host (3x NVIDIA L40S)

18 (6 users per GPU with the performance of RTX A1000 at all times)

16-24 (3-8 users per GPU with the performance of A1000-A2000)

The NVIDIA-specific and third-party industry tools mentioned in this guide were used to capture VM and server-level metrics to validate optimal performance and scalability based on benchmark data. It is highly recommended that you conduct a proof of concept (POC) for each deployment type. This will allow you to validate performance using objective measurements and gather subjective feedback from your end-users to ensure the deployment meets their needs effectively.

Previous Performance Analysis

Next Deployment Best Practices