Example VDI Deployment Configurations#

Application-specific sizing uses benchmark results and typical user configurations to answer three common questions:

  • Which NVIDIA Data Center GPU should I use for my business needs?

  • How do I select the correct profile(s) for my user types?

  • How many users can be supported per server (user density)?

User behavior varies and critically impacts the best GPU and profile size. Recommendations are provided for three user types, each with two levels of quality of service (QoS): Dedicated Performance and Typical Customer Deployment.

User Types:

  • Light Users: Low graphics requirements and smaller model sizes.

  • Medium Users: Moderate graphics requirements and medium model sizes.

  • Heavy Users: High graphics requirements and large data sets.

Recommendations for these user types within each QoS level along with server configurations are provided. These are guidelines. The most successful deployments start with a proof of concept (POC) and are continuously tuned.

If performance is the primary concern, it is recommended to use the fixed share scheduler and larger profile sizes, resulting in fewer users per server. Most deployments, however, use the best effort GPU scheduler policy for better GPU utilization, supporting more users per server and improving TCO per user. Keep the scheduling policy in mind when comparing options.

Note

For a detailed explanation of vGPU scheduling policies, see vGPU Schedulers.

The table below summarizes findings for Typical Customer Deployment:

Table 7 Typical Customer Deployment#

Light User

Medium User

Heavy User

6-8 Users Per GPU

Typical User VM Config:

  • RTX PRO 4500 Blackwell DC-4Q

  • 4 vCPUs

  • 8-12 GB RAM

3-6 Users Per GPU

Typical User VM Config:

  • RTX PRO 4500 Blackwell DC-4Q, DC-8Q or RTX PRO 6000 Blackwell DC-4Q, DC-8Q

  • 8 vCPUs

  • 16-24 GB RAM

2-4 Users Per GPU

Typical User VM Config:

  • RTX PRO 6000 Blackwell DC-12Q, DC-16Q, DC-24Q

  • 12 vCPUs+

  • 48-72 GB RAM

The NVIDIA RTX PRO 4500 Blackwell Server Edition is designed for entry-level vWS use cases and supports mixed workloads. It allows VDI administrators to deploy multiple vGPU profile sizes and software license types on a single GPU. For example, part of the GPU can be allocated to 3B profiles for vPC users, while another portion can simultaneously run 8Q profiles for RTX vWS workloads.

Because density and performance requirements vary across environments, running a proof of concept (POC) is strongly recommended to validate that the RTX PRO 4500 Blackwell Server Edition meets your organization’s needs. For OEM guidance and supported platforms, refer to the list of vGPU Certified Servers.

By selecting the best effort GPU scheduler policy, the GPU compute engine can be oversubscribed, maximizing GPU usage during idle or low utilization periods. In many customer deployments, it is unlikely that all users will be rendering simultaneously or to the extent replicated in dedicated performance testing. Therefore, selecting the best effort scheduler often results in a 2–3x oversubscription of the GPU compute engine, effectively supporting 2-3 times the number of users. The extent of higher scalability depends on users’ typical daily activities, such as the number of meetings, the length of breaks, multi-tasking, etc. It is recommended to test and validate the appropriate GPU scheduling policy to meet your users’ needs.

Quality of Service (QoS) thresholds can impact the number of users per server through the application of various GPU scheduling policies.

When using the fixed share scheduler, a specific QoS is always guaranteed. For instance, six users sharing an L40S GPU will consistently experience performance comparable to a workstation equipped with an NVIDIA RTX A1000 GPU.

In contrast, the best effort scheduler, the most commonly used option in enterprise environments, does not guarantee the same level of QoS but can support a higher user density at NVIDIA RTX A1000-class performance. However, user performance will fluctuate depending on the load from other users on the same L40S at any given time. For example, a single user on an L40S may experience performance similar to an NVIDIA RTX A2000, whereas at 3–8 users per GPU, the performance is typically comparable to a workstation with an RTX A1000 GPU.

Note

The NVIDIA-specific and third-party industry tools mentioned in this guide were used to capture VM and server-level metrics to validate optimal performance and scalability based on benchmark data. It is highly recommended that you conduct a proof of concept (POC) for each deployment type. This will allow you to validate performance using objective measurements and gather subjective feedback from your end-users to ensure the deployment meets their needs effectively.