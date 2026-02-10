Deployment Best Practices#

Understand Your Environment# IT infrastructure is highly complex, involving multiple server types with varying CPUs, memory, storage, and networking resources. Deployments often involve a geographically dispersed user base, multiple data centers, and a blend of cloud-based compute and storage resources. It is crucial to define the scope of your deployment around these variables and conduct a proof of concept (POC) for each deployment type. Other factors to consider include the NVIDIA vGPU certified OEM server you’ve selected, the supported NVIDIA GPUs for that platform, and any power and cooling constraints in your data center. For further information regarding installation and server configuration steps, please refer to the NVIDIA vGPU deployment guides.

Run a Proof of Concept# The most successful deployments balance user density (scalability) with quality user experience. This balance is achieved by using NVIDIA RTX vWS virtual machines in production while gathering objective measurements and subjective feedback from end users. Table 12 Metrics for Balancing User Density and User Experience # Objective Measurements Subjective Feedback Loading time of application Overall user experience Loading time of dataset Application performance Utilization (CPU, GPU, network) Zooming and panning experience Frames Per Second (FPS) Video streaming

Understand Your Users & Applications# Another benefit of performing a POC prior to deployment is that it enables more accurate categorization of user behavior and GPU requirements for each virtual application. Customers often segment their end users into user types for each application and bundle similar user types on a host. Light users can be supported on a smaller vGPU profile size while heavy users require more GPU resources, and a large profile size like what can be achieved with the RTX PRO 6000 Blackwell Server Edition. Note that while the NVIDIA A16 board has a total framebuffer size of 64GB, each GPU on the A16 has 16GB, so the largest profile size supported on an A16 is 16Q. However, the L40S has one GPU on a board supporting up to a 48Q profile size. Work with your application ISV and NVIDIA representative to help you determine the correct license(s) and NVIDIA GPUs for your deployment needs.

Use Benchmark Testing# Benchmark tools like SPECviewperf are valuable for sizing deployments but have limitations. These benchmarks simulate peak workloads, representing periods of highest GPU demand across all virtual machines. They do not account for times when the system is underutilized, nor for hypervisor features like best-effort scheduling, which can enhance user density while maintaining consistent performance. The graph below illustrates that user workflows are often interactive, characterized by frequent short idle periods when users require fewer hypervisor and NVIDIA vGPU resources. The extent to which scalability is increased depends on typical user activities such as meetings, breaks, multitasking, and other factors. Figure 10 Comparison of benchmarking versus typical end user# Note For accurate benchmarking, it is recommended to disable the Frame Rate Limiter (FRL). For detailed instructions on how to disable the FRL, please refer to the release notes for your chosen hypervisor in the NVIDIA Virtual GPU Software Documentation.

Understanding the GPU Scheduler# NVIDIA RTX vWS provides three GPU scheduling options to accommodate a variety of QoS requirements of customers. Additional information regarding GPU scheduling can be found here.