Is this page helpful?

Selecting the Right NVIDIA GPU for Virtualization#

The GPU that best meets the needs of your workloads depends on how you prioritize factors such as raw performance, time-to-solution, performance per dollar, performance per watt, form factor, and any power or cooling constraints.

NVIDIA GPUs for Virtualization#

Table 3 summarizes the features of the NVIDIA GPUs for virtualization workloads based on the NVIDIA Blackwell, Ada Lovelace and Ampere GPU architectures.

GPUs for graphics workloads based on the NVIDIA Blackwell, Ada Lovelace, and Ampere GPU architectures feature second, third and fourth-generation RT Cores. RT Cores are accelerator units that are dedicated to performing ray tracing operations with extraordinary efficiency.

The GPUs in Table 3 are tested and supported with NVIDIA software for virtualizing GPUs, specifically with NVIDIA virtual GPU software. For the full product support matrices for the NVIDIA software for virtualizing GPUs, refer to Virtual GPU Software Supported Products.

Table 3 NVIDIA GPUs Recommended for Virtualization#
Specification	RTX PRO 6000 Blackwell Server Edition	L40S	L40	L4	A40	A10	A16
GPUs/Board	1	1	1	1	1	1	4
Architecture	Blackwell	Ada Lovelace	Ada Lovelace	Ada Lovelace	Ampere	Ampere	Ampere
RTX Technology	✔	✔	✔	✔	✔	✔	✔
Memory Size and Type	96GB GDDR7	48GB GDDR6	48GB GDDR6	24GB GDDR6	48GB GDDR6	24GB GDDR6	64GB (16GB per GPU) GDDR6
vGPU Profile Sizes (GB)	2, 3, 4, 6, 8, 12, 16, 24, 32, 48, 96	1, 2, 3, 4, 6, 8, 12, 16, 24, 48	1, 2, 3, 4, 6, 8, 12, 16, 24, 48	1, 2, 3, 4, 6, 8, 12, 24	1, 2, 3, 4, 6, 8, 12, 16, 24, 48	1, 2, 3, 4, 6, 8, 12, 24	1, 2, 3 (since vGPU 19.1), 4, 8, 16
MIG Support	Yes	No	No	No	No	No	No
NVLink Support	No	No	No	No	Yes	No	No
Form Factor	PCIe 5.0 Dual Slot FHFL	PCIe 4.0 Dual Slot FHFL	PCIe 4.0 Dual Slot FHFL	PCIe 4.0 Single Slot HHHL	PCIe 4.0 Dual Slot FHFL	PCIe 4.0 Single Slot FHFL	PCIe 4.0 Dual Slot FHFL
Power (W)	600	350	300	72	300	150	250
Cooling	Passive	Passive	Passive	Passive	Passive	Passive	Passive
Optimized For [1]	Performance and Density	Performance	Performance	Performance	Performance	Performance	Density
Target Workloads	High-end 3D visualization applications, AI training and inference workloads, as well as vPC (VDI) deployments that benefit from high user density and excellent graphics performance	Deep learning and machine learning training and inference, video transcoding, AI audio and video effects, rendering, data analytics, virtual workstations, and virtual desktops	High-end virtual workstations or mixed virtual workstations and compute (AI inference, data science)	VDI, mid-level to high-end virtual workstations and compute (AI inference, video)	High-end virtual workstations or mixed virtual workstations and compute (AI, data science)	Entry-level to mid-level virtual workstations	Knowledge worker virtual desktops

Note

Supported vGPU deployments require a certified server platform. Customers should refer to the NVIDIA Qualified Systems Catalog to verify that their host system is certified.

NVIDIA RTX PRO 6000 Blackwell Server Edition#

The NVIDIA RTX PRO 6000 Blackwell Server Edition is designed to deliver top-tier AI and graphics performance for enterprise data centers. It features 96 GB of high-speed GDDR7 ECC memory, 24,064 CUDA cores, 752 fifth-generation Tensor Cores, and 188 fourth-generation RT Cores. This combination makes it ideal for a wide range of workloads, including AI inference, simulation, high-quality rendering, and advanced computing tasks.

With Universal MIG, the RTX PRO 6000 Blackwell Server Edition becomes the first data center GPU capable of supporting both compute and graphics workloads within MIG instances. This enables flexible, mixed-use environments that combine AI/ML compute with professional visualization or VDI workloads while maintaining MIG’s strict resource isolation and predictable performance.

The RTX PRO 6000 Blackwell Server Edition is a powerful solution for both vWS and vPC, offering scalability and user density, supporting up to 48 concurrent vGPUs per GPU. Its exceptional performance, flexibility, and efficiency make it an ideal solution for organizations consolidating professional visualization, AI, and VDI workloads.

MIG-Backed vGPU Support#

The NVIDIA RTX PRO 6000 Blackwell Server Edition supports MIG-backed vGPU, enabling virtualized GPUs to be created from individual MIG slices and assigned to virtual machines. This model combines MIG’s hardware-level spatial partitioning with the temporal partitioning capabilities of vGPU, offering flexibility in how GPU resources are shared across workloads. More information on MIG-backed vGPU is available here.

NVIDIA L40S#

The NVIDIA® L40S is the highest-performance Ada GPU for AI inference, AI training, and compute-intensive workloads, while also delivering excellent visual computing performance. Based on the NVIDIA Ada Lovelace GPU architecture, it provides exceptional performance for both advanced visual computing and AI workloads in data center and edge deployments. Featuring 142 third-generation RT Cores and 568 fourth-generation Tensor Cores with FP8 support, it accelerates real-time ray tracing, deep learning training and inference, generative AI workloads, and simulation workflows. With 48GB of graphics memory, the L40S delivers outstanding performance across compute-intensive tasks, batch and real-time rendering, virtual workstations, and cloud gaming. When combined with NVIDIA RTX™ Virtual Workstation (vWS) software, it enables powerful, secure virtual workstations that can be accessed from any device.

NVIDIA L40#

The NVIDIA® L40, built on the NVIDIA Ada Lovelace GPU architecture, delivers unprecedented visual computing performance and provides revolutionary neural graphics, rendering, and AI capabilities for the most demanding graphics-driven workloads. It features 142 third-generation RT Cores for enhanced real-time ray tracing and 568 fourth-generation Tensor Cores with FP8 support, paired with the latest CUDA Cores and 48GB of graphics memory. The L40 excels at high-performance virtual workstations, large-scale digital twins in NVIDIA Omniverse, and advanced visualization workloads, delivering up to twice the performance of the previous generation at the same power. When combined with NVIDIA RTX™ Virtual Workstation (vWS) software, the L40 supports immersive, high-fidelity virtual workstations accessible from the data center or cloud.

NVIDIA L4#

The NVIDIA Ada Lovelace L4 Tensor Core GPU delivers universal acceleration and energy efficiency for video, AI, virtual workstations, and graphics applications in the enterprise, in the cloud, and at the edge. And with NVIDIA’s AI platform and full-stack approach, L4 is optimized for video and inference at scale for a broad range of AI applications to deliver the best in personalized experiences. As the most efficient NVIDIA accelerator for mainstream use, servers equipped with L4 power up to 120X higher AI video performance over CPU solutions and 2.5X more generative AI performance, as well as over 4X more graphics performance than the previous GPU generation. L4’s versatility and energy-efficient, single-slot, low-profile form factor makes it ideal for edge, cloud, and enterprise deployments.

NVIDIA A40#

Built on the RTX platform, the NVIDIA A40 GPU is uniquely positioned to power high-end virtual workstations running professional visualization applications, accelerating the most demanding graphics workloads. The second-generation RT Cores of the NVIDIA A40 enable it to deliver massive speedups for workloads such as photorealistic rendering of movie content, architectural design evaluations, and virtual prototyping of product designs. The NVIDIA A40 features 48 GB of frame buffer, but with the NVIDIA® NVLink® GPU interconnect, it can support up to 96 GB of frame buffer to power virtual workstations that support very large animations, files, or models. Although the NVIDIA A40 has 48 GB of frame buffer, the context switching limit per GPU limits the maximum number of users supported to 32.

The NVIDIA A40 is also suitable for running VDI workloads and compute workloads on the same infrastructure. Resource utilization can be increased by using common virtualized GPU accelerated server resources to run virtual desktops and workstations while users are logged on, and compute workloads after the users have logged off. Learn more from the NVIDIA whitepaper about Using NVIDIA Virtual GPUs to Power Mixed Workloads.

NVIDIA A16#

The NVIDIA A16 is designed to provide the most cost-effective graphics performance for knowledge worker VDI workloads. For these workloads, where users are accessing office productivity applications, web browsers, and streaming video, the most important consideration is achieving the best performance per dollar and the highest user density per server. With four GPUs on each board, the NVIDIA A16 is ideal for providing the best performance per dollar and a high number of users per GPU for these workloads.

NVIDIA A10#

The NVIDIA A10 is designed to provide cost-effective graphics performance for accelerating and optimizing the performance of mixed workloads. When combined with NVIDIA RTX vWS software, it accelerates graphics and video processing with AI on mainstream enterprise servers. Its second-generation RT Cores make the NVIDIA A10 ideal for mainstream professional visualization applications running on high-performance mid-range virtual workstations.

GPU Performance Benchmark Tests#

The GPU performance benchmark tests measure GPU performance for virtualized workloads that use NVIDIA GPU virtualization software. To measure the performance of a GPU running a specific virtualized workload, a representative benchmark test for the workload is run on the GPU.

In many cases, cost rather than raw performance is the principal factor in selecting the right virtual GPU solution for a specific workload. For this reason, the GPU performance benchmark tests measure both raw performance and performance per dollar.

Unless otherwise stated, the tests are run with vGPU profiles that are allocated all the physical GPU’s frame buffer. This vGPU profile size was chosen because the impact of scaling does not vary between different GPUs [2].

Professional Graphics#

GPU performance for professional graphics workloads was measured by using the SPECviewperf 2020 (3840x2160) benchmark test. SPECviewperf 2020 is a standard benchmark for measuring the graphics performance of professional applications. It measures the 3D graphics performance of systems running under the OpenGL and DirectX application programming interfaces.

Test Results#

For professional graphics workloads, the principal factor in determining cost effectiveness is performance per dollar.

_images/vgpu-003.png — Figure 2 RTX vWS SPECviewperf2020 Performance#

_images/vgpu-004.png — Figure 3 RTX vWS SPECviewperf2020 Performance per Dollar#

Calculations in Figure 3 include the GPU price plus the cost of NVIDIA RTX vWS software with a four-year subscription.

Note

Information regarding the NVIDIA RTX PRO 6000 Blackwell Server Edition will be updated soon.

Benchmark Server Configuration#

The server configuration for benchmarking professional graphics workloads is listed in Table 4.

Table 4 Server Configuration for Benchmarking Professional Graphics Workloads#
Property	Value
Server CPU	Intel(R) Xeon(R) Platinum 8462Y+
Hypervisor software	VMware ESXi 8.0 U1
VM vCPUs	8 vCPU
VM vMemory	16 GB
VM guest OS	Windows 11 Enterprise
GPU virtualization software	NVIDIA RTX vWS
Virtual GPU Manager driver version	550.54.02
Guest driver version	551.44
vGPU profile	L4-24Q, A10-24Q, A16-16Q, A40-48Q, T4-16Q, L40S-48Q