NVIDIA Docs Hub Homepage NVIDIA Virtual GPU (vGPU) Software NVIDIA Virtual PC (vPC): Sizing and GPU Selection Guide for Virtualized Workloads Selecting the Right NVIDIA GPU for Virtualization

Selecting the Right NVIDIA GPU for Virtualization

The GPU that best meets the needs of your workloads depends on how you prioritize factors such as raw performance, time-to-solution, performance per dollar, performance per watt, form factor, and any power or cooling constraints.

NVIDIA GPUs for Virtualization

Table 2 summarizes the features of the NVIDIA GPUs for virtualization workloads based on the NVIDIA Blackwell, Ada Lovelace and Ampere GPU architectures.

GPUs for graphics workloads based on the NVIDIA Blackwell, Ada Lovelace, and Ampere GPU architectures feature second, third and fourth-generation RT Cores. RT Cores are accelerator units that are dedicated to performing ray tracing operations with extraordinary efficiency.

The GPUs in Table 2 are tested and supported with NVIDIA software for virtualizing GPUs, specifically with NVIDIA virtual GPU software. For the full product support matrices for the NVIDIA software for virtualizing GPUs, refer to Virtual GPU Software Supported Products.

Table 2 NVIDIA GPUs Recommended for Virtualization

Specification	RTX PRO 6000 Blackwell Server Edition	L40S	L40	L4	A40	A10	A16
GPUs/Board	1	1	1	1	1	1	4
Architecture	Blackwell	Ada Lovelace	Ada Lovelace	Ada Lovelace	Ampere	Ampere	Ampere
RTX Technology	✔	✔	✔	✔	✔	✔	✔
Memory Size and Type	96GB GDDR7	48GB GDDR6	48GB GDDR6	24GB GDDR6	48GB GDDR6	24GB GDDR6	64GB (16GB per GPU) GDDR6
vGPU Profile Sizes (GB)	2, 3, 4, 6, 8, 12, 16, 24, 32, 48, 96	1, 2, 3, 4, 6, 8, 12, 16, 24, 48	1, 2, 3, 4, 6, 8, 12, 16, 24, 48	1, 2, 3, 4, 6, 8, 12, 24	1, 2, 3, 4, 6, 8, 12, 16, 24, 48	1, 2, 3, 4, 6, 8, 12, 24	1, 2, 3 (since vGPU 19.1), 4, 8, 16
MIG Support	Yes	No	No	No	No	No	No
NVLink Support	No	No	No	No	Yes	No	No
Form Factor	PCIe 5.0 Dual Slot FHFL	PCIe 4.0 Dual Slot FHFL	PCIe 4.0 Dual Slot FHFL	PCIe 4.0 Single Slot HHHL	PCIe 4.0 Dual Slot FHFL	PCIe 4.0 Single Slot FHFL	PCIe 4.0 Dual Slot FHFL
Power (W)	600	350	300	72	300	150	250
Cooling	Passive	Passive	Passive	Passive	Passive	Passive	Passive
Optimized For ¹	Performance and Density	Performance	Performance	Performance	Performance	Performance	Density and Cost per User
Target Workloads	High-end 3D visualization applications, AI training and inference workloads, as well as vPC (VDI) deployments that benefit from high user density and excellent graphics performance	Deep learning and machine learning training and inference, video transcoding, AI audio and video effects, rendering, data analytics, virtual workstations, and virtual desktops	High-end virtual workstations or mixed virtual workstations and compute (AI inference, data science)	VDI, mid-level to high-end virtual workstations and compute (AI inference, video)	High-end virtual workstations or mixed virtual workstations and compute (AI, data science)	Entry-level to mid-level virtual workstations	Knowledge worker virtual desktops

Note

Supported vGPU deployments require a certified server platform. Customers should refer to the NVIDIA Qualified Systems Catalog to verify that their host system is certified.

NVIDIA RTX PRO 6000 Blackwell Server Edition

The NVIDIA RTX PRO 6000 Blackwell Server Edition is designed to deliver top-tier AI and graphics performance for enterprise data centers. It features 96 GB of high-speed GDDR7 ECC memory, 24,064 CUDA cores, 752 fifth-generation Tensor Cores, and 188 fourth-generation RT Cores. This combination makes it ideal for a wide range of workloads, including AI inference, simulation, high-quality rendering, and advanced computing tasks.

With Universal MIG, the RTX PRO 6000 Blackwell Server Edition becomes the first data center GPU capable of supporting both compute and graphics workloads within MIG instances. This enables flexible, mixed-use environments that combine AI/ML compute with professional visualization or VDI workloads while maintaining MIG’s strict resource isolation and predictable performance.

The RTX PRO 6000 Blackwell Server Edition is a powerful solution for both vWS and vPC, offering scalability and user density, supporting up to 48 concurrent vGPUs per GPU. Its exceptional performance, flexibility, and efficiency make it an ideal solution for organizations consolidating professional visualization, AI, and VDI workloads.

MIG-Backed vGPU Support

The NVIDIA RTX PRO 6000 Blackwell Server Edition supports MIG-backed vGPU, enabling virtualized GPUs to be created from individual MIG slices and assigned to virtual machines. This model combines MIG’s hardware-level spatial partitioning with the temporal partitioning capabilities of vGPU, offering flexibility in how GPU resources are shared across workloads. More information on MIG-backed vGPU is available here.

NVIDIA L40S

The NVIDIA® L40S is the highest-performance Ada GPU for AI inference, AI training, and compute-intensive workloads, while also delivering excellent visual computing performance. Based on the NVIDIA Ada Lovelace GPU architecture, it provides exceptional performance for both advanced visual computing and AI workloads in data center and edge deployments. Featuring 142 third-generation RT Cores and 568 fourth-generation Tensor Cores with FP8 support, it accelerates real-time ray tracing, deep learning training and inference, generative AI workloads, and simulation workflows. With 48GB of graphics memory, the L40S delivers outstanding performance across compute-intensive tasks, batch and real-time rendering, virtual workstations, and cloud gaming. When combined with NVIDIA RTX™ Virtual Workstation (vWS) software, it enables powerful, secure virtual workstations that can be accessed from any device.

NVIDIA L40

The NVIDIA® L40, built on the NVIDIA Ada Lovelace GPU architecture, delivers unprecedented visual computing performance and provides revolutionary neural graphics, rendering, and AI capabilities for the most demanding graphics-driven workloads. It features 142 third-generation RT Cores for enhanced real-time ray tracing and 568 fourth-generation Tensor Cores with FP8 support, paired with the latest CUDA Cores and 48GB of graphics memory. The L40 excels at high-performance virtual workstations, large-scale digital twins in NVIDIA Omniverse, and advanced visualization workloads, delivering up to twice the performance of the previous generation at the same power. When combined with NVIDIA RTX™ Virtual Workstation (vWS) software, the L40 supports immersive, high-fidelity virtual workstations accessible from the data center or cloud.

NVIDIA L4

The NVIDIA Ada Lovelace L4 Tensor Core GPU delivers universal acceleration and energy efficiency for video, AI, virtual workstations, and graphics applications in the enterprise, in the cloud, and at the edge. And with NVIDIA’s AI platform and full-stack approach, L4 is optimized for video and inference at scale for a broad range of AI applications to deliver the best in personalized experiences. As the most efficient NVIDIA accelerator for mainstream use, servers equipped with L4 power up to 120X higher AI video performance over CPU solutions and 2.5X more generative AI performance, as well as over 4X more graphics performance than the previous GPU generation. L4’s versatility and energy-efficient, single-slot, low-profile form factor makes it ideal for edge, cloud, and enterprise deployments.

NVIDIA A40

Built on the RTX platform, the NVIDIA A40 GPU is uniquely positioned to power high-end virtual workstations running professional visualization applications, accelerating the most demanding graphics workloads. The second-generation RT Cores of the NVIDIA A40 enable it to deliver massive speedups for workloads such as photorealistic rendering of movie content, architectural design evaluations, and virtual prototyping of product designs. The NVIDIA A40 features 48 GB of frame buffer, but with the NVIDIA® NVLink® GPU interconnect, it can support up to 96 GB of frame buffer to power virtual workstations that support very large animations, files, or models. Although the NVIDIA A40 has 48 GB of frame buffer, the context switching limit per GPU limits the maximum number of users supported to 32.

The NVIDIA A40 is also suitable for running VDI workloads and compute workloads on the same infrastructure. Resource utilization can be increased by using common virtualized GPU accelerated server resources to run virtual desktops and workstations while users are logged on, and compute workloads after the users have logged off. Learn more from the NVIDIA whitepaper about Using NVIDIA Virtual GPUs to Power Mixed Workloads.

NVIDIA A16

The NVIDIA A16 is designed to provide the most cost-effective graphics performance for knowledge worker VDI workloads. For these workloads, where users are accessing office productivity applications, web browsers, and streaming video, the most important consideration is achieving the best performance per dollar and the highest user density per server. With four GPUs on each board, the NVIDIA A16 is ideal for providing the best performance per dollar and a high number of users per GPU for these workloads.

NVIDIA A10

The NVIDIA A10 is designed to provide cost-effective graphics performance for accelerating and optimizing the performance of mixed workloads. When combined with NVIDIA RTX vWS software, it accelerates graphics and video processing with AI on mainstream enterprise servers. Its second-generation RT Cores make the NVIDIA A10 ideal for mainstream professional visualization applications running on high-performance mid-range virtual workstations.

Knowledge Worker VDI

For knowledge worker VDI workloads, the principal factor in determining cost effectiveness is the combination of performance per dollar and user density.

As more knowledge worker users are added to a server, the server consumes more CPU resources. Adding an NVIDIA GPU for this workload conserves CPU resources by offloading graphics rendering tasks to the GPU. As a result, user experience and performance are improved for end users.

Table 3 Maximum Number of Supported NVIDIA vPC Knowledge Workers per 2U Server (Homogeneous vGPU Profiles)

GPU	Maximum Users per GPU Board	Maximum Boards per 2U Server	Maximum Users per 2U Server
RTX PRO 6000 Blackwell Server Edition (with 2 GB Profile Size)	48 ²	4	192 ³
L40S (with 1 GB Profile Size)	32	8	256
L4 (with 1 GB Profile Size)	24	16	384
A40 (with 1 GB Profile Size)	32	8	256
A10 (with 1 GB Profile Size)	24	16	384
A16 (with 1 GB Profile Size)	64 (16 x 4)	4	256

Table 3 assumes that each user requires a vGPU profile with 1 or 2GB of frame buffer. However, to determine the profile sizes that provide the best user experience for the users in your environment, you must conduct a proof of concept (POC).

Figure 3 NVIDIA vPC VDI Cost per User

Calculations in Figure 3 include the GPU price plus the cost of NVIDIA vPC software with a four-year subscription, divided by the number of users.

Note

Information regarding the NVIDIA RTX PRO 6000 Blackwell Server Edition will be updated soon.

Footnotes

[1]

Performance-optimized GPUs are designed to maximize raw performance for a specific class of virtualized workload. They are typically recommended for the following classes of virtualized workload:

High-end virtual workstations running professional visualization applications.
Compute-intensive workloads such as artificial intelligence, deep learning, or data science workloads.

Density-optimized GPUs are designed to maximize the number of VDI users supported in a server. They are typically recommended for knowledge worker virtual desktop infrastructure (VDI) to run office productivity applications, streaming video, and the Windows OS.

[2]

48 users per board is achieved with MIG-backed Time-Sliced vGPU. Without MIG-backed Time-Sliced vGPU, the maximum is 32.

[3]

192 users per 2U server is achieved with MIG-backed Time-Sliced vGPU. Without MIG-backed Time-Sliced vGPU, the maximum is 128.

Next Selecting the Right NVIDIA GPU Virtualization Software