Selecting the Right NVIDIA GPU for Virtualization
The GPU that best meets the needs of your workloads depends on how you prioritize factors such as raw performance, time-to-solution, performance per dollar, performance per watt, form factor, and any power or cooling constraints.
Table 2 summarizes the features of the NVIDIA GPUs for virtualization workloads based on the NVIDIA Blackwell, Ada Lovelace and Ampere GPU architectures.
GPUs for graphics workloads based on the NVIDIA Blackwell, Ada Lovelace, and Ampere GPU architectures feature second, third and fourth-generation RT Cores. RT Cores are accelerator units that are dedicated to performing ray tracing operations with extraordinary efficiency.
The GPUs in Table 2 are tested and supported with NVIDIA software for virtualizing GPUs, specifically with NVIDIA virtual GPU software. For the full product support matrices for the NVIDIA software for virtualizing GPUs, refer to Virtual GPU Software Supported Products.
Specification | RTX PRO 6000 Blackwell Server Edition | L40S | L40 | L4 | A40 | A10 | A16 |
|---|---|---|---|---|---|---|---|
| GPUs/Board | 1 | 1 | 1 | 1 | 1 | 1 | 4 |
| Architecture | Blackwell | Ada Lovelace | Ada Lovelace | Ada Lovelace | Ampere | Ampere | Ampere |
| RTX Technology | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Memory Size and Type | 96GB GDDR7 | 48GB GDDR6 | 48GB GDDR6 | 24GB GDDR6 | 48GB GDDR6 | 24GB GDDR6 | 64GB (16GB per GPU) GDDR6 |
| vGPU Profile Sizes (GB) | 2, 3, 4, 6, 8, 12, 16, 24, 32, 48, 96 | 1, 2, 3, 4, 6, 8, 12, 16, 24, 48 | 1, 2, 3, 4, 6, 8, 12, 16, 24, 48 | 1, 2, 3, 4, 6, 8, 12, 24 | 1, 2, 3, 4, 6, 8, 12, 16, 24, 48 | 1, 2, 3, 4, 6, 8, 12, 24 | 1, 2, 3 (since vGPU 19.1), 4, 8, 16 |
| MIG Support | Yes | No | No | No | No | No | No |
| NVLink Support | No | No | No | No | Yes | No | No |
| Form Factor |
|
|
|
|
|
|
|
| Power (W) | 600 | 350 | 300 | 72 | 300 | 150 | 250 |
| Cooling | Passive | Passive | Passive | Passive | Passive | Passive | Passive |
| Optimized For 1 | Performance and Density | Performance | Performance | Performance | Performance | Performance | Density and Cost per User |
| Target Workloads | High-end 3D visualization applications, AI training and inference workloads, as well as vPC (VDI) deployments that benefit from high user density and excellent graphics performance | Deep learning and machine learning training and inference, video transcoding, AI audio and video effects, rendering, data analytics, virtual workstations, and virtual desktops | High-end virtual workstations or mixed virtual workstations and compute (AI inference, data science) | VDI, mid-level to high-end virtual workstations and compute (AI inference, video) | High-end virtual workstations or mixed virtual workstations and compute (AI, data science) | Entry-level to mid-level virtual workstations | Knowledge worker virtual desktops |
Supported vGPU deployments require a certified server platform. Customers should refer to the NVIDIA Qualified Systems Catalog to verify that their host system is certified.
NVIDIA RTX PRO 6000 Blackwell Server Edition
The NVIDIA RTX PRO 6000 Blackwell Server Edition is designed to deliver top-tier AI and graphics performance for enterprise data centers. It features 96 GB of high-speed GDDR7 ECC memory, 24,064 CUDA cores, 752 fifth-generation Tensor Cores, and 188 fourth-generation RT Cores. This combination makes it ideal for a wide range of workloads, including AI inference, simulation, high-quality rendering, and advanced computing tasks.
With Universal MIG, the RTX PRO 6000 Blackwell Server Edition becomes the first data center GPU capable of supporting both compute and graphics workloads within MIG instances. This enables flexible, mixed-use environments that combine AI/ML compute with professional visualization or VDI workloads while maintaining MIG’s strict resource isolation and predictable performance.
The RTX PRO 6000 Blackwell Server Edition is a powerful solution for both vWS and vPC, offering scalability and user density, supporting up to 48 concurrent vGPUs per GPU. Its exceptional performance, flexibility, and efficiency make it an ideal solution for organizations consolidating professional visualization, AI, and VDI workloads.
MIG-Backed vGPU Support
The NVIDIA RTX PRO 6000 Blackwell Server Edition supports MIG-backed vGPU, enabling virtualized GPUs to be created from individual MIG slices and assigned to virtual machines. This model combines MIG’s hardware-level spatial partitioning with the temporal partitioning capabilities of vGPU, offering flexibility in how GPU resources are shared across workloads. More information on MIG-backed vGPU is available here.
NVIDIA L40S
The NVIDIA® L40S is the highest-performance Ada GPU for AI inference, AI training, and compute-intensive workloads, while also delivering excellent visual computing performance. Based on the NVIDIA Ada Lovelace GPU architecture, it provides exceptional performance for both advanced visual computing and AI workloads in data center and edge deployments. Featuring 142 third-generation RT Cores and 568 fourth-generation Tensor Cores with FP8 support, it accelerates real-time ray tracing, deep learning training and inference, generative AI workloads, and simulation workflows. With 48GB of graphics memory, the L40S delivers outstanding performance across compute-intensive tasks, batch and real-time rendering, virtual workstations, and cloud gaming. When combined with NVIDIA RTX™ Virtual Workstation (vWS) software, it enables powerful, secure virtual workstations that can be accessed from any device.
NVIDIA L40
The NVIDIA® L40, built on the NVIDIA Ada Lovelace GPU architecture, delivers unprecedented visual computing performance and provides revolutionary neural graphics, rendering, and AI capabilities for the most demanding graphics-driven workloads. It features 142 third-generation RT Cores for enhanced real-time ray tracing and 568 fourth-generation Tensor Cores with FP8 support, paired with the latest CUDA Cores and 48GB of graphics memory. The L40 excels at high-performance virtual workstations, large-scale digital twins in NVIDIA Omniverse, and advanced visualization workloads, delivering up to twice the performance of the previous generation at the same power. When combined with NVIDIA RTX™ Virtual Workstation (vWS) software, the L40 supports immersive, high-fidelity virtual workstations accessible from the data center or cloud.
NVIDIA L4
The NVIDIA Ada Lovelace L4 Tensor Core GPU delivers universal acceleration and energy efficiency for video, AI, virtual workstations, and graphics applications in the enterprise, in the cloud, and at the edge. And with NVIDIA’s AI platform and full-stack approach, L4 is optimized for video and inference at scale for a broad range of AI applications to deliver the best in personalized experiences. As the most efficient NVIDIA accelerator for mainstream use, servers equipped with L4 power up to 120X higher AI video performance over CPU solutions and 2.5X more generative AI performance, as well as over 4X more graphics performance than the previous GPU generation. L4’s versatility and energy-efficient, single-slot, low-profile form factor makes it ideal for edge, cloud, and enterprise deployments.
NVIDIA A40
Built on the RTX platform, the NVIDIA A40 GPU is uniquely positioned to power high-end virtual workstations running professional visualization applications, accelerating the most demanding graphics workloads. The second-generation RT Cores of the NVIDIA A40 enable it to deliver massive speedups for workloads such as photorealistic rendering of movie content, architectural design evaluations, and virtual prototyping of product designs. The NVIDIA A40 features 48 GB of frame buffer, but with the NVIDIA® NVLink® GPU interconnect, it can support up to 96 GB of frame buffer to power virtual workstations that support very large animations, files, or models. Although the NVIDIA A40 has 48 GB of frame buffer, the context switching limit per GPU limits the maximum number of users supported to 32.
The NVIDIA A40 is also suitable for running VDI workloads and compute workloads on the same infrastructure. Resource utilization can be increased by using common virtualized GPU accelerated server resources to run virtual desktops and workstations while users are logged on, and compute workloads after the users have logged off. Learn more from the NVIDIA whitepaper about Using NVIDIA Virtual GPUs to Power Mixed Workloads.
NVIDIA A16
The NVIDIA A16 is designed to provide the most cost-effective graphics performance for knowledge worker VDI workloads. For these workloads, where users are accessing office productivity applications, web browsers, and streaming video, the most important consideration is achieving the best performance per dollar and the highest user density per server. With four GPUs on each board, the NVIDIA A16 is ideal for providing the best performance per dollar and a high number of users per GPU for these workloads.
NVIDIA A10
The NVIDIA A10 is designed to provide cost-effective graphics performance for accelerating and optimizing the performance of mixed workloads. When combined with NVIDIA RTX vWS software, it accelerates graphics and video processing with AI on mainstream enterprise servers. Its second-generation RT Cores make the NVIDIA A10 ideal for mainstream professional visualization applications running on high-performance mid-range virtual workstations.
For knowledge worker VDI workloads, the principal factor in determining cost effectiveness is the combination of performance per dollar and user density.
As more knowledge worker users are added to a server, the server consumes more CPU resources. Adding an NVIDIA GPU for this workload conserves CPU resources by offloading graphics rendering tasks to the GPU. As a result, user experience and performance are improved for end users.
GPU | Maximum Users per GPU Board | Maximum Boards per 2U Server | Maximum Users per 2U Server |
|---|---|---|---|
| RTX PRO 6000 Blackwell Server Edition (with 2 GB Profile Size) | 48 2 | 4 | 192 3 |
| L40S (with 1 GB Profile Size) | 32 | 8 | 256 |
| L4 (with 1 GB Profile Size) | 24 | 16 | 384 |
| A40 (with 1 GB Profile Size) | 32 | 8 | 256 |
| A10 (with 1 GB Profile Size) | 24 | 16 | 384 |
| A16 (with 1 GB Profile Size) | 64 (16 x 4) | 4 | 256 |
Table 3 assumes that each user requires a vGPU profile with 1 or 2GB of frame buffer. However, to determine the profile sizes that provide the best user experience for the users in your environment, you must conduct a proof of concept (POC).
Figure 3 NVIDIA vPC VDI Cost per User
Calculations in Figure 3 include the GPU price plus the cost of NVIDIA vPC software with a four-year subscription, divided by the number of users.
Information regarding the NVIDIA RTX PRO 6000 Blackwell Server Edition will be updated soon.
Footnotes
Performance-optimized GPUs are designed to maximize raw performance for a specific class of virtualized workload. They are typically recommended for the following classes of virtualized workload:
High-end virtual workstations running professional visualization applications.
Compute-intensive workloads such as artificial intelligence, deep learning, or data science workloads.
Density-optimized GPUs are designed to maximize the number of VDI users supported in a server. They are typically recommended for knowledge worker virtual desktop infrastructure (VDI) to run office productivity applications, streaming video, and the Windows OS.
48 users per board is achieved with MIG-backed Time-Sliced vGPU. Without MIG-backed Time-Sliced vGPU, the maximum is 32.
192 users per 2U server is achieved with MIG-backed Time-Sliced vGPU. Without MIG-backed Time-Sliced vGPU, the maximum is 128.