NVIDIA RTX vWS: Sizing and GPU Selection Guide for Virtualized Workloads
NVIDIA RTX vWS: Sizing and GPU Selection Guide for Virtualized Workloads

Tools

Several NVIDIA-specific and third-party industry tools can help validate your proof of concept (POC) and optimize user density and performance. The tools covered in this section include:

  • GPU Profiler

  • NVIDIA-SMI

  • esxtop

  • Aria Operations

These tools enable you to analyze the utilization of all physical and virtual resources, optimizing the configuration to meet user performance requirements and achieve the best scale. They are particularly useful during your POC to ensure your test environment accurately represents a live production environment. Continuously using these tools is essential to maintain system health, stability, and scalability, as deployment needs will likely evolve over time.

GPU Profiler, available on GitHub, is a widely used tool for quickly capturing resource utilization during workload execution on a virtual machine. It is typically used during a POC to help size the virtual environment and ensure acceptable user performance. GPU Profiler can be run on a single VM with various vGPU profiles. The following metrics can be captured:

  • Framebuffer %

  • GPU Utilization

  • vCPU %

  • RAM %

  • Video Encode

  • Video Decode

vgpu-010.png

Figure 10 - GPU Profiler

For more detailed information and to download the tool, please visit the release page for GPUProfiler on GitHub.

The nvidia-smi utility, built into the NVIDIA vGPU Manager, provides extensive monitoring features, allowing IT to better understand the usage of various NVIDIA vGPU engines. You can monitor and log the utilization of the compute engine, frame buffer, encoder, and decoder through the command-line interface tool, nvidia-smi, accessible on the hypervisor or within the virtual machine.

To identify physical GPU bottlenecks for RTX vWS VMs, execute the following nvidia-smi commands on the hypervisor in a Shell session using SSH:

  • Virtual Machine Frame Buffer Utilization:

Copy
Copied!
            

nvidia-smi vgpu -q -l 5 | grep -e "VM ID" -e "VM Name" -e "Total" -e "Used" -e "Free"

  • Virtual Machine GPU, Encoder and Decoder Utilization:

Copy
Copied!
            

nvidia-smi vgpu -q -l 5 | grep -e "VM ID" -e "VM Name" -e "Utilization" -e "Gpu" -e "Encoder" -e "Decoder"

  • Physical GPU, Encoder and Decoder Utilization:

Copy
Copied!
            

nvidia-smi -q -d UTILIZATION -l 5 | grep -v -e "Duration" -e "Number" -e "Max" -e "Min" -e "Avg" -e "Memory" -e "ENC" -e "DEC" -e "Samples"

For more information on nvidia-smi, refer to the official documentation. Note the option -f FILE or –filename=FILE, which can redirect query output to a file (for example, .csv).

esxtop is a VMware tool that captures host-level performance metrics in real-time, displaying information on each processor, memory utilization, disk usage, and network usage. It also captures VM-level metrics.

To efficiently capture esxtop data while minimizing disk space usage, you can pipe the output directly into a compressed file. Below is an example command to capture a one-hour data sample:

Copy
Copied!
            

esxtop -b -a -d 15 -n 240 | gzip -9c > esxtopoutput.csv.gz

  • -b: Run esxtop in batch mode, suitable for long-term data collection.

  • -a: Capture all available performance metrics.

  • -d 15: Set a delay of 15 seconds between each data collection.

  • -n 240: Perform 240 iterations, resulting in a capture window of 3600 seconds (one hour).

Additional information on VMWare’s esxtop can be found here.

The NVIDIA Virtual GPU Management Pack for VMware Aria Operations enables robust monitoring of NVIDIA physical GPUs and virtual GPUs in a VMware Aria Operations cluster.

VMware Aria Operations Features

  • Integrated Management: Combines performance, capacity, and configuration management for VMware vSphere, physical, and hybrid cloud environments.

  • Customizable Platform: Supports third-party management packs for extended functionality.

For additional information, see the VMware Aria Operations documentation.

NVIDIA Virtual GPU Management Pack Capabilities

  • Comprehensive Monitoring: Tracks and analyzes performance metrics from NVIDIA vGPU software.

  • Seamless Integration: Sends metrics to VMware Aria Operations for real-time analysis and visualization.

  • Enhanced Visibility: Displays metrics in custom NVIDIA dashboards within VMware Aria Operations.

Additional information on NVIDIA’s Virtual GPU Management Pack for VMWare Aria Operations can be found here.

Previous Sizing Methodology
Next Performance Metrics
© Copyright © 2024, NVIDIA Corporation. Last updated on Oct 3, 2024.