Performance Metrics
The previous chapter introduces essential tools for capturing critical performance metrics, which will be detailed in subsequent sections. It is crucial to gather these metrics during your Proof of Concept (POC) and regularly in production to optimize VDI delivery.
In a VDI environment, performance metrics are categorized into two tiers: server-level and VM-level. Each tier has distinct metrics that must be validated to ensure optimal performance and scalability.
As discussed in the previous chapter, both the GPU Profiler and VMware Aria Operations are invaluable tools for monitoring resource usage metrics within VMs. The upcoming sections detail these metrics, essential for conducting a POC or monitoring an existing deployment to identify and address potential performance bottlenecks effectively.
Framebuffer Usage
In a virtualized environment, the frame buffer represents the amount of vGPU memory available to the guest operating system. A common practice to follow is that a VM’s frame buffer usage should not frequently exceed 90% or average over 70%. If high utilization is noted, the vGPU backed VM is more prone to suboptimal user experiences, including performance degradation and potential crashes. Given the varied user interactions and workflows in software applications, conducting a POC with your specific workload is recommended to determine appropriate frame buffer thresholds for your environment.
vCPU Usage
When deploying NVIDIA RTX vWS, monitoring vCPU usage is equally critical alongside vGPU frame buffer utilization. As all workloads depend on CPU resources, ensuring vCPU usage doesn’t become a bottleneck is essential for maintaining optimal performance. Even when processes are accelerated using vGPU, vCPU resources are still integral to their operation. Thus, balancing and monitoring both vGPU and vCPU resources is key to optimizing system performance.
Many legacy CAD and CAE applications remain predominantly single-threaded. In these cases, choosing CPUs with high clock speeds (generally above 3 GHz) is as critical as vCPU allocation to avoid performance bottlenecks.
Video Encode/Decode
NVIDIA GPUs feature hardware-based encoders and decoders, specifically:
NVENC (NVIDIA Video Encoder): This hardware-accelerated encoder offloads computationally intensive video encoding tasks from the CPU to the GPU, significantly improving performance and efficiency.
NVDEC (NVIDIA Video Decoder): This hardware-accelerated decoder provides fast real-time decoding for various video codecs, enhancing video playback performance by reducing CPU load.
Metrics for encoder and decoder usage can be captured when these NVIDIA hardware components are actively utilized. The Video Encoder Usage metric specifically measures how intensively the GPU’s encoder is utilized by the protocol or application, crucial for monitoring performance in virtualized environments.
In the previous chapter, we introduced the NVIDIA System Management Interface (nvidia-smi) and VMware esxtop as valuable tools for monitoring resource usage metrics on a physical host. The upcoming sections delve into these metrics, essential for conducting a POC or maintaining an operational deployment to identify and address performance bottlenecks effectively.
CPU Core Utilization
VMware’s esxtop utility monitors essential physical host state information for each CPU processor. The % Total CPU Core Utilization metric is crucial for analyzing and maintaining optimal VM performance. As previously noted, every process within a VM runs on a vCPU, utilizing physical cores on the host for execution. When host threads are fully utilized, processes in a VM may bottleneck, leading to considerable performance degradation.
GPU Utilization
The NVIDIA System Management Interface (nvidia-smi) monitors GPU utilization rates, indicating the workload each GPU handles over time. It provides insights into how extensively vGPU-backed VMs utilize NVIDIA GPUs on the host server.
For environments running AI virtual Workstation (AI vWS) workloads such as Retrieval-Augmented Generation (RAG) pipelines or large language model (LLM) inference, additional compute-focused GPU metrics become relevant. AI inference workloads rely heavily on tensor operations, where performance indicators such as inference latency, time-to-first-token, and throughput (tokens per second) provide valuable insight during a POC. Running a representative AI workflow from an AI vWS Toolkit can help exercise the underlying model inference, retrieval, and embedding steps primarily intended for early AI development workflows. Monitoring these AI-specific metrics alongside traditional vWS measurements provides a more complete understanding of system behavior when supporting both graphics and AI-driven applications.