Testing Methodology - NVIDIA Docs

Single VM Testing

The first phase of testing explored the impact of higher resolution and multi-monitor scenarios where the following tests were executed:

Resolution	Monitors
High Definition (HD) 1920x1080	1
High Definition (HD) 1920x1080	2
Quad High Definition (QHD) 2560x1440	2
Quad High Definition (QHD) 2560x1440	3
4K 4096x2160	1
4K 4096x2160	2
5K 5120x2880	1

Note

This table reflects the configurations that were tested, NOT our recommendations.

Tests were executed on a single VM using 1B and 2B vGPU profiles to determine the optimal vGPU profile based on the nVector KW workload.

Important

The nVector Knowledge worker workload is designed to simulate peak usage scenarios using the typical productivity apps where all the concurrent users are actively using the system resources simultaneously. These results are meant to give administrators an outline with which to plan POC deployments. Workloads within your environment might be less resource intensive than the nVector knowledge workload.

Test Environment

Single VM testing leveraged two physical servers, the first hosting the target vPC VMs and the second hosting the virtual clients. Both server hosts used VMware vSphere ESXi 7.0.2 and NVIDIA Virtual GPU Software. The target VM acts as a standard vPC VDI that an end-user would connect to, and the virtual client serves as an example of an endpoint that the end-user would use to connect to the target VM.

Host Configuration	VM Configuration	Virtual Client
2U NVIDIA-Certified Server	vCPU: 4	vCPU: 2
Intel® Xeon® Gold 6248 @ 3.00 GHz	vRAM: 6144 MB	vRAM: 4096 MB
VMware ESXi, 7.0.2, 17630552	NIC: 1 (vmxnet3)	NIC: 1 (vmxnet3)
Number of CPUs: 40 (2 x 20)	Hard disk: 48 GB	Hard disk: 48 GB
Memory: 766 GB	Virtual Hardware: vmx-13	Virtual Hardware: vmx-13
Storage: Local Flash	VMware Horizon 8.12	VMware Horizon 8.12
Power Setting: High Performance	Blast Extreme (4:4:4)	Blast Extreme (4:4:4)
GPU: A16	vGPU Software: 13 (Windows Driver 471.68)	vGPU Software: 13 (Windows Driver 471.68)
Scheduling Policy: 0x00 (Best Effort)	Guest OS: Windows 10 Pro 20H2	Guest OS: Windows 10 Pro 20H2

Test Metrics - Framebuffer Usage

Frame buffer utilization is based upon many factors, including application load, monitor configuration, and screen resolution. Since our test focuses on the impact of higher resolutions and multi-monitor scenarios, frame buffer utilization is a critical test metric.

GPU Profiler

GPU Profiler is a commonly used tool that can quickly capture resource utilization while a workload is being executed on a virtual machine. This tool is typically used during a POC to help size the virtual environment to ensure acceptable user performance. GPU Profiler was run on a single VM with various vGPU profiles while the nVector knowledge worker workload ran. The following metrics were captured:

Framebuffer %
vCPU %
RAM %
Video Encode
Video Decode

A good rule of thumb to follow is that frame buffer utilization should not exceed 90% for a short time or an average of over 70% on the 1GB (1B) profile. If high utilization is noted, the vPC VM should be assigned a 2GB (2B) profile. These results are reflective of the work profile mentioned in section 1.1. Due to users leveraging different applications with varying degrees of utilization, we recommend performing a POC within your internal environment.

Scalability Testing

Typical VDI deployments have two conflicting goals: Achieving the best possible user experience and maximizing user density on server hardware. Problems can arise as density is scaled up because it can negatively impact user experience after a certain point. Scalability testing used nVector to execute tests at scale on 64 and 128 VMs while leveraging dual HD (1920x1080) monitors. Capacity planning for the server is often dependent upon server resource utilization metrics and user experience. This testing phase examined both, and the following sections summarize their importance and how to analyze these metrics.

Server Utilization Metrics

Observing overall server utilization will allow you to assess the trade-offs between end-user experience and resource utilization. To do this, monitoring tools periodically sample CPU core and GPU utilization during a single workload session. To determine the ‘steady state’ portion of the workload, samples are filtered, leaving out when users have all logged on, and the workload ramps up and down. Once a steady state has been established, all samples are aggregated to get the total CPU core utilization on the server.

The utilization of the GPU compute engine, the frame buffer, the encoder, and the decoder can all be monitored and logged through NVIDIA System Management Interface (nvidia-smi), a command-line interface tool. In addition, NVIDIA vGPU metrics are integrated through management packs like VMware vRealize Operations. For our testing purposes, nVector automated the capture of the following server metrics. It is strongly advised to test your specific workloads during a POC. You can run nvidia-smi commands on the hypervisor to monitor the GPU utilization of the physical GPU. Please refer to Deployment Best Practices for further syntax information.

User Experience Metrics

NVIDIA’s nVector benchmarking tool has built-in mechanisms to measure user experience. This next section will dig deeper into how the end-user experience is measured and how results are obtained.

Latency Metrics

Latency defines the response or feel of the end-user when working with applications in the VDI. Increased latency can provide a poor experience, including mouse cursor delay, text display issues when typing, and audio/video sync issues. The lower the latency, the better! Imagine that you are working on a PowerPoint presentation, adding a shape, and resizing it. On the first attempt, this process is instantaneous. However, the second attempt is delayed by several seconds or is sluggish. With such inconsistency, the user tends to overshoot or have trouble getting the mouse in the correct position. This lack of consistent experience can be very frustrating. Often, it results in the user experiencing high error rates as they click too fast or too slow, trying to pace themselves with an unpredictable response time. NVIDIA’s nVector benchmarking tool measures the variation in end-user latency and how frequently it is experienced.

Remoted Frames Metrics

Frame rate metrics are captured on the endpoint and provide an excellent metric on the possible end-user experience. The average frame rate is captured and calculated across the simulated workload. A lower frame rate can cause slow response during screen refresh and stuttering during scrolling or zooming. The higher the frame rate, the better!

Remoted frames are a standard measure of user experience. NVIDIA’s nVector benchmarking tool collects data on the ‘frames per second’ provided by the remote protocol vendor for the entire workload duration. The tool then tallies the data for all VDI sessions to get the total number of frames remoted for all users. Hypervisor vendors likewise measure total remoted frames as an indicator of the quality of user experience. The greater this number, the more fluid the user experience.

Image Quality

Image quality is determined by the remoting protocol, configuration of the VDI environment, and endpoint capability. For this sizing guide, the protocol used is VMware Blast Extreme with High Color Accuracy (HCA) YUV444, a protocol specific to vPC use cases. HCA no longer removes any chroma information from images and provides a much better image quality.

NVIDIA’s nVector benchmarking tool uses a lightweight agent on the VDI desktop and the client to measure image quality. These agents take multiple screen captures on the VDI desktop and on the thin client to compare later. The structural similarity (SSIM) of the screen capture taken on the client is computed by comparing it to the one taken on the VDI desktop. When the two images are similar, the heatmap will reflect more colors above the spectrum shown on its right with an SSIM value closer to 1.0 (Figure 1). As the images become less similar, the heatmap will reflect more colors down the spectrum with a value less than 1.0. More than a hundred pairs of images across an entire set of user sessions are obtained. The average SSIM index of all pairs of images is computed to provide the overall remote session quality for all users. Poor image quality, under 0.90, can cause issues with text display, line sharpness, and other graphical issues. The threshold SSIM value is 0.98; scores above 0.98 indicate good image quality, with 1.0 perfect.

Figure 1 SSIM as a Measure of Image Quality