Benchmarking#

Compute performance and memory bandwidth benchmarks can confirm that your system is healthy and is correctly configured.

NVBandwidth#

NVBandwidth is a performance analysis tool developed by NVIDIA to measure memory bandwidth and latency between various components in systems equipped with NVIDIA GPUs. The open-source utility, available through NVIDIA’s GitHub repository, provides detailed measurements of data transfer rates between host CPUs and GPU devices (host-to-device and device-to-host), as well as inter-GPU communication bandwidth across NVLink and PCIe interconnects.

Refer to NVIDIA/nvbandwidth for more informtion.

NCCL-Tests#

NCCL-tests is a comprehensive open-source test suite developed by NVIDIA to evaluate the performance and functional correctness of the NVIDIA Collective Communications Library (NCCL) in distributed GPU computing environments. Designed for deep learning frameworks and high-performance computing applications, these tests measure the bandwidth and latency of NCCL’s collective operations like all-reduce, broadcast, gather, and reduce-scatter across multi-GPU configurations.

Refer to NVIDIA/nccl-tests for more information.

NVSHMEM#

The NVSHMEM Performance benchmarks measure latency and bandwidth for stream-ordered and device-initiated communication that targets GPU memory. Benchmarks include one-sided point-to-point communication, synchronization, and collective operations. When communicating over systems that cross multiple NVLink domains, users can configure from network transports such as the GPUDirect Async Kernel-initiated (GDA-KI) transport.

Refer to https://docs.nvidia.com/nvshmem/release-notes-install-guide/best-practice-guide/performance.html for more information about transport configuration and best practices.

Refer to https://docs.nvidia.com/nvshmem/api/gen/env.html for more information about runtime configuration parameters affecting performance.

NVSHMEM performance test binaries are shipped as part of the NVSHMEM Library package. The source code is also available as part of the source distributions, which can be downloaded from https://developer.nvidia.com/nvshmem-downloads.

Profiling CPU Behavior with Nsight Systems#

Refer to Measuring Workload Performance with Hardware Performance Counters for more information.