NVIDIA STREAM Benchmark#
x86 package folder structure#
stream-gpu-test.sh script in the root directory of the package to invoke the stream_test executable for NVIDIA GPUs.
NVIDIA STREAM in the folder ./stream-gpu-linux-x86_64
stream_testexecutable. NVIDIA STREAM benchmark for GPU with double precision elements
stream_test_fp32executable. NVIDIA STREAM benchmark for GPU with single precision elements
aarch64 package folder structure#
stream-test-cpu.sh script in the root directory of the package to invoke the NVIDIA STREAM executable NVIDIA Grace CPU.
stream-test-gpu.sh script in the root directory of the package to invoke the NVIDIA STREAM executable for NVIDIA Grace Hopper and NVIDIA Grace Blackwell.
NVIDIA STREAM in the folder
./stream-gpu-linux-aarch64stream_testexecutable. NVIDIA STREAM benchmark for GPU with double precision elementsstream_test_fp32executable. NVIDIA STREAM benchmark for GPU with single precision elements
NVIDIA STREAM in the folder
./stream-cpu-linux-aarch64stream_testexecutable. NVIDIA STREAM benchmark for NVIDIA Grace CPU with double precision elements
Running the NVIDIA STREAM Benchmarks on x86_64 with NVIDIA GPUs, NVIDIA Grace Hopper, and NVIDIA Grace Blackwell systems#
The script stream-gpu-test.sh can be invoked on a command line or through a Slurm batch script to launch the NVIDIA STREAM benchmark.
The script stream-gpu-test.sh accepts the following optional parameters:
--d <int>device number
--n <int>number of elements in the arrays (defaul value 1308622848)
--dt fp32enable fp32 stream test
--t <string>tests which will be executed, can be any combination of:
C- COPY test
S- SCALE test
A- ADD test
T- TRAID testfor example, value
--t CSTmeans that COPY, SCALE and TRIAD tests will be executedDefault value
CSAT
Running the NVIDIA STREAM Benchmarks on NVIDIA Grace CPU only systems#
The script stream-cpu-test.sh can be invoked on a command line or through a Slurm batch script to launch the NVIDIA STREAM benchmark.
The script stream-cpu-test.sh accepts the following optional parameters:
--n <int>number of elements in the arrays (default value 120000000)
--t <int>number of threads
Examples:
Run NVIDIA STREAM for GPU on the device 1 and the number of elements in arrays 10000000:
srun -N 1 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \ ./stream-gpu-test.sh --d 1 --n 10000000Run NVIDIA STREAM for CPU on 144 threads and the number of elements in arrays 10000000:
srun -N 1 --ntasks-per-node=1 --cpu-bind=none --mpi=pmix \ ./stream-cpu-test.sh --t 144 --n 10000000Note: it’s recommended to use default value of the number of elements in arrays. The value above is for demonstration only.