NVIDIA STREAM Benchmark#

x86 package folder structure#

stream-gpu-test.sh script in the root directory of the package to invoke the stream_test executable for NVIDIA GPUs.

NVIDIA STREAM in the folder ./stream-gpu-linux-x86_64

  • stream_test executable. NVIDIA STREAM benchmark for GPU with double precision elements

  • stream_test_fp32 executable. NVIDIA STREAM benchmark for GPU with single precision elements

aarch64 package folder structure#

stream-test-cpu.sh script in the root directory of the package to invoke the NVIDIA STREAM executable NVIDIA Grace CPU.

stream-test-gpu.sh script in the root directory of the package to invoke the NVIDIA STREAM executable for NVIDIA Grace Hopper and NVIDIA Grace Blackwell.

  • NVIDIA STREAM in the folder ./stream-gpu-linux-aarch64

    • stream_test executable. NVIDIA STREAM benchmark for GPU with double precision elements

    • stream_test_fp32 executable. NVIDIA STREAM benchmark for GPU with single precision elements

  • NVIDIA STREAM in the folder ./stream-cpu-linux-aarch64

    • stream_test executable. NVIDIA STREAM benchmark for NVIDIA Grace CPU with double precision elements

Running the NVIDIA STREAM Benchmarks on x86_64 with NVIDIA GPUs, NVIDIA Grace Hopper, and NVIDIA Grace Blackwell systems#

The script stream-gpu-test.sh can be invoked on a command line or through a Slurm batch script to launch the NVIDIA STREAM benchmark.

The script stream-gpu-test.sh accepts the following optional parameters:

  • --d <int> device number

  • --n <int> number of elements in the arrays (defaul value 1308622848)

  • --dt fp32 enable fp32 stream test

  • --t <string> tests which will be executed, can be any combination of:

    • C - COPY test

    • S - SCALE test

    • A - ADD test

    • T - TRAID test

    for example, value --t CST means that COPY, SCALE and TRIAD tests will be executed

    Default value CSAT

Running the NVIDIA STREAM Benchmarks on NVIDIA Grace CPU only systems#

The script stream-cpu-test.sh can be invoked on a command line or through a Slurm batch script to launch the NVIDIA STREAM benchmark.

The script stream-cpu-test.sh accepts the following optional parameters:

  • --n <int> number of elements in the arrays (default value 120000000)

  • --t <int> number of threads

Examples:

Run NVIDIA STREAM for GPU on the device 1 and the number of elements in arrays 10000000:

srun -N 1 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
    ./stream-gpu-test.sh --d 1 --n 10000000

Run NVIDIA STREAM for CPU on 144 threads and the number of elements in arrays 10000000:

srun -N 1 --ntasks-per-node=1 --cpu-bind=none --mpi=pmix \
    ./stream-cpu-test.sh --t 144 --n 10000000

Note: it’s recommended to use default value of the number of elements in arrays. The value above is for demonstration only.