Run NCCL Test on Dev Pods

NCCL test is a tool to verify bandwidth among nodes that have GPUs with InfiniBand (IB) or RoCE connectivity.

This example covers how to run NCCL test on a set of Dev Pods.

Step 1: Create Dev Pods

NCCL Test requires 2 or more Dev Pods, each utilizing a full node. In this example, we will create 2 Dev Pods, each uses 8x H100 GPUs.

Step 2: Setup SSH Keys for Pods Communication

1. Generate an SSH key on Pod 1

On Pod 1, generate an SSH key pair and add the public key to the list of authorized keys:

mkdir -p /root/.ssh
chmod 700 /root/.ssh
ssh-keygen -t ed25519 -N "" -f /root/.ssh/id_ed25519
cat /root/.ssh/id_ed25519.pub >> /root/.ssh/authorized_keys

2. Apply the public key to Pods 2 through N

On each of Pod 2 to Pod N, prepare the .ssh directory and add the generated public key from Pod 1 to authorized_keys:

mkdir -p /root/.ssh
chmod 700 /root/.ssh
touch /root/.ssh/authorized_keys
# then echo >> or any other command that writes the public key to /root/.ssh/authorized_keys
# on pod 2 ... pod N

Step 3: Get the local IP addresses for Pod 1 to Pod N

You can obtain the local IP of the node from the dashboard by clicking on the node name and find it in the node detail page.

Step 4: Create hostfile on Pod 1

Create a file named /tmp/hostfile.txt on Pod 1. Each line should contain the private IP address of the nodes where these pods are running.

touch /tmp/hostfile.txt
vi /tmp/hostfile.txt
# IP_ADDRESS_FOR_NODE_1
# IP_ADDRESS_FOR_NODE_2
# ...

Step 5: Download the nccl-tests script on every Pod

wget https://pub-2f78d6ca875c410392d83a29768dd4ce.r2.dev/nccl_test_pod.bash -O ./nccl_test_pod.bash
chmod +x nccl_test_pod.bash

Step 6: Run the nccl-tests script on Pod 2 to Pod N

On Pod 2 to Pod N, run the following command:

./nccl_test_pod.bash --ssh-port <ssh-port>

Note

You can find the ssh port from the Pod detail page. Commonly, the ssh port is 2222 since the Pod is utilizing a full node.

You should see "Done" printed at the end of the output.

Step 7: Run the nccl-tests script on Pod 1

Then, on Pod 1, run the following command:

./nccl_test_pod.bash --launcher --host-file /tmp/hostfile.txt --num-workers <N> --ssh-port <ssh-port>

The output will be similar to the following:

#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    42.85    0.00    0.00    N/A    36.65    0.00    0.00    N/A
          16             4     float     sum      -1    36.15    0.00    0.00    N/A    36.02    0.00    0.00    N/A
          32             8     float     sum      -1    36.19    0.00    0.00    N/A    35.95    0.00    0.00    N/A
          64            16     float     sum      -1    36.22    0.00    0.00    N/A    36.30    0.00    0.00    N/A
         128            32     float     sum      -1    36.38    0.00    0.01    N/A    36.54    0.00    0.01    N/A
         256            64     float     sum      -1    62.06    0.00    0.01    N/A    36.49    0.01    0.01    N/A
         512           128     float     sum      -1    55.09    0.01    0.02    N/A    36.07    0.01    0.03    N/A
        1024           256     float     sum      -1    35.48    0.03    0.05    N/A    36.33    0.03    0.05    N/A
        2048           512     float     sum      -1    36.13    0.06    0.11    N/A    36.43    0.06    0.11    N/A
        4096          1024     float     sum      -1    36.23    0.11    0.21    N/A    36.23    0.11    0.21    N/A
        8192          2048     float     sum      -1    40.11    0.20    0.38    N/A    39.67    0.21    0.39    N/A
       16384          4096     float     sum      -1    47.88    0.34    0.64    N/A    47.94    0.34    0.64    N/A
       32768          8192     float     sum      -1    57.35    0.57    1.07    N/A    57.48    0.57    1.07    N/A
       65536         16384     float     sum      -1    63.68    1.03    1.93    N/A    63.04    1.04    1.95    N/A
      131072         32768     float     sum      -1    69.54    1.88    3.53    N/A    71.59    1.83    3.43    N/A
      262144         65536     float     sum      -1    70.81    3.70    6.94    N/A    67.62    3.88    7.27    N/A
      524288        131072     float     sum      -1    69.38    7.56   14.17    N/A    67.24    7.80   14.62    N/A
     1048576        262144     float     sum      -1    82.14   12.77   23.94    N/A    80.83   12.97   24.32    N/A
     2097152        524288     float     sum      -1    98.29   21.34   40.01    N/A    97.38   21.53   40.38    N/A
     4194304       1048576     float     sum      -1    111.6   37.57   70.44    N/A    110.5   37.94   71.14    N/A
     8388608       2097152     float     sum      -1    147.5   56.88  106.64    N/A    145.2   57.75  108.29    N/A
    16777216       4194304     float     sum      -1    192.9   86.98  163.09    N/A    193.0   86.93  162.99    N/A
    33554432       8388608     float     sum      -1    259.4  129.34  242.51    N/A    258.1  129.99  243.73    N/A
    67108864      16777216     float     sum      -1    465.2  144.27  270.51    N/A    461.3  145.47  272.76    N/A
   134217728      33554432     float     sum      -1    736.0  182.37  341.95    N/A    748.3  179.35  336.29    N/A
   268435456      67108864     float     sum      -1   1282.3  209.34  392.52    N/A   1284.0  209.06  391.98    N/A
   536870912     134217728     float     sum      -1   2338.5  229.57  430.45    N/A   2347.8  228.67  428.76    N/A
  1073741824     268435456     float     sum      -1   4483.0  239.51  449.09    N/A   4476.1  239.89  449.79    N/A
  2147483648     536870912     float     sum      -1   8768.3  244.91  459.21    N/A   8772.0  244.81  459.02    N/A
  4294967296    1073741824     float     sum      -1    17427  246.45  462.10    N/A    17365  247.33  463.75    N/A
  8589934592    2147483648     float     sum      -1    34683  247.67  464.38    N/A    34668  247.78  464.58    N/A
 17179869184    4294967296     float     sum      -1    69265  248.03  465.06    N/A    69282  247.97  464.95    N/A
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 137.867
#
MPI job completed!

Note

This result is obtained by running the NCCL tests on nvcr.io/nvidia/cuda-dl-base:24.10-cuda12.6-devel-ubuntu22.04.

1. Bring Your Own Compute

1. Endpoint

2. Dev Pod

3. Batch Job

4. Node Group

8. Workspace

1. Dev Pod

2. Batch Job

1. API Reference

2. CLI Reference

3. Limits