Run NCCL Test on Dev Pods
NCCL test is a tool to verify bandwidth among nodes that have GPUs with InfiniBand (IB) or RoCE connectivity.
This example covers how to run NCCL test on a set of Dev Pods.
Step 1: Create Dev Pods
NCCL Test requires 2 or more Dev Pods, each utilizing a full node. In this example, we will create 2 Dev Pods, each uses 8x H100 GPUs.
Step 2: Setup SSH Keys for Pods Communication
1. Generate an SSH key on Pod 1
On Pod 1, generate an SSH key pair and add the public key to the list of authorized keys:
mkdir -p /root/.ssh
chmod 700 /root/.ssh
ssh-keygen -t ed25519 -N "" -f /root/.ssh/id_ed25519
cat /root/.ssh/id_ed25519.pub >> /root/.ssh/authorized_keys
2. Apply the public key to Pods 2 through N
On each of Pod 2 to Pod N, prepare the .ssh directory and add the generated public key from Pod 1 to authorized_keys:
mkdir -p /root/.ssh
chmod 700 /root/.ssh
touch /root/.ssh/authorized_keys
# then echo >> or any other command that writes the public key to /root/.ssh/authorized_keys
# on pod 2 ... pod N
Step 3: Get the local IP addresses for Pod 1 to Pod N
You can obtain the local IP of the node from the dashboard by clicking on the node name and find it in the node detail page.
Step 4: Create hostfile on Pod 1
Create a file named /tmp/hostfile.txt on Pod 1. Each line should contain the private IP address of the nodes where these pods are running.
touch /tmp/hostfile.txt
vi /tmp/hostfile.txt
# IP_ADDRESS_FOR_NODE_1
# IP_ADDRESS_FOR_NODE_2
# ...
Step 5: Download the nccl-tests script on every Pod
wget https://pub-2f78d6ca875c410392d83a29768dd4ce.r2.dev/nccl_test_pod.bash -O ./nccl_test_pod.bash
chmod +x nccl_test_pod.bash
Step 6: Run the nccl-tests script on Pod 2 to Pod N
On Pod 2 to Pod N, run the following command:
./nccl_test_pod.bash --ssh-port <ssh-port>
You can find the ssh port from the Pod detail page. Commonly, the ssh port is 2222 since the Pod is utilizing a full node.
You should see "Done" printed at the end of the output.
Step 7: Run the nccl-tests script on Pod 1
Then, on Pod 1, run the following command:
./nccl_test_pod.bash --launcher --host-file /tmp/hostfile.txt --num-workers <N> --ssh-port <ssh-port>
The output will be similar to the following:
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    42.85    0.00    0.00    N/A    36.65    0.00    0.00    N/A
          16             4     float     sum      -1    36.15    0.00    0.00    N/A    36.02    0.00    0.00    N/A
          32             8     float     sum      -1    36.19    0.00    0.00    N/A    35.95    0.00    0.00    N/A
          64            16     float     sum      -1    36.22    0.00    0.00    N/A    36.30    0.00    0.00    N/A
         128            32     float     sum      -1    36.38    0.00    0.01    N/A    36.54    0.00    0.01    N/A
         256            64     float     sum      -1    62.06    0.00    0.01    N/A    36.49    0.01    0.01    N/A
         512           128     float     sum      -1    55.09    0.01    0.02    N/A    36.07    0.01    0.03    N/A
        1024           256     float     sum      -1    35.48    0.03    0.05    N/A    36.33    0.03    0.05    N/A
        2048           512     float     sum      -1    36.13    0.06    0.11    N/A    36.43    0.06    0.11    N/A
        4096          1024     float     sum      -1    36.23    0.11    0.21    N/A    36.23    0.11    0.21    N/A
        8192          2048     float     sum      -1    40.11    0.20    0.38    N/A    39.67    0.21    0.39    N/A
       16384          4096     float     sum      -1    47.88    0.34    0.64    N/A    47.94    0.34    0.64    N/A
       32768          8192     float     sum      -1    57.35    0.57    1.07    N/A    57.48    0.57    1.07    N/A
       65536         16384     float     sum      -1    63.68    1.03    1.93    N/A    63.04    1.04    1.95    N/A
      131072         32768     float     sum      -1    69.54    1.88    3.53    N/A    71.59    1.83    3.43    N/A
      262144         65536     float     sum      -1    70.81    3.70    6.94    N/A    67.62    3.88    7.27    N/A
      524288        131072     float     sum      -1    69.38    7.56   14.17    N/A    67.24    7.80   14.62    N/A
     1048576        262144     float     sum      -1    82.14   12.77   23.94    N/A    80.83   12.97   24.32    N/A
     2097152        524288     float     sum      -1    98.29   21.34   40.01    N/A    97.38   21.53   40.38    N/A
     4194304       1048576     float     sum      -1    111.6   37.57   70.44    N/A    110.5   37.94   71.14    N/A
     8388608       2097152     float     sum      -1    147.5   56.88  106.64    N/A    145.2   57.75  108.29    N/A
    16777216       4194304     float     sum      -1    192.9   86.98  163.09    N/A    193.0   86.93  162.99    N/A
    33554432       8388608     float     sum      -1    259.4  129.34  242.51    N/A    258.1  129.99  243.73    N/A
    67108864      16777216     float     sum      -1    465.2  144.27  270.51    N/A    461.3  145.47  272.76    N/A
   134217728      33554432     float     sum      -1    736.0  182.37  341.95    N/A    748.3  179.35  336.29    N/A
   268435456      67108864     float     sum      -1   1282.3  209.34  392.52    N/A   1284.0  209.06  391.98    N/A
   536870912     134217728     float     sum      -1   2338.5  229.57  430.45    N/A   2347.8  228.67  428.76    N/A
  1073741824     268435456     float     sum      -1   4483.0  239.51  449.09    N/A   4476.1  239.89  449.79    N/A
  2147483648     536870912     float     sum      -1   8768.3  244.91  459.21    N/A   8772.0  244.81  459.02    N/A
  4294967296    1073741824     float     sum      -1    17427  246.45  462.10    N/A    17365  247.33  463.75    N/A
  8589934592    2147483648     float     sum      -1    34683  247.67  464.38    N/A    34668  247.78  464.58    N/A
 17179869184    4294967296     float     sum      -1    69265  248.03  465.06    N/A    69282  247.97  464.95    N/A
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 137.867
#
MPI job completed!
This result is obtained by running the NCCL tests on nvcr.io/nvidia/cuda-dl-base:24.10-cuda12.6-devel-ubuntu22.04.