NCCL Performance Benchmark Job

This example shows how to run an NCCL performance benchmark job.

This example shows you how to run an NCCL performance benchmark job on DGX Cloud Lepton step by step.

Step 1: Create a New Job

First, you need to create a job in our platform. Head over to the Create Job page.

As you can see, there are many configurations you can fill in, like name, resource, image, etc. You can find a more detailed guide in documentation for creating a job.

create-job

In this example, we will use the following configurations:

  • Name: nccl-benchmark or any name you want.
  • Resource: For the performance benchmark, choose 8xH100 to use the entire node for the NCCL performance benchmark, and set the number of workers to 2 to use both nodes. You will need to select a node group matching the resource shape requirements. It's recommended to use your own dedicated node group for this job.
  • Image: We will use nvcr.io/nvidia/pytorch:24.11-py3 as the image for the job. This image is built with the latest NVIDIA container toolkit and PyTorch 24.11. Choose custom image and enter the image name.
  • Run command: We will load code from a remote GitHub repository and run the NCCL performance benchmark. Enter the command as follows:

You need to fill in the YOUR_SSH_PUBLIC_KEY and YOUR_SSH_PRIVATE_KEY with your own credentials.

Step 2: Run the Job

Click on the Create button, then you can see the job status in the detail page of the job. The job will proceed once two replicas are in the "Ready" state, which will take a few minutes.

job-status

You can see the logs of the job by clicking on the Logs button to check the test result.

logs

After the job is finished, the two replicas will be terminated automatically with Completed state.

Copyright @ 2025, NVIDIA Corporation.