Templates
DGX Cloud Lepton provides two templates to help you get started quickly:
- MPI - For distributed computing with MPI
- Torchrun - For PyTorch distributed training
To use these templates, select the Template option at the top of the create job page.

MPI Template
To use the MPI template, follow the steps below:
-
Go to the Batch Jobs tab and click on Create Job.
-
Select MPI on the top of the create job page.
-
Fill in the MPI Command, job name, and other configurations.
- MPI Command: The template provides a default command. You can customize it using the predefined environment variables provided by MPI template.
NoteIn this documentation, we will use the term 'primary' instead of 'master' to align with modern terminology. Please note that UI, commands, and environment variables may still reference 'master'.
The MPI template generates a bash script that sets up the MPI runtime environment. The following environment variables are automatically available in your job:
MASTER_ADDR
: Hostname of the primary node (first node)MASTER_IP
: IP address of the primary nodeTHIS_ADDR
: Hostname of the current nodeHOST_ADDRS
: Comma-separated list of all node hostnamesHOST_IPS
: Comma-separated list of all node IP addressesNNODES
: Total workers for the current jobNODE_RANK
: Rank of the current node among all workersNGPUS
: Number of GPUs on the current nodeHOSTFILE
: Path to the file containing all node IPs (default: "/tmp/hostfile.txt")
- Job Name: Enter a descriptive name like
mpi-job
or your preferred identifier. - Other Configurations: Use the default settings or customize as needed. See batch job configurations for detailed options.
-
Click Create to launch your job.
Torchrun Template
The Torchrun template streamlines PyTorch distributed training. Follow these steps:
-
Go to the Batch Jobs tab and click Create Job.
-
Select Torchrun on the top of the create job page.
-
Fill in the Torchrun Command, job name, and other configurations.
- Torchrun Command: The template provides a default command. You can customize it using the predefined environment variables.
NoteIn this documentation, we will use the term 'primary' instead of 'master' to align with modern terminology. Please note that UI, commands, and environment variables may still reference 'master'.
The Torchrun template generates a bash script that predefines environment variables for your job:
PET_MASTER_ADDR
: Hostname of the primary node (first node)MASTER_IP
: IP address of the primary nodeTHIS_ADDR
: Hostname of the current nodePET_NNODES
: Total number of workers for the current jobPET_NODE_RANK
: Rank of the current node among all workersPET_NPROC_PER_NODE
: Number of GPUs on the current node
Environment variables prefixed with
PET_
are automatically parsed as corresponding torchrun parameters. For example,PET_MASTER_ADDR
becomes--master_addr
. You can override these by explicitly setting parameters in yourtorchrun
command.- Job Name: Enter a descriptive name like
torchrun-job
or your preferred identifier. - Other Configurations: Use the default settings or customize as needed. Refer to batch job configurations for detailed options.
-
Click Create to launch your job.