Environment Variables Reference
This page lists all the environment variables that can be used on DGX Cloud Lepton.
Basic Environment Variables
When a batch job is launched on DGX Cloud Lepton, the following environment variable names are automatically set, with the values corresponding to the job configurations.
| Environment Variable Name | Description | Sample Value |
|---|---|---|
LEPTON_RESOURCE_ACCELERATOR_NUM | Number of hardware accelerators allocated | 1 |
LEPTON_JOB_WORKER_HOSTNAME_PREFIX | Prefix used for naming worker hostnames | worker |
LEPTON_WORKSPACE_ID | Identifier for the current workspace | prod01awsuswest |
LEPTON_RESOURCE_ACCELERATOR_TYPE | Type of hardware accelerator used | NVIDIA-A100-80GB |
LEPTON_WORKER_ID | Unique identifier for the current worker | env-job-98bw-0-2nm7s |
LEPTON_JOB_FAILURE_COUNT | Number of failed job attempts | 0 |
LEPTON_JOB_TOTAL_WORKERS | Total number of workers assigned to the job | 1 |
LEPTON_JOB_WORKER_INDEX | Index of the current worker within the job | 0 |
LEPTON_SUBDOMAIN | Subdomain name assigned to the job service | env-job-98bw-job-svc |
LEPTON_JOB_SERVICE_PREFIX | Prefix used for naming services related to the job | env-job-98bw |
LEPTON_JOB_WORKER_PREFIX | Prefix used for naming services | worker |
LEPTON_JOB_NAME | Name assigned to the job | env-job-98bw |
LEPTON_VIRTUAL_ENV | Path to the Python virtual environment | /opt/lepton/venv |
Reference for torch.distributed.launch
Using the DGX Cloud Lepton environment variables, you can construct the environment variables for various AI training framework abstractions.
For example, if you use torch.distributed.launch, the required environment variables can be set up as:
| Environment Variable Name | Meaning | Construction Method |
|---|---|---|
MASTER_ADDR | Address of the master node for distributed training | ${LEPTON_JOB_WORKER_PREFIX}-0.${LEPTON_SUBDOMAIN} |
MASTER_PORT | Port for master node communication | 29400 |
WORLD_SIZE | Total number of workers assigned to the job | ${LEPTON_JOB_TOTAL_WORKERS} |
WORKER_ADDRS | Addresses for worker nodes | (seq 1 $((LEPTON_JOB_TOTAL_WORKERS - 1)) | xargs -I {} echo ${LEPTON_JOB_WORKER_PREFIX}-{}.${LEPTON_SUBDOMAIN} | paste -sd ',' -) |
NODE_RANK | Rank of the current worker node in the distributed setup | ${LEPTON_JOB_WORKER_INDEX} |
Here is an example of a short script that sets up the environment variables for a job: