Run with NeMo-Run
Run with NeMo-Run
In this guide, you will learn how to launch NeMo AutoModel training jobs using NeMo-Run. NeMo-Run supports multiple backends including Slurm, Kubernetes, Docker, and local execution. For cloud-based training, see Run on Any Cloud with SkyPilot. For direct sbatch usage, see Run on a Cluster (Slurm). For single-node workstation usage, see Run on Your Local Workstation.
NeMo-Run is an open-source tool from NVIDIA that manages job submission across different execution backends. You define your compute configuration once in a Python file and reuse it across all your training jobs.
Before You Begin
- Install NeMo-Run (it is not bundled with AutoModel):
-
Create an executor definitions file at
$NEMORUN_HOME/executors.py.NEMORUN_HOMEdefaults to~/.nemo_run; set the environment variable to use a different location. This file tells NeMo-Run how to reach your compute target. Every executor you reference in a YAML config must be defined here. See Executor Setup for a complete example. -
Verify connectivity to the target in your executor (e.g. SSH for Slurm, kubeconfig for Kubernetes).
-
Set required environment variables (if needed by your training config):
Executor Setup
The executor: field in your YAML config is a name that maps to an entry in $NEMORUN_HOME/executors.py. This file must define a module-level EXECUTOR_MAP dictionary. NeMo-Run supports several executor types — here are examples of the most common ones:
Slurm Executor
Kubernetes Executor
Multiple Executors
You can define as many executors as you need for different backends, clusters, or resource configurations:
- Keys in
EXECUTOR_MAPare names you reference in YAML (executor: slurm_dev). - Values can be executor instances or zero-argument callables that return one.
- Override fields in the YAML (
nodes,devices,container_image, etc.) are applied on top of the executor defaults.
Quickstart
Any existing AutoModel YAML config can be run via NeMo-Run by adding a nemo_run: section at the top. For example, given an existing config that you run locally:
Add a nemo_run: block to submit it to a remote executor instead:
Then run the same command:
The CLI detects the nemo_run: key, strips it from the training config, loads the named executor from $NEMORUN_HOME/executors.py, and submits the job — all in one command.
Configuration Reference
All nemo_run: Fields
Examples
Single-Node Fine-Tuning (1 x 8 GPUs)
Multi-Node Distributed Training (2 x 8 GPUs)
For multi-node jobs the launcher automatically adds --nnodes, --node-rank, --rdzv-backend, --master-addr, and --master-port to the torchrun command.
Custom Container Image and Mounts
Local Execution (No Cluster)
Use executor: local to run on the current machine. No $NEMORUN_HOME/executors.py entry is needed:
Monitor and Manage Jobs
NeMo-Run stores experiment metadata under $NEMORUN_HOME/experiments/. Set tail_logs: true in the YAML to stream job output after submission.
For Slurm-based executors, standard Slurm commands also work:
For Kubernetes-based executors, use kubectl to monitor pods and jobs.
How It Works
- The
automodelCLI detects thenemo_run:key and importsNemoRunLauncher. - The
nemo_run:section is popped from the config. The remaining training config is written tonemo_run_jobs/<timestamp>/job_config.yamlfor record-keeping. - The launcher loads a pre-configured executor from
$NEMORUN_HOME/executors.pyby name (or creates aLocalExecutorforexecutor: local). Override fields are applied on top of the executor defaults. - The training config YAML is embedded in a self-contained inline bash script via a heredoc, so no separate file transfer is needed.
- A
torchruncommand is built with--nproc-per-nodeand (for multi-node) distributed rendezvous arguments. - The script is submitted via
nemo_run.Experiment. By default the call returns immediately (detach=True).
Customize Configuration
Override any training parameter from the command line, same as with local runs: