Run with NeMo-Run

In this guide, you will learn how to launch NeMo AutoModel training jobs using NeMo-Run. NeMo-Run supports multiple backends including Slurm, Kubernetes, Docker, and local execution. For cloud-based training, see Run on Any Cloud with SkyPilot. For direct sbatch usage, see Run on a Cluster (Slurm). For single-node workstation usage, see Run on Your Local Workstation.

NeMo-Run is an open-source tool from NVIDIA that manages job submission across different execution backends. You define your compute configuration once in a Python file and reuse it across all your training jobs.

Before You Begin

Install NeMo-Run (it is not bundled with AutoModel):

$ pip install nemo-run

Create an executor definitions file at $NEMORUN_HOME/executors.py. NEMORUN_HOME defaults to ~/.nemo_run; set the environment variable to use a different location. This file tells NeMo-Run how to reach your compute target. Every executor you reference in a YAML config must be defined here. See Executor Setup for a complete example.
Verify connectivity to the target in your executor (e.g. SSH for Slurm, kubeconfig for Kubernetes).
Set required environment variables (if needed by your training config):

$ export HF_TOKEN=hf_...          # Required for gated models (e.g. Llama)
$ export WANDB_API_KEY=...        # Optional: Weights & Biases logging

Executor Setup

The executor: field in your YAML config is a name that maps to an entry in $NEMORUN_HOME/executors.py. This file must define a module-level EXECUTOR_MAP dictionary. NeMo-Run supports several executor types — here are examples of the most common ones:

Slurm Executor

1 import nemo_run as run
2 
3 def my_slurm_cluster():
4     executor = run.SlurmExecutor(
5         account="my_account",
6         partition="batch",
7         tunnel=run.SSHTunnel(
8             user="myuser",
9             host="login-node.example.com",
10             job_dir="/remote/path/nemo_run/jobs",
11         ),
12         nodes=1,
13         ntasks_per_node=8,
14         gpus_per_node=8,
15         mem="0",
16         exclusive=True,
17         packager=run.Packager(),
18     )
19     executor.container_image = "nvcr.io/nvidia/nemo-automodel:26.02"
20     executor.container_mounts = ["/data:/data", "/checkpoints:/checkpoints"]
21     executor.env_vars = {"HF_HOME": "/data/hf_cache"}
22     executor.time = "04:00:00"
23     return executor
24 
25 EXECUTOR_MAP = {
26     "my_slurm": my_slurm_cluster(),
27 }

Kubernetes Executor

1 import nemo_run as run
2 
3 def my_k8s_cluster():
4     return run.KubeflowExecutor(
5         namespace="training",
6         image="nvcr.io/nvidia/nemo-automodel:26.02",
7         num_nodes=1,
8         nprocs_per_node=8,
9         gpus_per_node=8,
10     )
11 
12 EXECUTOR_MAP = {
13     "my_k8s": my_k8s_cluster(),
14 }

Multiple Executors

You can define as many executors as you need for different backends, clusters, or resource configurations:

1 EXECUTOR_MAP = {
2     "slurm_dev": my_slurm_dev(),
3     "slurm_prod": my_slurm_prod(),
4     "k8s": my_k8s_cluster(),
5 }

Keys in EXECUTOR_MAP are names you reference in YAML (executor: slurm_dev).
Values can be executor instances or zero-argument callables that return one.
Override fields in the YAML (nodes, devices, container_image, etc.) are applied on top of the executor defaults.

Quickstart

Any existing AutoModel YAML config can be run via NeMo-Run by adding a nemo_run: section at the top. For example, given an existing config that you run locally:

$ automodel examples/llm_finetune/qwen/qwen3_moe_30b_te_packed_sequence.yaml

Add a nemo_run: block to submit it to a remote executor instead:

1 # -- Add this section to any existing config ----------------------------------
2 nemo_run:
3   executor: my_slurm             # Name from EXECUTOR_MAP in $NEMORUN_HOME/executors.py
4   container_image: /images/custom.sqsh  # Override executor's default image
5   nodes: 1                       # Override number of nodes
6   ntasks_per_node: 8             # GPUs per node
7   time: "04:00:00"               # Override time limit
8   job_name: qwen3_moe_finetune   # Experiment and job name
9 
10 # -- Everything below is your existing training config (unchanged) ------------
11 recipe: TrainFinetuneRecipeForNextTokenPrediction
12 
13 step_scheduler:
14   global_batch_size: 32
15   # ... rest of your config ...

Then run the same command:

$ automodel your_config.yaml

The CLI detects the nemo_run: key, strips it from the training config, loads the named executor from $NEMORUN_HOME/executors.py, and submits the job — all in one command.

Configuration Reference

All `nemo_run:` Fields

Field	Default	Description
`executor`	`"local"`	Name from `EXECUTOR_MAP` in `$NEMORUN_HOME/executors.py`, or `"local"` for local execution
`job_name`	`<recipe_class_name>`	Experiment and job name
`detach`	`true`	Return immediately after submission
`tail_logs`	`false`	Stream logs after submission
`executors_file`	`$NEMORUN_HOME/executors.py`	Path to the executor definitions file
`job_dir`	`./nemo_run_jobs`	Local directory for job artifacts (config snapshot)
(any other key)	(from executor)	Applied directly to the executor via `setattr`. Use the executor’s native attribute names (e.g. `nodes`, `ntasks_per_node`, `partition`, `container_image`, `time`, `env_vars`). Dicts are merged, lists are extended.

Examples

Single-Node Fine-Tuning (1 x 8 GPUs)

1 nemo_run:
2   executor: my_slurm
3   nodes: 1
4   ntasks_per_node: 8
5   job_name: single_node_finetune

Multi-Node Distributed Training (2 x 8 GPUs)

1 nemo_run:
2   executor: my_slurm
3   nodes: 2
4   ntasks_per_node: 8
5   time: "08:00:00"
6   job_name: multinode_pretrain

For multi-node jobs the launcher automatically adds --nnodes, --node-rank, --rdzv-backend, --master-addr, and --master-port to the torchrun command.

Custom Container Image and Mounts

1 nemo_run:
2   executor: my_slurm
3   container_image: /images/automodel_nightly.sqsh
4   container_mounts:
5     - /scratch/datasets:/datasets
6     - /scratch/checkpoints:/checkpoints
7   env_vars:
8     HF_HOME: /datasets/hf_cache
9     NCCL_DEBUG: INFO

Local Execution (No Cluster)

Use executor: local to run on the current machine. No $NEMORUN_HOME/executors.py entry is needed:

1 nemo_run:
2   executor: local
3   ntasks_per_node: 2
4   job_name: local_test

Monitor and Manage Jobs

NeMo-Run stores experiment metadata under $NEMORUN_HOME/experiments/. Set tail_logs: true in the YAML to stream job output after submission.

For Slurm-based executors, standard Slurm commands also work:

$ squeue -u $USER                 # List your queued and running jobs
$ scancel <job_id>                # Cancel a running or pending job
$ sacct -j <job_id>               # View job accounting information

For Kubernetes-based executors, use kubectl to monitor pods and jobs.

How It Works

The automodel CLI detects the nemo_run: key and imports NemoRunLauncher.
The nemo_run: section is popped from the config. The remaining training config is written to nemo_run_jobs/<timestamp>/job_config.yaml for record-keeping.
The launcher loads a pre-configured executor from $NEMORUN_HOME/executors.py by name (or creates a LocalExecutor for executor: local). Override fields are applied on top of the executor defaults.
The training config YAML is embedded in a self-contained inline bash script via a heredoc, so no separate file transfer is needed.
A torchrun command is built with --nproc-per-node and (for multi-node) distributed rendezvous arguments.
The script is submitted via nemo_run.Experiment. By default the call returns immediately (detach=True).

Customize Configuration

Override any training parameter from the command line, same as with local runs:

$ automodel config_with_nemo_run.yaml \
>   --model.pretrained_model_name_or_path meta-llama/Llama-3.2-3B

When to Use NeMo-Run vs. SkyPilot vs. Slurm

	NeMo-Run	SkyPilot	Slurm (sbatch)
Infrastructure	Slurm, Kubernetes, Docker, local	Public cloud (AWS, GCP, Azure)	On-prem HPC
Container support	Yes (Pyxis/Enroot, Docker, K8s pods)	N/A (cloud VMs)	Manual (in sbatch script)
Setup required	`nemo-run` + `$NEMORUN_HOME/executors.py`	Cloud credentials + `sky check`	Cluster access + sbatch script
Job submission	`automodel config.yaml`	`automodel config.yaml`	`sbatch slurm.sub`
Good for	Managed multi-backend execution, reusable executor configs	Cloud burst, cost optimization, spot instances	Direct Slurm scripts, full control over sbatch