Run with NeMo-Run

View as Markdown

In this guide, you will learn how to launch NeMo AutoModel training jobs using NeMo-Run. NeMo-Run supports multiple backends including Slurm, Kubernetes, Docker, and local execution. For cloud-based training, see Run on Any Cloud with SkyPilot. For direct sbatch usage, see Run on a Cluster (Slurm). For single-node workstation usage, see Run on Your Local Workstation.

NeMo-Run is an open-source tool from NVIDIA that manages job submission across different execution backends. You define your compute configuration once in a Python file and reuse it across all your training jobs.

Before You Begin

  1. Install NeMo-Run (it is not bundled with AutoModel):
$pip install nemo-run
  1. Create an executor definitions file at $NEMORUN_HOME/executors.py. NEMORUN_HOME defaults to ~/.nemo_run; set the environment variable to use a different location. This file tells NeMo-Run how to reach your compute target. Every executor you reference in a YAML config must be defined here. See Executor Setup for a complete example.

  2. Verify connectivity to the target in your executor (e.g. SSH for Slurm, kubeconfig for Kubernetes).

  3. Set required environment variables (if needed by your training config):

$export HF_TOKEN=hf_... # Required for gated models (e.g. Llama)
$export WANDB_API_KEY=... # Optional: Weights & Biases logging

Executor Setup

The executor: field in your YAML config is a name that maps to an entry in $NEMORUN_HOME/executors.py. This file must define a module-level EXECUTOR_MAP dictionary. NeMo-Run supports several executor types — here are examples of the most common ones:

Slurm Executor

1import nemo_run as run
2
3def my_slurm_cluster():
4 executor = run.SlurmExecutor(
5 account="my_account",
6 partition="batch",
7 tunnel=run.SSHTunnel(
8 user="myuser",
9 host="login-node.example.com",
10 job_dir="/remote/path/nemo_run/jobs",
11 ),
12 nodes=1,
13 ntasks_per_node=8,
14 gpus_per_node=8,
15 mem="0",
16 exclusive=True,
17 packager=run.Packager(),
18 )
19 executor.container_image = "nvcr.io/nvidia/nemo-automodel:26.02"
20 executor.container_mounts = ["/data:/data", "/checkpoints:/checkpoints"]
21 executor.env_vars = {"HF_HOME": "/data/hf_cache"}
22 executor.time = "04:00:00"
23 return executor
24
25EXECUTOR_MAP = {
26 "my_slurm": my_slurm_cluster(),
27}

Kubernetes Executor

1import nemo_run as run
2
3def my_k8s_cluster():
4 return run.KubeflowExecutor(
5 namespace="training",
6 image="nvcr.io/nvidia/nemo-automodel:26.02",
7 num_nodes=1,
8 nprocs_per_node=8,
9 gpus_per_node=8,
10 )
11
12EXECUTOR_MAP = {
13 "my_k8s": my_k8s_cluster(),
14}

Multiple Executors

You can define as many executors as you need for different backends, clusters, or resource configurations:

1EXECUTOR_MAP = {
2 "slurm_dev": my_slurm_dev(),
3 "slurm_prod": my_slurm_prod(),
4 "k8s": my_k8s_cluster(),
5}
  • Keys in EXECUTOR_MAP are names you reference in YAML (executor: slurm_dev).
  • Values can be executor instances or zero-argument callables that return one.
  • Override fields in the YAML (nodes, devices, container_image, etc.) are applied on top of the executor defaults.

Quickstart

Any existing AutoModel YAML config can be run via NeMo-Run by adding a nemo_run: section at the top. For example, given an existing config that you run locally:

$automodel examples/llm_finetune/qwen/qwen3_moe_30b_te_packed_sequence.yaml

Add a nemo_run: block to submit it to a remote executor instead:

1# -- Add this section to any existing config ----------------------------------
2nemo_run:
3 executor: my_slurm # Name from EXECUTOR_MAP in $NEMORUN_HOME/executors.py
4 container_image: /images/custom.sqsh # Override executor's default image
5 nodes: 1 # Override number of nodes
6 ntasks_per_node: 8 # GPUs per node
7 time: "04:00:00" # Override time limit
8 job_name: qwen3_moe_finetune # Experiment and job name
9
10# -- Everything below is your existing training config (unchanged) ------------
11recipe: TrainFinetuneRecipeForNextTokenPrediction
12
13step_scheduler:
14 global_batch_size: 32
15 # ... rest of your config ...

Then run the same command:

$automodel your_config.yaml

The CLI detects the nemo_run: key, strips it from the training config, loads the named executor from $NEMORUN_HOME/executors.py, and submits the job — all in one command.

Configuration Reference

All nemo_run: Fields

FieldDefaultDescription
executor"local"Name from EXECUTOR_MAP in $NEMORUN_HOME/executors.py, or "local" for local execution
job_name<recipe_class_name>Experiment and job name
detachtrueReturn immediately after submission
tail_logsfalseStream logs after submission
executors_file$NEMORUN_HOME/executors.pyPath to the executor definitions file
job_dir./nemo_run_jobsLocal directory for job artifacts (config snapshot)
(any other key)(from executor)Applied directly to the executor via setattr. Use the executor’s native attribute names (e.g. nodes, ntasks_per_node, partition, container_image, time, env_vars). Dicts are merged, lists are extended.

Examples

Single-Node Fine-Tuning (1 x 8 GPUs)

1nemo_run:
2 executor: my_slurm
3 nodes: 1
4 ntasks_per_node: 8
5 job_name: single_node_finetune

Multi-Node Distributed Training (2 x 8 GPUs)

1nemo_run:
2 executor: my_slurm
3 nodes: 2
4 ntasks_per_node: 8
5 time: "08:00:00"
6 job_name: multinode_pretrain

For multi-node jobs the launcher automatically adds --nnodes, --node-rank, --rdzv-backend, --master-addr, and --master-port to the torchrun command.

Custom Container Image and Mounts

1nemo_run:
2 executor: my_slurm
3 container_image: /images/automodel_nightly.sqsh
4 container_mounts:
5 - /scratch/datasets:/datasets
6 - /scratch/checkpoints:/checkpoints
7 env_vars:
8 HF_HOME: /datasets/hf_cache
9 NCCL_DEBUG: INFO

Local Execution (No Cluster)

Use executor: local to run on the current machine. No $NEMORUN_HOME/executors.py entry is needed:

1nemo_run:
2 executor: local
3 ntasks_per_node: 2
4 job_name: local_test

Monitor and Manage Jobs

NeMo-Run stores experiment metadata under $NEMORUN_HOME/experiments/. Set tail_logs: true in the YAML to stream job output after submission.

For Slurm-based executors, standard Slurm commands also work:

$squeue -u $USER # List your queued and running jobs
$scancel <job_id> # Cancel a running or pending job
$sacct -j <job_id> # View job accounting information

For Kubernetes-based executors, use kubectl to monitor pods and jobs.

How It Works

  1. The automodel CLI detects the nemo_run: key and imports NemoRunLauncher.
  2. The nemo_run: section is popped from the config. The remaining training config is written to nemo_run_jobs/<timestamp>/job_config.yaml for record-keeping.
  3. The launcher loads a pre-configured executor from $NEMORUN_HOME/executors.py by name (or creates a LocalExecutor for executor: local). Override fields are applied on top of the executor defaults.
  4. The training config YAML is embedded in a self-contained inline bash script via a heredoc, so no separate file transfer is needed.
  5. A torchrun command is built with --nproc-per-node and (for multi-node) distributed rendezvous arguments.
  6. The script is submitted via nemo_run.Experiment. By default the call returns immediately (detach=True).

Customize Configuration

Override any training parameter from the command line, same as with local runs:

$automodel config_with_nemo_run.yaml \
> --model.pretrained_model_name_or_path meta-llama/Llama-3.2-3B

When to Use NeMo-Run vs. SkyPilot vs. Slurm

NeMo-RunSkyPilotSlurm (sbatch)
InfrastructureSlurm, Kubernetes, Docker, localPublic cloud (AWS, GCP, Azure)On-prem HPC
Container supportYes (Pyxis/Enroot, Docker, K8s pods)N/A (cloud VMs)Manual (in sbatch script)
Setup requirednemo-run + $NEMORUN_HOME/executors.pyCloud credentials + sky checkCluster access + sbatch script
Job submissionautomodel config.yamlautomodel config.yamlsbatch slurm.sub
Good forManaged multi-backend execution, reusable executor configsCloud burst, cost optimization, spot instancesDirect Slurm scripts, full control over sbatch