Executors#
An execution unit is a (task, executor) pair. The task defines what to run; the executor defines where and how. NeMo-Run keeps these two concerns separate so you can swap executors without changing your task configuration.
Choose an executor#
Pick the executor that matches your environment:
Executor |
When to use |
Setup cost |
|---|---|---|
Prototyping, debugging, CI |
None — works out of the box |
|
Reproducible local runs, container-based workflows |
Docker installed & running |
|
HPC clusters with Slurm and Pyxis |
SSH access to a Slurm cluster |
|
Multi-cloud: AWS, GCP, Azure, Kubernetes |
|
|
NVIDIA DGX Cloud via Run:ai |
Pod access + PVC on DGX Cloud |
|
NVIDIA DGX Cloud Lepton (standard execution) |
Lepton CLI installed & authenticated |
|
Distributed training via Kubeflow Training Operator v2 |
kubectl + Kubeflow Training Operator v2 |
|
Ray workloads on Kubernetes |
kubectl + KubeRay operator |
Packager support matrix#
The packager controls how your code is bundled and sent to the execution environment.
Executor |
Packagers |
|---|---|
LocalExecutor |
|
DockerExecutor |
|
SlurmExecutor |
|
SkypilotExecutor |
|
DGXCloudExecutor |
|
LeptonExecutor |
|
KubeflowExecutor |
|
See Execution — Packagers for a description of each packager.
Launcher support#
The launcher controls how the process is started inside the executor.
Launcher |
Flag |
Description |
|---|---|---|
Default |
|
Direct subprocess — no special launcher |
Torchrun |
|
Distributed training via |
Fault Tolerance |
|
NVIDIA fault-tolerant launcher |
SlurmRay |
|
Ray cluster on Slurm (see ray.md) |
See Execution — Launchers for details.