nemo_runspec Package#
The following content is pulled from the package source for convenience. The canonical source file in the repository is src/nemo_runspec/README.md.
nemo_runspec#
Bridge layer for PEP 723 [tool.runspec] metadata. Parses declarative metadata
from recipe scripts and provides the shared CLI toolkit that Nemotron commands
build on.
Philosophy#
Recipe scripts should be self-describing. Rather than scattering identity,
container images, launch methods, and resource defaults across CLI wrappers and
config files, each recipe script declares all of this as standard PEP 723 inline
metadata in a [tool.runspec] block at the top of the file. The CLI layer reads
this metadata and stays thin – it doesn’t encode policy about how to run a
script, it just asks the script what it needs. This keeps recipes portable
(any tool can read the same metadata), eliminates hidden coupling between CLI
commands and the scripts they wrap, and makes it trivial to add a new recipe:
write the script, add the [tool.runspec] block, and the CLI machinery picks
it up automatically.
What it does#
nemo_runspec solves two problems:
Runspec parsing – Extracts
[tool.runspec]TOML from PEP 723 inline script metadata blocks, returning a frozenRunspecdataclass describing a recipe’s identity, container image, launch method, config directory, and resource requirements.CLI toolkit – Provides the reusable building blocks that every recipe command needs: config loading, env.toml profile resolution, display helpers,
RecipeTyper, packaging, and nemo-run support.
Quick start#
from nemo_runspec import parse
SPEC = parse("src/nemotron/recipes/nano3/stage0_pretrain/train.py")
print(SPEC.name) # "nano3/pretrain"
print(SPEC.image) # "nvcr.io/nvidia/nemo:25.11.nemotron_3_nano"
print(SPEC.config_dir) # Path("/abs/path/to/config")
Runspec schema#
See docs/runspec/v1/spec.md for the
full [tool.runspec] specification – field reference, format, and usage guide.
Package modules#
Module |
Purpose |
|---|---|
|
PEP 723 TOML extraction and |
|
Frozen |
|
Config loading and OmegaConf resolver package |
|
Config pipeline: YAML loading, dotlist overrides, profile merging, job YAML |
|
OmegaConf resolvers: |
|
|
|
|
|
|
|
|
|
|
|
Rich display utilities for dry-run output and job submission summaries |
|
|
|
|
|
|
|
|
|
nemo-run patches (Ray CPU template, rsync host key handling) |
|
Pipeline orchestration: local subprocess piping, nemo-run, and sbatch launchers |
|
Execution helpers: startup commands, env vars, executor creation, local run |
|
|
|
Container squash utilities (Docker to enroot sqsh, ensure squashed on cluster) |
|
Custom Ray CPU Slurm template ( |
|
Evaluator helpers: task flag parsing, W&B injection, config save, image collection |
|
Shared utilities like |
env.toml#
Environment configuration uses TOML profiles with inheritance:
[base]
executor = "slurm"
account = "my-account"
remote_job_dir = "/lustre/jobs"
[dev]
extends = "base"
partition = "dev-gpu"
nodes = 1
[prod]
extends = "base"
partition = "prod-gpu"
nodes = 8
[wandb]
entity = "my-team"
project = "nemotron"
[artifacts]
backend = "file"
root = "/lustre/artifacts"
[cache]
git_dir = "/lustre/git-cache"
Profiles are selected via --run <profile> or --batch <profile>.
Special sections (wandb, cli, cache, artifacts) are not executor profiles.