Runspec: [tool.runspec] Specification#
What is runspec?#
Runspec is a declarative metadata format for recipe scripts. Each recipe embeds
a [tool.runspec] block inside standard PEP 723 inline script metadata,
describing what the script is and what it needs to run — identity, container
image, launch method, config layout, and resource defaults.
This keeps recipes self-describing and portable. Any tool (CLI, CI, notebook, another cookbook) can read the same metadata without importing cookbook-specific code.
Motivation#
Nemotron is a cookbook — a reference implementation you fork and customize. The key architectural goal is that an agent (or human) can freely port recipes between workflow engines (nemo-run, SkyPilot, plain bash) without reverse-engineering how a script should be launched. Runspec makes this possible by separating what a script needs (declarative metadata) from how it gets run (execution logic in CLI commands).
Without runspec, recipe metadata is scattered across CLI wrappers, config files, and docstrings. Adding a new recipe means editing multiple files. Porting to a different workflow engine means tracing through code to rediscover identity, image, launch method, and resource requirements. Runspec collapses all of this into the script itself:
Identity: what is this recipe called?
Environment: what container image and setup does it need?
Launch: how should it be started (torchrun, ray, direct)?
Config: where do YAML configs live relative to the script?
Resources: what are sensible defaults for nodes and GPUs?
Because this metadata is declarative and machine-readable, any tool can consume it — a CLI, a CI pipeline, a notebook, or an agent building a completely different execution backend. See the Design Philosophy for the broader principles behind this approach.
Format#
Runspec lives inside PEP 723 inline script metadata — the # /// script block
at the top of a Python file. Inside that block, runspec uses the [tool.runspec]
TOML table and its sub-tables.
#!/usr/bin/env python3
# /// script
# [tool.runspec]
# schema = "1"
# docs = "https://raw.githubusercontent.com/NVIDIA-NeMo/Nemotron/main/docs/runspec/v1/spec.md"
# name = "nano3/pretrain"
# image = "nvcr.io/nvidia/nemo:25.11.nemotron_3_nano"
# setup = "NeMo and all training dependencies are pre-installed in the image."
#
# [tool.runspec.run]
# launch = "torchrun"
#
# [tool.runspec.config]
# dir = "./config"
# default = "default"
# format = "omegaconf"
#
# [tool.runspec.resources]
# nodes = 1
# gpus_per_node = 8
# ///
Each line is prefixed with # per PEP 723 convention. The block starts with
# /// script and ends with # ///.
Schema (v1)#
[tool.runspec] — Top-level#
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Schema version. Always |
|
|
|
URL to this specification. Enables agents and tools to fetch the spec for context. Use a raw GitHub URL (e.g., |
|
|
|
Recipe identity (e.g., |
|
|
|
Default container image for remote execution. |
|
|
|
Human-readable description of what the image provides or what setup is needed. |
[tool.runspec.run] — Launch configuration#
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
How to start the script: |
|
|
|
Command template. Supports |
|
|
|
Working directory inside the container (e.g., |
[tool.runspec.config] — Config layout#
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Config directory, relative to the script file. |
|
|
|
Name of the default config (without extension). |
|
|
|
Config format: |
[tool.runspec.resources] — Resource defaults#
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Default number of nodes. |
|
|
|
Default GPUs per node. |
[tool.runspec.env] — Environment variables#
An optional key-value table of environment variables to set at runtime:
[tool.runspec.env]
NCCL_NVLS_ENABLE = "0"
How the CLI uses runspec#
A typical CLI command reads runspec at import time and uses it throughout:
from nemo_runspec import parse as parse_runspec
SCRIPT_PATH = "src/nemotron/recipes/nano3/stage0_pretrain/train.py"
SPEC = parse_runspec(SCRIPT_PATH)
# SPEC.name → "nano3/pretrain"
# SPEC.image → container image for remote execution
# SPEC.config_dir → absolute path to config directory
# SPEC.config.default → "default" config name
# SPEC.run.launch → "torchrun"
The command file contains the visible execution logic — runspec tells it what to run, the command decides how (local, slurm, ray, etc.).
Adding a new recipe#
Write the training/data-prep script
Add a
[tool.runspec]block at the topCreate a CLI command that calls
parse_runspec()on itRegister the command in the typer group
The runspec block is the single source of truth. No need to duplicate metadata in the CLI layer.
Reference implementation#
The nemo_runspec Python package (src/nemo_runspec/) implements:
Parsing:
_parser.pyextracts the PEP 723 block and parses the TOMLModels:
_models.pydefines the frozenRunspecdataclassCLI toolkit: config loading, env.toml profiles, RecipeTyper, packaging, etc.
See src/nemo_runspec/README.md for package-level documentation.