Runspec: `[tool.runspec]` Specification#

What is runspec?#

Runspec is a declarative metadata format for recipe scripts. Each recipe embeds a [tool.runspec] block inside standard PEP 723 inline script metadata, describing what the script is and what it needs to run — identity, container image, launch method, config layout, and resource defaults.

This keeps recipes self-describing and portable. Any tool (CLI, CI, notebook, another cookbook) can read the same metadata without importing cookbook-specific code.

Motivation#

Nemotron is a cookbook — a reference implementation you fork and customize. The key architectural goal is that an agent (or human) can freely port recipes between workflow engines (nemo-run, SkyPilot, plain bash) without reverse-engineering how a script should be launched. Runspec makes this possible by separating what a script needs (declarative metadata) from how it gets run (execution logic in CLI commands).

Without runspec, recipe metadata is scattered across CLI wrappers, config files, and docstrings. Adding a new recipe means editing multiple files. Porting to a different workflow engine means tracing through code to rediscover identity, image, launch method, and resource requirements. Runspec collapses all of this into the script itself:

Identity: what is this recipe called?
Environment: what container image and setup does it need?
Launch: how should it be started (torchrun, ray, direct)?
Config: where do YAML configs live relative to the script?
Resources: what are sensible defaults for nodes and GPUs?

Because this metadata is declarative and machine-readable, any tool can consume it — a CLI, a CI pipeline, a notebook, or an agent building a completely different execution backend. See the Design Philosophy for the broader principles behind this approach.

Format#

Runspec lives inside PEP 723 inline script metadata — the # /// script block at the top of a Python file. Inside that block, runspec uses the [tool.runspec] TOML table and its sub-tables.

#!/usr/bin/env python3
# /// script
# [tool.runspec]
# schema = "1"
# docs = "https://raw.githubusercontent.com/NVIDIA-NeMo/Nemotron/main/docs/runspec/v1/spec.md"
# name = "nano3/pretrain"
# image = "nvcr.io/nvidia/nemo:25.11.nemotron_3_nano"
# setup = "NeMo and all training dependencies are pre-installed in the image."
#
# [tool.runspec.run]
# launch = "torchrun"
#
# [tool.runspec.config]
# dir = "./config"
# default = "default"
# format = "omegaconf"
#
# [tool.runspec.resources]
# nodes = 1
# gpus_per_node = 8
# ///

Each line is prefixed with # per PEP 723 convention. The block starts with # /// script and ends with # ///.

Schema (v1)#

`[tool.runspec]` — Top-level#

Field	Type	Default	Description
`schema`	`str`	`"1"`	Schema version. Always `"1"` for now.
`docs`	`str`	`""`	URL to this specification. Enables agents and tools to fetch the spec for context. Use a raw GitHub URL (e.g., `https://raw.githubusercontent.com/NVIDIA-NeMo/Nemotron/main/docs/runspec/v1/spec.md`).
`name`	`str`	`""`	Recipe identity (e.g., `"nano3/pretrain"`, `"nano3/data/prep/sft"`). Used in job names, display, and directory layout.
`image`	`str?`	`null`	Default container image for remote execution.
`setup`	`str`	`""`	Human-readable description of what the image provides or what setup is needed.

`[tool.runspec.run]` — Launch configuration#

Field	Type	Default	Description
`launch`	`str`	`"torchrun"`	How to start the script: `"torchrun"` (distributed), `"ray"` (Ray cluster), or `"direct"` (plain python).
`cmd`	`str`	`"python {script} --config {config}"`	Command template. Supports `{script}` and `{config}` placeholders.
`workdir`	`str?`	`null`	Working directory inside the container (e.g., `"/opt/nemo-rl"`).

`[tool.runspec.config]` — Config layout#

Field	Type	Default	Description
`dir`	`str`	`"./config"`	Config directory, relative to the script file.
`default`	`str`	`"default"`	Name of the default config (without extension).
`format`	`str`	`"omegaconf"`	Config format: `"omegaconf"`, `"yaml"`, or `"json"`.

`[tool.runspec.resources]` — Resource defaults#

Field	Type	Default	Description
`nodes`	`int`	`1`	Default number of nodes.
`gpus_per_node`	`int`	`8`	Default GPUs per node.

`[tool.runspec.env]` — Environment variables#

An optional key-value table of environment variables to set at runtime:

[tool.runspec.env]
NCCL_NVLS_ENABLE = "0"

How the CLI uses runspec#

A typical CLI command reads runspec at import time and uses it throughout:

from nemo_runspec import parse as parse_runspec

SCRIPT_PATH = "src/nemotron/recipes/nano3/stage0_pretrain/train.py"
SPEC = parse_runspec(SCRIPT_PATH)

# SPEC.name        → "nano3/pretrain"
# SPEC.image       → container image for remote execution
# SPEC.config_dir  → absolute path to config directory
# SPEC.config.default → "default" config name
# SPEC.run.launch  → "torchrun"

The command file contains the visible execution logic — runspec tells it what to run, the command decides how (local, slurm, ray, etc.).

Adding a new recipe#

Write the training/data-prep script
Add a [tool.runspec] block at the top
Create a CLI command that calls parse_runspec() on it
Register the command in the typer group

The runspec block is the single source of truth. No need to duplicate metadata in the CLI layer.

Reference implementation#

The nemo_runspec Python package (src/nemo_runspec/) implements:

Parsing: _parser.py extracts the PEP 723 block and parses the TOML
Models: _models.py defines the frozen Runspec dataclass
CLI toolkit: config loading, env.toml profiles, RecipeTyper, packaging, etc.

See src/nemo_runspec/README.md for package-level documentation.

Runspec: [tool.runspec] Specification#