Nemotron Architecture#

This directory contains documentation about Nemotron’s architecture and design principles.

Overview#

Nemotron is a cookbook - a reference implementation showing best practices for training LLMs at scale. It’s not a framework you install; it’s a codebase you fork and customize.

Documents#

Quick Start#

# Run pretraining locally
nemotron nano3 pretrain -c tiny

# Submit to cluster
nemotron nano3 pretrain -c tiny --run dgx

# See what's happening (execution logic is visible in the code)
# Open: src/nemotron/cli/commands/nano3/pretrain.py

Two-Layer Architecture#

Layer

What

Where

Fork When

Execution

How to run and track experiments

cli/commands/ + nemo_runspec/

nemo-run/wandb -> SkyPilot/mlflow

Runtime

Training/data processing

recipes/

Algorithm changes

Execution Layer                    Runtime Layer (recipes/)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ cli/commands/nano3/      β”‚      β”‚ recipes/nano3/      β”‚
β”‚   pretrain.py            β”‚      β”‚   pretrain/train.py β”‚
β”‚                          β”‚      β”‚                     β”‚
β”‚ nemo_runspec (toolkit)   │─────►│ Megatron-Bridge     β”‚
β”‚   config, execution, env β”‚      β”‚                     β”‚
β”‚   artifact registry      β”‚      β”‚                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The runtime layer is typically a thin script that delegates to NVIDIA AI stack libraries. The execution layer contains all the job submission logic, which is what you’d change to swap nemo-run for SkyPilot or another backend.

Package Responsibilities#

Package

Scope

nemo_runspec

Generic CLI toolkit: PEP 723 runspec parsing, config loading, env.toml profiles, execution helpers, packaging, pipeline orchestration, artifact registry (art:// resolution, fsspec/wandb backends)

nemotron.kit

Domain-specific: artifact type definitions (pretrain data, SFT data, checkpoints), lineage trackers (W&B, file-based), W&B integration

nemotron.cli

CLI commands: visible execution logic per command, typer-based command tree

nemotron.recipes

Runtime scripts: training, data prep, RL (thin scripts delegating to NVIDIA AI stack)

Dependency direction: nemotron.cli -> nemo_runspec + nemotron.kit -> NVIDIA stack. Never the reverse.