Nemotron 3 Nano Training Recipe#

Reproducible training pipeline for Nemotron 3 Nano, an open Mixture-of-Experts hybrid Mamba-Transformer model optimized for agentic reasoning.

Quick Start#

Prerequisites#

Slurm cluster with GPU nodes (H100 recommended). See Execution through NeMo-Run
Weights & Biases account for experiment tracking and artifact lineage
Container images:
- Training: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano
- RL: nvcr.io/nvidia/nemo-rl:v0.4.0.nemotron_3_nano

Installation#

git clone https://github.com/NVIDIA/nemotron
cd nemotron
uv sync

Configuration#

Create an env.toml file (see Execution through NeMo-Run for details):

[wandb]
project = "nemotron"
entity = "YOUR-TEAM"

[YOUR-CLUSTER]
executor = "slurm"
account = "YOUR-ACCOUNT"
partition = "batch"
nodes = 2
ntasks_per_node = 8
gpus_per_node = 8
mounts = ["/lustre:/lustre"]

Run the Pipeline#

// Stage 0: Pretraining
$ uv run nemotron nano3 data prep pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 pretrain --run YOUR-CLUSTER

// Stage 1: Supervised Fine-Tuning
$ uv run nemotron nano3 data prep sft --run YOUR-CLUSTER
$ uv run nemotron nano3 sft --run YOUR-CLUSTER

// Stage 2: Reinforcement Learning
$ uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
$ uv run nemotron nano3 rl --run YOUR-CLUSTER

// Compose pretrain + SFT as a single nemo-run Experiment
$ uv run nemotron nano3 pipe --run YOUR-CLUSTER

Note: The pipe command composes pretrain → SFT into a single nemo-run Experiment for coordinated remote execution. RL uses Ray and must be run separately.

Resources#

Tech Report: Nemotron 3 Nano Technical Report
Model Weights:
- NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 (Base model)
- NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (Instruct model)
- NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 (FP8 quantized)
Model Collection: NVIDIA Nemotron v3 Collection
Training Datasets:
- Pre-training Datasets (Open pre-training data)
- Post-training Datasets (SFT and RL data)

Training Pipeline#

Stage	Name	Purpose	Guide
0	Pretraining	Base model on 25T tokens with curriculum learning	pretrain.md
1	SFT	Multi-domain instruction tuning with 12+ data sources	sft.md
2	RL	GRPO alignment with multi-environment rewards	rl.md
3	Evaluation	Benchmark evaluation with NeMo Evaluator	evaluate.md

Model Specifications#

Specification	Value
Total Parameters	31.6B
Active Parameters	3.6B (per forward pass)
Pretraining Tokens	25 trillion
Context Length	Up to 1M tokens
Architecture	Hybrid Mamba-Transformer with sparse MoE

For architecture details, see Tech Report Section 2.1.

Stage Summaries#

Stage 0: Pretraining#

Two-phase curriculum on 25 trillion tokens: Phase 1 (23.5T) focuses on diversity across web, code, math, and multilingual data; Phase 2 (1.5T) emphasizes high-quality sources. Includes long-context extension to 1M tokens.

→ Pretraining Guide

Stage 1: Supervised Fine-Tuning#

Multi-domain instruction tuning covering 12+ data domains including competition math/code, InfinityByte cross-domain synthesis, STEM reasoning, conversational tool use, and multilingual support.

→ SFT Guide

Stage 2: Reinforcement Learning#

Multi-environment RLVR training across 7 reward environments using GRPO, plus GenRM-based RLHF and DPO for reducing tool hallucination.

→ RL Guide

Execution Options#

All commands support NeMo-Run execution modes:

Option	Behavior	Use Case
`--run <profile>`	Attached—submits job and streams logs	Interactive development
`--batch <profile>`	Detached—submits and exits immediately	Long-running jobs
`--dry-run`	Preview execution plan	Validation

See Execution through NeMo-Run for profile configuration and advanced options.

Artifact Lineage#

The pipeline tracks lineage via W&B Artifacts, so you can trace any model back to the data it was trained on.

        %%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333', 'clusterBkg': '#ffffff', 'clusterBorder': '#333333'}}}%%
flowchart TB
    subgraph pretrain["Stage 0: Pretraining"]
        raw["Raw Text Data"] --> data0["PretrainBlendsArtifact<br/>(bin/idx)"]
        data0 --> cmd0["uv run nemotron nano3 pretrain"]
        cmd0 --> model0["ModelArtifact-pretrain"]
    end

    subgraph sft["Stage 1: SFT"]
        data1["SFTDataArtifact<br/>(Parquet)"] --> cmd1["uv run nemotron nano3 sft"]
        model0 --> cmd1
        cmd1 --> model1["ModelArtifact-sft"]
    end

    subgraph rl["Stage 2: RL"]
        data2["SplitJsonlDataArtifact<br/>(JSONL)"] --> cmd2["uv run nemotron nano3 rl"]
        model1 --> cmd2
        cmd2 --> model2["ModelArtifact-rl<br/>(Final Model)"]
    end

    style pretrain fill:#e1f5fe,stroke:#2196f3
    style sft fill:#f3e5f5,stroke:#9c27b0
    style rl fill:#e8f5e9,stroke:#4caf50

→ Artifact Lineage & W&B Integration

Open-Source Data#

Note: These recipes train exclusively on the open-sourced subset of training data. Results will differ from the tech report benchmarks, which used additional proprietary data. Use these recipes as reference implementations to apply the methodology with your own data.

Coming Soon#

Native integrations with NVIDIA’s NeMo ecosystem:

Tool	Description	Status
NeMo Curator	Data curation: deduplication, quality filtering, PII removal	Planned
NeMo Data Designer	Synthetic data generation for instruction tuning and alignment	Planned
NeMo Export-Deploy	Model export to TensorRT-LLM and deployment	Planned
NeMo Evaluator	Model evaluation and benchmarking	Planned

These integrations will connect data curation directly to model evaluation.

CLI Reference#

// Show available commands
$ uv run nemotron nano3 --help
Usage: nemotron nano3 [OPTIONS] COMMAND [ARGS]...

 Nano3 training recipe

╭─ Commands ───────────────────────────────────────────────────────────────╮
│ data       Data curation and preparation commands                        │
│ model      Model evaluation and import commands                          │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Training Stages ────────────────────────────────────────────────────────╮
│ pretrain   Run pretraining with Megatron-Bridge (stage0).                │
│ sft        Run supervised fine-tuning with Megatron-Bridge (stage1).     │
│ rl         Run reinforcement learning with NeMo-RL GRPO (stage2).        │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Evaluation ─────────────────────────────────────────────────────────────╮
│ eval       Run model evaluation with NeMo Evaluator.                     │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Pipeline ───────────────────────────────────────────────────────────────╮
│ pipe       Compose pretrain → SFT into a single nemo-run Experiment.     │
╰──────────────────────────────────────────────────────────────────────────╯

// View training command help (SFT example with artifact overrides)
$ uv run nemotron nano3 sft --help
Usage: nemotron nano3 sft [OPTIONS]

 Run supervised fine-tuning with Megatron-Bridge (stage1).

╭─ Options ────────────────────────────────────────────────────────────────╮
│ --help  -h        Show this message and exit.                            │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Global Options ─────────────────────────────────────────────────────────╮
│  -c, --config NAME       Config name or path                             │
│  -r, --run PROFILE       Submit to cluster (attached)                    │
│  -b, --batch PROFILE     Submit to cluster (detached)                    │
│  -d, --dry-run           Preview config without execution                │
│  --stage                 Stage files for interactive debugging           │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Configs (-c/--config) ──────────────────────────────────────────────────╮
│ Built-in: default, tiny                                                  │
│ Custom: -c /path/to/your/config.yaml                                     │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Artifact Overrides (W&B artifact references) ───────────────────────────╮
│  run.model     Base model checkpoint artifact                            │
│  run.data      SFT data artifact (Packed Parquet)                        │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Run Overrides (override env.toml settings) ─────────────────────────────╮
│  run.env.nodes               Number of nodes                             │
│  run.env.nproc_per_node      GPUs per node                               │
│  run.env.partition           Slurm partition                             │
│  run.env.account             Slurm account                               │
│  run.env.time                Job time limit (e.g., 04:00:00)             │
│  run.env.container_image     Override container image                    │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ env.toml Profiles ──────────────────────────────────────────────────────╮
│ Available profiles: YOUR-CLUSTER, YOUR-CLUSTER-large                     │
│ Usage: --run PROFILE or --batch PROFILE                                  │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Examples ───────────────────────────────────────────────────────────────╮
│ $ ... sft -c tiny                    Local execution                     │
│ $ ... sft -c tiny --dry-run          Preview config                      │
│ $ ... sft -c tiny --run my-cluster   Submit to cluster                   │
│ $ ... sft -c tiny -r cluster run.env.nodes=4                             │
╰──────────────────────────────────────────────────────────────────────────╯

Troubleshooting#

W&B authentication: See W&B Integration for setup.

wandb login

Container not found: Verify image path in config files.

Job submission fails: Check Slurm account and partition in env.toml. See Execution through NeMo-Run.