Nemotron 3 Nano Training Recipe#
Reproducible training pipeline for Nemotron 3 Nano, an open Mixture-of-Experts hybrid Mamba-Transformer model optimized for agentic reasoning.
Quick Start#
Prerequisites#
Slurm cluster with GPU nodes (H100 recommended). See Execution through NeMo-Run
Weights & Biases account for experiment tracking and artifact lineage
Container images:
Training:
nvcr.io/nvidia/nemo:25.11.nemotron_3_nanoRL:
nvcr.io/nvidia/nemo-rl:v0.4.0.nemotron_3_nano
Installation#
git clone https://github.com/NVIDIA/nemotron
cd nemotron
uv sync
Configuration#
Create an env.toml file (see Execution through NeMo-Run for details):
[wandb]
project = "nemotron"
entity = "YOUR-TEAM"
[YOUR-CLUSTER]
executor = "slurm"
account = "YOUR-ACCOUNT"
partition = "batch"
nodes = 2
ntasks_per_node = 8
gpus_per_node = 8
mounts = ["/lustre:/lustre"]
Run the Pipeline#
// Stage 0: Pretraining
$ uv run nemotron nano3 data prep pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 pretrain --run YOUR-CLUSTER
// Stage 1: Supervised Fine-Tuning
$ uv run nemotron nano3 data prep sft --run YOUR-CLUSTER
$ uv run nemotron nano3 sft --run YOUR-CLUSTER
// Stage 2: Reinforcement Learning
$ uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
$ uv run nemotron nano3 rl --run YOUR-CLUSTER
// Compose pretrain + SFT as a single nemo-run Experiment
$ uv run nemotron nano3 pipe --run YOUR-CLUSTER
Note: The
pipecommand composes pretrain โ SFT into a single nemo-run Experiment for coordinated remote execution. RL uses Ray and must be run separately.
Resources#
Tech Report: Nemotron 3 Nano Technical Report
Model Weights:
NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 (Base model)
NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (Instruct model)
NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 (FP8 quantized)
Model Collection: NVIDIA Nemotron v3 Collection
Training Datasets:
Pre-training Datasets (Open pre-training data)
Post-training Datasets (SFT and RL data)
Training Pipeline#
Stage |
Name |
Purpose |
Guide |
|---|---|---|---|
0 |
Base model on 25T tokens with curriculum learning |
||
1 |
Multi-domain instruction tuning with 12+ data sources |
||
2 |
GRPO alignment with multi-environment rewards |
||
3 |
Benchmark evaluation with NeMo Evaluator |
Model Specifications#
Specification |
Value |
|---|---|
Total Parameters |
31.6B |
Active Parameters |
3.6B (per forward pass) |
Pretraining Tokens |
25 trillion |
Context Length |
Up to 1M tokens |
Architecture |
Hybrid Mamba-Transformer with sparse MoE |
For architecture details, see Tech Report Section 2.1.
Stage Summaries#
Stage 0: Pretraining#
Two-phase curriculum on 25 trillion tokens: Phase 1 (23.5T) focuses on diversity across web, code, math, and multilingual data; Phase 2 (1.5T) emphasizes high-quality sources. Includes long-context extension to 1M tokens.
Stage 1: Supervised Fine-Tuning#
Multi-domain instruction tuning covering 12+ data domains including competition math/code, InfinityByte cross-domain synthesis, STEM reasoning, conversational tool use, and multilingual support.
โ SFT Guide
Stage 2: Reinforcement Learning#
Multi-environment RLVR training across 7 reward environments using GRPO, plus GenRM-based RLHF and DPO for reducing tool hallucination.
โ RL Guide
Execution Options#
All commands support NeMo-Run execution modes:
Option |
Behavior |
Use Case |
|---|---|---|
|
Attachedโsubmits job and streams logs |
Interactive development |
|
Detachedโsubmits and exits immediately |
Long-running jobs |
|
Preview execution plan |
Validation |
See Execution through NeMo-Run for profile configuration and advanced options.
Artifact Lineage#
The pipeline tracks lineage via W&B Artifacts, so you can trace any model back to the data it was trained on.
%%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333', 'clusterBkg': '#ffffff', 'clusterBorder': '#333333'}}}%%
flowchart TB
subgraph pretrain["Stage 0: Pretraining"]
raw["Raw Text Data"] --> data0["PretrainBlendsArtifact<br/>(bin/idx)"]
data0 --> cmd0["uv run nemotron nano3 pretrain"]
cmd0 --> model0["ModelArtifact-pretrain"]
end
subgraph sft["Stage 1: SFT"]
data1["SFTDataArtifact<br/>(Parquet)"] --> cmd1["uv run nemotron nano3 sft"]
model0 --> cmd1
cmd1 --> model1["ModelArtifact-sft"]
end
subgraph rl["Stage 2: RL"]
data2["SplitJsonlDataArtifact<br/>(JSONL)"] --> cmd2["uv run nemotron nano3 rl"]
model1 --> cmd2
cmd2 --> model2["ModelArtifact-rl<br/>(Final Model)"]
end
style pretrain fill:#e1f5fe,stroke:#2196f3
style sft fill:#f3e5f5,stroke:#9c27b0
style rl fill:#e8f5e9,stroke:#4caf50
Open-Source Data#
Note: These recipes train exclusively on the open-sourced subset of training data. Results will differ from the tech report benchmarks, which used additional proprietary data. Use these recipes as reference implementations to apply the methodology with your own data.
Coming Soon#
Native integrations with NVIDIAโs NeMo ecosystem:
Tool |
Description |
Status |
|---|---|---|
Data curation: deduplication, quality filtering, PII removal |
Planned |
|
Synthetic data generation for instruction tuning and alignment |
Planned |
|
Model export to TensorRT-LLM and deployment |
Planned |
|
Model evaluation and benchmarking |
Planned |
These integrations will connect data curation directly to model evaluation.
CLI Reference#
// Show available commands
$ uv run nemotron nano3 --help
Usage: nemotron nano3 [OPTIONS] COMMAND [ARGS]...
Nano3 training recipe
โญโ Commands โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ data Data curation and preparation commands โ
โ model Model evaluation and import commands โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Training Stages โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ pretrain Run pretraining with Megatron-Bridge (stage0). โ
โ sft Run supervised fine-tuning with Megatron-Bridge (stage1). โ
โ rl Run reinforcement learning with NeMo-RL GRPO (stage2). โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Evaluation โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ eval Run model evaluation with NeMo Evaluator. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Pipeline โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ pipe Compose pretrain โ SFT into a single nemo-run Experiment. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
// View training command help (SFT example with artifact overrides)
$ uv run nemotron nano3 sft --help
Usage: nemotron nano3 sft [OPTIONS]
Run supervised fine-tuning with Megatron-Bridge (stage1).
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help -h Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Global Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ -c, --config NAME Config name or path โ
โ -r, --run PROFILE Submit to cluster (attached) โ
โ -b, --batch PROFILE Submit to cluster (detached) โ
โ -d, --dry-run Preview config without execution โ
โ --stage Stage files for interactive debugging โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Configs (-c/--config) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Built-in: default, tiny โ
โ Custom: -c /path/to/your/config.yaml โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Artifact Overrides (W&B artifact references) โโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ run.model Base model checkpoint artifact โ
โ run.data SFT data artifact (Packed Parquet) โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Run Overrides (override env.toml settings) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ run.env.nodes Number of nodes โ
โ run.env.nproc_per_node GPUs per node โ
โ run.env.partition Slurm partition โ
โ run.env.account Slurm account โ
โ run.env.time Job time limit (e.g., 04:00:00) โ
โ run.env.container_image Override container image โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ env.toml Profiles โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Available profiles: YOUR-CLUSTER, YOUR-CLUSTER-large โ
โ Usage: --run PROFILE or --batch PROFILE โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Examples โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ $ ... sft -c tiny Local execution โ
โ $ ... sft -c tiny --dry-run Preview config โ
โ $ ... sft -c tiny --run my-cluster Submit to cluster โ
โ $ ... sft -c tiny -r cluster run.env.nodes=4 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Troubleshooting#
W&B authentication: See W&B Integration for setup.
wandb login
Container not found: Verify image path in config files.
Job submission fails: Check Slurm account and partition in env.toml. See Execution through NeMo-Run.