Nemotron Training Recipes#
Open and efficient models for agentic AI. Reproducible training pipelines with transparent data, techniques, and weights.
Quick Start#
// Install the Nemotron training recipes
$ git clone https://github.com/NVIDIA/nemotron
$ cd nemotron && uv sync
// Run the Nano3 pipeline stage by stage
$ uv run nemotron nano3 data prep pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep sft --run YOUR-CLUSTER
$ uv run nemotron nano3 sft --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
$ uv run nemotron nano3 rl --run YOUR-CLUSTER
Note: The
--run YOUR-CLUSTERflag submits jobs to your configured Slurm cluster via NeMo-Run. See Execution through NeMo-Run for setup instructions.
Usage Cookbook & Examples#
Deployment guides for Nemotron models: TensorRT-LLM, vLLM, SGLang, NIM, and Hugging Face.
End-to-end applications: RAG agents, ML agents, and multi-agent systems.
Available Training Recipes#
31.6B total / 3.6B active parameters, 25T tokens, up to 1M context. Hybrid Mamba-Transformer with sparse MoE.
Stages: Pretraining β SFT β RL
Training Pipeline#
The Nemotron training pipeline has three stages, each tracked through artifact lineage:
Stage |
Name |
Description |
|---|---|---|
0 |
Base model training on large text corpus |
|
1 |
Supervised fine-tuning for instruction following |
|
2 |
Reinforcement learning for alignment |
Why Nemotron?#
Open Models |
Transparent training data, techniques, and weights for community innovation |
Compute Efficiency |
Model pruning enabling higher throughput via TensorRT-LLM |
High Accuracy |
Built on frontier open models with human-aligned reasoning |
Flexible Deployment |
Deploy anywhere: edge, single GPU, or data center with NIM |
Features#
End-to-end pipelines from raw data to deployment-ready models
Artifact lineage via W&B from data to model
Built on NVIDIAβs NeMo stack (Megatron-Bridge, NeMo-RL)
Reproducible with versioned configs, data blends, and checkpoints
Resources#
Tech Report β Nemotron 3 Nano methodology
Model Weights β pre-trained checkpoints on HuggingFace
Pre-training Datasets β open pre-training data
Post-training Datasets β SFT and RL data
Artifact Lineage β W&B integration guide