Nemotron Training Recipes#
Open and efficient models for agentic AI. Reproducible training pipelines with transparent data, techniques, and weights.
Quick Start#
// Install the Nemotron training recipes
$ git clone https://github.com/NVIDIA-NeMo/Nemotron
$ cd Nemotron && uv sync
// Run a tiny SFT job on your cluster
$ uv run nemotron steps run sft/automodel -c tiny --run YOUR-CLUSTER
// Run the Nano3 pipeline stage by stage
$ uv run nemotron nano3 data prep pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep sft --run YOUR-CLUSTER
$ uv run nemotron nano3 sft --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
$ uv run nemotron nano3 rl --run YOUR-CLUSTER
Note: The
--run YOUR-CLUSTERflag submits jobs to your configured Slurm cluster via NeMo-Run. See Execution through NeMo-Run for setup instructions.
Sample Deployments and Applications#
Deployment guides for Nemotron models: TensorRT-LLM, vLLM, SGLang, NIM, Hugging Face, and agent harnesses.
End-to-end applications: RAG agents, ML agents, and multi-agent systems.
Customization Workflows with Nemotron Steps#
Translate JSONL or Parquet corpora with translate/nemo_curator, NeMo Curator
backends, and optional FAITH quality scoring.
Generate and translate custom multiple-choice benchmarks with byob/mcq.
Filter JSONL text with curate/nemo_curator before translation or training data preparation.
Use sdg/data_designer to produce SFT, tool-use, and preference datasets.
Evaluate hosted endpoints or checkpoints with eval/model_eval.
Training Recipes#
550B total / 55B active parameters, 20T tokens, up to 1M context. Hybrid Mamba-Attention MoE with LatentMoE and MTP.
Stages: Pretraining → SFT → RLVR → MOPD
120.6B total / 12.7B active parameters, up to 1M context. Hybrid Mamba-Transformer with sparse Latent MoE.
Stages: Pretraining → SFT → RL → Quantization → Eval
31.6B total / 3.6B active parameters, 25T tokens, up to 1M context. Hybrid Mamba-Transformer with sparse MoE.
Stages: Pretraining → SFT → RL
GA-checkpoint multimodal post-training recipe with stage-local container builds and a three-step RL stack.
Stages: SFT → RL MPO → RL text → RL vision → Eval
Fine-tune Llama-Nemotron-Embed-1B-v2 on domain-specific data with synthetic data generation, evaluation, and NIM deployment.
Stages: SDG → Data Prep → Finetune → Eval → Export → Deploy
Fine-tune Llama-Nemotron-Rerank-1B-v2 cross-encoders for domain-specific reranking with synthetic data generation, evaluation, export, and NIM deployment.
Stages: SDG → Data Prep → Finetune → Eval → Export → Deploy
Recipe Layout#
Nemotron keeps data-producing recipes separate from model-family training recipes:
Path |
Purpose |
Example |
|---|---|---|
|
Filter, dedup, and curate existing corpora |
|
|
Generate synthetic datasets that can feed multiple families |
Long-document SDG feeding Omni3 SFT |
|
Family-specific training, RL, evaluation, and model lifecycle commands |
Training Pipeline#
Each recipe family has its own stage layout, and all of them can be tracked through artifact lineage:
Family |
Stage layout |
|---|---|
Pretraining → SFT → RL |
|
SFT → RL MPO → RL text → RL vision → Eval |
|
Pretraining → SFT → RL → Quantization → Eval |
|
Pretraining → SFT → RLVR → MOPD |
|
SDG → Data Prep → Finetune → Eval → Export → Deploy |
|
SDG → Data Prep → Finetune → Eval → Export → Deploy |
Why Nemotron?#
Open Models |
Transparent training data, techniques, and weights for community innovation |
Compute Efficiency |
Model pruning enabling higher throughput via TensorRT-LLM |
High Accuracy |
Built on frontier open models with human-aligned reasoning |
Flexible Deployment |
Deploy anywhere: edge, single GPU, or data center with NIM |
Features#
End-to-end pipelines from raw data to deployment-ready models
Artifact lineage via W&B from data to model
Built on NVIDIA’s NeMo stack (Megatron-Bridge, NeMo-RL)
Reproducible with versioned configs, data blends, and checkpoints
Resources#
Tech Report – Nemotron 3 Nano methodology
Model Weights – pre-trained checkpoints on HuggingFace
Pre-training Datasets – open pre-training data
Post-training Datasets – SFT and RL data
Artifact Lineage – W&B integration guide
Model training steps – SFT, PEFT, RL, and optimization with
nemotron step run