Nemotron Training Recipes#

Open and efficient models for agentic AI. Reproducible training pipelines with transparent data, techniques, and weights.

Quick Start#

// Install the Nemotron training recipes
$ git clone https://github.com/NVIDIA-NeMo/Nemotron
$ cd Nemotron && uv sync

// Run a tiny SFT job on your cluster
$ uv run nemotron steps run sft/automodel -c tiny --run YOUR-CLUSTER

// Run the Nano3 pipeline stage by stage
$ uv run nemotron nano3 data prep pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep sft --run YOUR-CLUSTER
$ uv run nemotron nano3 sft --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
$ uv run nemotron nano3 rl --run YOUR-CLUSTER

Note: The --run YOUR-CLUSTER flag submits jobs to your configured Slurm cluster via NeMo-Run. See Execution through NeMo-Run for setup instructions.

Sample Deployments and Applications#

Deployment Guides

Deployment guides for Nemotron models: TensorRT-LLM, vLLM, SGLang, NIM, Hugging Face, and agent harnesses.

Deployment Guides
Sample Applications

End-to-end applications: RAG agents, ML agents, and multi-agent systems.

Application Examples

Customization Workflows with Nemotron Steps#

Translation

Translate JSONL or Parquet corpora with translate/nemo_curator, NeMo Curator backends, and optional FAITH quality scoring.

Translation With Nemotron
Build MCQ Benchmarks

Generate and translate custom multiple-choice benchmarks with byob/mcq.

About Building Multiple-Choice Question Benchmarks
Data Curation

Filter JSONL text with curate/nemo_curator before translation or training data preparation.

About Data Curation With NeMo Curator
Synthetic Data Generation

Use sdg/data_designer to produce SFT, tool-use, and preference datasets.

About Synthetic Data Generation
Model Evaluation

Evaluate hosted endpoints or checkpoints with eval/model_eval.

About Model Evaluation

Training Recipes#

Nemotron 3 Ultra

550B total / 55B active parameters, 20T tokens, up to 1M context. Hybrid Mamba-Attention MoE with LatentMoE and MTP.

Stages: Pretraining → SFT → RLVR → MOPD

Nemotron 3 Ultra Training Recipe
Nemotron 3 Super

120.6B total / 12.7B active parameters, up to 1M context. Hybrid Mamba-Transformer with sparse Latent MoE.

Stages: Pretraining → SFT → RL → Quantization → Eval

Nemotron 3 Super Training Recipe
Nemotron 3 Nano

31.6B total / 3.6B active parameters, 25T tokens, up to 1M context. Hybrid Mamba-Transformer with sparse MoE.

Stages: Pretraining → SFT → RL

Nemotron 3 Nano Training Recipe
Nemotron 3 Omni

GA-checkpoint multimodal post-training recipe with stage-local container builds and a three-step RL stack.

Stages: SFT → RL MPO → RL text → RL vision → Eval

Nemotron 3 Omni Training Recipe
Embedding Fine-Tuning

Fine-tune Llama-Nemotron-Embed-1B-v2 on domain-specific data with synthetic data generation, evaluation, and NIM deployment.

Stages: SDG → Data Prep → Finetune → Eval → Export → Deploy

Embedding Model Fine-Tuning Recipe
Reranking Fine-Tuning

Fine-tune Llama-Nemotron-Rerank-1B-v2 cross-encoders for domain-specific reranking with synthetic data generation, evaluation, export, and NIM deployment.

Stages: SDG → Data Prep → Finetune → Eval → Export → Deploy

Reranking Model Fine-Tuning Recipe

Recipe Layout#

Nemotron keeps data-producing recipes separate from model-family training recipes:

Path

Purpose

Example

src/nemotron/recipes/data/curation/

Filter, dedup, and curate existing corpora

Nemotron-CC

src/nemotron/recipes/data/sdg/

Generate synthetic datasets that can feed multiple families

Long-document SDG feeding Omni3 SFT

src/nemotron/recipes/<family>/

Family-specific training, RL, evaluation, and model lifecycle commands

Nano3, Omni3

Training Pipeline#

Each recipe family has its own stage layout, and all of them can be tracked through artifact lineage:

Family

Stage layout

Nano3

Pretraining → SFT → RL

Omni3

SFT → RL MPO → RL text → RL vision → Eval

Super3

Pretraining → SFT → RL → Quantization → Eval

Ultra3

Pretraining → SFT → RLVR → MOPD

Embed

SDG → Data Prep → Finetune → Eval → Export → Deploy

Rerank

SDG → Data Prep → Finetune → Eval → Export → Deploy

Why Nemotron?#

Open Models

Transparent training data, techniques, and weights for community innovation

Compute Efficiency

Model pruning enabling higher throughput via TensorRT-LLM

High Accuracy

Built on frontier open models with human-aligned reasoning

Flexible Deployment

Deploy anywhere: edge, single GPU, or data center with NIM

Features#

  • End-to-end pipelines from raw data to deployment-ready models

  • Artifact lineage via W&B from data to model

  • Built on NVIDIA’s NeMo stack (Megatron-Bridge, NeMo-RL)

  • Reproducible with versioned configs, data blends, and checkpoints

Resources#