About Nemotron Steps#

A Nemotron step is a named, reusable unit of work that you invoke with the nemotron steps CLI. Each step declares the artifacts it consumes, the artifacts it produces, and a set of named configurations that you can run on your laptop, on a single node, or on a cluster. Steps are the building blocks of every Nemotron pipeline.

This section is the entry point for the step model itself. Use it to learn what a step is, to explore the available steps from the CLI, and to find the right domain section for the work you have in mind.

The Basics#

Nemotron Steps Basics

Definitions of step, configuration, environment profile, and artifact. Start here if you have not run a step before.

Concepts

Nemotron Steps Basics

Getting Started With Steps

List the available steps, inspect their inputs and outputs, and chain steps together.

Beginner

Getting Started With Steps

Building Block Steps#

Pipelines are modular. You can run a single step in isolation, and you can compose steps into longer flows. The cards below group the available steps by the outcome they support. Follow the link in each card for tutorials, how-to guides, concepts, and reference material in that domain.

Build your own dataset

Synthetic Data Generation

Generate supervised fine-tuning (SFT) chat data, tool-calling data, or preference pairs with NeMo Data Designer. Backed by the sdg/data_designer step.

About Synthetic Data Generation

Translation

Translate JSON Lines or Apache Parquet corpora with NeMo Curator, with optional faithfulness, accuracy, integrity, and translation-quality holistic (FAITH) scoring. Backed by the translate/nemo_curator step.

Translation With Nemotron

Data Curation and Preparation

Filter raw text with curate/nemo_curator, then tokenize and shard it with the data_prep/pretrain_prep, data_prep/sft_packing, and data_prep/rl_prep steps. Use the curation docs for JSONL filtering and the training docs for data preparation.

About Data Curation With NeMo Curator

Build your own benchmarks

Multiple-Choice Question Benchmarks

Generate a custom multiple-choice question (MCQ) benchmark from your own documents, with optional translation. Backed by the byob step.

About Building Multiple-Choice Question Benchmarks

Build your own models

Model Training

Pretrain, fine-tune, align, and optimize models with the pretrain/, sft/, peft/, rl/, optimize/, and convert/ step families.

Model Training with Nemotron Steps

Model Evaluation

Score a trained checkpoint on standard benchmarks with NeMo Evaluator. Backed by the eval/model_eval step.

About Model Evaluation

Shared Infrastructure#

Every remote run depends on an environment profile that describes the cluster, the container image, the resource shape, and the mount points. The env/env_toml step generates these profile files from compact YAML templates for Lepton or Slurm. The Basics page covers profiles and the env/env_toml step in detail.

I Want To#

Goal	Go To
Learn what a step, configuration, and profile are	Nemotron Steps Basics
List the available steps from the CLI	Getting Started With Steps
Run steps in an airgap environment	Airgap
Curate JSONL text	Data Curation
Generate synthetic training data	Synthetic Data Generation
Translate a corpus	Translation
Build an MCQ benchmark	Build MCQ Benchmarks
Fine-tune or align a model	Model Training
Evaluate a model	Model Evaluation
Set up a Lepton or Slurm environment profile	Nemotron Steps Basics