About Nemotron Steps#

A Nemotron step is a named, reusable unit of work that you invoke with the nemotron steps CLI. Each step declares the artifacts it consumes, the artifacts it produces, and a set of named configurations that you can run on your laptop, on a single node, or on a cluster. Steps are the building blocks of every Nemotron pipeline.

This section is the entry point for the step model itself. Use it to learn what a step is, to explore the available steps from the CLI, and to find the right domain section for the work you have in mind.

The Basics#

Nemotron Steps Basics

Definitions of step, configuration, environment profile, and artifact. Start here if you have not run a step before.

Nemotron Steps Basics
Getting Started With Steps

List the available steps, inspect their inputs and outputs, and chain steps together.

Getting Started With Steps

Building Block Steps#

Pipelines are modular. You can run a single step in isolation, and you can compose steps into longer flows. The cards below group the available steps by the outcome they support. Follow the link in each card for tutorials, how-to guides, concepts, and reference material in that domain.

Build your own dataset

Synthetic Data Generation

Generate supervised fine-tuning (SFT) chat data, tool-calling data, or preference pairs with NeMo Data Designer. Backed by the sdg/data_designer step.

About Synthetic Data Generation
Translation

Translate JSON Lines or Apache Parquet corpora with NeMo Curator, with optional faithfulness, accuracy, integrity, and translation-quality holistic (FAITH) scoring. Backed by the translate/nemo_curator step.

Translation With Nemotron
Data Curation and Preparation

Filter raw text with curate/nemo_curator, then tokenize and shard it with the data_prep/pretrain_prep, data_prep/sft_packing, and data_prep/rl_prep steps. Use the curation docs for JSONL filtering and the training docs for data preparation.

About Data Curation With NeMo Curator

Build your own benchmarks

Multiple-Choice Question Benchmarks

Generate a custom multiple-choice question (MCQ) benchmark from your own documents, with optional translation. Backed by the byob step.

About Building Multiple-Choice Question Benchmarks

Build your own models

Model Training

Pretrain, fine-tune, align, and optimize models with the pretrain/, sft/, peft/, rl/, optimize/, and convert/ step families.

Model Training with Nemotron Steps
Model Evaluation

Score a trained checkpoint on standard benchmarks with NeMo Evaluator. Backed by the eval/model_eval step.

About Model Evaluation

Shared Infrastructure#

Every remote run depends on an environment profile that describes the cluster, the container image, the resource shape, and the mount points. The env/env_toml step generates these profile files from compact YAML templates for Lepton or Slurm. The Basics page covers profiles and the env/env_toml step in detail.

I Want To#

Goal

Go To

Learn what a step, configuration, and profile are

Nemotron Steps Basics

List the available steps from the CLI

Getting Started With Steps

Run steps in an airgap environment

Airgap

Curate JSONL text

Data Curation

Generate synthetic training data

Synthetic Data Generation

Translate a corpus

Translation

Build an MCQ benchmark

Build MCQ Benchmarks

Fine-tune or align a model

Model Training

Evaluate a model

Model Evaluation

Set up a Lepton or Slurm environment profile

Nemotron Steps Basics