Choose an SFT Backend#
Supervised fine tuning (SFT) is implemented by two interchangeable steps. Pick one step based on data format, checkpoint format, and scale.
Options#
Step id |
Best when |
Primary input artifact |
Primary output artifact |
|---|---|---|---|
|
You have OpenAI chat-formatted JSON Lines (JSONL), you want Hugging Face style checkpoints, or you want the smallest cluster footprint for iteration |
|
|
|
You need distributed Megatron Bridge training with packed sequences and an Apache Parquet pipeline |
|
|
Decision Flow#
If your data is already chat-formatted JSON Lines (JSONL) and downstream tools expect Hugging Face safetensors, start with
sft/automodel.If your data is packed Parquet produced by the packing prep step, or you require Megatron distributed checkpoints without an export round trip, use
sft/megatron_bridge.If you start on one backend and later need the other output format, plan an explicit conversion step in your pipeline. Do not switch backends silently without conversion.
Prerequisites for Megatron Bridge#
Megatron Bridge SFT expects packed Parquet that is compatible with the tokenizer and sequence length you will use in training. The pack size in prep must match the training sequence length. If they diverge, you risk shape errors mid-run.
Sample Commands#
$ uv run nemotron steps run sft/automodel -c tiny
$ uv run nemotron steps run sft/megatron_bridge -c tiny
Success Criteria#
The commands
nemotron steps show sft/automodelandnemotron steps show sft/megatron_bridgelist theconsumestypes your workspace must provide.Loss decreases on a small slice before you scale data or learning rate.
Tokenizer, chat template, and sequence length stay aligned with evaluation and with any later reinforcement learning (RL) step that reuses the policy.