sft/automodel#
This step runs supervised fine-tuning (SFT) on a Hugging Face (HF) format model from a JSON Lines (JSONL) chat dataset, using the NeMo AutoModel library.
The same step supports full fine-tuning and low-rank adaptation (LoRA) tuning, controlled by the peft parameter.
Syntax#
nemotron steps run sft/automodel \
[-c <config-name-or-path>] \
[-r <run-profile> | -b <batch-profile>] \
[-d] \
[--force-squash] \
[<dotlist-overrides>...] \
[<passthrough-args>...]
Refer to the Nemotron Steps CLI Reference for the shared flag set.
Configuration Files#
The step ships two configuration files under src/nemotron/steps/sft/automodel/config/.
File |
Purpose |
|---|---|
|
Full-shape training on |
|
Short validation run with five training steps and a 64-record dataset slice from |
Pass either name with -c:
$ nemotron steps run sft/automodel -c tiny
$ nemotron steps run sft/automodel -c default
Inputs and Outputs#
Direction |
Artifact Type |
Required |
Description |
|---|---|---|---|
Consumes |
|
Yes |
Instruction data in JSONL with a |
Produces |
|
— |
Hugging Face checkpoint directory. The output is a full model checkpoint when |
Supported Models#
The manifest declares three reference base models.
Other Hugging Face causal language models that load through AutoModelForCausalLM.from_pretrained also work.
Model |
Minimum GPUs |
Default |
Notes |
|---|---|---|---|
|
8 |
Yes |
Mixture-of-experts (MoE) base model used by |
|
4 |
No |
Common dense baseline for single-node SFT and LoRA. |
|
2 |
No |
Use this model when GPU count is small or iteration speed matters. |
Override the base model from the command line:
$ nemotron steps run sft/automodel -c default \
model.pretrained_model_name_or_path=meta-llama/Llama-3.1-8B-Instruct
Step Parameters#
The manifest declares one Nemotron-specific parameter. Pass it as a dotlist override after the options.
- peft=VALUE#
Selects adapter-style training instead of full fine-tuning. Set this value to
lorato train a LoRA adapter, or tonullfor full SFT.Choices:
lora,null.Default:
null.Example:
peft=lora
You can also override any key from the compiled YAML configuration. The frequently used keys include the following.
- step_scheduler.max_steps=N#
The maximum number of optimizer steps for the run.
Example:
step_scheduler.max_steps=200
- step_scheduler.global_batch_size=N#
The global batch size across all data-parallel workers.
Example:
step_scheduler.global_batch_size=64
- dataset.path_or_dataset_id=<id-or-path>#
The Hugging Face dataset identifier or a local path that resolves to a JSONL chat dataset.
Example:
dataset.path_or_dataset_id=/data/my-instructions.jsonl
- checkpoint.checkpoint_dir=PATH#
The directory where the AutoModel recipe writes checkpoints.
Example:
checkpoint.checkpoint_dir=/output/qwen-sft
Strategies#
The manifest records four operator strategies for sft/automodel.
When the run has one or two GPUs, or memory is tight, set
peft=loraand start from a Mistral-class model.When the run has three or four GPUs and the chosen model fits comfortably, consider full fine-tuning only when the resulting checkpoint size and iteration speed remain acceptable.
When the dataset already uses OpenAI chat-format JSONL, skip
data_prep/sft_packingand train directly fromtraining_jsonl.When you need an immediately deployable HF checkpoint, keep the safetensors save format and the consolidated HF output layout that
config/default.yamlsets.
Common Errors#
- chat_template_missing#
Cause: the tokenizer for the chosen model does not include a chat template, so the AutoModel recipe cannot render the
messagesfield into prompt-and-completion form.Recovery: choose a tokenizer with chat-template support, or convert the data to prompt-and-completion format before training.
- oom#
Cause: GPU memory is exhausted during forward or backward passes.
Recovery: set
peft=lora, reducestep_scheduler.global_batch_size, or move to a smaller model.
Command Examples#
Run the tiny validation configuration locally:
$ nemotron steps run sft/automodel -c tiny
Compile the default configuration on a Lepton profile without submitting the job:
$ nemotron steps run sft/automodel -c default -r lepton_sft_automodel --dry-run
Submit an attached LoRA run on a Lepton profile with a smaller base model:
$ nemotron steps run sft/automodel -c default -r lepton_sft_automodel \
peft=lora \
model.pretrained_model_name_or_path=mistralai/Mistral-7B-Instruct-v0.3 \
step_scheduler.max_steps=500
Submit a detached full SFT on Slurm and write checkpoints to a shared scratch directory:
$ nemotron steps run sft/automodel -c default -b slurm_sft_automodel \
peft=null \
step_scheduler.global_batch_size=64 \
checkpoint.checkpoint_dir=/lustre/runs/qwen-sft