Sequence Classification (SFT/PEFT) with NeMo Automodel#
Introduction#
Sequence classification tasks (e.g., sentiment analysis, topic classification, GLUE tasks) map input text to a discrete label. NeMo Automodel provides a lightweight recipe specialized for this setting that integrates with popular pretrained model formats and dataset sources. Integration with Hugging Face is supported.
This guide shows how to train a sequence classification model using the TrainFinetuneRecipeForSequenceClassification recipe, including optional Parameter-Efficient Fine-Tuning (LoRA).
Quickstart#
Use the example config for GLUE MRPC with RoBERTa-large + LoRA:
python3 examples/llm_seq_cls/seq_cls.py --config examples/llm_seq_cls/glue/mrpc_roberta_lora.yaml
Loads
roberta-largewithnum_labels: 2Builds GLUE MRPC datasets (train/validation)
Optionally, enables LoRA via the
peftblockTrains and validates per
step_scheduler
What is the Sequence Classification Recipe?#
TrainFinetuneRecipeForSequenceClassification is a config-driven trainer that orchestrates:
Model and optimizer construction
Dataset/Dataloader setup
Training and validation loops
Checkpointing and logging
It follows the same design as the SFT recipe in the fine-tune guide, but uses a standard cross-entropy classification loss and a simplified batching pipeline.
Minimal Config Anatomy#
# GLUE MRPC with RoBERTa-large + LoRA
step_scheduler:
global_batch_size: 32
local_batch_size: 32
ckpt_every_steps: 200
val_every_steps: 100
num_epochs: 2
max_steps: 10
dist_env:
backend: nccl
timeout_minutes: 1
model:
_target_: nemo_automodel.NeMoAutoModelForSequenceClassification.from_pretrained
pretrained_model_name_or_path: roberta-large
num_labels: 2
checkpoint:
enabled: true
checkpoint_dir: checkpoints/
model_save_format: safetensors
save_consolidated: true
distributed:
_target_: nemo_automodel.components.distributed.fsdp2.FSDP2Manager
dp_size: none
dp_replicate_size: 1
tp_size: 1
cp_size: 1
sequence_parallel: false
peft:
_target_: nemo_automodel.components._peft.lora.PeftConfig
target_modules:
- "*.query"
- "*.value"
dim: 8
alpha: 16
dropout: 0.1
dataset:
_target_: nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC
split: train
dataloader:
_target_: torchdata.stateful_dataloader.StatefulDataLoader
collate_fn: nemo_automodel.components.datasets.utils.default_collater
validation_dataset:
_target_: nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC
split: validation
validation_dataloader:
_target_: torchdata.stateful_dataloader.StatefulDataLoader
collate_fn: nemo_automodel.components.datasets.utils.default_collater
optimizer:
_target_: torch.optim.AdamW
betas: [0.9, 0.999]
eps: 1e-8
lr: 3.0e-4
weight_decay: 0
Dataset Notes#
For single-sentence datasets (e.g.,
yelp_review_full,imdb), useYelpReviewFullorIMDBfromnemo_automodel.components.datasets.llm.seq_cls.For GLUE MRPC (sentence-pair classification), use
GLUE_MRPC, which tokenizes(sentence1, sentence2)with padding/truncation.
LoRA (PEFT) Settings#
target_modules: glob to select linear layers (e.g.,"*.proj").dim(rank),alpha,dropout: tune per model/compute budget. Valuesdim=8, alpha=16, dropout=0.1are a good starting point for RoBERTa.The recipe automatically applies the adapters; no additional code changes are required.
Running with torchrun#
torchrun --nproc-per-node=2 examples/llm_seq_cls/seq_cls.py --config examples/llm_seq_cls/glue/mrpc_roberta_lora.yaml
You can adjust the number of GPUs as necessary via the --nproc-per-node knob.