Sequence Classification (SFT/PEFT) with NeMo AutoModel
Sequence Classification (SFT/PEFT) with NeMo AutoModel
Introduction
Sequence classification tasks (e.g., sentiment analysis, topic classification, GLUE tasks) map input text to a discrete label. NeMo AutoModel provides a lightweight recipe specialized for this setting that integrates with popular pretrained model formats and dataset sources. Integration with Hugging Face is supported.
This guide shows how to train a sequence classification model using the TrainFinetuneRecipeForSequenceClassification recipe, including optional Parameter-Efficient Fine-Tuning (LoRA).
Quickstart
Use the example config for GLUE MRPC with RoBERTa-large + LoRA:
- Loads
roberta-largewithnum_labels: 2 - Builds GLUE MRPC datasets (train/validation)
- Optionally, enables LoRA via the
peftblock - Trains and validates per
step_scheduler
What is the Sequence Classification Recipe?
TrainFinetuneRecipeForSequenceClassification is a config-driven trainer that orchestrates:
- Model and optimizer construction
- Dataset/Dataloader setup
- Training and validation loops
- Checkpointing and logging
It follows the same design as the SFT recipe in the fine-tune guide, but uses a standard cross-entropy classification loss and a simplified batching pipeline.
Minimal Config Anatomy
Dataset Notes
- For single-sentence datasets (e.g.,
yelp_review_full,imdb), useYelpReviewFullorIMDBfromnemo_automodel.components.datasets.llm.seq_cls. - For GLUE MRPC (sentence-pair classification), use
GLUE_MRPC, which tokenizes(sentence1, sentence2)with padding/truncation.
LoRA (PEFT) Settings
target_modules: glob to select linear layers (e.g.,"*.proj").dim(rank),alpha,dropout: tune per model/compute budget. Valuesdim=8, alpha=16, dropout=0.1are a good starting point for RoBERTa.- The recipe automatically applies the adapters; no additional code changes are required.
Running with torchrun
You can adjust the number of GPUs as necessary using the --nproc-per-node knob.