Sequence Classification (SFT/PEFT) with NeMo AutoModel

View as Markdown

Introduction

Sequence classification tasks (e.g., sentiment analysis, topic classification, GLUE tasks) map input text to a discrete label. NeMo AutoModel provides a lightweight recipe specialized for this setting that integrates with popular pretrained model formats and dataset sources. Integration with Hugging Face is supported.

This guide shows how to train a sequence classification model using the TrainFinetuneRecipeForSequenceClassification recipe, including optional Parameter-Efficient Fine-Tuning (LoRA).

Quickstart

Use the example config for GLUE MRPC with RoBERTa-large + LoRA:

$python3 examples/llm_seq_cls/seq_cls.py --config examples/llm_seq_cls/glue/mrpc_roberta_lora.yaml
  • Loads roberta-large with num_labels: 2
  • Builds GLUE MRPC datasets (train/validation)
  • Optionally, enables LoRA via the peft block
  • Trains and validates per step_scheduler

What is the Sequence Classification Recipe?

TrainFinetuneRecipeForSequenceClassification is a config-driven trainer that orchestrates:

  • Model and optimizer construction
  • Dataset/Dataloader setup
  • Training and validation loops
  • Checkpointing and logging

It follows the same design as the SFT recipe in the fine-tune guide, but uses a standard cross-entropy classification loss and a simplified batching pipeline.

Minimal Config Anatomy

1# GLUE MRPC with RoBERTa-large + LoRA
2step_scheduler:
3 global_batch_size: 32
4 local_batch_size: 32
5 ckpt_every_steps: 200
6 val_every_steps: 100
7 num_epochs: 2
8 max_steps: 10
9
10dist_env:
11 backend: nccl
12 timeout_minutes: 1
13
14model:
15 _target_: nemo_automodel.NeMoAutoModelForSequenceClassification.from_pretrained
16 pretrained_model_name_or_path: roberta-large
17 num_labels: 2
18
19checkpoint:
20 enabled: true
21 checkpoint_dir: checkpoints/
22 model_save_format: safetensors
23 save_consolidated: true
24
25distributed:
26 strategy: fsdp2
27 dp_size: null
28 dp_replicate_size: null
29 tp_size: 1
30 cp_size: 1
31 sequence_parallel: false
32
33peft:
34 _target_: nemo_automodel.components._peft.lora.PeftConfig
35 target_modules:
36 - "*.query"
37 - "*.value"
38 dim: 8
39 alpha: 16
40 dropout: 0.1
41
42dataset:
43 _target_: nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC
44 split: train
45
46dataloader:
47 _target_: torchdata.stateful_dataloader.StatefulDataLoader
48 collate_fn: nemo_automodel.components.datasets.utils.default_collater
49
50validation_dataset:
51 _target_: nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC
52 split: validation
53
54validation_dataloader:
55 _target_: torchdata.stateful_dataloader.StatefulDataLoader
56 collate_fn: nemo_automodel.components.datasets.utils.default_collater
57
58optimizer:
59 _target_: torch.optim.AdamW
60 betas: [0.9, 0.999]
61 eps: 1e-8
62 lr: 3.0e-4
63 weight_decay: 0

Dataset Notes

  • For single-sentence datasets (e.g., yelp_review_full, imdb), use YelpReviewFull or IMDB from nemo_automodel.components.datasets.llm.seq_cls.
  • For GLUE MRPC (sentence-pair classification), use GLUE_MRPC, which tokenizes (sentence1, sentence2) with padding/truncation.

LoRA (PEFT) Settings

  • target_modules: glob to select linear layers (e.g., "*.proj").
  • dim (rank), alpha, dropout: tune per model/compute budget. Values dim=8, alpha=16, dropout=0.1 are a good starting point for RoBERTa.
  • The recipe automatically applies the adapters; no additional code changes are required.

Running with torchrun

$torchrun --nproc-per-node=2 examples/llm_seq_cls/seq_cls.py --config examples/llm_seq_cls/glue/mrpc_roberta_lora.yaml

You can adjust the number of GPUs as necessary using the --nproc-per-node knob.