Sequence Classification (SFT/PEFT) with NeMo AutoModel

Introduction

Sequence classification tasks (e.g., sentiment analysis, topic classification, GLUE tasks) map input text to a discrete label. NeMo AutoModel provides a lightweight recipe specialized for this setting that integrates with popular pretrained model formats and dataset sources. Integration with Hugging Face is supported.

This guide shows how to train a sequence classification model using the TrainFinetuneRecipeForSequenceClassification recipe, including optional Parameter-Efficient Fine-Tuning (LoRA).

Quickstart

Use the example config for GLUE MRPC with RoBERTa-large + LoRA:

$ python3 examples/llm_seq_cls/seq_cls.py --config examples/llm_seq_cls/glue/mrpc_roberta_lora.yaml

Loads roberta-large with num_labels: 2
Builds GLUE MRPC datasets (train/validation)
Optionally, enables LoRA via the peft block
Trains and validates per step_scheduler

What is the Sequence Classification Recipe?

TrainFinetuneRecipeForSequenceClassification is a config-driven trainer that orchestrates:

Model and optimizer construction
Dataset/Dataloader setup
Training and validation loops
Checkpointing and logging

It follows the same design as the SFT recipe in the fine-tune guide, but uses a standard cross-entropy classification loss and a simplified batching pipeline.

Minimal Config Anatomy

1 # GLUE MRPC with RoBERTa-large + LoRA
2 step_scheduler:
3   global_batch_size: 32
4   local_batch_size: 32
5   ckpt_every_steps: 200
6   val_every_steps: 100
7   num_epochs: 2
8   max_steps: 10
9 
10 dist_env:
11   backend: nccl
12   timeout_minutes: 1
13 
14 model:
15   _target_: nemo_automodel.NeMoAutoModelForSequenceClassification.from_pretrained
16   pretrained_model_name_or_path: roberta-large
17   num_labels: 2
18 
19 checkpoint:
20   enabled: true
21   checkpoint_dir: checkpoints/
22   model_save_format: safetensors
23   save_consolidated: final
24 
25 distributed:
26   strategy: fsdp2
27   dp_size: null
28   dp_replicate_size: null
29   tp_size: 1
30   cp_size: 1
31   sequence_parallel: false
32 
33 peft:
34   _target_: nemo_automodel.components._peft.lora.PeftConfig
35   target_modules:
36   - "*.query"
37   - "*.value"
38   dim: 8
39   alpha: 16
40   dropout: 0.1
41 
42 dataset:
43   _target_: nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC
44   split: train
45 
46 dataloader:
47   _target_: torchdata.stateful_dataloader.StatefulDataLoader
48   collate_fn: nemo_automodel.components.datasets.utils.default_collater
49 
50 validation_dataset:
51   _target_: nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC
52   split: validation
53 
54 validation_dataloader:
55   _target_: torchdata.stateful_dataloader.StatefulDataLoader
56   collate_fn: nemo_automodel.components.datasets.utils.default_collater
57 
58 optimizer:
59   _target_: torch.optim.AdamW
60   betas: [0.9, 0.999]
61   eps: 1e-8
62   lr: 3.0e-4
63   weight_decay: 0

Dataset Notes

For single-sentence datasets (e.g., yelp_review_full, imdb), use YelpReviewFull or IMDB from nemo_automodel.components.datasets.llm.seq_cls.
For GLUE MRPC (sentence-pair classification), use GLUE_MRPC, which tokenizes (sentence1, sentence2) with padding/truncation.

LoRA (PEFT) Settings

target_modules: glob to select linear layers (e.g., "*.proj").
dim (rank), alpha, dropout: tune per model/compute budget. Values dim=8, alpha=16, dropout=0.1 are a good starting point for RoBERTa.
The recipe automatically applies the adapters; no additional code changes are required.

Running with torchrun

$ torchrun --nproc-per-node=2 examples/llm_seq_cls/seq_cls.py --config examples/llm_seq_cls/glue/mrpc_roberta_lora.yaml

You can adjust the number of GPUs as necessary using the --nproc-per-node knob.