Reranking Models#

Introduction#

Reranking models use cross-encoders to score a query-document pair jointly. They are typically used after an embedding model has produced an initial candidate set. NeMo AutoModel supports optimized bidirectional Llama rerankers and falls back to Hugging Face AutoModelForSequenceClassification for other architectures.

For first-stage dense retrieval, see Embedding Models.

Optimized Backbones (Bidirectional Attention)#

Owner	Model	Architecture	Wrapper Class	Tasks
NVIDIA	llama-nemotron-rerank-1b-v2	`LlamaBidirectionalForSequenceClassification`	`NeMoAutoModelCrossEncoder`	Reranking

Hugging Face Auto Backbones#

Any Hugging Face model loadable using AutoModelForSequenceClassification can be used as a reranking backbone. This fallback path uses the model’s native attention; no bidirectional conversion is applied.

Supported Workflows#

Fine-tuning (Cross-Encoder): Cross-entropy training on query-document pairs to produce rerankers
LoRA/PEFT: Parameter-efficient fine-tuning for reranking backbones

Dataset#

Retrieval fine-tuning requires query-document pairs: each example is a query paired with one positive document and one or more negative documents. Both inline JSONL and corpus ID-based JSON formats are supported. See the Retrieval Dataset guide.