Embedding Models#
Introduction#
Text embedding models transform text into dense vector representations that power semantic search, dense retrieval, retrieval-augmented generation (RAG), and classification tasks. NeMo AutoModel includes a training recipe for converting Llama decoder-only models into encoder architectures with bidirectional attention, and falls back to Hugging Face AutoModel for other encoder backbones.
For cross-encoder pairwise scoring, see Reranking Models.
Embedding models use bi-encoders to produce dense representations for queries and documents independently. They are the standard path for embedding generation and first-stage dense retrieval.
Optimized Backbones (Bidirectional Attention)#
Owner |
Model |
Architecture |
Auto Class |
Tasks |
|---|---|---|---|---|
NVIDIA |
|
Embedding, Dense Retrieval |
||
Mistral AI |
|
Embedding, Dense Retrieval |
Hugging Face Auto Backbones#
Any Hugging Face model that can be loaded with AutoModel can be used as an embedding backbone. This fallback path uses the model’s native attention; no bidirectional conversion is applied.
Example Recipes#
Recipe |
Description |
|---|---|
Bi-encoder — Llama 3.2 1B embedding model |
|
Bi-encoder — Llama-Embed-Nemotron-8B reproduction recipe |
|
[ [download} |
Bi-encoder — Ministral3-3B recipe |
Supported Workflows#
Fine-tuning (Bi-Encoder): Contrastive learning on query-document pairs to produce embedding models
LoRA/PEFT: Parameter-efficient fine-tuning for embedding backbones
ONNX Export: Export trained embedding models for deployment (case by case, model dependent)
Dataset#
Retrieval fine-tuning requires query-document pairs: each example is a query paired with one positive document and one or more negative documents. Both inline JSONL and corpus ID-based JSON formats are supported. See the Retrieval Dataset guide.