Embedding Models#

Introduction#

Text embedding models transform text into dense vector representations that power semantic search, dense retrieval, retrieval-augmented generation (RAG), and classification tasks. NeMo AutoModel includes a training recipe for converting Llama decoder-only models into encoder architectures with bidirectional attention, and falls back to Hugging Face AutoModel for other encoder backbones.

For cross-encoder pairwise scoring, see Reranking Models.

Embedding models use bi-encoders to produce dense representations for queries and documents independently. They are the standard path for embedding generation and first-stage dense retrieval.

Optimized Backbones (Bidirectional Attention)#

Owner

Model

Architecture

Auto Class

Tasks

NVIDIA

Llama (Bidirectional)

LlamaBidirectionalModel

NeMoAutoModelBiEncoder

Embedding, Dense Retrieval

Mistral AI

Ministral3 (Bidirectional)

Ministral3BidirectionalModel

NeMoAutoModelBiEncoder

Embedding, Dense Retrieval

Hugging Face Auto Backbones#

Any Hugging Face model that can be loaded with AutoModel can be used as an embedding backbone. This fallback path uses the model’s native attention; no bidirectional conversion is applied.

Example Recipes#

Recipe

Description

llama3_2_1b.yaml

Bi-encoder — Llama 3.2 1B embedding model

llama_embed_nemotron_8b.yaml

Bi-encoder — Llama-Embed-Nemotron-8B reproduction recipe

[ [download}ministral3_3b_instruct.yaml <../../../examples/retrieval/bi_encoder/ministral3_3b_instruct.yaml>

Bi-encoder — Ministral3-3B recipe

Supported Workflows#

  • Fine-tuning (Bi-Encoder): Contrastive learning on query-document pairs to produce embedding models

  • LoRA/PEFT: Parameter-efficient fine-tuning for embedding backbones

  • ONNX Export: Export trained embedding models for deployment (case by case, model dependent)

Dataset#

Retrieval fine-tuning requires query-document pairs: each example is a query paired with one positive document and one or more negative documents. Both inline JSONL and corpus ID-based JSON formats are supported. See the Retrieval Dataset guide.