Embedding Models#

This page provides detailed technical specifications for the embedding model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.

Llama 3.2 NV EmbedQA 1B v2#

Property

Value

Creator

NVIDIA

Architecture

transformer

Description

Llama 3.2 NV EmbedQA 1B v2 is a specialized embedding model optimized for question-answering and retrieval tasks.

Max Sequence Length

2048

Parameters

1 billion

Training Data

Specialized QA and retrieval datasets

Recommended GPUs for Customization

1

Default Name

nvidia/llama-3.2-nv-embedqa-1b-v2

Base Model

nvidia/llama-3.2-nv-embedqa-1b-v2

NGC Model URI

ngc://nvidia/nemo/llama-3_2-1b-embedding-base:0.0.1

Customization Target Configuration#

The following configuration is used for the customization target:

Configuration

Value

Namespace

nvidia

Name

llama-3.2-nv-embedqa-1b@v2

Model Path

llama32_1b-embedding

Base Model

nvidia/llama-3.2-nv-embedqa-1b-v2

Number of Parameters

1,000,000,000

Precision

bf16-mixed

Enabled

Configurable (default: false)

Training Configuration#

The model supports the following training configuration:

Training Option

Value

Training Type

SFT (Supervised Fine-Tuning)

Fine-tuning Type

All Weights

Number of GPUs

1

Number of Nodes

1

Tensor Parallel Size

4

Micro Batch Size

8

Max Sequence Length

2048

Prompt Template

{prompt} {completion}

Resource Requirements#

  • Minimum GPU Memory: 8GB

  • Recommended GPU: A100

  • Training Time: Varies based on dataset size and epochs

Deployment with NIM#

This model supports inference deployment through NVIDIA Inference Microservices (NIM). To deploy this model for inference:

  1. Deploy using Deployment Management Service: Follow the Deploy NIM guide to deploy the base model

  2. Access through NIM Proxy: Once deployed, the model can be accessed through the NIM Proxy service

  3. Fine-tuned models: After customization, the fine-tuned model can be deployed following the same NIM deployment process

Note

The embedding model requires specific NIM container images that support embedding inference. Refer to the NIM compatibility matrix for supported image versions.

Model Name Mapping#

When using this model, note the following name mappings:

  • API/External Name: nvidia/llama-3.2-nv-embedqa-1b-v2 (used in inference requests and external documentation)

Example Usage#

After fine-tuning and deployment, you can use the model for embedding tasks:

# Example inference call through NIM Proxy
response = requests.post(
    f"{nim_proxy_url}/v1/embeddings",
    headers={"Content-Type": "application/json"},
    json={
        "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
        "input": ["What is the capital of France?"],
        "encoding_format": "float"
    }
)

For detailed fine-tuning instructions, refer to the Fine-tuning tutorial.