Embedding Models#

This page provides detailed technical specifications for the embedding model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.

Llama 3.2 NV EmbedQA 1B v2#

Property

Value

Creator

NVIDIA

Architecture

transformer

Description

Llama 3.2 NV EmbedQA 1B v2 is a specialized embedding model optimized for question-answering and retrieval tasks.

Max Sequence Length

2048

Parameters

1 billion

Training Data

Specialized QA and retrieval datasets

Recommended GPUs for Customization

1

Default Name

nvidia/llama-3.2-nv-embedqa-1b-v2

Base Model

nvidia/llama-3.2-nv-embedqa-1b-v2

NGC Model URI

ngc://nvidia/nemo/llama-3_2-1b-embedding-base:0.0.1

Customization Target Configuration#

The following configuration is used for the customization target:

Configuration

Value

Namespace

nvidia

Name

llama-3.2-nv-embedqa-1b@v2

Model Path

llama32_1b-embedding

Base Model

nvidia/llama-3.2-nv-embedqa-1b-v2

Number of Parameters

1,000,000,000

Precision

bf16-mixed

Enabled

Configurable (default: false)

Training Configuration#

The model supports the following training configuration:

Training Option

Value

Training Type

SFT (Supervised Fine-Tuning)

Fine-tuning Type

All Weights

Number of GPUs

1

Number of Nodes

1

Tensor Parallel Size

4

Micro Batch Size

8

Max Sequence Length

2048

Prompt Template

{prompt} {completion}

Resource Requirements#

  • Minimum GPU Memory: 8GB

  • Recommended GPU: A100

  • Training Time: Varies based on dataset size and epochs

Hyperparameter and Data Recommendations#

This fine-tuning recipe supports full fine-tuning, updating all 1 billion parameters, and requires careful hyperparameter and data selection to prevent overfitting.

The following table provides conservative hyperparameter defaults specifically optimized to prevent overfitting for embedding models:

Parameter

API Field Name

Type

Description

Recommended Value

Learning Rate

learning_rate

number

Step size for updating model parameters. Lower values help prevent overfitting in embedding models.

5e-6

Weight Decay

weight_decay

number

Regularization parameter to prevent overfitting by penalizing large weights.

0.01

Number of Epochs

epochs

integer

Number of complete passes through the training dataset. Limited to prevent overfitting.

1

Training Data Size

N/A

N/A

Number of training examples to prevent overfitting while maintaining model performance.

5,000-10,000 examples

NVIDIA recommends evaluating fine-tuned embedding models against the baseline to detect overfitting and potential performance degradation.

Deployment with NIM#

This model supports inference deployment through NVIDIA Inference Microservices (NIM). To deploy this model for inference:

  1. Deploy using Deployment Management Service: Follow the Deploy NIM guide to deploy the base model

  2. Access through NIM Proxy: Once deployed, the model can be accessed through the NIM Proxy service

  3. Fine-tuned models: After customization, the fine-tuned model can be deployed following the same NIM deployment process

Note

The embedding model requires specific NIM container images that support embedding inference. Refer to the NIM compatibility matrix for supported image versions.

Model Name Mapping#

When using this model, note the following name mappings:

  • API/External Name: nvidia/llama-3.2-nv-embedqa-1b-v2 (used in inference requests and external documentation)

Example Usage#

After fine-tuning and deployment, you can use the model for embedding tasks:

# Example inference call through NIM Proxy
response = requests.post(
    f"{nim_proxy_url}/v1/embeddings",
    headers={"Content-Type": "application/json"},
    json={
        "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
        "input": ["What is the capital of France?"],
        "encoding_format": "float"
    }
)

For detailed fine-tuning instructions, refer to the Fine-tuning tutorial.

For more information about formatting training datasets for the embedding model, refer to Dataset Format Requirements.