Embedding Models#

This page provides detailed technical specifications for the embedding model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.

Llama 3.2 NV EmbedQA 1B v2#

Property	Value
Creator	NVIDIA
Architecture	transformer
Description	Llama 3.2 NV EmbedQA 1B v2 is a specialized embedding model optimized for question-answering and retrieval tasks.
Max Sequence Length	2048
Parameters	1 billion
Training Data	Specialized QA and retrieval datasets
Recommended GPUs for Customization	1
Default Name	nvidia/llama-3.2-nv-embedqa-1b-v2
Base Model	nvidia/llama-3.2-nv-embedqa-1b-v2
NGC Model URI	`ngc://nvidia/nemo/llama-3_2-1b-embedding-base:0.0.1`

Customization Target Configuration#

The following configuration is used for the customization target:

Configuration	Value
Namespace	nvidia
Name	llama-3.2-nv-embedqa-1b@v2
Model Path	llama32_1b-embedding
Base Model	nvidia/llama-3.2-nv-embedqa-1b-v2
Number of Parameters	1,000,000,000
Precision	bf16-mixed
Enabled	Configurable (default: false)

Training Configuration#

The model supports the following training configuration:

Training Option	Value
Training Type	SFT (Supervised Fine-Tuning)
Fine-tuning Type	All Weights
Number of GPUs	1
Number of Nodes	1
Tensor Parallel Size	4
Micro Batch Size	8
Max Sequence Length	2048
Prompt Template	`{prompt} {completion}`

Resource Requirements#

Minimum GPU Memory: 8GB
Recommended GPU: A100
Training Time: Varies based on dataset size and epochs

Deployment with NIM#

This model supports inference deployment through NVIDIA Inference Microservices (NIM). To deploy this model for inference:

Deploy using Deployment Management Service: Follow the Deploy NIM guide to deploy the base model
Access through NIM Proxy: Once deployed, the model can be accessed through the NIM Proxy service
Fine-tuned models: After customization, the fine-tuned model can be deployed following the same NIM deployment process

Note

The embedding model requires specific NIM container images that support embedding inference. Refer to the NIM compatibility matrix for supported image versions.

Model Name Mapping#

When using this model, note the following name mappings:

API/External Name: nvidia/llama-3.2-nv-embedqa-1b-v2 (used in inference requests and external documentation)

Example Usage#

After fine-tuning and deployment, you can use the model for embedding tasks:

# Example inference call through NIM Proxy
response = requests.post(
    f"{nim_proxy_url}/v1/embeddings",
    headers={"Content-Type": "application/json"},
    json={
        "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
        "input": ["What is the capital of France?"],
        "encoding_format": "float"
    }
)

For detailed fine-tuning instructions, refer to the Fine-tuning tutorial.