Embedding Models#
This page provides detailed technical specifications for the embedding model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.
Llama 3.2 NV EmbedQA 1B v2#
Property |
Value |
---|---|
Creator |
NVIDIA |
Architecture |
transformer |
Description |
Llama 3.2 NV EmbedQA 1B v2 is a specialized embedding model optimized for question-answering and retrieval tasks. |
Max Sequence Length |
2048 |
Parameters |
1 billion |
Training Data |
Specialized QA and retrieval datasets |
Recommended GPUs for Customization |
1 |
Default Name |
nvidia/llama-3.2-nv-embedqa-1b-v2 |
Base Model |
nvidia/llama-3.2-nv-embedqa-1b-v2 |
NGC Model URI |
|
Customization Target Configuration#
The following configuration is used for the customization target:
Configuration |
Value |
---|---|
Namespace |
nvidia |
Name |
llama-3.2-nv-embedqa-1b@v2 |
Model Path |
llama32_1b-embedding |
Base Model |
nvidia/llama-3.2-nv-embedqa-1b-v2 |
Number of Parameters |
1,000,000,000 |
Precision |
bf16-mixed |
Enabled |
Configurable (default: false) |
Training Configuration#
The model supports the following training configuration:
Training Option |
Value |
---|---|
Training Type |
SFT (Supervised Fine-Tuning) |
Fine-tuning Type |
All Weights |
Number of GPUs |
1 |
Number of Nodes |
1 |
Tensor Parallel Size |
4 |
Micro Batch Size |
8 |
Max Sequence Length |
2048 |
Prompt Template |
|
Resource Requirements#
Minimum GPU Memory: 8GB
Recommended GPU: A100
Training Time: Varies based on dataset size and epochs
Deployment with NIM#
This model supports inference deployment through NVIDIA Inference Microservices (NIM). To deploy this model for inference:
Deploy using Deployment Management Service: Follow the Deploy NIM guide to deploy the base model
Access through NIM Proxy: Once deployed, the model can be accessed through the NIM Proxy service
Fine-tuned models: After customization, the fine-tuned model can be deployed following the same NIM deployment process
Note
The embedding model requires specific NIM container images that support embedding inference. Refer to the NIM compatibility matrix for supported image versions.
Model Name Mapping#
When using this model, note the following name mappings:
API/External Name:
nvidia/llama-3.2-nv-embedqa-1b-v2
(used in inference requests and external documentation)
Example Usage#
After fine-tuning and deployment, you can use the model for embedding tasks:
# Example inference call through NIM Proxy
response = requests.post(
f"{nim_proxy_url}/v1/embeddings",
headers={"Content-Type": "application/json"},
json={
"model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
"input": ["What is the capital of France?"],
"encoding_format": "float"
}
)
For detailed fine-tuning instructions, refer to the Fine-tuning tutorial.