Embedding Models
This page provides detailed technical specifications for the embedding model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.
Llama Nemotron Embedding 1B v2
Model Entity Configuration
Create a Model Entity for this embedding model:
Training Options
- LoRA (merged): 1x 80GB GPU, tensor parallel size 1
- Full SFT: 1x 80GB GPU, tensor parallel size 1
Embedding models only support merged LoRA (peft with merge=True). Unmerged LoRA adapters are not supported because the embedding NIM requires ONNX format, which cannot represent standalone adapters.
Resource Requirements
- Minimum GPU Memory: 80GB
- Recommended GPU: A100
- Training Time: Varies based on dataset size and epochs
Hyperparameter and Data Recommendations
This fine-tuning recipe supports full fine-tuning, updating all 1 billion parameters, and requires careful hyperparameter and data selection to prevent overfitting.
The following table provides conservative hyperparameter defaults specifically optimized to prevent overfitting for embedding models:
NVIDIA recommends evaluating fine-tuned embedding models against the baseline to detect overfitting and potential performance degradation.
Deployment Configuration
- Full SFT and LoRA (merged):
- NIM Image:
nvcr.io/nim/nvidia/llama-nemotron-embed-1b-v2:1.13.0 - GPU Count: 1x 80GB
Deployment and Inference
This model supports inference deployment through NVIDIA Inference Microservices (NIM). After customization, access your model through the Inference Gateway:
- Deploy the model: Create a ModelDeploymentConfig and ModelDeployment to deploy your fine-tuned model. See about for details.
- Access through Inference Gateway: The Inference Gateway provides unified access to all deployed models via three routing patterns:
- Model Entity routing:
/v2/workspaces/{workspace}/inference/gateway/model/{name}/-/v1/embeddings - Provider routing:
/v2/workspaces/{workspace}/inference/gateway/provider/{deployment}/-/v1/embeddings
The embedding model requires NIM container images that support embedding inference. When the deployment reaches READY state, a ModelProvider is automatically created for routing inference requests.
Example Usage
After fine-tuning and deployment, you can use the model for embedding tasks:
For detailed fine-tuning instructions, refer to the Embedding Customization tutorial.
For more information about formatting training datasets for the embedding model, refer to Dataset Format Requirements.