Use Fine-Tuned Models With NVIDIA NeMo Retriever Embedding NIM#
Use this documentation to learn how to deploy a fine-tuned embedding model with NVIDIA NeMo Retriever Embedding NIM.
Prerequisites#
Before proceeding, complete the setup steps in Get Started With NVIDIA NeMo Retriever Embedding NIM, including NGC authentication and Docker login.
Supported Model Formats#
The following model formats are supported for fine-tuned models:
Format |
File Extension |
|---|---|
ONNX |
|
TensorRT |
|
The NIM automatically detects the model format based on the file extension.
Launch the NIM With a Fine-Tuned Model#
To use a fine-tuned model, mount the directory containing your model files into the Docker container, and set the NIM_CUSTOM_MODEL environment variable to the directory path inside the container.
# Choose a container name for bookkeeping
export NIM_MODEL_NAME=nvidia/llama-nemotron-embed-1b-v2
export CONTAINER_NAME=$(basename $NIM_MODEL_NAME)
# Choose a NIM Image from NGC
export IMG_NAME="nvcr.io/nim/nvidia/$CONTAINER_NAME:1.13.0"
# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
# Set the path to your fine-tuned model on the host
export LOCAL_MODEL_PATH=/path/to/your/fine-tuned-model
# Start the NIM with the fine-tuned model
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NGC_API_KEY \
-e NIM_CUSTOM_MODEL=/opt/custom-model \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-v "$LOCAL_MODEL_PATH:/opt/custom-model" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
The following table describes the flags that are specific to launching a fine-tuned model. For the full list of standard flags, see Get Started With NVIDIA NeMo Retriever Embedding NIM.
Flag |
Description |
|---|---|
|
Set the path to the directory containing the fine-tuned model inside the container. The NIM loads this model instead of the default model. |
|
Mount the host directory that contains your fine-tuned model files to a path inside the container. |
Fine-Tuning#
To learn more about how to fine-tune a NIM-compatible embedding model, refer to the NeMo AutoModel documentation.