Use Fine-Tuned Models With NVIDIA NeMo Retriever Embedding NIM#

Use this documentation to learn how to deploy a fine-tuned embedding model with NVIDIA NeMo Retriever Embedding NIM.

Prerequisites#

Before proceeding, complete the setup steps in Get Started With NVIDIA NeMo Retriever Embedding NIM, including NGC authentication and Docker login.

Supported Model Formats#

The following model formats are supported for fine-tuned models:

Format

File Extension

ONNX

.onnx

TensorRT

.plan or .engine

The NIM automatically detects the model format based on the file extension.

Launch the NIM With a Fine-Tuned Model#

To use a fine-tuned model, mount the directory containing your model files into the Docker container, and set the NIM_CUSTOM_MODEL environment variable to the directory path inside the container.

# Choose a container name for bookkeeping
export NIM_MODEL_NAME=nvidia/llama-nemotron-embed-1b-v2
export CONTAINER_NAME=$(basename $NIM_MODEL_NAME)

# Choose a NIM Image from NGC
export IMG_NAME="nvcr.io/nim/nvidia/$CONTAINER_NAME:1.13.0"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"

# Set the path to your fine-tuned model on the host
export LOCAL_MODEL_PATH=/path/to/your/fine-tuned-model

# Start the NIM with the fine-tuned model
docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NGC_API_KEY \
  -e NIM_CUSTOM_MODEL=/opt/custom-model \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v "$LOCAL_MODEL_PATH:/opt/custom-model" \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME

The following table describes the flags that are specific to launching a fine-tuned model. For the full list of standard flags, see Get Started With NVIDIA NeMo Retriever Embedding NIM.

Flag

Description

-e NIM_CUSTOM_MODEL=/opt/custom-model

Set the path to the directory containing the fine-tuned model inside the container. The NIM loads this model instead of the default model.

-v "$LOCAL_MODEL_PATH:/opt/custom-model"

Mount the host directory that contains your fine-tuned model files to a path inside the container.

Fine-Tuning#

To learn more about how to fine-tune a NIM-compatible embedding model, refer to the NeMo AutoModel documentation.