Is this page helpful?

Model Caching for Speech NIM Containers#

On first startup, the container downloads models from NGC. You can avoid repeated downloads by caching models locally. Which option applies depends on the model format (prebuilt vs RMIR); refer to the support matrix for your service (ASR, TTS, NMT).

Prebuilt Models

Set CONTAINER_ID and NIM_TAGS_SELECTOR for your chosen model (and profile) from the support matrix or from the ASR, TTS, or NMT tutorials. Include model_type=prebuilt in NIM_TAGS_SELECTOR when applicable.

Create a cache directory and run the container with it mounted:

# Create the cache directory on the host
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p $LOCAL_NIM_CACHE
chmod 777 $LOCAL_NIM_CACHE

# Set values for your model (example: ASR Parakeet 1.1b en-US streaming)
export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-1-1b-ctc-en-us,mode=str,model_type=prebuilt"

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_TAGS_SELECTOR \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

On later runs, the container loads models from the cache. Refer to Runtime parameters for flag details.

RMIR Models

RMIR (intermediate) models must be built and exported once; then you can run from the export path without re-downloading from NGC.

Step 1: Export the Model

Create an export directory and run the container with NIM_TAGS_SELECTOR set to the RMIR profile. The container will build the model, write it to the mounted export path, then exit.

# Create the export directory on the host
export NIM_EXPORT_PATH=~/nim_export
mkdir -p $NIM_EXPORT_PATH
chmod 777 $NIM_EXPORT_PATH

# Set values for your model (example: ASR Parakeet 1.1b en-US streaming, RMIR)
export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-1-1b-ctc-en-us,mode=str,model_type=rmir"

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_TAGS_SELECTOR \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $NIM_EXPORT_PATH:/opt/nim/export \
  -e NIM_EXPORT_PATH=/opt/nim/export \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

When deployment finishes, you should refer to logs similar to:

INFO:inference:Riva model generation completed
INFO:inference:Models exported to /opt/nim/export
INFO:inference:Exiting container

Step 2: Run Using the Exported Model

Start the container with the same export path and set NIM_DISABLE_MODEL_DOWNLOAD=true so it uses the exported models instead of downloading again:

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_TAGS_SELECTOR \
  -e NIM_DISABLE_MODEL_DOWNLOAD=true \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $NIM_EXPORT_PATH:/opt/nim/export \
  -e NIM_EXPORT_PATH=/opt/nim/export \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

Not all services or models support both prebuilt and RMIR; check the support matrix for your service.