Model Caching for Speech NIM Containers#
On first startup, the container downloads models from NGC. You can avoid repeated downloads by caching models locally. Which option applies depends on the model format (prebuilt vs RMIR); refer to the support matrix for your service (ASR, TTS, NMT).
Set CONTAINER_ID and NIM_TAGS_SELECTOR for your chosen model (and profile) from the support matrix or from the ASR, TTS, or NMT tutorials. Include model_type=prebuilt in NIM_TAGS_SELECTOR when applicable.
Create a cache directory and run the container with it mounted:
# Create the cache directory on the host
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p $LOCAL_NIM_CACHE
chmod 777 $LOCAL_NIM_CACHE
# Set values for your model (example: ASR Parakeet 1.1b en-US streaming)
export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-1-1b-ctc-en-us,mode=str,model_type=prebuilt"
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
On later runs, the container loads models from the cache. Refer to Runtime parameters for flag details.
RMIR (intermediate) models must be built and exported once; then you can run from the export path without re-downloading from NGC.
Step 1: Export the Model
Create an export directory and run the container with NIM_TAGS_SELECTOR set to the RMIR profile. The container will build the model, write it to the mounted export path, then exit.
# Create the export directory on the host
export NIM_EXPORT_PATH=~/nim_export
mkdir -p $NIM_EXPORT_PATH
chmod 777 $NIM_EXPORT_PATH
# Set values for your model (example: ASR Parakeet 1.1b en-US streaming, RMIR)
export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-1-1b-ctc-en-us,mode=str,model_type=rmir"
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $NIM_EXPORT_PATH:/opt/nim/export \
-e NIM_EXPORT_PATH=/opt/nim/export \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
When deployment finishes, you should refer to logs similar to:
INFO:inference:Riva model generation completed
INFO:inference:Models exported to /opt/nim/export
INFO:inference:Exiting container
Step 2: Run Using the Exported Model
Start the container with the same export path and set NIM_DISABLE_MODEL_DOWNLOAD=true so it uses the exported models instead of downloading again:
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $NIM_EXPORT_PATH:/opt/nim/export \
-e NIM_EXPORT_PATH=/opt/nim/export \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
Not all services or models support both prebuilt and RMIR; check the support matrix for your service.