Is this page helpful?

Deploying NVIDIA Speech NIM Microservices with Docker#

Deploy NVIDIA Speech NIM microservices as Docker containers. Each NIM runs in its own container with GPU acceleration using CUDA, TensorRT, and Triton.

In this section, you can find how to configure, run, and cache models for the Speech NIM microservices.

Configuration: GPU selection, shared memory, environment variables, and container setup common to all Speech NIM containers.
Runtime Parameters: docker run flags and env vars (ports, API keys, cache path, optional settings).
Model Caching: Caching model artifacts locally (prebuilt or RMIR) to avoid repeated NGC downloads on startup.