Prerequisites#

Verify that your environment meets the following requirements before deploying NVIDIA Speech NIM microservices.

License and Hardware#

Requirement	Details
NVIDIA AI Enterprise (NVAIE)	Required license for self-hosting Speech NIMs.
NVIDIA GPU	Model-specific GPU and memory requirements vary by service. Refer to the support matrix for supported GPU and model combinations.
CPU Architecture	x86_64 only.

Operating System#

Use a Linux distribution that meets the following requirements:

Supported by the NVIDIA Container Toolkit.
glibc >= 2.35. Verify with ld -v.

CUDA Drivers#

Install CUDA drivers by following the CUDA installation guide for Linux.

Use a package manager installation. Skip the CUDA toolkit – the required libraries are bundled in the NIM container.
Install open GPU kernel modules matching your driver version.

Supported Driver Versions#

Major Version	EOL	Data Center and RTX/Quadro	GeForce
> 550	TBD	Yes	Yes
550	Feb 2025	Yes	Yes
545	Oct 2023	Yes	Yes
535	June 2026	Yes	–
525	Nov 2023	Yes	–
470	Sept 2024	Yes	–

Docker#

Install Docker Engine for your Linux distribution by following the Docker Engine installation guide.

After installation, verify that the Docker daemon is running and that your user can execute docker commands without sudo. Add your user to the docker group if needed:

sudo usermod -aG docker $USER

Log out and back in for the group change to take effect.

NVIDIA Container Toolkit#

The NVIDIA Container Toolkit enables Docker containers to access the host GPU.

Install the toolkit by following the NVIDIA Container Toolkit installation guide.
Configure Docker to use the NVIDIA runtime by following the Docker configuration steps.
Restart the Docker daemon after configuration:
```
sudo systemctl restart docker
```

Verify GPU Access#

Confirm that containers can access the GPU:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

The output should display the driver version, CUDA version, and available GPU(s):

| NVIDIA-SMI 550.54.14   Driver Version: 550.54.14   CUDA Version: 12.4     |
| GPU  Name                 ...

If this succeeds, your environment is ready to run Speech NIM containers.

WSL2 (Windows)#

For Windows deployments with WSL2, refer to NVIDIA NIM on WSL2. Check the support matrix for your service to confirm WSL2-compatible models. You might need to adjust WSL memory allocation using .wslconfig and use podman instead of docker.

Client-Side Prerequisites for Python Clients#

Some Riva Python client scripts import system audio libraries at module load time, so a fresh system needs additional packages before running them, even if you do not use microphone capture or speaker playback.

sudo apt-get install -y portaudio19-dev
python3 -m pip install pyaudio

Package	Why It Is Needed
`portaudio19-dev` + `pyaudio`	Required by `scripts/asr/transcribe_mic.py` (microphone capture) and `scripts/tts/realtime_tts_client.py` (the realtime TTS WebSocket client). Both scripts import `pyaudio` at module top in the current release, so they fail with `ModuleNotFoundError: No module named 'pyaudio'` on a fresh host even when running with `--output output.wav`.
`sox`	Optional. Only used by the TTS tutorial’s HTTP streaming example to prefix a WAV header onto raw LPCM output. The same step can be done in pure Python with the standard-library `wave` module. For more information, refer to the TTS tutorial.