Troubleshooting NVIDIA ASR NIM Microservice Issues#
This page covers troubleshooting issues specific to the NVIDIA ASR NIM microservices. For issues shared across all NVIDIA Speech NIM microservices, see Common Issues.
Idle Stream Sequence Error#
Symptom#
Server logs show an error requiring the START flag on the first request of the sequence.
Cause#
The error occurs when:
A streaming sequence is idle (no audio chunks received) longer than the configured
max_sequence_idle_microsecondstimeout.The server automatically releases the idle sequence.
The client sends a new audio chunk for the already-released sequence.
The server rejects the request because it expects a
STARTflag for what it now considers a new sequence, but the client does not provide it.
The stream timeout in the NIM HTTP layer is set to 60 seconds (stream_timeout=60000000 microseconds).
Solution#
Option 1: Implement Client-Side Mitigations (Recommended)#
A. Send Silence Buffers During Pauses#
If microphone is muted but stream should stay active:
def handle_mic_mute():
while mic_is_muted and stream_is_active:
send_silence_buffer(stream_id)
time.sleep(30.0) # Send every 30s
B. Stream Timeout Mechanism#
Implement a fail-safe to detect and handle idle streams:
STREAM_TIMEOUT = 30 # seconds
def monitor_stream_activity(stream_id, last_activity_time):
idle_duration = time.time() - last_activity_time
if idle_duration > STREAM_TIMEOUT:
send_stream_end_signal(stream_id)
return new_stream_id()
Option 2: Increase Idle Timeout (Server-Side)#
During Model Build#
Set the parameter during riva-build:
riva-build ... --asr_ensemble_backend.max_sequence_idle_microseconds=120000000
Configuration Edit in the Existing Model Repository#
Edit models/conformer-<LANG>-asr-streaming-*-asr-bls-ensemble/config.pbtxt:
parameters: {
key: "max_sequence_idle_microseconds"
value: {
string_value: "120000000" # 120 seconds (doubled from default 60s)
}
}
Restart the Riva server after saving the configuration changes.
Too Many Open Files Error#
Symptom#
The NIM container fails to start with an error such as:
OSError: [Errno 24] Too many open files: '/tmp/tmp64in_90a'
Cause#
The NIM container starts with a too-low file-descriptor limit because the correct ulimit was not propagated from the host system to the container.
Solution#
Add --ulimit nofile=2048:2048 to the docker run command, where 2048:2048 represents the soft and hard limits for the number of open file descriptors.
docker run --ulimit nofile=2048:2048 [other options] <image>
If the error persists, increase the limit further (for example, 4096:4096). Ensure the host hard limit supports the value you set.
Audio File Too Large#
Symptom#
The HTTP transcription request returns 400 Bad Request with the message audio too long.
Cause#
The NIM enforces a maximum audio file size of 25 MB per request. Files exceeding this limit are rejected before processing.
Solution#
Ensure the audio file is under 25 MB. Check the file size before uploading:
ls -lh audio.wav
For large audio files, split them into smaller segments:
sox input.wav output_%03d.wav trim 0 60 : newp : restart
Use a compressed audio format (OPUS or FLAC) instead of uncompressed WAV to reduce file size while maintaining quality.
Model Not Found for Language#
Symptom#
The transcription request returns 404 Not Found with the message Model not found for language <language_code>.
Cause#
The specified language code does not match any deployed model. The NIM first attempts an exact match on the full language code (for example, en-US), then falls back to matching the base language code (for example, en). If neither matches a deployed model, the request fails.
Solution#
Verify the language code matches the deployed model. Confirm which models are loaded by checking the container startup logs for lines containing
Found ASR model. Alternatively, use the--list-modelsoption with the Riva Python client script for ASR to query the deployed models on the server:python python-clients/scripts/asr/transcribe_file.py --list-models --server localhost:50051
Use the correct language code format. Common formats:
Language
Code
English (US)
en-USSimplified Chinese
zh-CNSpanish
es-ESores-USIf the model or language parameter is omitted entirely, the request returns
400 Bad Request, need model or language. Provide at least one.
CTC vs RNNT Container Deployment#
Symptom#
Confusion about which container image to use when deploying CTC-only, RNNT-only, or both CTC and RNNT models.
Clarification#
CTC container: A slimmer container image based on TensorRT with minimal dependencies. Use it when you need CTC-only deployments.
RNNT container: A larger container image that includes dependencies for both RNNT and CTC. It is a superset—you can run both RNNT and CTC models from the same deployment.
Recommendation: If you need both RNNT and CTC, use the RNNT container; do not run separate CTC and RNNT containers for the same workload.
Deploying RNNT and Calling CTC Models#
When using the RNNT container, you can load and serve both RNNT and CTC models. Configure your deployment (for example, in Helm or your manifest) to use the RNNT image and reference the CTC model by its name or language code in your client requests. The same NIM instance can serve both model types.
FAQ: Container Image Size (RNNT vs CTC)#
Why is the RNNT container larger than the CTC container?
The CTC container is built with a smaller set of dependencies (TensorRT-focused) for CTC-only inference. The RNNT container includes additional runtimes and dependencies required for RNNT, and also supports CTC. If you only need CTC and want a smaller image and faster pulls, use the CTC container. If you need both model types or only RNNT, use the RNNT container.
Deploying a custom model from NGC without copying .tar.gz#
To use a custom ASR model from the NGC model registry inside a container (without manually copying .tar.gz files), upload the model to NGC, generate a custom manifest from it, then build and use a container that includes that manifest.
Upload the model to NGC (if not already there).
Export your NGC API key (if not already set):
export NGC_API_KEY=<your_api_key>
Generate a custom manifest by running the NIM container with
nim_download_to_cache. This downloads and validates the NGC model and writescustom_manifest.yaml. Replace the model URI with your NGC org, team, model name, and version:docker run -it --rm -e NGC_API_KEY -v /tmp/output:/data -u root --entrypoint nim_download_to_cache <container>:<container-version> --model-uri ngc://<ngc-org>/<ngc-team>/<model-name>:<model-version> --manifest-file /data/custom_manifest.yaml
Example with a specific model:
docker run -it --rm -e NGC_API_KEY -v /tmp/output:/data -u root --entrypoint nim_download_to_cache nvcr.io/nim/nvidia/parakeet-ctc-0.6b-vi:1.0.0 --model-uri ngc://nvstaging/nim/parakeet-ctc-0.6b-vi:a100x1-ofl-25.08-fp16--hv6tezmqw --manifest-file /data/custom_manifest.yaml
The manifest is created in the mounted folder (for example,
/tmp/output/custom_manifest.yaml).Build a Docker image that includes your
custom_manifest.yaml. The default manifest in the image is at/opt/nim/etc/default/model_manifest.yaml. Override it by placing your manifest in the image and setting theNIM_MANIFEST_PATHenvironment variable to its path.Deploy using the new image.