Is this page helpful?

Troubleshooting NVIDIA ASR NIM Microservice Issues#

The following issues might arise when you work with NVIDIA ASR NIM. For issues shared across all NVIDIA Speech NIM microservices, see Common Issues.

Note

For the full list of known issues, refer to Known Issues.

Audio File Too Large#

When you submit an HTTP transcription request with an audio file that exceeds 25 MB, you might see: 400 Bad Request: audio too long. To resolve this issue, ensure the audio file is under 25 MB before uploading (ls -lh audio.wav). For large files, split them into smaller segments or use a compressed format such as OPUS or FLAC to reduce file size while maintaining quality.

sox input.wav output_%03d.wav trim 0 60 : newp : restart

For more information, refer to the ASR NIM API References.

CTC vs RNNT Container Deployment#

When you deploy ASR models, you might be unsure whether to use the CTC or RNNT container image. The CTC container is a slimmer TensorRT-based image for CTC-only deployments. The RNNT container is a superset that supports both RNNT and CTC models from a single deployment. If you need both model types, use the RNNT container rather than running separate CTC and RNNT containers. For more information, refer to Deploy and Run ASR Models as NIM Microservices.

Custom Model from NGC Not Loading#

When you try to use a custom ASR model from the NGC model registry inside a container without manually copying .tar.gz files, the model might not load if the manifest is not included in the container image. To resolve this issue, generate a custom manifest and build it into your container image:

Upload the model to NGC.
Export your NGC API key:
```
export NGC_API_KEY=<your_api_key>
```

Generate a custom manifest by running the NIM container with nim_download_to_cache:

docker run -it --rm -e NGC_API_KEY -v /tmp/output:/data -u root --entrypoint nim_download_to_cache <container>:<version> --model-uri ngc://<ngc-org>/<ngc-team>/<model-name>:<model-version> --manifest-file /data/custom_manifest.yaml

Build a Docker image that includes custom_manifest.yaml. The default manifest path in the image is /opt/nim/etc/default/model_manifest.yaml. Override it by setting the NIM_MANIFEST_PATH environment variable to the path of your manifest.
Deploy using the new image.

For more information, refer to Deploying Custom Models as NIM.

Idle Stream Sequence Error#

When a streaming sequence is idle longer than the configured max_sequence_idle_microseconds timeout (60 seconds by default), the server releases the sequence. If the client then sends a new audio chunk without a START flag, you might see an error in the server logs requiring the START flag on the first request of the sequence. To resolve this issue, implement one of the following mitigations.

Option 1 (Recommended): Implement client-side mitigations

Send silence buffers every 30 seconds while the microphone is muted to keep the stream active, or implement a stream timeout that sends a stream-end signal and starts a new stream after 30 seconds of idle time:

STREAM_TIMEOUT = 30  # seconds

def monitor_stream_activity(stream_id, last_activity_time):
    idle_duration = time.time() - last_activity_time
    if idle_duration > STREAM_TIMEOUT:
        send_stream_end_signal(stream_id)
        return new_stream_id()

Option 2: Increase the idle timeout (server-side)

Set the timeout during riva-build:

riva-build ... --asr_ensemble_backend.max_sequence_idle_microseconds=120000000

Or edit models/conformer-<LANG>-asr-streaming-*-asr-bls-ensemble/config.pbtxt and restart the Riva server:

parameters: {
  key: "max_sequence_idle_microseconds"
  value: {
    string_value: "120000000"
  }
}

For more information, refer to Pipeline Configuration.

Model Not Found for Language#

When you send a transcription request, you might see 404 Not Found: Model not found for language <language_code> if the specified language code does not match any deployed model. The NIM first attempts an exact match on the full language code (for example, en-US), then falls back to the base language code (for example, en). If neither matches, the request fails. To resolve this issue, verify the language code matches a deployed model by checking the container startup logs for lines containing Found ASR model, or query the deployed models with the Riva Python client:

python python-clients/scripts/asr/transcribe_file.py --list-models --server localhost:50051

If the model or language parameter is omitted entirely, the request returns 400 Bad Request: need model or language. Provide at least one of the two parameters. For more information, refer to the ASR NIM Support Matrix.

Too Many Open Files Error#

If the ulimit is not propagated from the host to the container, the NIM container might fail to start with: OSError: [Errno 24] Too many open files. To resolve this issue, add --ulimit nofile=2048:2048 to your docker run command:

docker run --ulimit nofile=2048:2048 [other options] <image>

If the error persists, increase the limit (for example, 4096:4096), ensuring the host hard limit supports the value. For more information, refer to Runtime Parameters for Speech NIM Containers.

ASR NIM Fails to Start on a Memory-Constrained GPU (mode=all)#

Symptom#

The ASR NIM container fails to start or exits with CUDA out-of-memory errors when using mode=all on a GPU with limited VRAM.

Cause#

mode=all loads every inference mode (offline, streaming-throughput, streaming, and so on) simultaneously. GPUs with limited VRAM cannot host all modes at once. For example, the NVIDIA L40S (48 GB) cannot run Parakeet 1.1b RNNT Multilingual with mode=all because the offline (ofl) profile alone uses roughly 25.56 GB and the streaming-throughput (str-thr) profile uses roughly 14.75 GB, before runtime overhead.

Solution#

Export only the modes you need into a shared export bundle and serve from that bundle.

Step 1: Export each mode sequentially#

Export the first mode into NIM_EXPORT_PATH. The container downloads the model, runs the export, and exits.

export LOCAL_NIM_CACHE=/tmp/nim-cache
export NIM_EXPORT_PATH=/tmp/nim_export

mkdir -p $LOCAL_NIM_CACHE $NIM_EXPORT_PATH
chmod 777 $LOCAL_NIM_CACHE $NIM_EXPORT_PATH

docker run -it --rm \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_TAGS_SELECTOR="mode=ofl" \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $NIM_EXPORT_PATH:/opt/nim/export \
  -e NIM_EXPORT_PATH=/opt/nim/export \
  -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
  nvcr.io/nim/nvidia/<asr-nim-image>:latest

After the first export completes, export the second mode into the same NIM_EXPORT_PATH. Do not delete or recreate the export or cache directories between runs.

docker run -it --rm \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_TAGS_SELECTOR="mode=str-thr" \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $NIM_EXPORT_PATH:/opt/nim/export \
  -e NIM_EXPORT_PATH=/opt/nim/export \
  -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
  nvcr.io/nim/nvidia/<asr-nim-image>:latest

Repeat for each additional mode you need.

Step 2: Serve from the combined export bundle#

Start the container with NIM_DISABLE_MODEL_DOWNLOAD=true pointing at the same export path. The NIM loads all exported modes from disk without downloading anything from NGC.

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_DISABLE_MODEL_DOWNLOAD=true \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $NIM_EXPORT_PATH:/opt/nim/export \
  -e NIM_EXPORT_PATH=/opt/nim/export \
  nvcr.io/nim/nvidia/<asr-nim-image>:latest

Note

Even with two modes, startup may still fail if runtime overhead pushes total GPU memory usage over the available VRAM. Check nvidia-smi after startup to confirm memory headroom.

Kubernetes / Helm deployment#

The Kubernetes equivalent pre-populates a PVC-backed export volume using sequential Kubernetes Jobs, then mounts that volume into the Helm release with model downloads disabled.

Prerequisites:

A ReadWriteMany (RWX) PVC is recommended for EKS/EFS so the export Jobs and serving pod can land on different nodes. ReadWriteOnce (RWO) works for a single replica if you schedule the export Jobs and serving pod on the same node.
Use the same image tag across both export Jobs and the Helm release.
The NGC API key must be available as a Kubernetes secret named ngc-api with key NGC_API_KEY, and the registry pull secret as ngc-secret.

Step 1: Create the PVC (pvc.yaml):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: asr-nim-cache
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi

Step 2: Export the first profile (export-ofl-job.yaml):

apiVersion: batch/v1
kind: Job
metadata:
  name: asr-nim-export-ofl
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      imagePullSecrets:
        - name: ngc-secret
      containers:
        - name: export-ofl
          image: nvcr.io/nim/nvidia/<asr-nim-image>:latest
          imagePullPolicy: IfNotPresent
          env:
            - name: NGC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: ngc-api
                  key: NGC_API_KEY
            - name: NIM_TAGS_SELECTOR
              value: "mode=ofl"
            - name: NIM_EXPORT_PATH
              value: /opt/nim/export
            - name: NIM_HTTP_API_PORT
              value: "9000"
            - name: NIM_GRPC_API_PORT
              value: "50051"
          resources:
            limits:
              nvidia.com/gpu: 1
          volumeMounts:
            - name: nim-models
              mountPath: /opt/nim/export
              subPath: export
            - name: nim-models
              mountPath: /opt/nim/.cache
              subPath: cache
            - name: dshm
              mountPath: /dev/shm
      volumes:
        - name: nim-models
          persistentVolumeClaim:
            claimName: asr-nim-cache
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 8Gi

Step 3: Export the second profile (export-str-thr-job.yaml):

Create the same Job manifest as above, replacing:

metadata.name: asr-nim-export-str-thr
NIM_TAGS_SELECTOR value: "mode=str-thr"

Repeat for each additional mode you need.

Step 4: Deploy the NIM Helm release (custom-values.yaml):

image:
  repository: nvcr.io/nim/nvidia/<asr-nim-image>
  tag: latest
  pullPolicy: IfNotPresent

nim:
  ngcAPISecret: ngc-api
  httpPort: 9000
  grpcPort: 50051

imagePullSecrets:
  - name: ngc-secret

envVars:
  NIM_EXPORT_PATH: /opt/nim/export
  NIM_DISABLE_MODEL_DOWNLOAD: "true"

resources:
  limits:
    nvidia.com/gpu: 1

extraVolumes:
  nim-export:
    persistentVolumeClaim:
      claimName: asr-nim-cache
  dshm:
    emptyDir:
      medium: Memory
      sizeLimit: 8Gi

extraVolumeMounts:
  nim-export:
    mountPath: /opt/nim/export
    subPath: export
    readOnly: true
  dshm:
    mountPath: /dev/shm

Step 5: Apply in order:

kubectl apply -f pvc.yaml

kubectl apply -f export-ofl-job.yaml
kubectl wait --for=condition=complete job/asr-nim-export-ofl --timeout=60m

kubectl apply -f export-str-thr-job.yaml
kubectl wait --for=condition=complete job/asr-nim-export-str-thr --timeout=60m

helm upgrade --install riva-nim riva-nim-<version>.tgz -f custom-values.yaml

Note

Pin the image tag and use the same tag in both export Jobs and the Helm release. Mismatched tags can cause runtime incompatibility.

Troubleshooting NVIDIA ASR NIM Microservice Issues#

Audio File Too Large#

CTC vs RNNT Container Deployment#

Custom Model from NGC Not Loading#

Idle Stream Sequence Error#

Model Not Found for Language#

Too Many Open Files Error#

ASR NIM Fails to Start on a Memory-Constrained GPU (mode=all)#

Symptom#

Cause#

Solution#

Step 1: Export each mode sequentially#

Step 2: Serve from the combined export bundle#

Kubernetes / Helm deployment#

Related Topics#