To share feedback or ask questions about this release, access our NVIDIA Riva Developer Forum.


Before pulling images or models from NGC, run the following command with your API key.

docker login
ngc config set

To locate Riva related materials, go to NGC Catalogs.

Model Export and ServiceMaker#

  • In case the .riva file is encrypted, provide the encryption key to both riva-build and riva-deploy.

  • If you want to overwrite a previously generated RMIR (Riva Model Intermediate Representation) file or directory that contains Triton Inference Server artifacts, pass -f to either riva-build or riva-deploy.



  • In the event of a server crash, if running within a Kubernetes environment, the pod should restart itself. Similar functionality can be implemented outside of a Kubernetes environment by polling the server (every 10s for example) to determine its state.

    /bin/grpc_health_probe -addr=:50051
  • Services may accept more than the referenced number of maximum streams. If this happens, performance will begin to degrade and connections may start to timeout.

  • GPU utilization and memory consumption are available using the API on port :8002. For more information, refer to Triton Inference Server Metrics.

Automatic Speech Recognition (ASR)

  • For very long audio, use the streaming recognition service.

  • For noisy audio, use the Jasper acoustic model for improved accuracy. The provided English-language Jasper model has been trained to be robust to medium levels of background noise. The provided English QuartzNet model is not as robust to background noise. The model can be set in RecognitionConfig.

  • When using streaming recognition, the client sends chunks of audio, normally from a microphone. Clients can use whatever chunk size. The Riva server creates chunks of 100ms or 800ms depending on the server configuration. Streaming recognition mode uses more GPU memory than offline recognition mode.

  • In the offline recognition mode, clients send a request containing all the audio. The server segments the audio into chunks under the hood.

  • The server automatically upsamples 8khz audio to 16khz.

  • If you want to use domain-specific ASR models, you can either fine-tune the acoustic model or train a domain-specific language model and deploy with Riva ServiceMaker.

  • To fully saturate hardware, you might want to use multiple streams in real time. In the client example, simply specify the number of parallel streams by setting --num_parallel_requests.

Text-To-Speech (TTS)

  • For real-time applications, use the online streaming mode (option --online=True in the command-line client, or function SynthesizeOnline from the gRPC API).

  • To fully saturate hardware, you might want to use multiple streams in real time. In the client example, simply specify the number of parallel streams by setting --num_parallel_requests. Using more streams than supported by your hardware configuration might cause issues, such as some requests timing out.

Client Integration#

  • If you encounter the following error Cannot create GRPC channel at uri localhost:50051, check whether the Riva server is on by looking at the log docker logs riva-speech.

Riva Helm Deployment#

A variety of issues can happen when installing using helm. Some of the more common issues are captured below:

  • During installation, to watch the logs for the init container, run:

    kubectl logs $POD riva-model-init --follow
  • During installation, to watch the logs of the server (the server will not start until the above finishes), run:

    kubectl logs $POD --follow
  • To ensure that the pods are correctly launched, run:

    kubectl get pods -A
  • To verify which services are running, run:

    kubectl get services
  • If using load balancing from traefik to validate the ingress route, run:

    kubectl get ingressroutes
  • To ensure the ingress route points at the correct service, run:

    kubectl get service `kubectl get ingressroutes riva-ingressroute -o=jsonpath='{..spec.routes[0].services[0].name}'


  • The SpeechSquad container connectivity must be functioning.

    POD=`kubectl get pods | grep speechsquad | awk '{print $1}'` kubectl exec -it $POD -- /bin/bash