To share feedback or ask questions about this release, access our NVIDIA Riva Developer Forum.


Before pulling images/models from NGC, run the following command with your API key.

$ docker login
$ ngc config set

To locate Riva related materials, go to NGC Catalogs. Under the name section on the right, select ea-riva-stage, then navigate to Private Registry to find Riva related items.

Model Export and ServiceMaker

  • If tao ... export from TAO Toolkit launcher is used for exporting from NeMo to Riva, the recent NeMo release (1.0.0.b4) is recommended for training models.

  • In case the .riva file is encrypted, you will need to provide the encryption key to both riva-build and riva-deploy.

  • If you want to overwrite a previously generated RMIR file or directory that contains Triton Inference Server artifacts, pass -f to either riva-build or riva-deploy.



  • For very long audio, it is recommended to use the streaming recognition service.

  • For noisy audio, it is recommended to use the Jasper acoustic model for improved accuracy. The provided English-language Jasper model has been trained to be robust to medium levels of background noise. The provided English QuartzNet model is not as robust to background noise. The model can be set in RecognitionConfig.

  • When using streaming recognition, the client sends chunks of audio, normally from a microphone. Clients can use whatever chunksize. The Riva server creates chunks of 100ms or 800ms depending on the server configuration. Note that streaming recognition mode uses more GPU memory than offline recognition mode.

  • In the offline recognition mode, clients send a request containing all the audio. The server segments the audio into chunks under the hood.

  • The server automatically upsamples 8khz audio to 16khz.

  • In case you want to use domain-specific ASR models, you can either fine-tune the acoustic model or train a domain-specific language model and deploy with Riva ServiceMaker.

  • To fully saturate hardware, you might want to use multiple streams in real-time. In the client example, simply specify the number of parallel streams by setting --num_parallel_requests.


  • For real-time applications, use the online streaming mode (option --online=True in the command-line client, or function SynthesizeOnline from the gRPC API).

  • To fully saturate hardware, you might want to use multiple streams in real-time. In the client example, simply specify the number of parallel streams by setting --num_parallel_requests. Note that using more streams than supported by your hardware configuration might cause issues, such as some requests timing out.

Client Integration

  • If you encounter the following error Cannot create GRPC channel at uri localhost:50051, check whether the Riva server is on by looking at the log docker logs riva-speech.

Riva Helm Deployment

A variety of issues can happen when installing using helm. Some of the more common issues are captured below:

  • During installation, to watch the logs for the init container, run:

    kubectl logs $POD riva-model-init --follow
  • During installation, to watch the logs of the server (the server won’t start till the above finishes), run:

    kubectl logs $POD --follow
  • To ensure the pods are correctly launched, run:

    kubectl get pods -A
  • To verify which services are running, run:

    kubectl get services
  • If using load balancing from traefik to validate the ingress route, run:

    kubectl get ingressroutes
  • To ensure the ingress route points at the correct service, run:

    kubectl get service `kubectl get ingressroutes riva-ingressroute -o=jsonpath='{..spec.routes[0].services[0].name}'


  • The SpeechSquad container connectivity must be functioning.

    POD=`kubectl get pods | grep speechsquad | awk '{print $1}'` kubectl exec -it $POD -- /bin/bash