Release Notes¶

Riva Speech Skills 2.0.0¶

All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

If you are upgrading to 2.0.0 from a previous version, rerun riva-build for existing models. If you are using the Quick Start tool, run riva_clean.sh followed by riva_init.sh.

Breaking Changes¶

All legacy Jarvis APIs have been removed and are no longer supported.
The returned type of audio waveform from the Riva TTS service is now int16 to be compatible with the linear PCM wave format currently supported by Riva.

New Features / Enhancements¶

Riva supports Linux ARM64 platforms, i.e. NVIDIA Jetson AGX Xavier™ and NVIDIA Jetson NX Xavier, referred to as embedded throughout the documentation.
Riva provides two new pre-trained TTS voices that are easily deployable via the Quick Start scripts.
Phoneme SSML tags support manually overriding pronunciations.
SSL/TLS connections to the Riva server are supported.
There’s a new option for generating additional tokenization’s for words in the lexicon (this is an experimental feature which may boost recognition accuracy).
Inverse text normalization grammars must be provided during the riva-build stage to allow customizations for inverse text normalization.
Ability to add opt-in API key for sending telemetry back to NVIDIA.

Bug fixes¶

Fixed an issue in ServiceMaker that caused punctuation and capitalization models generated with recent NeMo versions to lead to inaccurate results.
Fixed an issue that could lead to a crash when using word boosting.

Known Issues¶

Deployment of Citrinet models for offline recognition can fail during the riva-deploy phase if large chunk sizes are used. To workaround this issue, we recommend passing parameter max-dim=100000 to nemo2riva when converting the .nemo model to .riva. This will enable using a chunk size up to 900 seconds during the riva-deploy phase.
On embedded platforms, the ASR examples in asr-python-basics and asr-python-boosting Jupyter notebooks don’t work by default, since they invoke offline recognition API and embedded platforms don’t have an offline ASR model enabled by default. To get these examples working, you need to either deploy an offline ASR model or modify the examples to use streaming recognition API.
When deploying the offline ASR models with riva-deploy, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.

Riva Speech Skills 1.10.0 Beta¶

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.10.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features / Enhancements¶

Riva 1.10.0 beta now uses Triton 2.19.0 and TensorRT 8.2
The default behavior of Riva TTS’s G2P pipeline has changed. Words that have multiple phonetic representations now default to use graphemes. This was done to match the default NeMo training behavior. To revert to the old behavior, please add --preprocessor.g2p_ignore_ambiguous=False to riva-build.
ASR word boosting at request time is supported in Riva. This feature allows you to provide a list of words that should be given a higher score when decoding the output of the acoustic model. Refer to the gRPC ASR protobuf file (riva/proto/riva_asr.proto) for more information on how to include boosted words with the ASR request.

Bug fixes¶

Fixed an issue that can cause acoustic models exported from NeMo 1.5+ to incorrectly include spaces in transcript.
Fixed an issue in nemo2riva preventing conversion of models from NeMo version less than 1.3.0.
Fixed an issue that could lead to irregular rhythm of speech when a TTS model was trained with mixed representation input.
Fixed an issue that can cause incorrect transcripts when the server is under a heavy load.

Known Issues¶

The Riva Speech Samples image nvcr.io/nvidia/riva/riva-speech-client:1.10.0-beta-samples does not exist. Use nvcr.io/nvidia/riva/riva-speech-client:1.8.0-beta-samples instead.
The ASR word boosting feature in Riva currently does not support boosting of phrases or combination of words. This will be supported in a future version of Riva.
nemo2riva and riva-build is currently broken for newer WaveGlow NeMo TTS checkpoints. As a workaround, use this WaveGlow.riva file instead: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechsynthesis_waveglow/files.

Riva Speech Skills 1.9.0 Beta¶

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.9.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features / Enhancements¶

Improved customization for Automatic Speech Recognition (ASR) Spanish (es-US) and German (de-DE) language models.
The rate SSML attribute supports x-low, low, medium, high, x-high, and default.
The pitch SSML attribute supports x-low, low, medium, high, x-high, and default.

Known Issues¶

The pre-trained model used to add punctuation and capitalization to ASR transcripts supports a maximum input length of 128 tokens. Currently, if an ASR transcript containing more than 128 tokens is passed to the punctuation and capitalization model, it will be truncated to 128 tokens. This will be addressed in a future release of Riva.
The pitch SSML attribute is not currently in compliance with the SSML specs, and does not support Hz, st, % changes.
When deploying the offline ASR models with riva-deploy, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.

Riva Speech Skills 1.8.0 Beta¶

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.8.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features / Enhancements¶

Released new pretrained models for German (de-DE), Russian (ru-RU), and Spanish (es-US) speech recognition.
Increased recognition accuracy of English (en-US) speech recognition models.
Introduced partial support for Speech Synthesis Markup Language (SSML) within the TTS API. Support has been added for pitch and rate attributes of the <prosody> tag to control pitch and duration of synthesized speech. Additional SSML support is planned for future releases.
Added word boosting support to the Speech Recognition API to bias ASR engine to recognize particular words of interest at request time. This release is limited to boosting of in-vocabulary words; out-of-vocabulary word boosting will be available in an upcoming release.
Minor ASR inference speed improvements in online mode.
Improved offline ASR recognition accuracy.
Added support for the Automatic Speech Recognition (ASR) Conformer-CTC model. The Conformer-CTC model is a non-autoregressive variant of the Conformer model for ASR which uses CTC loss/decoding instead of Transducer.

Bug fixes¶

Fixed an issue in TTS pipeline that can sometimes cause an audible ‘pop’ at the end of an utterance.

Known Issues¶

The pre-trained model used to add punctuation and capitalization to ASR transcripts supports a maximum input length of 128 tokens. Currently, if an ASR transcript containing more than 128 tokens is passed to the punctuation and capitalization model, it will be truncated to 128 tokens. This will be addressed in a future release of Riva.
The rate SSML attribute does not support x-low, low, medium, high, x-high, or default.
The pitch SSML attribute is not currently in compliance with the SSML specs, and does not support Hz, st, % changes, nor does it support x-low, low, medium, high, x-high, or default.
When deploying the offline ASR models with riva-deploy, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.

Riva Speech Skills 1.7.0 Beta¶

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.7.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features / Enhancements¶

Added support for models trained by NVIDIA TAO Toolkit 21.11.
Riva Streaming TTS now supports resampling, if necessary, to match the requested audio sample rate.
Default Riva English ASR model updated with higher accuracy.
Minor improvements in English text normalization and inverse text normalization models.
Increased maximum message size to support larger audio inputs in offline ASR.

Bug fixes¶

Fixed minor issues that could cause the synthesized audio generated by the TTS service to be prematuraly truncated.
Fixed issue related to custom pronunciations being mishandled by text normalization for TTS.

Known Issues¶

When running the nemo2riva package with EFF version 0.5.2, an ignored exception warning is printed. This should not affect functionality of the generated .riva models. This will be addressed in a future release of EFF.
During ASR pipeline execution inverse text normalization will not convert digits into numerals (one->1) unless there are ten digits. This limitation will be addressed in a future version of Riva.
The punctuation pipeline does not support unicode character input. This will be fixed in the next release.

Riva Speech Skills 1.6.0 Beta¶

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.6.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features / Enhancements¶

The Riva TTS service is no longer limited to 400 characters long input strings.
Updated the performance page of the documentation to include performance of Citrinet and Fastpitch+Hifigan models

Bug fixes¶

Fixed minor issues that could cause the synthesized audio generated by the TTS service to be prematuraly truncated.

Known Issues¶

Riva build does not support providing a 1-gram language model in .arpa format. This is due to a limitation in the KenLM utility to build language model binaries.
NLP Question Answering functionality may cause a segmentation fault when using TensorRT files generated from the Nemo -> Riva -> RMIR -> TensorRT path. This will be addressed in a future release.

Riva Speech Skills 1.5.0 Beta¶

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.5.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

Announcements¶

NVIDIA Transfer Learning Toolkit (TLT) has been renamed to NVIDIA TAO Toolkit starting in the 1.5.0-beta release.

New Features / Enhancements¶

Support for training n-gram language models for ASR has been added to TAO Toolkit. These language models are fully supported in Riva.
FastPitch now leverages Tensor Cores for improved inference performance.
nemo2riva now provides a warning when attempting to convert unsupported models.
Minor enhancements were made to cover additional cases in text normalization/inverse text normalization for English.

Bug fixes¶

Fixed failure in Quickstart for some versions of the NGC client.
Fixed minor issues that could cause occasional artifacts or reduced quality in TTS generated audio.
Eliminated misleading error messages during riva-build process.

Known Issues¶

NLP Question Answering functionality may cause a segmentation fault when using TensorRT files generated from the Nemo -> Riva -> RMIR -> TensorRT path. This will be addressed in a future release.

Riva Speech Skills 1.4.0 Beta¶

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.4.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

Announcements¶

The Jarvis framework has been renamed to Riva starting in the 1.4.0-beta release. Jarvis Speech Skills has been renamed to Riva Speech Skills. Documentation, scripts, and commands have been updated accordingly.
- The Jarvis API is supported but deprecated beginning with this release. It will be removed in a future release. Old Jarvis clients are expected to work as-is with this version of Riva Speech Skills, however, users will need to migrate to the Riva API after the Jarvis API is removed.
- The Riva API modifies the following service names:
  - JarvisASR -> RivaSpeechRecognition
  - JarvisNLP -> RivaLanguageUnderstanding
  - JarvisCoreNLP -> RivaLanguageUnderstanding
  - JarvisTTS -> RivaSpeechSynthesis
- jarvis-build and jarvis-deploy commands have been replaced with the equivalent riva-build and riva-deploy commands.
The riva-build command parameters for ASR pipelines have changed.
- The --lm_decoder_cpu parameter is deprecated. Replace --lm_decoder_cpu.decoder_type=<decoder_type> with --decoder_type=<decoder_type> and replace --lm_decoder_cpu.<param_name>=<param_value> with --<decoder_type>_decoder.<param_name>=<param_value>. For example, instead of using --lm_decoder_cpu.decoder_type=greedy --lm_decoder_cpu.asr_model_delay=-1, use --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
- The type of decoder to use must be explicitly set by using --decoder_type=<decoder_type> where <decoder_type> must be one of greedy, os2s, flashlight, or kaldi.
Refer to ASR Pipeline Configuration for example riva-build commands to use with different acoustic models.

Bug Fixes¶

Minor stability improvements were made to the ASR and TTS services.
Exposed the model_name parameter in the nlp_classify_tokens sample client.
Fixed an issue with the ASR language model hyperparameter tuning tool.

Jarvis Speech Skills 1.3.0 Beta

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.3.0 Beta from previous versions must rerun jarvis-build for existing models. Those using the Quick Start tool should run jarvis_clean.sh followed by jarvis_init.sh.

NVIDIA Riva Skills

Release Notes

Contents

Release Notes¶

Riva Speech Skills 2.0.0¶

Breaking Changes¶

New Features / Enhancements¶

Bug fixes¶

Known Issues¶

Riva Speech Skills 1.10.0 Beta¶

New Features / Enhancements¶

Bug fixes¶

Known Issues¶

Riva Speech Skills 1.9.0 Beta¶

New Features / Enhancements¶

Known Issues¶

Riva Speech Skills 1.8.0 Beta¶

New Features / Enhancements¶

Bug fixes¶

Known Issues¶

Riva Speech Skills 1.7.0 Beta¶

New Features / Enhancements¶

Bug fixes¶

Known Issues¶

Riva Speech Skills 1.6.0 Beta¶

New Features / Enhancements¶

Bug fixes¶

Known Issues¶

Riva Speech Skills 1.5.0 Beta¶

Announcements¶

New Features / Enhancements¶

Bug fixes¶

Known Issues¶

Riva Speech Skills 1.4.0 Beta¶

Announcements¶

Bug Fixes¶

Jarvis Speech Skills 1.3.0 Beta