Release Notes#

Important

If you are upgrading from a previous Riva version, refer to the Upgrading section.

Riva Speech Skills 2.3.0#

New Features and Enhancements#

Support has been added for the volume attribute of the <prosody> SSML tag to control the volume of synthesized speech. In order to use this tag, the FastPitch .riva file must be rebuilt from a .nemo or .tao file.

Deprecation Warning#

The Tacotron 2 and WaveGlow TTS pipeline will be deprecated in a future version of Riva. Consider switching to the FastPitch and HiFi-GAN pipeline, which is faster, more robust, and has similar quality as the Tacotron 2 and WaveGlow TTS pipeline.

Riva Speech Skills 2.2.1#

Fixed Issues#

Fixed a throughput performance regression in the speech synthesis service.
Return properly-punctuated words in WordInfo objects in offline speech recognition mode.
When word boosting in speech recognition, a warning instead of an error is returned when requested words cannot be boosted.

Riva Speech Skills 2.2.0#

Important

We recommend using the Riva Speech Skills 2.2.1 (22.05.1) release instead of v2.2.0.

Breaking Changes#

In the intent_slot pipeline, the --contextual command-line option is removed. The contextual mode behavior is still supported by the Riva client API and ServiceMaker using the contextual model config attribute. The default is false.

Model Updates#

Added new Hindi speech recognition model (Conformer).
Improved the Mandarin language model.
Added Mandarin punctuation support.

New Features and Enhancements#

Riva supports the NVIDIA Jetson Orin platform.
Punctuation models support arbitrary sequence length, and no longer truncate inputs.
Added the option to share the feature extractor between multiple ASR pipelines.

Fixed Issues#

Fixed an issue in TTS where the pitch and rate attributes were not applied where specified.
Fixed an issue reading non-standard wav headers that could cause marginally increased latency returning first result.
Fixed improperly required channel_count in speech recognition request configuration.
Fixed a potential crash when deploying TTS for a novel language with text normalization disabled.

Known Issues#

The Mandarin punctuation model clips the output when there are English words present in the input text.
The Mandarin punctuation model accuracy is low compared to other languages. It will be improved in a future version of Riva.
The Riva server currently does not return timestamps for every Mandarin character in the transcript. This will be fixed in a future version of Riva.
On the Jetson Xavier NX, the German ASR model doesn’t fit into the available 8 GB RAM.

Riva Speech Skills 2.1.0#

Breaking Changes#

Removed the following environment variables related to text normalization in TTS: NORM_PROTO_CONFIG and NORM_PROTO_PATH.
In previous versions, TTS used text normalization by default if none is specified. Now, text normalization will not be performed if none is specified.

New Features and Enhancements#

Added text normalization options as part of the riva-build process. Refer to the TTS Pipeline Configuration section for more information.
Added multiple tutorials

Riva Speech Skills 2.0.0#

All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Breaking Changes#

All legacy Jarvis APIs have been removed and are no longer supported.
The returned type of audio waveform from the Riva TTS service is now int16 to be compatible with the linear PCM wave format currently supported by Riva.

New Features and Enhancements#

Riva supports Linux ARM64 platforms, that is, NVIDIA Jetson AGX Xavier™ and NVIDIA Jetson NX Xavier, referred to as embedded throughout the documentation.
Riva provides two new pretrained TTS voices that are easily deployable via the Quick Start scripts.
Phoneme SSML tags support manually overriding pronunciations.
SSL/TLS connections to the Riva server are supported.
There is a new option for generating additional tokenization’s for words in the lexicon (this is an experimental feature, which may boost recognition accuracy).
Inverse text normalization grammars must be provided during the riva-build stage to allow customizations for inverse text normalization.
Ability to add opt-in API key for sending telemetry back to NVIDIA.

Bug Fixes#

Fixed an issue in ServiceMaker that caused punctuation and capitalization models generated with recent NeMo versions to lead to inaccurate results.
Fixed an issue that could lead to a crash when using word boosting.

Known Issues#

Deployment of Citrinet models for offline recognition can fail during the riva-deploy phase if large chunk sizes are used. To workaround this issue, we recommend passing parameter max-dim=100000 to nemo2riva when converting the .nemo model to .riva. This will enable using a chunk size up to 900 seconds during the riva-deploy phase.
On embedded platforms, the ASR examples in asr-python-basics and asr-python-boosting Jupyter notebooks do not work by default, since they invoke offline recognition API and embedded platforms do not have an offline ASR model enabled by default. To get these examples working, you need to either deploy an offline ASR model or modify the examples to use streaming recognition API.
When deploying the offline ASR models with riva-deploy, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.

Riva Speech Skills 1.10.0 Beta#

This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.

Note

Users upgrading to 1.10.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features and Enhancements#

Riva 1.10.0 beta now uses Triton 2.19.0 and TensorRT 8.2
The default behavior of Riva TTS’s G2P pipeline has changed. Words that have multiple phonetic representations now default to use graphemes. This was done to match the default NeMo training behavior. To revert to the old behavior, please add --preprocessor.g2p_ignore_ambiguous=False to riva-build.
ASR word boosting at request time is supported in Riva. This feature allows you to provide a list of words that should be given a higher score when decoding the output of the acoustic model. Refer to the gRPC ASR protobuf file (riva/proto/riva_asr.proto) for more information on how to include boosted words with the ASR request.

Bug Fixes#

Fixed an issue that can cause acoustic models exported from NeMo 1.5+ to incorrectly include spaces in transcript.
Fixed an issue in nemo2riva preventing conversion of models from NeMo version less than 1.3.0.
Fixed an issue that could lead to irregular rhythm of speech when a TTS model was trained with mixed representation input.
Fixed an issue that can cause incorrect transcripts when the server is under a heavy load.

Known Issues#

The Riva Speech Samples image nvcr.io/nvidia/riva/riva-speech-client:1.10.0-beta-samples does not exist. Use nvcr.io/nvidia/riva/riva-speech-client:1.8.0-beta-samples instead.
The ASR word boosting feature in Riva currently does not support boosting of phrases or combination of words. This will be supported in a future version of Riva.
nemo2riva and riva-build is currently broken for newer WaveGlow NeMo TTS checkpoints. As a workaround, use this WaveGlow.riva file instead: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechsynthesis_waveglow/files.

Riva Speech Skills 1.9.0 Beta#

Note

Users upgrading to 1.9.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features and Enhancements#

Improved customization for Automatic Speech Recognition (ASR) Spanish (es-US) and German (de-DE) language models.
The rate SSML attribute supports x-low, low, medium, high, x-high, and default.
The pitch SSML attribute supports x-low, low, medium, high, x-high, and default.

Known Issues#

The pretrained model used to add punctuation and capitalization to ASR transcripts supports a maximum input length of 128 tokens. Currently, if an ASR transcript containing more than 128 tokens is passed to the punctuation and capitalization model, it will be truncated to 128 tokens. This will be addressed in a future release of Riva.
The pitch SSML attribute is not currently in compliance with the SSML specs, and does not support Hz, st, % changes.
When deploying the offline ASR models with riva-deploy, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.

Riva Speech Skills 1.8.0 Beta#

Note

Users upgrading to 1.8.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features and Enhancements#

Released new pretrained models for German (de-DE), Russian (ru-RU), and Spanish (es-US) speech recognition.
Increased recognition accuracy of English (en-US) speech recognition models.
Introduced partial support for Speech Synthesis Markup Language (SSML) within the TTS API. Support has been added for pitch and rate attributes of the <prosody> tag to control pitch and duration of synthesized speech. Additional SSML support is planned for future releases.
Added word boosting support to the Speech Recognition API to bias ASR engine to recognize particular words of interest at request time. This release is limited to boosting of in-vocabulary words; out-of-vocabulary word boosting will be available in an upcoming release.
Minor ASR inference speed improvements in online mode.
Improved offline ASR recognition accuracy.
Added support for the Automatic Speech Recognition (ASR) Conformer-CTC model. The Conformer-CTC model is a nonautoregressive variant of the Conformer model for ASR, which uses CTC loss/decoding instead of Transducer.

Bug Fixes#

Fixed an issue in TTS pipeline that can sometimes cause an audible ‘pop’ at the end of an utterance.

Known Issues#

The pretrained model used to add punctuation and capitalization to ASR transcripts supports a maximum input length of 128 tokens. Currently, if an ASR transcript containing more than 128 tokens is passed to the punctuation and capitalization model, it will be truncated to 128 tokens. This will be addressed in a future release of Riva.
The rate SSML attribute does not support x-low, low, medium, high, x-high, or default.
The pitch SSML attribute is not currently in compliance with the SSML specs, and does not support Hz, st, % changes, nor does it support x-low, low, medium, high, x-high, or default.
When deploying the offline ASR models with riva-deploy, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.

Riva Speech Skills 1.7.0 Beta#

Note

Users upgrading to 1.7.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features and Enhancements#

Added support for models trained by NVIDIA TAO Toolkit 21.11.
Riva Streaming TTS now supports resampling, if necessary, to match the requested audio sample rate.
Default Riva English ASR model updated with higher accuracy.
Minor improvements in English text normalization and inverse text normalization models.
Increased maximum message size to support larger audio inputs in offline ASR.

Bug Fixes#

Fixed minor issues that could cause the synthesized audio generated by the TTS service to be prematurely truncated.
Fixed issue related to custom pronunciations being mishandled by text normalization for TTS.

Known Issues#

When running the nemo2riva package with EFF version 0.5.2, an ignored exception warning is printed. This should not affect functionality of the generated .riva models. This will be addressed in a future release of EFF.
During ASR pipeline execution, inverse text normalization will not convert digits into numerals (one->1) unless there are 10 digits. This limitation will be addressed in a future version of Riva.
The punctuation pipeline does not support unicode character input. This will be fixed in the next release.

Riva Speech Skills 1.6.0 Beta#

Note

Users upgrading to 1.6.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

New Features and Enhancements#

The Riva TTS service is no longer limited to 400 characters long input strings.
Updated the performance page of the documentation to include performance of Citrinet, FastPitch, and HiFi-GAN models

Bug Fixes#

Fixed minor issues that could cause the synthesized audio generated by the TTS service to be prematurely truncated.

Known Issues#

Riva build does not support providing a 1-gram language model in .arpa format. This is due to a limitation in the KenLM utility to build language model binaries.
NLP Question Answering functionality may cause a segmentation fault when using TensorRT files generated from the NeMo -> Riva -> RMIR -> TensorRT path. This will be addressed in a future release.

Riva Speech Skills 1.5.0 Beta#

Note

Users upgrading to 1.5.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

Announcements#

NVIDIA Transfer Learning Toolkit (TLT) has been renamed to NVIDIA TAO Toolkit starting in the 1.5.0-beta release.

New Features and Enhancements#

Support for training n-gram language models for ASR has been added to TAO Toolkit. These language models are fully supported in Riva.
FastPitch now leverages Tensor Cores for improved inference performance.
nemo2riva now provides a warning when attempting to convert unsupported models.
Minor enhancements were made to cover additional cases in text normalization/inverse text normalization for English.

Bug Fixes#

Fixed failure in Quickstart for some versions of the NGC client.
Fixed minor issues that could cause occasional artifacts or reduced quality in TTS generated audio.
Eliminated misleading error messages during riva-build process.

Known Issues#

NLP Question Answering functionality may cause a segmentation fault when using TensorRT files generated from the NeMo -> Riva -> RMIR -> TensorRT path. This will be addressed in a future release.

Riva Speech Skills 1.4.0 Beta#

Note

Users upgrading to 1.4.0 Beta from previous versions must rerun riva-build for existing models. Those using the Quick Start tool should run riva_clean.sh followed by riva_init.sh.

Announcements#

The Jarvis framework has been renamed to Riva starting in the 1.4.0-beta release. Jarvis Speech Skills has been renamed to Riva Speech Skills. Documentation, scripts, and commands have been updated accordingly.
- The Jarvis API is supported but deprecated beginning with this release. It will be removed in a future release. Old Jarvis clients are expected to work as-is with this version of Riva Speech Skills, however, users will need to migrate to the Riva API after the Jarvis API is removed.
- The Riva API modifies the following service names:
  - JarvisASR -> RivaSpeechRecognition
  - JarvisNLP -> RivaLanguageUnderstanding
  - JarvisCoreNLP -> RivaLanguageUnderstanding
  - JarvisTTS -> RivaSpeechSynthesis
- jarvis-build and jarvis-deploy commands have been replaced with the equivalent riva-build and riva-deploy commands.
The riva-build command parameters for ASR pipelines have changed.
- The --lm_decoder_cpu parameter is deprecated. Replace --lm_decoder_cpu.decoder_type=<decoder_type> with --decoder_type=<decoder_type> and replace --lm_decoder_cpu.<param_name>=<param_value> with --<decoder_type>_decoder.<param_name>=<param_value>. For example, instead of using --lm_decoder_cpu.decoder_type=greedy --lm_decoder_cpu.asr_model_delay=-1, use --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
- The type of decoder to use must be explicitly set by using --decoder_type=<decoder_type> where <decoder_type> must be one of greedy, os2s, flashlight, or kaldi.
Refer to ASR Pipeline Configuration for example riva-build commands to use with different acoustic models.

Bug Fixes#

Minor stability improvements were made to the ASR and TTS services.
Exposed the model_name parameter in the nlp_classify_tokens sample client.
Fixed an issue with the ASR language model hyperparameter tuning tool.

Jarvis Speech Skills 1.3.0 Beta

Note

Users upgrading to 1.3.0 Beta from previous versions must rerun jarvis-build for existing models. Those using the Quick Start tool should run jarvis_clean.sh followed by jarvis_init.sh.

NVIDIA Riva Skills

Release Notes

Contents

Release Notes#

Riva Speech Skills 2.3.0#

New Features and Enhancements#

Deprecation Warning#

Riva Speech Skills 2.2.1#

Fixed Issues#

Riva Speech Skills 2.2.0#

Breaking Changes#

Model Updates#

New Features and Enhancements#

Fixed Issues#

Known Issues#

Riva Speech Skills 2.1.0#

Breaking Changes#

New Features and Enhancements#

Riva Speech Skills 2.0.0#

Breaking Changes#

New Features and Enhancements#

Bug Fixes#

Known Issues#

Riva Speech Skills 1.10.0 Beta#

New Features and Enhancements#

Bug Fixes#

Known Issues#

Riva Speech Skills 1.9.0 Beta#

New Features and Enhancements#

Known Issues#

Riva Speech Skills 1.8.0 Beta#

New Features and Enhancements#

Bug Fixes#

Known Issues#

Riva Speech Skills 1.7.0 Beta#

New Features and Enhancements#

Bug Fixes#

Known Issues#

Riva Speech Skills 1.6.0 Beta#

New Features and Enhancements#

Bug Fixes#

Known Issues#

Riva Speech Skills 1.5.0 Beta#

Announcements#

New Features and Enhancements#

Bug Fixes#

Known Issues#

Riva Speech Skills 1.4.0 Beta#

Announcements#

Bug Fixes#

Jarvis Speech Skills 1.3.0 Beta