Release Notes
Contents
Release Notes¶
Riva Speech Skills 2.0.0¶
All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
If you are upgrading to 2.0.0 from a previous version, rerun riva-build
for existing models.
If you are using the Quick Start tool, run riva_clean.sh
followed by riva_init.sh
.
Breaking Changes¶
All legacy Jarvis APIs have been removed and are no longer supported.
The returned type of audio waveform from the Riva TTS service is now
int16
to be compatible with the linear PCM wave format currently supported by Riva.
New Features / Enhancements¶
Riva supports Linux ARM64 platforms, i.e. NVIDIA Jetson AGX Xavier™ and NVIDIA Jetson NX Xavier, referred to as embedded throughout the documentation.
Riva provides two new pre-trained TTS voices that are easily deployable via the Quick Start scripts.
Phoneme SSML tags support manually overriding pronunciations.
SSL/TLS connections to the Riva server are supported.
There’s a new option for generating additional tokenization’s for words in the lexicon (this is an experimental feature which may boost recognition accuracy).
Inverse text normalization grammars must be provided during the
riva-build
stage to allow customizations for inverse text normalization.Ability to add opt-in API key for sending telemetry back to NVIDIA.
Bug fixes¶
Fixed an issue in ServiceMaker that caused punctuation and capitalization models generated with recent NeMo versions to lead to inaccurate results.
Fixed an issue that could lead to a crash when using word boosting.
Known Issues¶
Deployment of Citrinet models for offline recognition can fail during the
riva-deploy
phase if large chunk sizes are used. To workaround this issue, we recommend passing parametermax-dim=100000
tonemo2riva
when converting the.nemo
model to.riva
. This will enable using a chunk size up to 900 seconds during theriva-deploy
phase.On embedded platforms, the ASR examples in
asr-python-basics
andasr-python-boosting
Jupyter notebooks don’t work by default, since they invoke offline recognition API and embedded platforms don’t have an offline ASR model enabled by default. To get these examples working, you need to either deploy an offline ASR model or modify the examples to use streaming recognition API.When deploying the offline ASR models with
riva-deploy
, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.
Riva Speech Skills 1.10.0 Beta¶
This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
Users upgrading to 1.10.0 Beta from previous versions must rerun riva-build
for existing models.
Those using the Quick Start tool should run riva_clean.sh
followed by riva_init.sh
.
New Features / Enhancements¶
Riva 1.10.0 beta now uses Triton 2.19.0 and TensorRT 8.2
The default behavior of Riva TTS’s G2P pipeline has changed. Words that have multiple phonetic representations now default to use graphemes. This was done to match the default NeMo training behavior. To revert to the old behavior, please add
--preprocessor.g2p_ignore_ambiguous=False
to riva-build.ASR word boosting at request time is supported in Riva. This feature allows you to provide a list of words that should be given a higher score when decoding the output of the acoustic model. Refer to the gRPC ASR protobuf file (riva/proto/riva_asr.proto) for more information on how to include boosted words with the ASR request.
Bug fixes¶
Fixed an issue that can cause acoustic models exported from NeMo 1.5+ to incorrectly include spaces in transcript.
Fixed an issue in
nemo2riva
preventing conversion of models from NeMo version less than 1.3.0.Fixed an issue that could lead to irregular rhythm of speech when a TTS model was trained with mixed representation input.
Fixed an issue that can cause incorrect transcripts when the server is under a heavy load.
Known Issues¶
The Riva Speech Samples image
nvcr.io/nvidia/riva/riva-speech-client:1.10.0-beta-samples
does not exist. Usenvcr.io/nvidia/riva/riva-speech-client:1.8.0-beta-samples
instead.The ASR word boosting feature in Riva currently does not support boosting of phrases or combination of words. This will be supported in a future version of Riva.
nemo2riva
andriva-build
is currently broken for newer WaveGlow NeMo TTS checkpoints. As a workaround, use thisWaveGlow.riva
file instead: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechsynthesis_waveglow/files.
Riva Speech Skills 1.9.0 Beta¶
This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
Users upgrading to 1.9.0 Beta from previous versions must rerun riva-build
for existing models.
Those using the Quick Start tool should run riva_clean.sh
followed by riva_init.sh
.
New Features / Enhancements¶
Improved customization for Automatic Speech Recognition (ASR) Spanish (es-US) and German (de-DE) language models.
The
rate
SSML attribute supportsx-low
,low
,medium
,high
,x-high
, anddefault
.The
pitch
SSML attribute supportsx-low
,low
,medium
,high
,x-high
, anddefault
.
Known Issues¶
The pre-trained model used to add punctuation and capitalization to ASR transcripts supports a maximum input length of 128 tokens. Currently, if an ASR transcript containing more than 128 tokens is passed to the punctuation and capitalization model, it will be truncated to 128 tokens. This will be addressed in a future release of Riva.
The
pitch
SSML attribute is not currently in compliance with the SSML specs, and does not supportHz
,st
,%
changes.When deploying the offline ASR models with
riva-deploy
, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.
Riva Speech Skills 1.8.0 Beta¶
This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
Users upgrading to 1.8.0 Beta from previous versions must rerun riva-build
for existing models.
Those using the Quick Start tool should run riva_clean.sh
followed by riva_init.sh
.
New Features / Enhancements¶
Released new pretrained models for German (de-DE), Russian (ru-RU), and Spanish (es-US) speech recognition.
Increased recognition accuracy of English (en-US) speech recognition models.
Introduced partial support for Speech Synthesis Markup Language (SSML) within the TTS API. Support has been added for
pitch
andrate
attributes of the<prosody>
tag to control pitch and duration of synthesized speech. Additional SSML support is planned for future releases.Added word boosting support to the Speech Recognition API to bias ASR engine to recognize particular words of interest at request time. This release is limited to boosting of in-vocabulary words; out-of-vocabulary word boosting will be available in an upcoming release.
Minor ASR inference speed improvements in online mode.
Improved offline ASR recognition accuracy.
Added support for the Automatic Speech Recognition (ASR) Conformer-CTC model. The Conformer-CTC model is a non-autoregressive variant of the Conformer model for ASR which uses CTC loss/decoding instead of Transducer.
Bug fixes¶
Fixed an issue in TTS pipeline that can sometimes cause an audible ‘pop’ at the end of an utterance.
Known Issues¶
The pre-trained model used to add punctuation and capitalization to ASR transcripts supports a maximum input length of 128 tokens. Currently, if an ASR transcript containing more than 128 tokens is passed to the punctuation and capitalization model, it will be truncated to 128 tokens. This will be addressed in a future release of Riva.
The
rate
SSML attribute does not supportx-low
,low
,medium
,high
,x-high
, ordefault
.The
pitch
SSML attribute is not currently in compliance with the SSML specs, and does not supportHz
,st
,%
changes, nor does it supportx-low
,low
,medium
,high
,x-high
, ordefault
.When deploying the offline ASR models with
riva-deploy
, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.
Riva Speech Skills 1.7.0 Beta¶
This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
Users upgrading to 1.7.0 Beta from previous versions must rerun riva-build
for existing models.
Those using the Quick Start tool should run riva_clean.sh
followed by riva_init.sh
.
New Features / Enhancements¶
Added support for models trained by NVIDIA TAO Toolkit 21.11.
Riva Streaming TTS now supports resampling, if necessary, to match the requested audio sample rate.
Default Riva English ASR model updated with higher accuracy.
Minor improvements in English text normalization and inverse text normalization models.
Increased maximum message size to support larger audio inputs in offline ASR.
Bug fixes¶
Fixed minor issues that could cause the synthesized audio generated by the TTS service to be prematuraly truncated.
Fixed issue related to custom pronunciations being mishandled by text normalization for TTS.
Known Issues¶
When running the
nemo2riva
package with EFF version 0.5.2, an ignored exception warning is printed. This should not affect functionality of the generated.riva
models. This will be addressed in a future release of EFF.During ASR pipeline execution inverse text normalization will not convert digits into numerals (one->1) unless there are ten digits. This limitation will be addressed in a future version of Riva.
The punctuation pipeline does not support unicode character input. This will be fixed in the next release.
Riva Speech Skills 1.6.0 Beta¶
This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
Users upgrading to 1.6.0 Beta from previous versions must rerun riva-build
for existing models.
Those using the Quick Start tool should run riva_clean.sh
followed by riva_init.sh
.
New Features / Enhancements¶
The Riva TTS service is no longer limited to 400 characters long input strings.
Updated the performance page of the documentation to include performance of Citrinet and Fastpitch+Hifigan models
Bug fixes¶
Fixed minor issues that could cause the synthesized audio generated by the TTS service to be prematuraly truncated.
Known Issues¶
Riva build does not support providing a 1-gram language model in .arpa format. This is due to a limitation in the KenLM utility to build language model binaries.
NLP Question Answering functionality may cause a segmentation fault when using TensorRT files generated from the Nemo -> Riva -> RMIR -> TensorRT path. This will be addressed in a future release.
Riva Speech Skills 1.5.0 Beta¶
This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
Users upgrading to 1.5.0 Beta from previous versions must rerun riva-build
for existing models.
Those using the Quick Start tool should run riva_clean.sh
followed by riva_init.sh
.
Announcements¶
NVIDIA Transfer Learning Toolkit (TLT) has been renamed to NVIDIA TAO Toolkit starting in the 1.5.0-beta release.
New Features / Enhancements¶
Support for training n-gram language models for ASR has been added to TAO Toolkit. These language models are fully supported in Riva.
FastPitch now leverages Tensor Cores for improved inference performance.
nemo2riva
now provides a warning when attempting to convert unsupported models.Minor enhancements were made to cover additional cases in text normalization/inverse text normalization for English.
Bug fixes¶
Fixed failure in Quickstart for some versions of the NGC client.
Fixed minor issues that could cause occasional artifacts or reduced quality in TTS generated audio.
Eliminated misleading error messages during
riva-build
process.
Known Issues¶
NLP Question Answering functionality may cause a segmentation fault when using TensorRT files generated from the Nemo -> Riva -> RMIR -> TensorRT path. This will be addressed in a future release.
Riva Speech Skills 1.4.0 Beta¶
This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
Users upgrading to 1.4.0 Beta from previous versions must rerun riva-build
for existing models.
Those using the Quick Start tool should run riva_clean.sh
followed by riva_init.sh
.
Announcements¶
The Jarvis framework has been renamed to Riva starting in the 1.4.0-beta release. Jarvis Speech Skills has been renamed to Riva Speech Skills. Documentation, scripts, and commands have been updated accordingly.
The Jarvis API is supported but deprecated beginning with this release. It will be removed in a future release. Old Jarvis clients are expected to work as-is with this version of Riva Speech Skills, however, users will need to migrate to the Riva API after the Jarvis API is removed.
The Riva API modifies the following service names:
JarvisASR -> RivaSpeechRecognition
JarvisNLP -> RivaLanguageUnderstanding
JarvisCoreNLP -> RivaLanguageUnderstanding
JarvisTTS -> RivaSpeechSynthesis
jarvis-build
andjarvis-deploy
commands have been replaced with the equivalentriva-build
andriva-deploy
commands.
The
riva-build
command parameters for ASR pipelines have changed.The
--lm_decoder_cpu
parameter is deprecated. Replace--lm_decoder_cpu.decoder_type=<decoder_type>
with--decoder_type=<decoder_type>
and replace--lm_decoder_cpu.<param_name>=<param_value>
with--<decoder_type>_decoder.<param_name>=<param_value>
. For example, instead of using--lm_decoder_cpu.decoder_type=greedy --lm_decoder_cpu.asr_model_delay=-1
, use--decoder_type=greedy --greedy_decoder.asr_model_delay=-1
.The type of decoder to use must be explicitly set by using
--decoder_type=<decoder_type>
where<decoder_type>
must be one ofgreedy
,os2s
,flashlight
, orkaldi
.
Refer to ASR Pipeline Configuration for example
riva-build
commands to use with different acoustic models.
Bug Fixes¶
Minor stability improvements were made to the ASR and TTS services.
Exposed the
model_name
parameter in thenlp_classify_tokens
sample client.Fixed an issue with the ASR language model hyperparameter tuning tool.
Jarvis Speech Skills 1.3.0 Beta
This is a beta release. All published functionality in the release notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our NVIDIA Riva Developer Forum.
Note
Users upgrading to 1.3.0 Beta from previous versions must rerun jarvis-build
for existing models.
Those using the Quick Start tool should run jarvis_clean.sh
followed by jarvis_init.sh
.