Contents
Getting Started
Installation
Tutorials
- Speech Recognition
- How do I use Riva ASR APIs with out-of-the-box models?
- Creating Grammars for Speech Hints
- How to Customize Riva ASR Vocabulary and Pronunciation with Lexicon Mapping
- How to Deploy a Custom Language Model (n-gram) Trained with NeMo on Riva
- How to Deploy a Custom Acoustic Model (Citrinet) Trained with NeMo on Riva
- How to Deploy a Custom Acoustic Model (Conformer-CTC) Trained with NeMo on Riva
- How to Deploy a Conformer-CTC Acoustic Model with WFST Decoders
- How to Fine-Tune a Riva ASR Acoustic Model with NVIDIA NeMo
- How to Customize a Riva ASR Acoustic Model (Conformer-CTC) with Adapters
- ASR with Adapters
- What are Adapters?
- Advantages and Limitations of Adapter Training
- Preparing the Acoustic Encoder for Adapter Training
- Preparing the Model and Dataset for Adaptation
- Creating and Training an Adapter
- Evaluating the Model
- Export the Model to Riva
- What’s Next?
- How to Fine-Tune a Riva ASR Acoustic Model with NVIDIA NeMo
- How to Improve Recognition of Specific Words
- Conclusion
- How to Synthesize a Noisy Dataset that can be used to Train a Noise Robust ASR Model
- How to Improve the Accuracy on Noisy Speech by Fine-Tuning the Acoustic Model (Conformer-CTC) in the Riva ASR Pipeline
- How To Train, Evaluate, and Fine-Tune an n-gram Language Model
- How do I Use Speaker Diarization with Riva ASR?
- Requirements and Setup
- How do I boost specific words at runtime with word boosting?
- Support for Class Based n-gram Language Models in Riva (WFST Decoder)
- WFST Decoding
- Speech Recognition - New Language Adaptation
- Cloud Deployment
- Speech Synthesis
- Translation
- How do I perform Language Translation using Riva NMT APIs with out-of-the-box models?
- How to deploy a NeMo-finetuned NMT model on Riva Speech Skills server?
- How to fine-tune a Riva NMT Bilingual model with Nvidia NeMo
- How to perform synthetic data generation using Riva NMT Multilingual model with Nvidia NeMo
- How to fine-tune a Riva NMT Multilingual model with Nvidia NeMo
Architecture
Speech Recognition
- ASR Overview
- Basics of Speech Recognition and Customization of Riva ASR
- Basics of Automatic Speech Recognition
- Evaluation of ASR Accuracy
- Riva ASR
- Riva Speech Recognition Pipeline
- Pipeline Configuration
- Streaming/Offline Recognition
- Language Models
- Flashlight Decoder Lexicon
- Flashlight Decoder Lexicon Free
- OpenSeq2Seq Decoder
- Beginning/End of Utterance Detection
- Streaming Speaker Diarization
- Neural-Based Voice Activity Detection
- Generating Multiple Transcript Hypotheses
- Impact of Chunk Size and Padding Size on Performance and Accuracy (Advanced)
- Sharing Acoustic and Feature Extractor Models Across Multiple ASR Pipelines (Advanced)
- Riva-build Optional Parameters
- Performance
- ASR Advanced Details
Speech Synthesis
Natural Language Processing
Translation
SDKs and Sample Apps
Reference
- Models
- gRPC & Protocol Buffers
- Troubleshooting
- Upgrading
- Acknowledgements
- Google APIs
- GoogleTest
- gflags
- Google Logging Library (glog)
- speexdsp
- libFLAC
- gRPC
- Triton Inference Server
- NVlabs cub
- KenLM
- Kaldi
- grpc_health_probe
- OpenFST
- Yamale
- PyTorch
- requests
- PyCUDA
- RapidJSON
- protobuf
- onnx
- librosa
- omegaconf
- utf8proc
- re2
- thrax
- Sparrowhawk
- SentencePiece
- YouTokenToMe
- MS-SNSD
- Silero VAD
- End User License Agreement
- Notice