Speech Recognition

Speech Recognition#

How do I use Riva ASR APIs with out-of-the-box models?
- NVIDIA Riva Overview
- Transcription with Riva ASR APIs
- Go deeper into Riva capabilities
Creating Grammars for Speech Hints
- Dependencies
- Prerequisites
- Overview
- Using Existing Speech Hint Grammars in Python
- Sample Grammar for Handling Alphabet Sequences
How to Customize Riva ASR Vocabulary and Pronunciation with Lexicon Mapping
- Overview
- What can be customized?
- Extending the vocabulary
- Customizing pronunciation with lexicon mapping
- Go deeper into Riva capabilities
How to Deploy a Custom Language Model (n-gram) Trained with NeMo on Riva
- NVIDIA Riva Overview
- NeMo (Neural Modules) and nemo2riva
- Prerequisites
- Riva ServiceMaker
- Start the Riva Server
- Run Inference
How to Deploy a Custom Acoustic Model (Citrinet) Trained with NeMo on Riva
- NVIDIA Riva Overview
- NeMo (Neural Modules) and nemo2riva
- Prerequisites
- Riva ServiceMaker
- Start the Riva Server
- Run Inference
How to Deploy a Custom Acoustic Model (Conformer-CTC) Trained with NeMo on Riva
- NVIDIA Riva Overview
- NeMo (Neural Modules) and nemo2riva
- Prerequisites
- Riva ServiceMaker
- Start the Riva Server
- Run Inference
How to Deploy a Conformer-CTC Acoustic Model with WFST Decoders
- NVIDIA Riva Overview
- NeMo (Neural Modules) and nemo2riva
- Prerequisites
- Riva ServiceMaker
- Start the Riva Server
- Run Inference
How to Fine-Tune a Riva ASR Acoustic Model with NVIDIA NeMo
- NVIDIA Riva Overview
- NeMo (Neural Modules)
- Fine-Tuning an ASR model with NeMo
- More Resources
- What’s Next?
How to Customize a Riva ASR Acoustic Model (Conformer-CTC) with Adapters
- NVIDIA Riva Overview
- Neural Module (NeMo)
ASR with Adapters
What are Adapters?
Advantages and Limitations of Adapter Training
Preparing the Acoustic Encoder for Adapter Training
Preparing the Model and Dataset for Adaptation
Creating and Training an Adapter
Evaluating the Model
Export the Model to Riva
What’s Next?
How to Fine-Tune a Riva ASR Acoustic Model with NVIDIA NeMo
- NVIDIA Riva Overview
- NeMo (Neural Modules)
- Fine-Tuning an ASR model with NeMo
- More Resources
- What’s Next?
How to Improve Recognition of Specific Words
- Overview of Riva customization techniques
- 1. Word boosting
- 2. Custom vocabulary
- 3. Custom pronunciation (Lexicon mapping)
- 4. Retrain language model
- 5. Fine-tune the acoustic model
Conclusion
How to Synthesize a Noisy Dataset that can be used to Train a Noise Robust ASR Model
How to Improve the Accuracy on Noisy Speech by Fine-Tuning the Acoustic Model (Conformer-CTC) in the Riva ASR Pipeline
- NVIDIA Riva Overview
How To Train, Evaluate, and Fine-Tune an n-gram Language Model
- n-gram Language Model
- Prerequisites
- Training and Fine-tuning LM with KenLM and NeMo
- Download the Evaluation Dataset
- Generating the Base Language Model
- Load the ASR Model
- Fine-Tuning and Interpolation
How do I Use Speaker Diarization with Riva ASR?
- NVIDIA Riva Overview
- Speaker Diarization with Riva ASR APIs
Requirements and Setup
- Import the Riva Client Libraries
- Create a Riva Client and Connect to the Riva Speech API Server
- Go Deeper into Riva Capabilities
How do I boost specific words at runtime with word boosting?
- NVIDIA Riva Overview
- Word boosting with Riva ASR APIs
- Go deeper into Riva capabilities
Support for Class Based n-gram Language Models in Riva (WFST Decoder)
WFST Decoding
- Supporting Class Language Models in the WFST Framework
- Process to Create TLG.fst Supporting Word Classes
- Evaluation
- Deploying in Riva