Tutorials#
The best way to get started with NeMo is to start with one of our tutorials. They cover various domains and provide both introductory and advanced topics.
Speech AI#
Most NeMo Speech AI tutorials can be run on Google’s Colab.
Running Tutorials on Colab#
To run a tutorial:
Click the Colab link associated with the tutorial you are interested in from the table below.
Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.
Speech AI Fundamentals#
Title |
GitHub / Colab URL |
|---|---|
Getting Started: NeMo Fundamentals |
|
Getting Started: Audio translator example |
|
Getting Started: Voice swap example |
|
Getting Started: NeMo Models |
|
Getting Started: NeMo Adapters |
|
Getting Started: NeMo Models on Hugging Face Hub |
Automatic Speech Recognition (ASR) Tutorials#
Title |
GitHub / Colab URL |
|---|---|
ASR with NeMo |
|
ASR with Subword Tokenization |
|
Offline ASR |
|
Online ASR Microphone Cache Aware Streaming |
|
Online ASR Microphone Buffered Streaming |
|
ASR CTC Language Fine-Tuning |
|
Intro to Transducers |
|
ASR with Transducers |
|
ASR with Adapters |
|
Speech Commands |
|
Online Offline Microphone Speech Commands |
|
Voice Activity Detection |
|
Online Offline Microphone VAD |
|
Speaker Recognition and Verification |
|
Speaker Diarization Inference |
|
ASR with Speaker Diarization |
|
Online Noise Augmentation |
|
ASR for Telephony Speech |
|
Streaming inference |
|
Buffered Transducer inference |
|
Buffered Transducer inference with LCS Merge |
|
Offline ASR with VAD for CTC models |
|
Self-supervised Pre-training for ASR |
|
Multi-lingual ASR |
|
Hybrid ASR-TTS Models |
|
ASR Confidence Estimation |
|
Confidence-based Ensembles |
Text-to-Speech (TTS) Tutorials#
Title |
GitHub / Colab URL |
|---|---|
Basic and Advanced: NeMo TTS Primer |
|
Basic and Advanced: TTS Speech/Text Aligner Inference |
|
Basic and Advanced: FastPitch and MixerTTS Model Training |
|
Basic and Advanced: FastPitch Finetuning |
|
Basic and Advanced: FastPitch and HiFiGAN Model Training for German |
|
Basic and Advanced: Tacotron2 Model Training |
|
Basic and Advanced: FastPitch Duration and Pitch Control |
|
Basic and Advanced: FastPitch Speaker Interpolation |
|
Basic and Advanced: TTS Inference and Model Selection |
|
Basic and Advanced: TTS Pronunciation Customization |
Tools and Utilities#
Title |
GitHub / Colab URL |
|---|---|
Utility Tools for Speech and Text: NeMo Forced Aligner |
|
Utility Tools for Speech and Text: Speech Data Explorer |
|
Utility Tools for Speech and Text: CTC Segmentation |
Text Processing (TN/ITN) Tutorials#
Title |
GitHub / Colab URL |
|---|---|
Text Normalization Techniques: Text Normalization |
|
Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger |
|
Text Normalization Techniques: WFST Tutorial |