Tutorials#

The best way to get started with NeMo is to start with one of our tutorials. These tutorials cover various domains and provide both introductory and advanced topics. They are designed to help you understand and use the NeMo toolkit effectively.

Running Tutorials on Colab#

Most NeMo tutorials can be run on Google’s Colab.

To run a tutorial:

Click the Colab link associated with the tutorial you are interested in from the table below.
Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.

Tutorial Overview#

**General Tutorials**#
Domain	Title	GitHub URL
General	Getting Started: NeMo Fundamentals	NeMo Fundamentals
General	Getting Started: Audio translator example	Audio translator example
General	Getting Started: Voice swap example	Voice swap example
General	Getting Started: NeMo Models	NeMo Models
General	Getting Started: NeMo Adapters	NeMo Adapters
General	Getting Started: NeMo Models on Hugging Face Hub	NeMo Models on HF Hub

**Multimodal Tutorials**#
Domain	Title	GitHub URL
Multimodal	Preparations and Advanced Applications: Multimodal Data Preparation	Multimodal Data Preparation
Multimodal	Preparations and Advanced Applications: NeVA (LLaVA) Tutorial	NeVA (LLaVA) Tutorial
Multimodal	Preparations and Advanced Applications: Stable Diffusion Tutorial	Stable Diffusion Tutorial
Multimodal	Preparations and Advanced Applications: DreamBooth Tutorial	DreamBooth Tutorial
Multimodal	Preparations and Advanced Applications: Stable Diffusion XL Quantization Tutorial	SDXL Quantization Tutorial

**Automatic Speech Recognition (ASR) Tutorials**#
Domain	Title	GitHub URL
ASR	ASR with NeMo	ASR with NeMo
ASR	ASR with Subword Tokenization	ASR with Subword Tokenization
ASR	Offline ASR	Offline ASR
ASR	Online ASR Microphone Cache Aware Streaming	Online ASR Microphone Cache Aware Streaming
ASR	Online ASR Microphone Buffered Streaming	Online ASR Microphone Buffered Streaming
ASR	ASR CTC Language Fine-Tuning	ASR CTC Language Fine-Tuning
ASR	Intro to Transducers	Intro to Transducers
ASR	ASR with Transducers	ASR with Transducers
ASR	ASR with Adapters	ASR with Adapters
ASR	Speech Commands	Speech Commands
ASR	Online Offline Microphone Speech Commands	Online Offline Microphone Speech Commands
ASR	Voice Activity Detection	Voice Activity Detection
ASR	Online Offline Microphone VAD	Online Offline Microphone VAD
ASR	Speaker Recognition and Verification	Speaker Recognition and Verification
ASR	Speaker Diarization Inference	Speaker Diarization Inference
ASR	ASR with Speaker Diarization	ASR with Speaker Diarization
ASR	Online Noise Augmentation	Online Noise Augmentation
ASR	ASR for Telephony Speech	ASR for Telephony Speech
ASR	Streaming inference	Streaming inference
ASR	Buffered Transducer inference	Buffered Transducer inference
ASR	Buffered Transducer inference with LCS Merge	Buffered Transducer inference with LCS Merge
ASR	Offline ASR with VAD for CTC models	Offline ASR with VAD for CTC models
ASR	Self-supervised Pre-training for ASR	Self-supervised Pre-training for ASR
ASR	Multi-lingual ASR	Multi-lingual ASR
ASR	Hybrid ASR-TTS Models	Hybrid ASR-TTS Models
ASR	ASR Confidence Estimation	ASR Confidence Estimation
ASR	Confidence-based Ensembles	Confidence-based Ensembles

**Text-to-Speech (TTS) Tutorials**#
Domain	Title	GitHub URL
TTS	Basic and Advanced: NeMo TTS Primer	NeMo TTS Primer
TTS	Basic and Advanced: TTS Speech/Text Aligner Inference	TTS Speech/Text Aligner Inference
TTS	Basic and Advanced: FastPitch and MixerTTS Model Training	FastPitch and MixerTTS Model Training
TTS	Basic and Advanced: FastPitch Finetuning	FastPitch Finetuning
TTS	Basic and Advanced: FastPitch and HiFiGAN Model Training for German	FastPitch and HiFiGAN Model Training for German
TTS	Basic and Advanced: Tacotron2 Model Training	Tacotron2 Model Training
TTS	Basic and Advanced: FastPitch Duration and Pitch Control	FastPitch Duration and Pitch Control
TTS	Basic and Advanced: FastPitch Speaker Interpolation	FastPitch Speaker Interpolation
TTS	Basic and Advanced: TTS Inference and Model Selection	TTS Inference and Model Selection
TTS	Basic and Advanced: TTS Pronunciation Customization	TTS Pronunciation Customization

**Tools and Utilities**#
Domain	Title	GitHub URL
Utility Tools	Utility Tools for Speech and Text: NeMo Forced Aligner	NeMo Forced Aligner
Utility Tools	Utility Tools for Speech and Text: Speech Data Explorer	Speech Data Explorer
Utility Tools	Utility Tools for Speech and Text: CTC Segmentation	CTC Segmentation

**Text Processing (TN/ITN) Tutorials**#
Domain	Title	GitHub URL
Text Processing	Text Normalization Techniques: Text Normalization	Text Normalization
Text Processing	Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger	Inverse Text Normalization with Thutmose Tagger
Text Processing	Text Normalization Techniques: WFST Tutorial	WFST Tutorial