Tutorials

The best way to get started with NeMo is to start with one of our tutorials. These tutorials cover various domains and provide both introductory and advanced topics. They are designed to help you understand and use the NeMo toolkit effectively.

Running Tutorials on Colab

Most NeMo tutorials can be run on Google’s Colab.

To run a tutorial:

  1. Click the Colab link associated with the tutorial you are interested in from the table below.

  2. Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.

Tutorial Overview

General Tutorials

Domain

Title

GitHub URL

General

Getting Started: NeMo Fundamentals

NeMo Fundamentals

General

Getting Started: Audio translator example

Audio translator example

General

Getting Started: Voice swap example

Voice swap example

General

Getting Started: NeMo Models

NeMo Models

General

Getting Started: NeMo Adapters

NeMo Adapters

General

Getting Started: NeMo Models on Hugging Face Hub

NeMo Models on HF Hub

Multimodal Tutorials

Domain

Title

GitHub URL

Multimodal

Preparations and Advanced Applications: Multimodal Data Preparation

Multimodal Data Preparation

Multimodal

Preparations and Advanced Applications: NeVA (LLaVA) Tutorial

NeVA (LLaVA) Tutorial

Multimodal

Preparations and Advanced Applications: Stable Diffusion Tutorial

Stable Diffusion Tutorial

Multimodal

Preparations and Advanced Applications: DreamBooth Tutorial

DreamBooth Tutorial

Multimodal

Preparations and Advanced Applications: Stable Diffusion XL Quantization Tutorial

DreamBooth Tutorial

Automatic Speech Recognition (ASR) Tutorials

Domain

Title

GitHub URL

ASR

ASR with NeMo

ASR with NeMo

ASR

ASR with Subword Tokenization

ASR with Subword Tokenization

ASR

Offline ASR

Offline ASR

ASR

Online ASR Microphone Cache Aware Streaming

Online ASR Microphone Cache Aware Streaming

ASR

Online ASR Microphone Buffered Streaming

Online ASR Microphone Buffered Streaming

ASR

ASR CTC Language Fine-Tuning

ASR CTC Language Fine-Tuning

ASR

Intro to Transducers

Intro to Transducers

ASR

ASR with Transducers

ASR with Transducers

ASR

ASR with Adapters

ASR with Adapters

ASR

Speech Commands

Speech Commands

ASR

Online Offline Microphone Speech Commands

Online Offline Microphone Speech Commands

ASR

Voice Activity Detection

Voice Activity Detection

ASR

Online Offline Microphone VAD

Online Offline Microphone VAD

ASR

Speaker Recognition and Verification

Speaker Recognition and Verification

ASR

Speaker Diarization Inference

Speaker Diarization Inference

ASR

ASR with Speaker Diarization

ASR with Speaker Diarization

ASR

Online Noise Augmentation

Online Noise Augmentation

ASR

ASR for Telephony Speech

ASR for Telephony Speech

ASR

Streaming inference

Streaming inference

ASR

Buffered Transducer inference

Buffered Transducer inference

ASR

Buffered Transducer inference with LCS Merge

Buffered Transducer inference with LCS Merge

ASR

Offline ASR with VAD for CTC models

Offline ASR with VAD for CTC models

ASR

Self-supervised Pre-training for ASR

Self-supervised Pre-training for ASR

ASR

Multi-lingual ASR

Multi-lingual ASR

ASR

Hybrid ASR-TTS Models

Hybrid ASR-TTS Models

ASR

ASR Confidence Estimation

ASR Confidence Estimation

ASR

Confidence-based Ensembles

Confidence-based Ensembles

Text-to-Speech (TTS) Tutorials

Domain

Title

GitHub URL

TTS

Basic and Advanced: NeMo TTS Primer

NeMo TTS Primer

TTS

Basic and Advanced: TTS Speech/Text Aligner Inference

TTS Speech/Text Aligner Inference

TTS

Basic and Advanced: FastPitch and MixerTTS Model Training

FastPitch and MixerTTS Model Training

TTS

Basic and Advanced: FastPitch Finetuning

FastPitch Finetuning

TTS

Basic and Advanced: FastPitch and HiFiGAN Model Training for German

FastPitch and HiFiGAN Model Training for German

TTS

Basic and Advanced: Tacotron2 Model Training

Tacotron2 Model Training

TTS

Basic and Advanced: FastPitch Duration and Pitch Control

FastPitch Duration and Pitch Control

TTS

Basic and Advanced: FastPitch Speaker Interpolation

FastPitch Speaker Interpolation

TTS

Basic and Advanced: TTS Inference and Model Selection

TTS Inference and Model Selection

TTS

Basic and Advanced: TTS Pronunciation Customization

TTS Pronunciation Customization

Tools and Utilities

Domain

Title

GitHub URL

Utility Tools

Utility Tools for Speech and Text: NeMo Forced Aligner

NeMo Forced Aligner

Utility Tools

Utility Tools for Speech and Text: Speech Data Explorer

Speech Data Explorer

Utility Tools

Utility Tools for Speech and Text: CTC Segmentation

CTC Segmentation

Text Processing (TN/ITN) Tutorials

Domain

Title

GitHub URL

Text Processing

Text Normalization Techniques: Text Normalization

Text Normalization

Text Processing

Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger

Inverse Text Normalization with Thutmose Tagger

Text Processing

Text Normalization Techniques: WFST Tutorial

WFST Tutorial