Tutorials

The best way to get started with NeMo is to start with one of our tutorials. These tutorials cover various domains and provide both introductory and advanced topics. They are designed to help you understand and use the NeMo toolkit effectively.

Most NeMo tutorials can be run on Google’s Colab.

To run a tutorial:

  1. Click the Colab link associated with the tutorial you are interested in from the table below.

  2. Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.

General Tutorials

Domain

Title

GitHub URL

General Getting Started: NeMo Fundamentals NeMo Fundamentals
General Getting Started: Audio translator example Audio translator example
General Getting Started: Voice swap example Voice swap example
General Getting Started: NeMo Models NeMo Models
General Getting Started: NeMo Adapters NeMo Adapters
General Getting Started: NeMo Models on Hugging Face Hub NeMo Models on HF Hub
Multimodal Tutorials

Domain

Title

GitHub URL

Multimodal Preparations and Advanced Applications: Multimodal Data Preparation Multimodal Data Preparation
Multimodal Preparations and Advanced Applications: NeVA (LLaVA) Tutorial NeVA (LLaVA) Tutorial
Multimodal Preparations and Advanced Applications: Stable Diffusion Tutorial Stable Diffusion Tutorial
Multimodal Preparations and Advanced Applications: DreamBooth Tutorial DreamBooth Tutorial
Multimodal Preparations and Advanced Applications: Stable Diffusion XL Quantization Tutorial DreamBooth Tutorial
Automatic Speech Recognition (ASR) Tutorials

Domain

Title

GitHub URL

ASR ASR with NeMo ASR with NeMo
ASR ASR with Subword Tokenization ASR with Subword Tokenization
ASR Offline ASR Offline ASR
ASR Online ASR Microphone Cache Aware Streaming Online ASR Microphone Cache Aware Streaming
ASR Online ASR Microphone Buffered Streaming Online ASR Microphone Buffered Streaming
ASR ASR CTC Language Fine-Tuning ASR CTC Language Fine-Tuning
ASR Intro to Transducers Intro to Transducers
ASR ASR with Transducers ASR with Transducers
ASR ASR with Adapters ASR with Adapters
ASR Speech Commands Speech Commands
ASR Online Offline Microphone Speech Commands Online Offline Microphone Speech Commands
ASR Voice Activity Detection Voice Activity Detection
ASR Online Offline Microphone VAD Online Offline Microphone VAD
ASR Speaker Recognition and Verification Speaker Recognition and Verification
ASR Speaker Diarization Inference Speaker Diarization Inference
ASR ASR with Speaker Diarization ASR with Speaker Diarization
ASR Online Noise Augmentation Online Noise Augmentation
ASR ASR for Telephony Speech ASR for Telephony Speech
ASR Streaming inference Streaming inference
ASR Buffered Transducer inference Buffered Transducer inference
ASR Buffered Transducer inference with LCS Merge Buffered Transducer inference with LCS Merge
ASR Offline ASR with VAD for CTC models Offline ASR with VAD for CTC models
ASR Self-supervised Pre-training for ASR Self-supervised Pre-training for ASR
ASR Multi-lingual ASR Multi-lingual ASR
ASR Hybrid ASR-TTS Models Hybrid ASR-TTS Models
ASR ASR Confidence Estimation ASR Confidence Estimation
ASR Confidence-based Ensembles Confidence-based Ensembles
Text-to-Speech (TTS) Tutorials

Domain

Title

GitHub URL

TTS Basic and Advanced: NeMo TTS Primer NeMo TTS Primer
TTS Basic and Advanced: TTS Speech/Text Aligner Inference TTS Speech/Text Aligner Inference
TTS Basic and Advanced: FastPitch and MixerTTS Model Training FastPitch and MixerTTS Model Training
TTS Basic and Advanced: FastPitch Finetuning FastPitch Finetuning
TTS Basic and Advanced: FastPitch and HiFiGAN Model Training for German FastPitch and HiFiGAN Model Training for German
TTS Basic and Advanced: Tacotron2 Model Training Tacotron2 Model Training
TTS Basic and Advanced: FastPitch Duration and Pitch Control FastPitch Duration and Pitch Control
TTS Basic and Advanced: FastPitch Speaker Interpolation FastPitch Speaker Interpolation
TTS Basic and Advanced: TTS Inference and Model Selection TTS Inference and Model Selection
TTS Basic and Advanced: TTS Pronunciation Customization TTS Pronunciation Customization
Tools and Utilities

Domain

Title

GitHub URL

Utility Tools Utility Tools for Speech and Text: NeMo Forced Aligner NeMo Forced Aligner
Utility Tools Utility Tools for Speech and Text: Speech Data Explorer Speech Data Explorer
Utility Tools Utility Tools for Speech and Text: CTC Segmentation CTC Segmentation
Text Processing (TN/ITN) Tutorials

Domain

Title

GitHub URL

Text Processing Text Normalization Techniques: Text Normalization Text Normalization
Text Processing Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger Inverse Text Normalization with Thutmose Tagger
Text Processing Text Normalization Techniques: WFST Tutorial WFST Tutorial
Previous Introduction
Next Mixed Precision Training
© Copyright 2023-2024, NVIDIA. Last updated on May 17, 2024.