Tutorials#

The best way to get started with NeMo is to start with one of our tutorials.

Most NeMo tutorials can be run on Google’s Colab.

To run a tutorial:

  1. Click the Colab link (see table below).

  2. Connect to an instance with a GPU. For example, click Runtime > Change runtime type and select GPU for the hardware accelerator.

Tutorials#

Domain

Title

GitHub URL

General

Getting Started: Exploring Nemo Fundamentals

NeMo Fundamentals

General

Getting Started: Sample Conversational AI application

Audio translator example

General

Getting Started: Voice swap application

Voice swap example

General

Exploring NeMo Model Construction

NeMo Models

General

Exploring NeMo Adapters

NeMo Adapters

General

Publishing NeMo models on Hugging Face Hub

NeMo Models on HF Hub

ASR

ASR with NeMo

ASR with NeMo

ASR

ASR with Subword Tokenization

ASR with Subword Tokenization

ASR

Offline ASR Inference with Beam Search and External Language Model Rescoring

Offline ASR

ASR

Online ASR inference with Microphone (Cache-Aware Streaming)

Online ASR Microphone Cache Aware Streaming

ASR

Online ASR inference with Microphone (Buffered Streaming)

Online ASR Microphone Buffered Streaming

ASR

Fine-tuning CTC Models on New Languages

ASR CTC Language Fine-Tuning

ASR

Intro to Transducers

Intro to Transducers

ASR

ASR with Transducers

ASR with Transducers

ASR

ASR with Adapters

ASR with Adapters

ASR

Speech Commands

Speech Commands

ASR

Online and Offline Speech Commands Inference

Online Offline Microphone Speech Commands

ASR

Voice Activity Detection (VAD)

Voice Activity Detection

ASR

Online and Offline VAD Inference

Online Offline Microphone VAD

ASR

Speaker Recognition and Verification

Speaker Recognition and Verification

ASR

Speaker Diarization Inference

Speaker Diarization Inference

ASR

ASR with Speaker Diarization

ASR with Speaker Diarization

ASR

Online Noise Augmentation

Online Noise Augmentation

ASR

ASR for Telephony Speech

ASR for Telephony Speech

ASR

Streaming inference for ASR

Streaming inference

ASR

Buffered Transducer inference for ASR

Buffered Transducer inference

ASR

Buffered Transducer inference with LCS Merge Algorithm

Buffered Transducer inference with LCS Merge

ASR

Offline ASR with VAD for CTC models

Offline ASR with VAD for CTC models

ASR

Self-supervised pre-training for ASR

Self-supervised Pre-training for ASR

ASR

Multi-lingual ASR

Multi-lingual ASR

ASR

Hybrid ASR-TTS Models

Hybrid ASR-TTS Models

ASR

ASR Confidence Estimation

ASR Confidence Estimation

ASR

Confidence-based Ensembles

Confidence-based Ensembles

NLP

Using Pretrained Language Models for Downstream Tasks

Pretrained Language Models for Downstream Tasks

NLP

Exploring NeMo NLP Tokenizers

NLP Tokenizers

NLP

Text Classification (Sentiment Analysis) with BERT

Text Classification (Sentiment Analysis)

NLP

Question Answering

Question Answering

NLP

Token Classification (Named Entity Recognition)

Token Classification: Named Entity Recognition

NLP

Joint Intent Classification and Slot Filling

Joint Intent and Slot Classification

NLP

GLUE Benchmark

GLUE Benchmark

NLP

Punctuation and Capitalization

Punctuation and Capitalization

NLP

Spellchecking ASR Customization - SpellMapper

Spellchecking ASR Customization - SpellMapper

NLP

Entity Linking

Entity Linking

NLP

Named Entity Recognition - BioMegatron

Named Entity Recognition - BioMegatron

NLP

Relation Extraction - BioMegatron

Relation Extraction - BioMegatron

NLP

P-Tuning/Prompt-Tuning

P-Tuning/Prompt-Tuning

NLP

Synthetic Tabular Data Generation

Synthetic Tabular Data Generation

TTS

NeMo TTS Primer

NeMo TTS Primer

TTS

TTS Speech/Text Aligner Inference

TTS Speech/Text Aligner Inference

TTS

FastPitch and MixerTTS Model Training

FastPitch and MixerTTS Model Training

TTS

FastPitch Finetuning

FastPitch Finetuning

TTS

FastPitch and HiFiGAN Model Training for German

FastPitch and HiFiGAN Model Training for German

TTS

Tacotron2 Model Training

Tacotron2 Model Training

TTS

FastPitch Duration and Pitch Control

FastPitch Duration and Pitch Control

TTS

FastPitch Speaker Interpolation

FastPitch Speaker Interpolation

TTS

Inference and Model Selection

TTS Inference and Model Selection

TTS

Pronunciation_customization

TTS Pronunciation_customization

Tools

NeMo Forced Aligner

NeMo Forced Aligner

Tools

Speech Data Explorer

Speech Data Explorer

Tools

CTC Segmentation

CTC Segmentation

Text Processing (TN/ITN)

Text Normalization and Inverse Normalization for ASR and TTS

Text Normalization

Text Processing (TN/ITN)

Inverse Text Normalization for ASR - Thutmose Tagger

Inverse Text Normalization with Thutmose Tagger

Text Processing (TN/ITN)

Constructing Normalization Grammars with WFSTs

WFST Tutorial