Tutorials

The best way to get started with NeMo is to start with one of our tutorials.

Most NeMo tutorials can be run on Google’s Colab.

To run a tutorial:

  1. Click the Colab link (see table below).

  2. Connect to an instance with a GPU. For example, click Runtime > Change runtime type and select GPU for the hardware accelerator.

Tutorials

Domain

Title

GitHub URL

General Getting Started: Exploring Nemo Fundamentals NeMo Fundamentals
General Getting Started: Sample Conversational AI application Audio translator example
General Getting Started: Voice swap application Voice swap example
General Exploring NeMo Model Construction NeMo Models
General Exploring NeMo Adapters NeMo Adapters
General Publishing NeMo models on Hugging Face Hub NeMo Models on HF Hub
ASR ASR with NeMo ASR with NeMo
ASR ASR with Subword Tokenization ASR with Subword Tokenization
ASR Offline ASR Inference with Beam Search and External Language Model Rescoring Offline ASR
ASR Online ASR inference with Microphone (Cache-Aware Streaming) Online ASR Microphone Cache Aware Streaming
ASR Online ASR inference with Microphone (Buffered Streaming) Online ASR Microphone Buffered Streaming
ASR Fine-tuning CTC Models on New Languages ASR CTC Language Fine-Tuning
ASR Intro to Transducers Intro to Transducers
ASR ASR with Transducers ASR with Transducers
ASR ASR with Adapters ASR with Adapters
ASR Speech Commands Speech Commands
ASR Online and Offline Speech Commands Inference Online Offline Microphone Speech Commands
ASR Voice Activity Detection (VAD) Voice Activity Detection
ASR Online and Offline VAD Inference Online Offline Microphone VAD
ASR Speaker Recognition and Verification Speaker Recognition and Verification
ASR Speaker Diarization Inference Speaker Diarization Inference
ASR ASR with Speaker Diarization ASR with Speaker Diarization
ASR Online Noise Augmentation Online Noise Augmentation
ASR ASR for Telephony Speech ASR for Telephony Speech
ASR Streaming inference for ASR Streaming inference
ASR Buffered Transducer inference for ASR Buffered Transducer inference
ASR Buffered Transducer inference with LCS Merge Algorithm Buffered Transducer inference with LCS Merge
ASR Offline ASR with VAD for CTC models Offline ASR with VAD for CTC models
ASR Self-supervised pre-training for ASR Self-supervised Pre-training for ASR
ASR Multi-lingual ASR Multi-lingual ASR
ASR Hybrid ASR-TTS Models Hybrid ASR-TTS Models
ASR ASR Confidence Estimation ASR Confidence Estimation
ASR Confidence-based Ensembles Confidence-based Ensembles
NLP Using Pretrained Language Models for Downstream Tasks Pretrained Language Models for Downstream Tasks
NLP Exploring NeMo NLP Tokenizers NLP Tokenizers
NLP Text Classification (Sentiment Analysis) with BERT Text Classification (Sentiment Analysis)
NLP Question Answering Question Answering
NLP Token Classification (Named Entity Recognition) Token Classification: Named Entity Recognition
NLP Joint Intent Classification and Slot Filling Joint Intent and Slot Classification
NLP GLUE Benchmark GLUE Benchmark
NLP Punctuation and Capitalization Punctuation and Capitalization
NLP Spellchecking ASR Customization - SpellMapper Spellchecking ASR Customization - SpellMapper
NLP Entity Linking Entity Linking
NLP Named Entity Recognition - BioMegatron Named Entity Recognition - BioMegatron
NLP Relation Extraction - BioMegatron Relation Extraction - BioMegatron
NLP P-Tuning/Prompt-Tuning P-Tuning/Prompt-Tuning
NLP Synthetic Tabular Data Generation Synthetic Tabular Data Generation
Multimodal Multimodal Data Preparation Multimodal Data Preparation
Multimodal NeVA (LLaVA) Tutorial NeVA (LLaVA) Tutorial
Multimodal Stable Diffusion Tutorial Stable Diffusion Tutorial
Multimodal DreamBooth Tutorial DreamBooth Tutorial
TTS NeMo TTS Primer NeMo TTS Primer
TTS TTS Speech/Text Aligner Inference TTS Speech/Text Aligner Inference
TTS FastPitch and MixerTTS Model Training FastPitch and MixerTTS Model Training
TTS FastPitch Finetuning FastPitch Finetuning
TTS FastPitch and HiFiGAN Model Training for German FastPitch and HiFiGAN Model Training for German
TTS Tacotron2 Model Training Tacotron2 Model Training
TTS FastPitch Duration and Pitch Control FastPitch Duration and Pitch Control
TTS FastPitch Speaker Interpolation FastPitch Speaker Interpolation
TTS Inference and Model Selection TTS Inference and Model Selection
TTS Pronunciation_customization TTS Pronunciation_customization
Tools NeMo Forced Aligner NeMo Forced Aligner
Tools Speech Data Explorer Speech Data Explorer
Tools CTC Segmentation CTC Segmentation
Text Processing (TN/ITN) Text Normalization and Inverse Normalization for ASR and TTS Text Normalization
Text Processing (TN/ITN) Inverse Text Normalization for ASR - Thutmose Tagger Inverse Text Normalization with Thutmose Tagger
Text Processing (TN/ITN) Constructing Normalization Grammars with WFSTs WFST Tutorial
Previous Introduction
Next Best Practices
© Copyright 2023-2024, NVIDIA. Last updated on Apr 12, 2024.