Tutorials#
The best way to get started with NeMo is to start with one of our tutorials.
Most NeMo tutorials can be run on Google’s Colab.
To run a tutorial:
Click the Colab link (see table below).
Connect to an instance with a GPU. For example, click Runtime > Change runtime type and select GPU for the hardware accelerator.
Domain |
Title |
GitHub URL |
---|---|---|
General |
Getting Started: Exploring Nemo Fundamentals |
|
General |
Getting Started: Sample Conversational AI application |
|
General |
Getting Started: Voice swap application |
|
General |
Exploring NeMo Model Construction |
|
General |
Exploring NeMo Adapters |
|
General |
Publishing NeMo models on Hugging Face Hub |
|
ASR |
ASR with NeMo |
|
ASR |
ASR with Subword Tokenization |
|
ASR |
Offline ASR Inference with Beam Search and External Language Model Rescoring |
|
ASR |
Online ASR inference with Microphone |
|
ASR |
Fine-tuning CTC Models on New Languages |
|
ASR |
Intro to Transducers |
|
ASR |
ASR with Transducers |
|
ASR |
ASR with Adapters |
|
ASR |
Speech Commands |
|
ASR |
Online and Offline Speech Commands Inference |
|
ASR |
Voice Activity Detection (VAD) |
|
ASR |
Online and Offline VAD Inference |
|
ASR |
Speaker Recognition and Verification |
|
ASR |
Speaker Diarization Inference |
|
ASR |
ASR with Speaker Diarization |
|
ASR |
Online Noise Augmentation |
|
ASR |
ASR for Telephony Speech |
|
ASR |
Streaming inference for ASR |
|
ASR |
Buffered Transducer inference for ASR |
|
ASR |
Buffered Transducer inference with LCS Merge Algorithm |
|
ASR |
Offline ASR with VAD for CTC models |
|
ASR |
Self-supervised pre-training for ASR |
|
ASR |
Multi-lingual ASR |
|
ASR |
Hybrid ASR-TTS Models |
|
ASR |
ASR Confidence Estimation |
|
ASR |
Confidence-based Ensembles |
|
NLP |
Using Pretrained Language Models for Downstream Tasks |
|
NLP |
Exploring NeMo NLP Tokenizers |
|
NLP |
Text Classification (Sentiment Analysis) with BERT |
|
NLP |
Question Answering |
|
NLP |
Token Classification (Named Entity Recognition) |
|
NLP |
Joint Intent Classification and Slot Filling |
|
NLP |
GLUE Benchmark |
|
NLP |
Punctuation and Capitalization |
|
NLP |
Spellchecking ASR Customization - SpellMapper |
|
NLP |
Entity Linking |
|
NLP |
Named Entity Recognition - BioMegatron |
|
NLP |
Relation Extraction - BioMegatron |
|
NLP |
P-Tuning/Prompt-Tuning |
|
NLP |
Synthetic Tabular Data Generation |
|
TTS |
NeMo TTS Primer |
|
TTS |
TTS Speech/Text Aligner Inference |
|
TTS |
FastPitch and MixerTTS Model Training |
|
TTS |
FastPitch Finetuning |
|
TTS |
FastPitch and HiFiGAN Model Training for German |
|
TTS |
Tacotron2 Model Training |
|
TTS |
FastPitch Duration and Pitch Control |
|
TTS |
FastPitch Speaker Interpolation |
|
TTS |
Inference and Model Selection |
|
TTS |
Pronunciation_customization |
|
Tools |
NeMo Forced Aligner |
|
Tools |
Speech Data Explorer |
|
Tools |
CTC Segmentation |
|
Text Processing (TN/ITN) |
Text Normalization and Inverse Normalization for ASR and TTS |
|
Text Processing (TN/ITN) |
Inverse Text Normalization for ASR - Thutmose Tagger |
|
Text Processing (TN/ITN) |
Constructing Normalization Grammars with WFSTs |