The best way to get started with NeMo is to start with one of our tutorials.
Most NeMo tutorials can be run on Google’s Colab.
To run a tutorial:
Click the Colab link (see table below).
Connect to an instance with a GPU. For example, click Runtime > Change runtime type and select GPU for the hardware accelerator.
Domain |
Title |
GitHub URL |
---|---|---|
General | Getting Started: Exploring Nemo Fundamentals | NeMo Fundamentals |
General | Getting Started: Sample Conversational AI application | Audio translator example |
General | Getting Started: Voice swap application | Voice swap example |
General | Exploring NeMo Model Construction | NeMo Models |
General | Exploring NeMo Adapters | NeMo Adapters |
General | Publishing NeMo models on Hugging Face Hub | NeMo Models on HF Hub |
ASR | ASR with NeMo | ASR with NeMo |
ASR | ASR with Subword Tokenization | ASR with Subword Tokenization |
ASR | Offline ASR Inference with Beam Search and External Language Model Rescoring | Offline ASR |
ASR | Online ASR inference with Microphone (Cache-Aware Streaming) | Online ASR Microphone Cache Aware Streaming |
ASR | Online ASR inference with Microphone (Buffered Streaming) | Online ASR Microphone Buffered Streaming |
ASR | Fine-tuning CTC Models on New Languages | ASR CTC Language Fine-Tuning |
ASR | Intro to Transducers | Intro to Transducers |
ASR | ASR with Transducers | ASR with Transducers |
ASR | ASR with Adapters | ASR with Adapters |
ASR | Speech Commands | Speech Commands |
ASR | Online and Offline Speech Commands Inference | Online Offline Microphone Speech Commands |
ASR | Voice Activity Detection (VAD) | Voice Activity Detection |
ASR | Online and Offline VAD Inference | Online Offline Microphone VAD |
ASR | Speaker Recognition and Verification | Speaker Recognition and Verification |
ASR | Speaker Diarization Inference | Speaker Diarization Inference |
ASR | ASR with Speaker Diarization | ASR with Speaker Diarization |
ASR | Online Noise Augmentation | Online Noise Augmentation |
ASR | ASR for Telephony Speech | ASR for Telephony Speech |
ASR | Streaming inference for ASR | Streaming inference |
ASR | Buffered Transducer inference for ASR | Buffered Transducer inference |
ASR | Buffered Transducer inference with LCS Merge Algorithm | Buffered Transducer inference with LCS Merge |
ASR | Offline ASR with VAD for CTC models | Offline ASR with VAD for CTC models |
ASR | Self-supervised pre-training for ASR | Self-supervised Pre-training for ASR |
ASR | Multi-lingual ASR | Multi-lingual ASR |
ASR | Hybrid ASR-TTS Models | Hybrid ASR-TTS Models |
ASR | ASR Confidence Estimation | ASR Confidence Estimation |
ASR | Confidence-based Ensembles | Confidence-based Ensembles |
NLP | Using Pretrained Language Models for Downstream Tasks | Pretrained Language Models for Downstream Tasks |
NLP | Exploring NeMo NLP Tokenizers | NLP Tokenizers |
NLP | Text Classification (Sentiment Analysis) with BERT | Text Classification (Sentiment Analysis) |
NLP | Question Answering | Question Answering |
NLP | Token Classification (Named Entity Recognition) | Token Classification: Named Entity Recognition |
NLP | Joint Intent Classification and Slot Filling | Joint Intent and Slot Classification |
NLP | GLUE Benchmark | GLUE Benchmark |
NLP | Punctuation and Capitalization | Punctuation and Capitalization |
NLP | Spellchecking ASR Customization - SpellMapper | Spellchecking ASR Customization - SpellMapper |
NLP | Entity Linking | Entity Linking |
NLP | Named Entity Recognition - BioMegatron | Named Entity Recognition - BioMegatron |
NLP | Relation Extraction - BioMegatron | Relation Extraction - BioMegatron |
NLP | P-Tuning/Prompt-Tuning | P-Tuning/Prompt-Tuning |
NLP | Synthetic Tabular Data Generation | Synthetic Tabular Data Generation |
Multimodal | Multimodal Data Preparation | Multimodal Data Preparation |
Multimodal | NeVA (LLaVA) Tutorial | NeVA (LLaVA) Tutorial |
Multimodal | Stable Diffusion Tutorial | Stable Diffusion Tutorial |
Multimodal | DreamBooth Tutorial | DreamBooth Tutorial |
TTS | NeMo TTS Primer | NeMo TTS Primer |
TTS | TTS Speech/Text Aligner Inference | TTS Speech/Text Aligner Inference |
TTS | FastPitch and MixerTTS Model Training | FastPitch and MixerTTS Model Training |
TTS | FastPitch Finetuning | FastPitch Finetuning |
TTS | FastPitch and HiFiGAN Model Training for German | FastPitch and HiFiGAN Model Training for German |
TTS | Tacotron2 Model Training | Tacotron2 Model Training |
TTS | FastPitch Duration and Pitch Control | FastPitch Duration and Pitch Control |
TTS | FastPitch Speaker Interpolation | FastPitch Speaker Interpolation |
TTS | Inference and Model Selection | TTS Inference and Model Selection |
TTS | Pronunciation_customization | TTS Pronunciation_customization |
Tools | NeMo Forced Aligner | NeMo Forced Aligner |
Tools | Speech Data Explorer | Speech Data Explorer |
Tools | CTC Segmentation | CTC Segmentation |
Text Processing (TN/ITN) | Text Normalization and Inverse Normalization for ASR and TTS | Text Normalization |
Text Processing (TN/ITN) | Inverse Text Normalization for ASR - Thutmose Tagger | Inverse Text Normalization with Thutmose Tagger |
Text Processing (TN/ITN) | Constructing Normalization Grammars with WFSTs | WFST Tutorial |