The best way to get started with NeMo is to start with one of our tutorials. These tutorials cover various domains and provide both introductory and advanced topics. They are designed to help you understand and use the NeMo toolkit effectively.
Most NeMo tutorials can be run on Google’s Colab.
To run a tutorial:
Click the Colab link associated with the tutorial you are interested in from the table below.
Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.
Domain |
Title |
GitHub URL |
---|---|---|
General | Getting Started: NeMo Fundamentals | NeMo Fundamentals |
General | Getting Started: Audio translator example | Audio translator example |
General | Getting Started: Voice swap example | Voice swap example |
General | Getting Started: NeMo Models | NeMo Models |
General | Getting Started: NeMo Adapters | NeMo Adapters |
General | Getting Started: NeMo Models on Hugging Face Hub | NeMo Models on HF Hub |
Domain |
Title |
GitHub URL |
---|---|---|
Multimodal | Preparations and Advanced Applications: Multimodal Data Preparation | Multimodal Data Preparation |
Multimodal | Preparations and Advanced Applications: NeVA (LLaVA) Tutorial | NeVA (LLaVA) Tutorial |
Multimodal | Preparations and Advanced Applications: Stable Diffusion Tutorial | Stable Diffusion Tutorial |
Multimodal | Preparations and Advanced Applications: DreamBooth Tutorial | DreamBooth Tutorial |
Multimodal | Preparations and Advanced Applications: Stable Diffusion XL Quantization Tutorial | DreamBooth Tutorial |
Domain |
Title |
GitHub URL |
---|---|---|
ASR | ASR with NeMo | ASR with NeMo |
ASR | ASR with Subword Tokenization | ASR with Subword Tokenization |
ASR | Offline ASR | Offline ASR |
ASR | Online ASR Microphone Cache Aware Streaming | Online ASR Microphone Cache Aware Streaming |
ASR | Online ASR Microphone Buffered Streaming | Online ASR Microphone Buffered Streaming |
ASR | ASR CTC Language Fine-Tuning | ASR CTC Language Fine-Tuning |
ASR | Intro to Transducers | Intro to Transducers |
ASR | ASR with Transducers | ASR with Transducers |
ASR | ASR with Adapters | ASR with Adapters |
ASR | Speech Commands | Speech Commands |
ASR | Online Offline Microphone Speech Commands | Online Offline Microphone Speech Commands |
ASR | Voice Activity Detection | Voice Activity Detection |
ASR | Online Offline Microphone VAD | Online Offline Microphone VAD |
ASR | Speaker Recognition and Verification | Speaker Recognition and Verification |
ASR | Speaker Diarization Inference | Speaker Diarization Inference |
ASR | ASR with Speaker Diarization | ASR with Speaker Diarization |
ASR | Online Noise Augmentation | Online Noise Augmentation |
ASR | ASR for Telephony Speech | ASR for Telephony Speech |
ASR | Streaming inference | Streaming inference |
ASR | Buffered Transducer inference | Buffered Transducer inference |
ASR | Buffered Transducer inference with LCS Merge | Buffered Transducer inference with LCS Merge |
ASR | Offline ASR with VAD for CTC models | Offline ASR with VAD for CTC models |
ASR | Self-supervised Pre-training for ASR | Self-supervised Pre-training for ASR |
ASR | Multi-lingual ASR | Multi-lingual ASR |
ASR | Hybrid ASR-TTS Models | Hybrid ASR-TTS Models |
ASR | ASR Confidence Estimation | ASR Confidence Estimation |
ASR | Confidence-based Ensembles | Confidence-based Ensembles |
Domain |
Title |
GitHub URL |
---|---|---|
TTS | Basic and Advanced: NeMo TTS Primer | NeMo TTS Primer |
TTS | Basic and Advanced: TTS Speech/Text Aligner Inference | TTS Speech/Text Aligner Inference |
TTS | Basic and Advanced: FastPitch and MixerTTS Model Training | FastPitch and MixerTTS Model Training |
TTS | Basic and Advanced: FastPitch Finetuning | FastPitch Finetuning |
TTS | Basic and Advanced: FastPitch and HiFiGAN Model Training for German | FastPitch and HiFiGAN Model Training for German |
TTS | Basic and Advanced: Tacotron2 Model Training | Tacotron2 Model Training |
TTS | Basic and Advanced: FastPitch Duration and Pitch Control | FastPitch Duration and Pitch Control |
TTS | Basic and Advanced: FastPitch Speaker Interpolation | FastPitch Speaker Interpolation |
TTS | Basic and Advanced: TTS Inference and Model Selection | TTS Inference and Model Selection |
TTS | Basic and Advanced: TTS Pronunciation Customization | TTS Pronunciation Customization |
Domain |
Title |
GitHub URL |
---|---|---|
Utility Tools | Utility Tools for Speech and Text: NeMo Forced Aligner | NeMo Forced Aligner |
Utility Tools | Utility Tools for Speech and Text: Speech Data Explorer | Speech Data Explorer |
Utility Tools | Utility Tools for Speech and Text: CTC Segmentation | CTC Segmentation |
Domain |
Title |
GitHub URL |
---|---|---|
Text Processing | Text Normalization Techniques: Text Normalization | Text Normalization |
Text Processing | Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger | Inverse Text Normalization with Thutmose Tagger |
Text Processing | Text Normalization Techniques: WFST Tutorial | WFST Tutorial |