Logo
1.0.0rc1

Getting Started

  • Introduction
    • Requirements
    • Quick Start
    • Installation
      • Pip
      • Pip from source
      • From source
      • Docker containers
    • FAQ
    • Contributing
    • License
  • Tutorials

NeMo Core

  • NeMo Models
    • Basics
    • Pretrained
    • Training
      • PyTorch Lightning LightningModule
      • PyTorch Lightning Trainer
    • Configuration
      • YAML
      • CLI
      • Dataclasses
    • Optimization
      • Optimizers
      • Optimizer Params
      • Register Optimizer
      • Learning Rate Schedulers
      • Scheduler Params
      • Register scheduler
    • Save and Restore
      • Save
      • Restore
  • Experiment Manager
  • Neural Modules
  • Neural Types
    • Motivation
    • NeuralType class
    • Type Comparison Results
    • Examples
      • Long vs short notation
      • Transpose same
      • VoidType for elements
      • Element type inheritance
      • Custom element types
      • Enforcing dimensions
      • Generic Axis kind
      • Container types
  • Core APIs
    • Base class for all NeMo models
    • Base Neural Module class
    • Neural Type classes
    • Experiment manager

Automatic Speech Recognition

  • Automatic Speech Recognition (ASR)
    • Models
      • Jasper
      • QuartzNet
      • Citrinet
      • Conformer-CTC
      • References
    • Datasets
      • LibriSpeech
      • Fisher English Training Speech
      • 2000 HUB5 English Evaluation Speech
      • AN4 Dataset
      • Aishell1
      • Aishell2
      • Preparing Custom ASR Data
      • Tarred Datasets
        • Conversion to Tarred Datasets
    • Checkpoints
      • Loading Local Checkpoints
      • NGC Pretrained Checkpoints
        • Transcribing/Inference
        • Automatic Speech Recognition Models
      • Speech Recognition (Languages)
        • English
        • Mandarin
        • German
        • Polish
        • Italian
        • Russian
        • Spanish
        • Catalan
    • NeMo ASR Configuration Files
      • Dataset Configuration
      • Preprocessor Configuration
      • Augmentation Configurations
      • Tokenizer Configurations
      • Model Architecture Configurations
        • Jasper and QuartzNet
        • Citrinet
        • Conformer-CTC
    • NeMo ASR collection API
      • Model Classes
      • Modules
      • Parts
      • Mixins
      • Datasets
        • Character Encoding Datasets
        • Subword Encoding Datasets
      • Audio Preprocessors
      • Audio Augmentors
      • Miscellaneous Classes
        • RNNT Decoding
        • Hypotheses
    • Resource and Documentation Guide
  • Speech Classification
    • Models
      • MatchboxNet (Speech Commands)
      • MarbleNet (VAD)
      • References
    • Datasets
      • Freesound
      • Google Speech Commands Dataset
      • Speech Command & Freesound for VAD
      • Preparing Custom Speech Classification Data
      • Tarred Datasets
    • Checkpoints
      • Loading Local Checkpoints
      • Transcribing/Inference
      • NGC Pretrained Checkpoints
        • Speech Classification Models
    • NeMo Speech Classification Configuration Files
      • Dataset Configuration
      • Preprocessor Configuration
      • Augmentation Configurations
      • Model Architecture Configurations
      • Decoder Configurations
    • Resource and Documentation Guide
  • Speaker Recognition (SR)
    • Models
      • SpeakerNet
      • References
    • NeMo ASR Configuration Files
      • Dataset Configuration
      • Preprocessor Configuration
      • Augmentation Configurations
      • Model Architecture Configurations
      • Decoder Configurations
    • Datasets
      • HI-MIA
      • All-other Datasets
      • Tarred Datasets
    • Checkpoints
      • Loading Local Checkpoints
      • Inference
      • NGC Pretrained Checkpoints
        • Speaker Recognition Models
    • Resource and Documentation Guide
  • Speaker Diarization
    • Models
    • Datasets
      • Preparing Evaluation Dataset
      • Prepraing ORACLE manifest
    • Checkpoints
      • Loading Local Checkpoints
      • Inference
      • NGC Pretrained Checkpoints
        • Models for Speaker Diarization
    • NeMo Speaker Diarization Configuration Files
      • Dataset Configuration
      • Diarizer Architecture Configurations
    • NeMo Speaker Diarization API
      • Model Classes
      • Mixins
    • Resource and Documentation Guide

Natural Language Processing

  • Natural Language Processing (NLP)
    • Models
      • Punctuation and Capitalization Model
        • Quick Start
        • Model Description
        • Raw Data Format
        • NeMo Data Format
        • Converting Raw Data to NeMo Format
        • Training Punctuation and Capitalization Model
        • Inference
        • Model Evaluation
        • References
      • Token Classification (Named Entity Recognition) Model
        • Quick Start
        • Data Input for Token Classification Model
        • Dataset Conversion
        • Training Token Classification Model
        • Inference
        • Model Evaluation
        • References
      • Joint Intent and Slot Classification
        • NeMo Data Format
        • Dataset Conversion
        • Model Training
        • Model Evaluation and Inference
        • References
      • Text Classification Model
        • Data Format
        • Dataset Conversion
        • Model Training
        • Model Evaluation and Inference
        • References
      • BERT
        • Quick Start
        • Data Input for BERT Model
        • Training BERT Model
        • References
      • Language Modeling
      • Question Answering Model
        • Quick Start
        • Data Format
        • Dataset Download
        • Model Training
        • Inference
        • Model Evaluation
        • References
      • Dialogue State Tracking - SGD-QA Model
      • GLUE Benchmark
      • Information Retrieval
      • Model NLP
      • Machine Translation Models
        • Quick Start
        • Data Format
        • Data Cleaning, Normalization & Tokenization
        • Training a BPE Tokenization
        • Applying BPE Tokenization, batching, bucketing and padding
        • Tarred Datasets for Large Corpora
        • Model Configuration and Training
        • Model Inference
        • References
    • Megatron-LM for Downstream Tasks
      • Fine-tuning
      • BioMegatron
      • References
    • NeMo NLP collection API
      • Model Classes
      • Modules

Text To Speech

  • Speech Synthesis (TTS)
    • Available Models
    • Base Classes
    • Training

Common

  • Common Collection
    • Tokenizers
    • Losses
    • Metrics

Tools

  • NeMo Tools
    • Dataset Creation Tool Based on CTC-Segmentation
      • References
    • Speech Data Explorer
    • Text Normalization
      • Prediction
      • Evaluation
      • References
NVIDIA NeMo
  • »
  • NVIDIA NeMo User Guide
  • View page source

NVIDIA NeMo User GuideΒΆ

Getting Started

  • Introduction
    • Requirements
    • Quick Start
    • Installation
    • FAQ
    • Contributing
    • License
  • Tutorials

NeMo Core

  • NeMo Models
    • Basics
    • Pretrained
    • Training
    • Configuration
    • Optimization
    • Save and Restore
  • Experiment Manager
  • Neural Modules
  • Neural Types
    • Motivation
    • NeuralType class
    • Type Comparison Results
    • Examples
  • Core APIs
    • Base class for all NeMo models
    • Base Neural Module class
    • Neural Type classes
    • Experiment manager

Automatic Speech Recognition

  • Automatic Speech Recognition (ASR)
    • Models
    • Datasets
    • Checkpoints
    • NeMo ASR Configuration Files
    • NeMo ASR collection API
    • Resource and Documentation Guide
  • Speech Classification
    • Models
    • Datasets
    • Checkpoints
    • NeMo Speech Classification Configuration Files
    • Resource and Documentation Guide
  • Speaker Recognition (SR)
    • Models
    • NeMo ASR Configuration Files
    • Datasets
    • Checkpoints
    • Resource and Documentation Guide
  • Speaker Diarization
    • Models
    • Datasets
    • Checkpoints
    • NeMo Speaker Diarization Configuration Files
    • NeMo Speaker Diarization API
    • Resource and Documentation Guide

Natural Language Processing

  • Natural Language Processing (NLP)
    • Models
    • Megatron-LM for Downstream Tasks
    • NeMo NLP collection API

Text To Speech

  • Speech Synthesis (TTS)
    • Available Models
    • Base Classes
    • Training

Common

  • Common Collection
    • Tokenizers
    • Losses
    • Metrics

Tools

  • NeMo Tools
    • Dataset Creation Tool Based on CTC-Segmentation
    • Speech Data Explorer
    • Text Normalization
Next

© Copyright 2021-, NVIDIA CORPORATION. Last updated on Apr 26, 2021.

Built with Sphinx using a theme provided by Read the Docs.