Skip to main content
country_code
Ctrl+K
NeMo-Speech - Home NeMo-Speech - Home

NeMo-Speech

  • GitHub
NeMo-Speech - Home NeMo-Speech - Home

NeMo-Speech

  • GitHub

Table of Contents

Getting Started

  • Introduction
  • NeMo Fundamentals
  • Why NeMo?
  • Tutorials

Training

  • Parallelisms
  • Mixed Precision Training

Model Checkpoints

  • Checkpoints

APIs

  • NeMo APIs
    • NeMo Models
    • Neural Modules
    • Experiment Manager
    • Neural Types
    • Adapters
      • Adapter Components
      • Adapters API
    • NeMo Core APIs
    • NeMo Common Collection API
      • Callbacks
      • Losses
      • Metrics
      • Tokenizers
      • Data
      • S3 Checkpointing
    • NeMo ASR API
    • NeMo TTS API
    • NeMo Audio API

Collections

  • NeMo Collections
    • Automatic Speech Recognition (ASR)
      • Models
      • Datasets
      • ASR Language Modeling and Customization
        • NGPU-LM (GPU-based N-gram Language Model) Language Model Fusion
        • Neural Rescoring
        • N-gram Language Model Fusion
        • Scripts for building and merging N-gram Language Models
        • Word Boosting
      • Checkpoints
      • Scores
      • NeMo ASR Configuration Files
      • NeMo ASR API
      • All Checkpoints
      • Canary Chunked and Streaming Decoding
      • Example With MCV
    • Speech Classification
      • Models
      • Datasets
      • Checkpoints
      • NeMo Speech Classification Configuration Files
      • Resource and Documentation Guide
    • Speaker Recognition (SR)
      • Models
      • NeMo Speaker Recognition Configuration Files
      • Datasets
      • Checkpoints
      • NeMo Speaker Recognition API
      • Resource and Documentation Guide
    • Speaker Diarization
      • Models
      • Datasets
      • Checkpoints
      • End-to-End Speaker Diarization Configuration Files
      • NeMo Speaker Diarization API
      • Resource and Documentation Guide
    • Speech Self-Supervised Learning
      • Models
      • Datasets
      • Checkpoints
      • NeMo SSL Configuration Files
      • NeMo SSL collection API
      • Resources and Documentation
    • Speech Intent Classification and Slot Filling
      • Models
      • Datasets
      • Checkpoints
      • NeMo Speech Intent Classification and Slot Filling Configuration Files
      • NeMo Speech Intent Classification and Slot Filling collection API
      • Resources and Documentation
    • Text-to-Speech (TTS)
      • Models
      • Data Preprocessing
      • Checkpoints
      • NeMo TTS Configuration Files
      • Grapheme-to-Phoneme Models
      • Magpie-TTS
      • Magpie-TTS Preference Optimization
      • Magpie-TTS Longform Inference
    • Speech and Audio Processing
      • Models
      • Datasets
      • Checkpoints
      • NeMo Audio Configuration Files
      • NeMo Audio API
    • SpeechLM2
      • Models
      • Datasets
      • Configuration Files
      • Training and Scaling

Speech AI Tools

  • Speech AI Tools
    • NeMo Forced Aligner (NFA)
    • Dataset Creation Tool Based on CTC-Segmentation
    • Speech Data Explorer
    • Comparison tool for ASR Models
    • ASR Evaluator
    • Speech Data Processor
  • Speech AI Tools
  • Speech Data Processor
Is this page helpful?

Speech Data Processor#

Speech Data Processor (SDP) is a toolkit to make it easy to:
  1. write code to process a new dataset, minimizing the amount of boilerplate code required.

  2. share the steps for processing a speech dataset.

SDP is hosted here: NVIDIA/NeMo-speech-data-processor.

To learn more about SDP, please check the [documentation](https://nvidia.github.io/NeMo-speech-data-processor/).

previous

ASR Evaluator

NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.