> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

> Process audio data using ASR inference, quality assessment, audio analysis, and text integration for high-quality speech datasets

# Process Data for Audio Curation

Process audio data you've loaded into `AudioTask` objects using NeMo Curator's comprehensive audio processing capabilities.

NeMo Curator provides a specialized suite of tools for processing speech and audio data as part of the AI training pipeline. These tools help you transcribe, analyze, filter, and integrate audio datasets to ensure high-quality input for ASR model training and multimodal applications.

## How it Works

NeMo Curator's audio processing capabilities are organized into five main categories:

1. **ASR Inference**: Transcribe audio using NeMo Framework's pretrained ASR models
2. **Quality Assessment**: Calculate and filter based on transcription accuracy metrics
3. **Quality Filtering**: Segment, filter, and diarize raw audio into clean single-speaker training segments
4. **Audio Analysis**: Extract audio characteristics like duration and validate formats
5. **Text Integration**: Convert processed audio data to text processing workflows

Each category provides GPU-accelerated implementations optimized for different speech curation needs. The result is a cleaned and filtered audio dataset with high-quality transcriptions ready for model training.

***

## ASR Inference

Transcribe audio files using NeMo Framework's state-of-the-art ASR models with GPU acceleration.

<Cards>
  <Card title="NeMo ASR Models" href="/curate-audio/process-data/asr-inference/nemo-models">
    Use pretrained NeMo ASR models for accurate speech recognition
    pretrained
    multilingual
    gpu-accelerated
  </Card>

  <Card title="Batch Processing" href="/curate-audio/process-data/asr-inference">
    Efficiently process large audio datasets with configurable batch sizes
    batch-inference
    memory-optimization
    scalable
  </Card>
</Cards>

## Quality Assessment

Evaluate and filter audio quality using transcription accuracy and audio characteristics.

<Cards>
  <Card title="WER Filtering" href="/curate-audio/process-data/quality-assessment/wer-filtering">
    Filter audio samples based on Word Error Rate thresholds
    accuracy
    quality-metrics
    filtering
  </Card>

  <Card title="Duration Filtering" href="/curate-audio/process-data/quality-assessment/duration-filtering">
    Filter audio samples by duration ranges and speech rate metrics
    duration
    speech-rate
    range-filtering
  </Card>
</Cards>

## Quality Filtering

Compose VAD, band, UTMOS, SIGMOS, and speaker-separation stages to extract clean single-speaker training segments from raw audio.

<Cards>
  <Card title="Quality Filtering Overview" href="/curate-audio/process-data/quality-filtering">
    End-to-end pipeline of preprocessing, segmentation, and filtering stages
    vad
    mos-scoring
    diarization
  </Card>

  <Card title="AudioDataFilterStage Composite" href="/curate-audio/process-data/quality-filtering/audio-data-filter-stage">
    Single composite stage that decomposes into the full filtering pipeline from a YAML config
    composite
    yaml-config
    end-to-end
  </Card>
</Cards>

## Audio Analysis

Extract and analyze audio file characteristics for quality control and metadata generation.

<Cards>
  <Card title="Duration Calculation" href="/curate-audio/process-data/audio-analysis/duration-calculation">
    Calculate precise audio duration using soundfile library
    soundfile
    precision
    metadata
  </Card>

  <Card title="Format Validation" href="/curate-audio/process-data/audio-analysis/format-validation">
    Validate audio file formats and detect corrupted files
    validation
    error-handling
    format-support
  </Card>
</Cards>

## ALM Data Curation

Curate training data for audio language models by extracting fixed-duration windows from diarized audio segments.

<Cards>
  <Card title="ALM Data Builder" href="/curate-audio/process-data/alm/data-builder">
    Construct candidate training windows from consecutive segments with quality filtering
    windowing
    speaker-count
    bandwidth
  </Card>

  <Card title="ALM Overlap Filtering" href="/curate-audio/process-data/alm/overlap-filtering">
    Remove redundant overlapping windows based on configurable thresholds
    deduplication
    overlap-ratio
    target-duration
  </Card>
</Cards>

## Text Integration

Convert processed audio data to text processing workflows for multimodal applications.

<Cards>
  <Card title="Audio-to-Text Conversion" href="/curate-audio/process-data/text-integration">
    Convert AudioTask objects to DocumentBatch for text processing
    format-conversion
    pipeline-integration
    multimodal
  </Card>
</Cards>