For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Learn the basics of audio processing with NeMo Curator using the FLEURS multilingual speech dataset. This tutorial walks you through a complete audio processing pipeline from data loading to quality assessment and filtering.
Overview
This tutorial demonstrates the core audio curation workflow:
Load Dataset: Download and prepare the FLEURS dataset
ASR Inference: Transcribe audio using NeMo ASR models
Quality Assessment: Calculate Word Error Rate (WER)
Duration Analysis: Extract audio file durations
Filtering: Keep only high-quality samples
Export: Save processed results
What you’ll learn:
How to build an end-to-end audio curation pipeline
Loading multilingual speech datasets (FLEURS)
Running ASR inference with NeMo models
Calculating quality metrics (WER, duration)
Filtering audio by quality thresholds
Exporting curated results in JSONL format
Time to complete: Approximately 15-30 minutes (depending on dataset size and GPU availability)
Working Example Location
The complete working code for this tutorial is located at:
<nemo_curator_repository>/tutorials/audio/fleurs/
├── README.md # Tutorial documentation
├── pipeline.py # Main tutorial script
├── pipeline.yaml # Configuration file for run.py
└── run.py # Same as pipeline.py, but defines pipeline using YAML file instead
NVIDIA GPU (required for ASR inference, minimum 16GB VRAM recommended)
Internet connection for dataset download
Basic Python knowledge
CUDA-compatible PyTorch installation
Sufficient disk space (FLEURS dataset requires ~10-50GB depending on language and split)
If you don’t have a GPU available, you can skip the ASR inference stage and work with pre-existing transcriptions. See the Custom Manifests guide for details.
Step-by-Step Walkthrough
Step 1: Import Required Modules
Import all necessary stages and components for the audio curation pipeline:
1
from nemo_curator.pipeline import Pipeline
2
from nemo_curator.backends.xenna import XennaExecutor
3
from nemo_curator.stages.audio.datasets.fleurs.create_initial_manifest import CreateInitialManifestFleursStage
4
from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
5
from nemo_curator.stages.audio.metrics.get_wer import GetPairwiseWerStage
6
from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
7
from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
8
from nemo_curator.stages.text.io.writer import JsonlWriter
9
from nemo_curator.stages.resources import Resources
Key components:
Pipeline: Container for organizing and executing processing stages
XennaExecutor: Backend executor for running the pipeline
CreateInitialManifestFleursStage: Downloads and prepares FLEURS dataset
InferenceAsrNemoStage: Runs ASR inference with NeMo models
GetPairwiseWerStage: Calculates Word Error Rate
PreserveByValueStage: Filters data based on threshold values
JsonlWriter: Exports results in JSONL format
Step 2: Create the Pipeline
Build the audio curation pipeline by adding stages in sequence:
1
def create_audio_pipeline(args):
2
"""Create audio curation pipeline."""
3
4
pipeline = Pipeline(name="audio_inference", description="Process FLEURS dataset with ASR")
5
6
# Stage 1: Load FLEURS dataset
7
pipeline.add_stage(
8
CreateInitialManifestFleursStage(
9
lang=args.lang, # e.g., "hy_am" for Armenian
10
split=args.split, # "dev", "train", or "test"
11
raw_data_dir=args.raw_data_dir
12
).with_(batch_size=4) # Process 4 samples per batch