For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • API Reference
    • Overview
        • Nemo Curator
          • Backends
          • Config
          • Core
          • Metrics
          • Models
          • Package Info
          • Pipeline
          • Stages
            • Audio
              • Advanced Pipelines
              • Alm
              • Common
              • Datasets
                • File Utils
                • Fleurs
                • Readspeech
                  • Create Initial Manifest
              • Filtering
              • Inference
              • Io
              • Metrics
              • Postprocessing
              • Preprocessing
              • Segmentation
              • Tagging
            • Base
            • Client Partitioning
            • Deduplication
            • File Partitioning
            • Function Decorators
            • Image
            • Interleaved
            • Math
            • Resources
            • Synthetic
            • Text
            • Video
          • Tasks
          • Utils
    • Pipeline
    • ProcessingStage
    • CompositeStage
    • Resources
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Module Contents
  • Classes
  • Data
  • API
API ReferenceFull Library ReferenceNemo CuratorNemo CuratorStagesAudioDatasetsReadspeech

nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest

||View as Markdown|
Previous

nemo_curator.stages.audio.datasets.readspeech

Next

nemo_curator.stages.audio.filtering

Module Contents

Classes

NameDescription
CreateInitialManifestReadSpeechStageStage to create initial manifest for the DNS Challenge Read Speech dataset.

Data

DNS_READSPEECH_URL

SAMPLE_RATE_48KHZ

_MIN_FILENAME_PARTS

API

class nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage(
raw_data_dir: str,
max_samples: int = 5000,
auto_download: bool = True,
filepath_key: str = 'audio_filepath',
text_key: str = 'text',
name: str = 'CreateInitialManifestReadS...,
batch_size: int = 1
)
Dataclass

Bases: ProcessingStage[_EmptyTask, AudioTask]

Stage to create initial manifest for the DNS Challenge Read Speech dataset.

Dataset: Microsoft DNS Challenge 5 - Read Speech (Track 1 Headset) Source: https://github.com/microsoft/DNS-Challenge

Downloads a single archive (4.88 GB) containing 14,279 WAV files at 48kHz (19.3 hours). When auto_download=True, the archive is downloaded and extracted automatically.

Parameters:

raw_data_dir
str

Directory where data will be downloaded/extracted to.

max_samples
intDefaults to 5000

Maximum number of samples to include (-1 for all).

auto_download
boolDefaults to True

If True, automatically download and extract dataset.

auto_download
bool = True
batch_size
int = 1
filepath_key
str = 'audio_filepath'
max_samples
int = 5000
name
str = 'CreateInitialManifestReadSpeech'
raw_data_dir
str
text_key
str = 'text'
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.__post_init__()
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage._collect_wavs_recursive(
directory: str
) -> list[str]
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage._count_wavs_recursive(
directory: str
) -> int
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage._extract_archive(
archive_path: str,
extract_path: str
) -> None
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage._find_extracted_wavs(
search_dir: str
) -> str | None
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.collect_audio_files(
search_dir: str
) -> list[dict]
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.download_and_extract() -> str

Download and extract DNS Challenge Read Speech dataset (~4.88 GB).

nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.parse_filename(
filename: str
) -> dict
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.process(
_: nemo_curator.tasks._EmptyTask
) -> list[nemo_curator.tasks.AudioTask]

Main processing method. Returns list[AudioTask] with one AudioTask per file.

nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.ray_stage_spec() -> dict[str, typing.Any]
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.select_samples(
entries: list[dict]
) -> list[dict]
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.CreateInitialManifestReadSpeechStage.verify_dataset_structure(
entries: list[dict]
) -> None
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.DNS_READSPEECH_URL = 'https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_datase...
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest.SAMPLE_RATE_48KHZ = 48000
nemo_curator.stages.audio.datasets.readspeech.create_initial_manifest._MIN_FILENAME_PARTS = 6