For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
      • Overview
        • Overview
        • WER Filtering
        • Duration Filtering
      • Text Integration
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Duration-Based Quality Control
  • Why Duration Matters
  • Basic Duration Filtering
  • Simple Duration Range
  • Use Case-Specific Ranges
  • Speech Rate Analysis
  • Calculate Speech Rate Metrics
  • Speech Rate Filtering
  • Filtering by Speech Rate
  • Normal Speech Rate Range
  • Normal Speech Rate Ranges
  • Best Practices
  • Duration Filtering Strategy
  • Common Pitfalls
  • Real Working Example
  • Related Topics
Curate AudioProcess DataQuality Assessment

Duration Filtering

||View as Markdown|
Previous

WER Filtering

Next

Overview

Filter audio samples by duration ranges, speech rate metrics, and temporal characteristics to create optimal datasets for ASR training and speech processing applications.

Duration-Based Quality Control

Why Duration Matters

Training Efficiency: Duration filtering can improve ASR training by removing samples that may be problematic for training

Processing Performance: Duration affects computational requirements:

  • Memory usage scales with audio length
  • Batch processing efficiency varies with duration variance
  • GPU utilization optimized for consistent lengths

Basic Duration Filtering

Simple Duration Range

1from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
2
3# Calculate duration for each audio file
4duration_stage = GetAudioDurationStage(
5 audio_filepath_key="audio_filepath",
6 duration_key="duration"
7)
8
9# Filter for optimal duration range (1-15 seconds)
10min_duration_filter = PreserveByValueStage(
11 input_value_key="duration",
12 target_value=1.0,
13 operator="ge" # greater than or equal
14)
15
16max_duration_filter = PreserveByValueStage(
17 input_value_key="duration",
18 target_value=15.0,
19 operator="le" # less than or equal
20)
21
22# Add to pipeline
23pipeline.add_stage(duration_stage)
24pipeline.add_stage(min_duration_filter)
25pipeline.add_stage(max_duration_filter)

Use Case-Specific Ranges

1# Duration ranges for different applications
2duration_configs = {
3 "asr_training": {
4 "min_duration": 1.0,
5 "max_duration": 20.0,
6 "optimal_range": (2.0, 10.0)
7 },
8
9 "voice_cloning": {
10 "min_duration": 3.0,
11 "max_duration": 10.0,
12 "optimal_range": (4.0, 8.0)
13 },
14
15 "speech_synthesis": {
16 "min_duration": 2.0,
17 "max_duration": 15.0,
18 "optimal_range": (3.0, 12.0)
19 },
20
21 "keyword_spotting": {
22 "min_duration": 0.5,
23 "max_duration": 3.0,
24 "optimal_range": (1.0, 2.0)
25 }
26}
27
28def create_use_case_duration_filter(use_case: str) -> list[PreserveByValueStage]:
29 """Create duration filters for specific use case."""
30
31 config = duration_configs.get(use_case, duration_configs["asr_training"])
32
33 return [
34 PreserveByValueStage(
35 input_value_key="duration",
36 target_value=config["min_duration"],
37 operator="ge"
38 ),
39 PreserveByValueStage(
40 input_value_key="duration",
41 target_value=config["max_duration"],
42 operator="le"
43 )
44 ]

Speech Rate Analysis

Speech rate metrics (words per second, characters per second) help identify samples with speaking speeds appropriate for your use case.

Calculate Speech Rate Metrics

The built-in speech rate calculation functions can be used within custom processing stages to analyze speaking speed and add metrics to your pipeline data.

Speech Rate Filtering

If you have pre-calculated speech rate metrics in your data, you can filter based on them:

1from nemo_curator.stages.audio.common import PreserveByValueStage
2from nemo_curator.pipeline import Pipeline
3
4# Example: Filter by speech rate if you have word_rate field in your data
5pipeline = Pipeline(name="speech_rate_filtering")
6
7# Filter by speech rate (1.5-5 words per second)
8pipeline.add_stage(
9 PreserveByValueStage(
10 input_value_key="word_rate", # Assumes this field exists in your data
11 target_value=1.5,
12 operator="ge"
13 )
14)
15
16pipeline.add_stage(
17 PreserveByValueStage(
18 input_value_key="word_rate",
19 target_value=5.0,
20 operator="le"
21 )
22)

This example assumes you have already calculated and stored speech rate metrics in your audio data. The built-in stages do not automatically calculate speech rates - you would need to create a custom stage for that functionality.

Filtering by Speech Rate

After you calculate speech rate metrics, filter samples to keep those with appropriate speaking speeds:

Normal Speech Rate Range

1from nemo_curator.stages.audio.common import PreserveByValueStage
2
3# Filter by word rate (assumes word_rate field exists in your data)
4word_rate_min_filter = PreserveByValueStage(
5 input_value_key="word_rate",
6 target_value=1.5,
7 operator="ge"
8)
9
10word_rate_max_filter = PreserveByValueStage(
11 input_value_key="word_rate",
12 target_value=5.0,
13 operator="le"
14)
15
16# Filter by character rate (assumes char_rate field exists in your data)
17char_rate_min_filter = PreserveByValueStage(
18 input_value_key="char_rate",
19 target_value=8.0,
20 operator="ge"
21)
22
23char_rate_max_filter = PreserveByValueStage(
24 input_value_key="char_rate",
25 target_value=30.0,
26 operator="le"
27)

These examples assume you have pre-calculated speech rate metrics in your audio data. Use the get_wordrate() and get_charrate() utility functions to calculate these values in a custom processing stage.

Normal Speech Rate Ranges

Typical speech rates for different contexts:

ContextWords/SecondCharacters/SecondUse Case
Slow/Clear Speech1.5 - 2.58 - 15Educational content, accessibility
Normal Conversation2.5 - 4.015 - 24General ASR training
Fast Speech4.0 - 5.024 - 30News, presentations
Very Fast>5.0>30May indicate errors or problematic samples

Best Practices

Duration Filtering Strategy

  1. Analyze First: Understand your dataset’s duration distribution
  2. Use Case Alignment: Align duration ranges with intended use
  3. Progressive Filtering: Apply duration filters before computationally expensive stages
  4. Quality Correlation: Consider correlation between duration and other quality metrics

Common Pitfalls

Over-Filtering: Removing too much data

1# Check retention rates before applying filters
2retention_rate = filtered_count / original_count
3if retention_rate < 0.5: # Less than 50% retained
4 print("Warning: Very aggressive filtering - consider relaxing thresholds")

Under-Filtering: Keeping problematic samples that may negatively impact training or processing efficiency.

Real Working Example

Here’s a complete working example from the NeMo Curator tutorials showing actual duration filtering in practice:

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
3from nemo_curator.stages.audio.datasets.fleurs.create_initial_manifest import CreateInitialManifestFleursStage
4from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
5from nemo_curator.stages.audio.metrics.get_wer import GetPairwiseWerStage
6from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
7from nemo_curator.stages.resources import Resources
8
9def create_audio_pipeline(raw_data_dir: str, wer_threshold: float = 75.0) -> Pipeline:
10 """Real working pipeline from NeMo Curator tutorials."""
11
12 pipeline = Pipeline(name="audio_inference", description="Inference audio and filter by WER threshold.")
13
14 # Load FLEURS dataset
15 pipeline.add_stage(
16 CreateInitialManifestFleursStage(
17 lang="hy_am",
18 split="dev",
19 raw_data_dir=raw_data_dir,
20 ).with_(batch_size=4)
21 )
22
23 # ASR inference
24 pipeline.add_stage(
25 InferenceAsrNemoStage(
26 model_name="nvidia/stt_hy_fastconformer_hybrid_large_pc"
27 ).with_(resources=Resources(gpus=1.0))
28 )
29
30 # Calculate WER
31 pipeline.add_stage(
32 GetPairwiseWerStage(
33 text_key="text",
34 pred_text_key="pred_text",
35 wer_key="wer"
36 )
37 )
38
39 # Calculate duration
40 pipeline.add_stage(
41 GetAudioDurationStage(
42 audio_filepath_key="audio_filepath",
43 duration_key="duration"
44 )
45 )
46
47 # Filter by WER threshold
48 pipeline.add_stage(
49 PreserveByValueStage(
50 input_value_key="wer",
51 target_value=wer_threshold,
52 operator="le"
53 )
54 )
55
56 # Convert to document format
57 pipeline.add_stage(AudioToDocumentStage().with_(batch_size=1))
58
59 return pipeline

This example comes directly from tutorials/audio/fleurs/pipeline.py and shows the correct parameter names and usage patterns for the built-in stages.

Related Topics

  • Quality Assessment Overview: Complete quality filtering workflow
  • WER Filtering: Transcription accuracy filtering
  • Audio Analysis: Duration calculation and analysis