For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
      • Overview
        • Overview
        • Data Builder
        • Overlap Filtering
      • Text Integration
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • How it Works
  • ALM Stages
  • Quick Example
  • Related Topics
Curate AudioProcess DataALM Data Curation

ALM Data Curation

||View as Markdown|
Previous

Format Validation

Next

Data Builder

Curate training data for audio language models by extracting fixed-duration windows from diarized audio segments. The ALM stages read JSONL manifests, build candidate windows that meet quality constraints, remove overlapping windows, and write the filtered results.

How it Works

The ALM pipeline processes audio manifests through a four-stage chain:

  1. ALMManifestReader reads JSONL manifests line-by-line, producing one AudioTask per entry
  2. ALMDataBuilderStage constructs candidate windows from consecutive segments, applying sample rate, bandwidth, speaker count, and duration constraints
  3. ALMDataOverlapStage removes windows that share too much audio content, keeping windows closest to the target duration
  4. ALMManifestWriterStage writes filtered results as JSONL

All stages run on CPU and support both Xenna and Ray Data backends.

ALM Stages

ALM Data Builder

Construct candidate training windows from diarized audio segments with quality filtering windowing speaker-count bandwidth

ALM Overlap Filtering

Remove redundant overlapping windows based on configurable thresholds deduplication overlap-ratio target-duration

Quick Example

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.audio.alm import (
3 ALMManifestReader,
4 ALMDataBuilderStage,
5 ALMDataOverlapStage,
6 ALMManifestWriterStage,
7)
8
9pipeline = Pipeline(name="alm_curation")
10
11# Read input manifests
12pipeline.add_stage(ALMManifestReader(manifest_path="/data/manifests/"))
13
14# Build 120-second training windows
15pipeline.add_stage(
16 ALMDataBuilderStage(
17 target_window_duration=120.0,
18 tolerance=0.1,
19 min_speakers=2,
20 max_speakers=5,
21 )
22)
23
24# Remove windows with more than 50% overlap
25pipeline.add_stage(
26 ALMDataOverlapStage(
27 overlap_percentage=50,
28 target_duration=120.0,
29 )
30)
31
32# Write results
33pipeline.add_stage(ALMManifestWriterStage(output_path="/data/output/alm.jsonl"))

Related Topics

  • ALM Pipeline Concepts: Architectural overview of the ALM pipeline
  • ALM Tutorial: Step-by-step guide with sample data
  • Manifests and Ingest: General manifest format concepts