For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LogoLogoNeMo Curator
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
      • Overview
      • Beginner Tutorial
      • Split and Dedup
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
  • Welcome
  • Overview
  • Key Features
  • Overview
  • Deduplication
  • Resource Allocation
  • Streaming
  • Auto-Balancing
  • Throughput
  • Overview
  • Loading
  • Acquisition
  • Processing
  • Curation Pipeline
  • Overview
  • Loading
  • Data Processing
  • Data Export
  • Overview
  • Architecture
  • Abstractions
  • Data Flow
  • Overview
  • Curation Pipeline
  • Audio Task
  • ASR Pipeline
  • Quality Metrics
  • Manifests and Ingest
  • ALM Pipeline
  • Text Integration
  • Overview
  • Migration FAQ
  • Migration Guide
  • Overview
  • Install (All Modalities)
  • Text Quickstart
  • Image Quickstart
  • Video Quickstart
  • Audio Quickstart
  • Overview
  • Tutorials
  • Overview
  • ArXiv
  • Common Crawl
  • Custom Sources
  • Nemotron-Parse PDF Pipeline
  • Read Existing Data
  • Wikipedia
  • Overview
  • Overview
  • Add IDs
  • Text Cleaning
  • Overview
  • vLLM Embedder
  • Overview
  • Exact Deduplication
  • Fuzzy Deduplication
  • Semantic Deduplication
  • Overview
  • Language Detection
  • Stopwords
  • Overview
  • Classifier
  • Distributed Classifier
  • Heuristic Filtering
  • Overview
  • Code Processing
  • Overview
  • Interleaved IO
  • Interleaved Filters
  • Save and Export
  • Overview
  • LLM Client Setup
  • Inference Server
  • NeMo Data Designer
  • Multilingual Q&A
  • Overview
  • Task Reference
  • Overview
  • Overview
  • Beginner Tutorial
  • Deduplication Workflow
  • Overview
  • TAR Archives
  • Overview
  • Overview
  • CLIP Embedder
  • Overview
  • Aesthetic Filter
  • NSFW Filter
  • Save and Export
  • Overview
  • Overview
  • Beginner Tutorial
  • Split and Dedup
  • Overview
  • Add Custom Environment
  • Add Custom Code
  • Add Custom Model
  • Add Custom Stage
  • Load Data
  • Overview
  • Clipping
  • Transcoding
  • Filtering
  • Embeddings
  • Deduplication
  • Frame Extraction
  • Captions Preview
  • Save and Export
  • Overview
  • Overview
  • Beginner Tutorial
  • ALM Tutorial
  • ReadSpeech Tutorial
  • Overview
  • Custom Manifests
  • FLEURS Dataset
  • Local Files
  • Overview
  • Overview
  • NeMo ASR Models
  • Overview
  • WER Filtering
  • Duration Filtering
  • Overview
  • Preprocessing Stages
  • VAD Segmentation
  • Band Filter
  • UTMOS Filter
  • SIGMOS Filter
  • Speaker Separation
  • AudioDataFilterStage Composite
  • Overview
  • Duration Calculation
  • Format Validation
  • Overview
  • Data Builder
  • Overlap Filtering
  • Text Integration
  • Save and Export
  • Overview
  • Overview
  • Requirements
  • Deploy Image Curation on Slurm
  • Multi-Node Ray on Slurm
  • Overview
  • Overview
  • Overview
  • Memory Management
  • Monitoring
  • GPU Processing
  • Resumable Processing
  • Execution Backends
  • Per-Stage Runtime Environments
  • Container Environments
  • Related Tools
Curate VideoTutorials

Video Curation Tutorials

||View as Markdown|

Use the tutorials in this section to learn video curation with NeMo Curator.

Tutorials are organized by complexity and typically build on one another.


Beginner Tutorial

Run your first splitting pipeline with the Python example, including model prep and common flags. video-splitting embeddings captioning

Split and Deduplicate Videos

Split videos and then remove near-duplicates using KMeans + Pairwise semantic dedup. splitting semantic-deduplication

Pipeline Customization Series

Customize pipelines by composing ProcessingStage classes and tuning resources. stages resources custom-pipelines

Previous

Overview

Next

Beginner Tutorial

NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.