For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Introduction to Curator
  • Quickstarts
  • Data Curation Workflows
  • Workflow Modalities
  • Tutorial Highlights
Home

NeMo Curator Documentation

||View as Markdown|
Next

Overview

Welcome to the NeMo Curator documentation.

Introduction to Curator

Learn about the Curator, how it works at a high-level, and the key features.

About Curator

Overview of NeMo Curator and its capabilities. target-users how-it-works

Key Features

Discover the main features of NeMo Curator for data curation. features capabilities deployments

Concepts

Explore the core concepts for each modality in NeMo Curator. data-loading data-processing data-generation

Quickstarts

Install and run NeMo Curator for specific modalities.

Text Curation Quickstart

Set up and run text curation workflows.

Image Curation Quickstart

Set up and run image curation workflows.

Video Curation Quickstart

Set up and run video curation workflows.

Audio Curation Quickstart

Set up and run audio curation workflows.

Data Curation Workflows

Workflow Modalities

Explore how you can use NeMo Curator across different content modalities.

Curate Text

Curate and prepare high-quality text datasets for LLM training. filtering formatting deduplication

Curate Images

Curate image-text datasets with embedding, classification, and deduplication. embedding classification semantic-deduplication

Curate Videos

Curate and process videos with GPU-accelerated pipelines and sharding. video-splitting video-sharding gpu-accelerated

Curate Audio

Transcribe, filter, and curate speech and audio datasets with ASR models. asr transcription quality-filtering

Tutorial Highlights

Check out tutorials to get a quick start on using the NeMo Curator library.

Text Beginner Tutorial

Learn the basics of text data processing with NeMo Curator. beginner text-processing data-preparation

Image Beginner Tutorial

Learn the basics of image data processing with NeMo Curator. beginner image-processing data-curation

Video Beginner Tutorial

Learn the basics of video pipeline construction and execution. video-splitting video-sharding custom-pipelines

Audio Beginner Tutorial

Learn the basics of speech data processing with NeMo Curator. beginner asr-inference quality-assessment