> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> NeMo Curator is an open-source, scalable data curation platform for curating large datasets across text, image, video, and audio modalities to improve AI model training

# NeMo Curator Documentation

Welcome to the NeMo Curator documentation.

## Introduction to Curator

Learn about the Curator, how it works at a high-level, and the key features.

Overview of NeMo Curator and its capabilities.
target-users how-it-works

Discover the main features of NeMo Curator for data curation.
features capabilities deployments

Explore the core concepts for each modality in NeMo Curator.
data-loading data-processing data-generation

## Quickstarts

Install and run NeMo Curator for specific modalities.

Set up and run text curation workflows.

Set up and run image curation workflows.

Set up and run video curation workflows.

Set up and run audio curation workflows.

## Data Curation Workflows

### Workflow Modalities

Explore how you can use NeMo Curator across different content modalities.

Curate and prepare high-quality text datasets for LLM training.
filtering formatting deduplication

Curate image-text datasets with embedding, classification, and deduplication.
embedding classification semantic-deduplication

Curate and process videos with GPU-accelerated pipelines and sharding.
video-splitting video-sharding gpu-accelerated

Transcribe, filter, and curate speech and audio datasets with ASR models.
asr transcription quality-filtering

## Tutorial Highlights

Check out tutorials to get a quick start on using the NeMo Curator library.

Learn the basics of text data processing with NeMo Curator.
beginner
text-processing
data-preparation

Learn the basics of image data processing with NeMo Curator.
beginner
image-processing
data-curation

Learn the basics of video pipeline construction and execution.
video-splitting
video-sharding
custom-pipelines

Learn the basics of speech data processing with NeMo Curator.
beginner
asr-inference
quality-assessment

***