NeMo Curator Documentation#

Welcome to the NeMo Curator documentation.

Introduction to Curator#

Learn about the Curator, how it works at a high-level, and the key features.

About Curator

Overview of NeMo Curator and its capabilities.

target-users how-it-works

Overview of NeMo Curator

Key Features

Discover the main features of NeMo Curator for data curation.

features capabilities deployments

Key Features

Concepts

Explore the core concepts for each modality in NeMo Curator.

data-loading data-processing data-generation

Concepts

Quickstarts#

Install and run NeMo Curator for specific modalities.

Text Curation Quickstart

Set up and run text curation workflows.

Get Started with Text Curation

Image Curation Quickstart

Set up and run image curation workflows.

Get Started with Image Curation

Video Curation Quickstart

Set up and run video curation workflows.

Get Started with Video Curation

Audio Curation Quickstart

Set up and run audio curation workflows.

Get Started with Audio Curation

Data Curation Workflows#

Workflow Modalities#

Explore how you can use NeMo Curator across different content modalities.

Curate Text

Curate and prepare high-quality text datasets for LLM training.

filtering formatting deduplication

About Text Curation

Curate Images

Curate image-text datasets with embedding, classification, and deduplication.

embedding classification semantic-deduplication

About Image Curation

Curate Videos

Curate and process videos with GPU-accelerated pipelines and sharding.

video-splitting video-sharding gpu-accelerated

About Video Curation

Curate Audio

Transcribe, filter, and curate speech and audio datasets with ASR models.

asr transcription quality-filtering

About Audio Curation

Tutorial Highlights#

Check out tutorials to get a quick start on using the NeMo Curator library.

Text Beginner Tutorial

Learn the basics of text data processing with NeMo Curator.

beginner text-processing data-preparation

Get Started with Text Curation

Image Beginner Tutorial

Learn the basics of image data processing with NeMo Curator.

beginner image-processing data-curation

Get Started with Image Curation

Video Beginner Tutorial

Learn the basics of video pipeline construction and execution.

video-splitting video-sharding custom-pipelines

Create a Video Pipeline

Audio Beginner Tutorial

Learn the basics of speech data processing with NeMo Curator.

beginner asr-inference quality-assessment

Get Started with Audio Curation