NeMo Curator Documentation#

Welcome to the NeMo Curator documentation.

Introduction to Curator#

Learn about the Curator, how it works at a high-level, and the key features.

About Curator

Overview of NeMo Curator and its capabilities.

Overview of NeMo Curator
Key Features

Discover the main features of NeMo Curator for data curation.

Key Features
Concepts

Explore the core concepts for each modality in NeMo Curator.

Concepts

Data Curation Workflows#

Workflow Modalities#

Explore how you can use NeMo Curator across different content modalities.

Curate Text

Curate and prepare high-quality text datasets for LLM training.

About Text Curation
Curate Images

Curate image-text datasets with embedding, classification, and deduplication.

About Image Curation

Quickstart Guides#

Install and run NeMo Curator for specific modalities.

Text Curation Quickstart

Quickly set up and run text curation workflows.

Get Started with Text Curation
Image Curation Quickstart

Quickly set up and run image curation workflows.

Get Started with Image Curation

Tutorial Highlights#

Check out tutorials to get a quick start on using the NeMo Curator library.

Text Beginner Tutorial

Learn the basics of text data processing with NeMo Curator.

Get Started with Text Curation
Image Beginner Tutorial

Learn the basics of image data processing with NeMo Curator.

Get Started with Image Curation