About Getting Started#

Before You Start#

Welcome to NeMo Curator! This toolkit enables you to curate large-scale datasets for training generative AI models across text, image, and video modalities.

Who are these quickstarts for?

  • Data scientists and ML engineers who want to quickly test NeMo Curator’s capabilities

  • Users who want to run their first curation pipeline with minimal setup

  • Anyone exploring NeMo Curator before committing to a full production deployment

What you’ll find here: Each quickstart below gets you up and running with a specific modality in under 30 minutes. They include basic installation, sample data, and a working example.

Tip

For production deployments, cluster configurations, or detailed system requirements, see the Setup & Deployment documentation.


Modality Quickstarts#

The following quickstarts enable you to test out NeMo Curator for a given modality.

Text Curation Quickstart

Set up your environment and run your first text curation pipeline with NeMo Curator. Learn how to install the toolkit, prepare your data, and use the DocumentDataset and modular filters to curate large-scale text datasets efficiently.

Get Started with Text Curation
Image Curation Quickstart

Set up your environment and install NeMo Curator’s image modules. Learn about prerequisites, installation methods, and how to use the toolkit to curate large-scale image-text datasets for generative model training.

Get Started with Image Curation