About Getting Started#
Before You Start#
Welcome to NeMo Curator! This toolkit enables you to curate large-scale datasets for training generative AI models across text, image, and video modalities.
Who are these quickstarts for?
Data scientists and ML engineers who want to quickly test NeMo Curator’s capabilities
Users who want to run their first curation pipeline with minimal setup
Anyone exploring NeMo Curator before committing to a full production deployment
What you’ll find here: Each quickstart below gets you up and running with a specific modality in under 30 minutes. They include basic installation, sample data, and a working example.
Tip
For production deployments, cluster configurations, or detailed system requirements, see the Setup & Deployment documentation.
Modality Quickstarts#
The following quickstarts enable you to test out NeMo Curator for a given modality.
Set up your environment and run your first text curation pipeline with NeMo Curator. Learn how to install the toolkit, prepare your data, and use the DocumentDataset
and modular filters to curate large-scale text datasets efficiently.
Set up your environment and install NeMo Curator’s image modules. Learn about prerequisites, installation methods, and how to use the toolkit to curate large-scale image-text datasets for generative model training.