For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Before You Start
  • Install
  • Modality Quickstarts
Get Started

About Getting Started

||View as Markdown|
Previous

Migration Guide

Next

Install (All Modalities)

Before You Start

Welcome to NeMo Curator! This framework streamlines the curation and pre-processing of large-scale datasets for training generative AI models across text, image, audio and video modalities.

Who are these quickstarts for?

  • AI/ML engineers and researchers who want to quickly test NeMo Curator’s capabilities
  • Users looking to run an initial curation pipeline with minimal setup
  • Individuals exploring NeMo Curator prior to a full production deployment

What you’ll find here: Each quickstart enables you to get started with a specific domain in less than 30 minutes. Quickstarts provide basic installation steps, sample data, and a working example.

For production deployments, cluster configurations, or detailed system requirements, refer to the Setup & Deployment documentation.


Install

Install NeMo Curator once with support for every modality, then jump into a quickstart below.

Install (All Modalities)

Full installation guide covering PyPI, source, and container methods, package extras for each modality, and post-install verification steps.


Modality Quickstarts

The following quickstarts allow you to test NeMo Curator using a selected data modality.

Text Curation Quickstart

Set up your environment and execute your first text curation pipeline with NeMo Curator. Instructions cover installation, data preparation, and use of the modular pipeline architecture for efficient large-scale text dataset curation.

Image Curation Quickstart

Set up your environment and install the NeMo Curator image modules. The quickstart explains prerequisites, installation methods, and the use of the framework to curate large-scale image-text datasets for generative AI model training.

Video Curation Quickstart

Set up your environment and execute your first video curation pipeline. The instructions include prerequisites, installation options, and guidance on splitting, encoding, embedding, and exporting curated video clips at scale.

Audio Curation Quickstart

Set up your environment and execute your first audio curation pipeline with NeMo Curator. Instructions cover installation, data preparation, and use of the modular pipeline architecture for efficient large-scale audio speech dataset curation.