For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
      • Overview
      • Deduplication
        • Overview
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Core Concept Areas
  • Infrastructure Components
About NeMo CuratorConceptsImage Concepts

Image Curation Concepts

||View as Markdown|
Previous

Curation Pipeline

Next

Loading

This document covers the essential concepts for image data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with data science and machine learning principles.

Core Concept Areas

Image curation in NVIDIA NeMo Curator focuses on these key areas:

Data Loading

Core concepts for loading and managing image datasets

Data Processing

Concepts for embedding generation, classification, filtering, and deduplication

Data Export

Concepts for saving, exporting, and resharding curated image datasets

Infrastructure Components

The image curation concepts build on NVIDIA NeMo Curator’s core infrastructure components, which are shared across all modalities (text, image, video). These components include:

Memory Management

Optimize memory usage when processing large datasets partitioning batching monitoring

GPU Acceleration

Leverage NVIDIA GPUs for faster data processing cuda dali performance

Resumable Processing

Continue interrupted operations across large datasets checkpoints recovery batching