Image Curation Concepts
This document covers the essential concepts for image data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with data science and machine learning principles.
Core Concept Areas
Image curation in NVIDIA NeMo Curator focuses on these key areas:
Data Loading
Core concepts for loading and managing image datasets
Data Processing
Concepts for embedding generation, classification, filtering, and deduplication
Data Export
Concepts for saving, exporting, and resharding curated image datasets
Infrastructure Components
The image curation concepts build on NVIDIA NeMo Curator’s core infrastructure components, which are shared across all modalities (text, image, video). These components include: