> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

> Essential concepts for image data curation including loading, processing, and export with GPU acceleration

# Image Curation Concepts

This document covers the essential concepts for image data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with data science and machine learning principles.

## Core Concept Areas

Image curation in NVIDIA NeMo Curator focuses on these key areas:

<Cards>
  <Card title="Data Loading" href="/about/concepts/image/data/loading">
    Core concepts for loading and managing image datasets
  </Card>

  <Card title="Data Processing" href="/about/concepts/image/data/processing">
    Concepts for embedding generation, classification, filtering, and deduplication
  </Card>

  <Card title="Data Export" href="/about/concepts/image/data/export">
    Concepts for saving, exporting, and resharding curated image datasets
  </Card>
</Cards>

## Infrastructure Components

The image curation concepts build on NVIDIA NeMo Curator's core infrastructure components, which are shared across all modalities (text, image, video). These components include:

<Cards>
  <Card title="Memory Management" href="/reference/infra/memory-management">
    Optimize memory usage when processing large datasets
    partitioning
    batching
    monitoring
  </Card>

  <Card title="GPU Acceleration" href="/reference/infra/gpu-processing">
    Leverage NVIDIA GPUs for faster data processing
    cuda
    dali
    performance
  </Card>

  <Card title="Resumable Processing" href="/reference/infra/resumable-processing">
    Continue interrupted operations across large datasets
    checkpoints
    recovery
    batching
  </Card>
</Cards>