***

description: >-
Core concepts and terminology for NeMo Curator across text, image, video, and
audio data curation modalities
categories:

* concepts-architecture
  tags:
* concepts
* fundamentals
* multimodal
* architecture
  personas:
* data-scientist-focused
* mle-focused
  difficulty: beginner
  content\_type: concept
  modality: universal

***

# Concepts

Learn about the core components and concepts introduced by NeMo Curator.

## Modality Concepts

Learn about working with specific modalities using NeMo Curator.

<Cards>
  <Card title="Text Curation Concepts" href="/about/concepts/text">
    Learn about text data curation, covering data loading and processing (filtering, classification, deduplication).
  </Card>

  <Card title="Image Curation Concepts" href="/about/concepts/image">
    Explore key concepts for image data curation, including scalable loading, processing (embedding, classification, filtering), and dataset export.
  </Card>

  <Card title="Video Curation Concepts" href="/about/concepts/video">
    Discover video data curation concepts, such as distributed processing, pipeline stages, execution modes, and efficient data flow.
  </Card>

  <Card title="Audio Curation Concepts" href="/about/concepts/audio">
    Learn about speech data curation, ASR inference, quality assessment, and audio-text integration workflows.
  </Card>
</Cards>

## Universal Concepts

Core concepts that apply across all modalities in NeMo Curator.

<Cards>
  <Card title="Deduplication Concepts" href="/about/concepts/deduplication">
    Comprehensive overview of deduplication techniques across text, image, and video modalities including exact, fuzzy, and semantic approaches.
  </Card>
</Cards>
