Concepts#

Learn about the core components and concepts introduced by NeMo Curator.

Modality Concepts#

Learn about working with specific modalities using NeMo Curator.

Text Curation Concepts

Learn about text data curation, covering data loading and processing (filtering, classification, deduplication).

Text Curation Concepts
Image Curation Concepts

Explore key concepts for image data curation, including scalable loading, processing (embedding, classification, filtering), and dataset export.

Image Curation Concepts
Video Curation Concepts

Discover video data curation concepts, such as distributed processing, pipeline stages, execution modes, and efficient data flow.

Video Curation Concepts
Audio Curation Concepts

Learn about speech data curation, ASR inference, quality assessment, and audio-text integration workflows.

Audio Curation Concepts

Universal Concepts#

Core concepts that apply across all modalities in NeMo Curator.

Deduplication Concepts

Comprehensive overview of deduplication techniques across text, image, and video modalities including exact, fuzzy, and semantic approaches.

Deduplication Concepts