About NeMo CuratorConceptsVideo Concepts

Video Curation Concepts

View as Markdown

This document covers the essential concepts for video data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with data science and machine learning principles.

Core Concept Areas

Video curation in NVIDIA NeMo Curator focuses on these key areas:

Notes on Modalities and Backends

Video pipelines in Curator run on Ray with the XennaExecutor integration for streaming and batch execution. Other modalities, such as text and image, also use RAPIDS and Curator’s distributed backends in parts of their workflows. Refer to the modality-specific guides for details.

Infrastructure Components

The video curation concepts build on NVIDIA NeMo Curator’s core infrastructure components. All modalities (text, image, video, and audio) use these components. These components include: