***
description: >-
Essential concepts for video data curation including distributed processing,
pipeline stages, and execution modes
categories:
* concepts-architecture
tags:
* concepts
* video-curation
* distributed
* pipeline
* ray
* autoscaling
personas:
* data-scientist-focused
* mle-focused
difficulty: beginner
content\_type: concept
modality: video-only
***
# Video Curation Concepts
This document covers the essential concepts for video data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with data science and machine learning principles.
## Core Concept Areas
Video curation in NVIDIA NeMo Curator focuses on these key areas:
Core concepts for distributed processing, Ray foundation, and auto-scaling
Stages, pipelines, and execution modes in video curation workflows
How data moves through the system from ingestion to output
## Notes on Modalities and Backends
Video pipelines in Curator run on Ray with the `XennaExecutor` integration for streaming and batch execution. Other modalities, such as text and image, also use RAPIDS and Curator’s distributed backends in parts of their workflows. Refer to the modality-specific guides for details.
## Infrastructure Components
The video curation concepts build on NVIDIA NeMo Curator's core infrastructure components. All modalities (text, image, video, and audio) use these components. These components include:
Optimize memory usage for large datasets
partitioning
batching
monitoring
Leverage NVIDIA GPU acceleration for faster data processing
cuda
rmm
performance
Continue interrupted operations on large datasets
checkpoints
recovery
batching