> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> Essential concepts for video data curation including distributed processing, pipeline stages, and execution modes

# Video Curation Concepts

This document covers the essential concepts for video data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with data science and machine learning principles.

## Core Concept Areas

Video curation in NVIDIA NeMo Curator focuses on these key areas:

Core concepts for distributed processing, Ray foundation, and auto-scaling

Stages, pipelines, and execution modes in video curation workflows

How data moves through the system from ingestion to output

## Notes on Modalities and Backends

Video pipelines in Curator run on Ray with the `XennaExecutor` integration for streaming and batch execution. Other modalities, such as text and image, also use RAPIDS and Curator’s distributed backends in parts of their workflows. Refer to the modality-specific guides for details.

## Infrastructure Components

The video curation concepts build on NVIDIA NeMo Curator's core infrastructure components. All modalities (text, image, video, and audio) use these components. These components include:

Optimize memory usage for large datasets
partitioning
batching
monitoring

Leverage NVIDIA GPU acceleration for faster data processing
cuda
rmm
performance

Continue interrupted operations on large datasets
checkpoints
recovery
batching