For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
      • Overview
      • Deduplication
        • Overview
        • Architecture
        • Abstractions
        • Data Flow
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Core Concept Areas
  • Notes on Modalities and Backends
  • Infrastructure Components
About NeMo CuratorConceptsVideo Concepts

Video Curation Concepts

||View as Markdown|
Previous

Data Export

Next

Architecture

This document covers the essential concepts for video data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with data science and machine learning principles.

Core Concept Areas

Video curation in NVIDIA NeMo Curator focuses on these key areas:

Architecture

Core concepts for distributed processing, Ray foundation, and auto-scaling

Key Abstractions

Stages, pipelines, and execution modes in video curation workflows

Data Flow

How data moves through the system from ingestion to output

Notes on Modalities and Backends

Video pipelines in Curator run on Ray with the XennaExecutor integration for streaming and batch execution. Other modalities, such as text and image, also use RAPIDS and Curator’s distributed backends in parts of their workflows. Refer to the modality-specific guides for details.

Infrastructure Components

The video curation concepts build on NVIDIA NeMo Curator’s core infrastructure components. All modalities (text, image, video, and audio) use these components. These components include:

Memory Management

Optimize memory usage for large datasets partitioning batching monitoring

GPU Acceleration

Leverage NVIDIA GPU acceleration for faster data processing cuda rmm performance

Resumable Processing

Continue interrupted operations on large datasets checkpoints recovery batching