*** description: >- Architecture overview of NeMo Curator's Ray-based distributed video curation system with autoscaling categories: * concepts-architecture tags: * architecture * distributed * ray * autoscaling * video-curation * performance personas: * data-scientist-focused * mle-focused * admin-focused difficulty: intermediate content\_type: concept modality: video-only only: not ga *** # Architecture NeMo Curator's video curation system builds on Ray, a distributed framework for scalable, high‑throughput data processing across machine clusters. ## Ray Foundation NeMo Curator leverages two essential Ray Core capabilities: * **Distributed Actor Management**: Creates and manages Ray actors across a cluster. Cosmos-Xenna supports per-stage runtime environments. In Curator today, per-stage `runtime_env` is not user-configurable through stage specs; the integration sets only limited executor-level environment variables. * **Ray Object Store and References**: Uses Ray's object store and data references to reduce data movement and increase throughput. ![Ray Architecture](https://files.buildwithfern.com/nemo-curator.docs.buildwithfern.com/nemo/curator/3fc21d7c5143ec2fdd9672c83b1448d48bb22b6c0cb4c1caf55918c171ee5cee/assets/images/stages-pipelines-diagram.png) ## Execution and Auto Scaling Curator runs pipelines through an executor. The Cosmos-Xenna executor (`XennaExecutor`) translates `ProcessingStage` definitions into Cosmos-Xenna stage specifications and runs them on Ray in either streaming or batch mode. During streaming execution, the auto-scaling mechanism: * Monitors each stage's throughput * Dynamically adjusts worker allocation * Optimizes pipeline performance by balancing resources across stages This dynamic scaling reduces bottlenecks and uses hardware efficiently for large-scale video curation tasks. Key executor configuration (actual keys): * `logging_interval`: Seconds between status logs (default: 60) * `ignore_failures`: Continue on failures (default: False) * `execution_mode`: "streaming" or "batch" (default: "streaming") * `cpu_allocation_percentage`: CPU allocation ratio (default: 0.95) * `autoscale_interval_s`: Auto-scaling interval in seconds (applies in streaming mode; default: 180) Use `Pipeline.describe()` to review stage resources and input/output requirements at a glance during development.