Configure and optimize execution backends to run NeMo Curator pipelines efficiently across single machines, multi-GPU systems, and distributed clusters.
Execution backends (executors) are the engines that run NeMo Curator Pipeline workflows across your compute resources. They handle:
Choosing the right executor impacts:
This guide covers all execution backends available in NeMo Curator and applies to all modalities: text, image, video, and audio curation.
All pipelines follow this standard execution pattern:
Key points:
XennaExecutor (recommended)XennaExecutor uses Cosmos-Xenna, a Ray-based execution engine optimized for distributed data processing. Xenna provides native streaming support, automatic resource scaling, and built-in fault tolerance. This executor is the recommended choice for most workloads, especially for video and multimodal pipelines.
Key Features:
Configuration Parameters:
For more details, refer to the official NVIDIA Cosmos-Xenna project.
RayActorPoolExecutorRayActorPoolExecutor uses Ray’s ActorPool for efficient distributed processing with fine-grained resource management. This executor creates pools of Ray actors per stage, enabling better load balancing and fault tolerance through Ray’s native mechanisms. Deduplication workflows automatically use this executor for GPU-accelerated stages.
Key Features:
map_unordered for efficient work distribution across actorsignore_head_node parameter to reserve the Ray cluster’s head node for coordination tasks onlyConfiguration Parameters:
For more details, refer to Text Deduplication .
RayDataExecutorRayDataExecutor uses Ray Data, a scalable data processing library built on Ray Core. Ray Data provides a familiar DataFrame-like API for distributed data transformations. This executor is best suited for large-scale text processing tasks that benefit from Ray Data’s optimized data loading and transformation pipelines.
Key Features:
Constructor Parameters:
Config Dictionary Keys (passed via config={...}):
All three backends support per-stage runtime environments, which allow individual stages to declare isolated Python dependencies. When a stage sets a runtime_env, the backend forwards it to Ray so that each stage’s workers run in a dedicated virtualenv. This enables pipelines where stages require incompatible library versions.
See the Per-Stage Runtime Environments reference for configuration details and examples.
Ray-based executors provide enhanced scalability and performance for large-scale data processing tasks. These executors are beneficial for:
All executors can deliver strong performance; choose based on your workload requirements:
XennaExecutor: Default for most workloads due to maturity and extensive real-world usage (including video pipelines); supports streaming and batch execution with auto-scaling.RayActorPoolExecutor: Automatically used for deduplication workflows; provides GPU-accelerated processing with RAFT integration.RayDataExecutor: Best for batch data transformations using Ray Data’s DataFrame-like API.