Configure and optimize execution backends to run NeMo Curator pipelines efficiently across single machines, multi-GPU systems, and distributed clusters.
Execution backends (executors) are the engines that run NeMo Curator Pipeline workflows across your compute resources. They handle:
Choosing the right executor impacts:
This guide covers all execution backends available in NeMo Curator and applies to all modalities: text, image, video, and audio curation.
All pipelines follow this standard execution pattern:
Key points:
XennaExecutor (recommended)XennaExecutor is the production-ready executor that uses Cosmos-Xenna, a Ray-based execution engine optimized for distributed data processing. Xenna provides native streaming support, automatic resource scaling, and built-in fault tolerance. It’s the recommended choice for most production workloads, especially for video and multimodal pipelines.
Key Features:
Configuration Parameters:
For more details, refer to the official NVIDIA Cosmos-Xenna project.
RayDataExecutorRayDataExecutor uses Ray Data, a scalable data processing library built on Ray Core. Ray Data provides a familiar DataFrame-like API for distributed data transformations. This executor is experimental and best suited for large-scale batch processing tasks that benefit from Ray Data’s optimized data loading and transformation pipelines.
Key Features:
RayDataExecutor currently has limited configuration options. For more control over execution, consider using XennaExecutor or RayActorPoolExecutor.
RayActorPoolExecutorExecutor using Ray Actor pools for custom distributed processing patterns such as deduplication.
Ray-based executors provide enhanced scalability and performance for large-scale data processing tasks. They’re beneficial for:
Consider Ray executors when:
Recommendation: Use XennaExecutor for production workloads and Ray executors for experimental large-scale processing.
Ray executors emit an experimental warning as the API and performance characteristics may change.
Both options can deliver strong performance; choose based on API fit and maturity:
XennaExecutor: Default for most workloads due to maturity and extensive real-world usage (including video pipelines); supports streaming and batch execution with auto-scaling.