NeMo Curator makes resource allocation across pipeline stages straightforward — both CPUs and GPUs. Each stage declares its own resource requirements, and the executor schedules work accordingly.
This design improves performance on CPU-only stages in pipelines that also use GPUs, because CPU stages no longer block GPU resources.
Every ProcessingStage can specify a Resources object that declares its CPU and GPU needs:
When a pipeline runs, the executor reads each stage’s resource declaration and schedules tasks to satisfy those constraints. Stages that need GPUs are placed on GPU-equipped nodes; CPU-only stages can run on any available worker.
The most impactful optimization is correctly separating CPU and GPU work. In a mixed pipeline, CPU-only stages (tokenization, text parsing, filtering) should not request GPU resources — this frees GPUs for inference stages that actually need them:
Some GPU stages don’t need an entire GPU. You can use fractional allocation via Resources(gpus=0.25) or reserve a specific amount of GPU memory with Resources(gpu_memory_gb=10):
This is useful for inference stages where the model fits in a fraction of GPU memory, allowing you to increase parallelism without requiring more hardware. Note that gpu_memory_gb sets GPU memory for a single GPU.