Resource Allocation | NeMo Curator

NeMo Curator makes resource allocation across pipeline stages straightforward — both CPUs and GPUs. Each stage declares its own resource requirements, and the executor schedules work accordingly.

This design improves performance on CPU-only stages in pipelines that also use GPUs, because CPU stages no longer block GPU resources.

How It Works

Every ProcessingStage can specify a Resources object that declares its CPU and GPU needs:

1 from nemo_curator.stages.core import ProcessingStage
2 from nemo_curator.stages.function_definitions import processing_stage
3 from nemo_curator.stages.resources import Resources
4 from nemo_curator.tasks import DocumentBatch
5 
6 
7 class TokenizerStage(ProcessingStage[DocumentBatch, DocumentBatch]):
8     name: str = "TokenizerStage"
9     resources: Resources = Resources(cpus=1.0)  # CPU-only — no GPU needed
10 
11     def __init__(self):
12         super().__init__()
13         # ... stage logic ...
14 
15 
16 class ModelStage(ProcessingStage[DocumentBatch, DocumentBatch]):
17     name: str = "ModelStage"
18 
19     def __init__(self, model_path: str):
20         super().__init__()
21         # ... stage logic ...
22         pass
23 
24 
25 @processing_stage(name="custom_filter", resources=Resources(cpus=1))
26 def custom_filter_stage(task: DocumentBatch) -> DocumentBatch:
27     # ... filter logic ...
28     pass
29 
30 
31 model_stage = ModelStage(model_path="path/to/model").with_(resources=Resources(gpus=1))

When a pipeline runs, the executor reads each stage’s resource declaration and schedules tasks to satisfy those constraints. Stages that need GPUs are placed on GPU-equipped nodes; CPU-only stages can run on any available worker.

Key Concepts

CPU-Only vs. GPU Stages

The most impactful optimization is correctly separating CPU and GPU work. In a mixed pipeline, CPU-only stages (tokenization, text parsing, filtering) should not request GPU resources — this frees GPUs for inference stages that actually need them:

1 # CPU-only: tokenization, filtering, I/O
2 # Runs on any worker, doesn't block GPU resources
3 @processing_stage(name="tokenizer", resources=Resources(cpus=1))
4 def tokenizer_stage(task: DocumentBatch) -> DocumentBatch:
5     pass
6 
7 # GPU: model inference, embeddings
8 # Scheduled only on GPU-equipped nodes
9 model_stage = ModelStage(model_path="path/to/model").with_(resources=Resources(gpus=1))

Fractional GPU Allocation

Some GPU stages don’t need an entire GPU. You can use fractional allocation via Resources(gpus=0.25) or reserve a specific amount of GPU memory with Resources(gpu_memory_gb=10):

1 # 4 workers share one GPU via fractional allocation
2 model_stage = ModelStage(model_path="path/to/model").with_(resources=Resources(gpus=0.25))
3 
4 # Or reserve a specific amount of GPU memory
5 model_stage = ModelStage(model_path="path/to/model").with_(resources=Resources(gpu_memory_gb=10))

This is useful for inference stages where the model fits in a fraction of GPU memory, allowing you to increase parallelism without requiring more hardware. Note that gpu_memory_gb sets GPU memory for a single GPU.

Best Practices

Start with defaults. Most stages have sensible default resource declarations. Override only when you observe resource contention or underutilization.
Separate CPU and GPU stages. This is the single highest-impact optimization — it allows the executor to parallelize across heterogeneous hardware.
Profile before tuning. Use Ray Dashboard or stage performance stats to identify bottlenecks before adjusting allocations.
Match hardware to workload. If your pipeline is mostly CPU-bound (text filtering), you may not need GPU nodes at all.