Resources | NeMo Curator

The Resources dataclass defines compute requirements for processing stages.

Import

1 from nemo_curator.stages.resources import Resources

Class Definition

1 from dataclasses import dataclass
2 
3 @dataclass
4 class Resources:
5     """Define compute requirements for a stage.
6 
7     Attributes:
8         cpus: Number of CPU cores (default: 1.0).
9         gpu_memory_gb: GPU memory in GB for single-GPU stages (default: 0.0).
10         gpus: Number of full GPUs (1 or more) for GPU stages (default: 0.0).
11     """
12 
13     cpus: float = 1.0
14     gpu_memory_gb: float = 0.0
15     gpus: float = 0.0

Properties

`requires_gpu`

Check if any GPU resources are requested.

1 @property
2 def requires_gpu(self) -> bool:
3     """Returns True if any GPU resources are requested (gpus or gpu_memory_gb)."""

Usage Examples

CPU-Only Stage

1 # Default: 1 CPU core
2 resources = Resources()
3 
4 # Multiple CPU cores
5 resources = Resources(cpus=4.0)

Single-GPU Stage

Use gpu_memory_gb for stages that need a fraction of a GPU:

1 # Request 16GB of GPU memory
2 resources = Resources(
3     cpus=4.0,
4     gpu_memory_gb=16.0,
5 )

The system automatically calculates the GPU fraction based on available GPU memory.

Multi-GPU Stage

Use gpus for stages that need one or more full GPUs:

1 # Request 2 full GPUs
2 resources = Resources(
3     cpus=8.0,
4     gpus=2.0,
5 )

Important Constraints

You cannot specify both gpus and gpu_memory_gb. Choose one:

Use gpu_memory_gb for single-GPU stages (< 1 GPU)
Use gpus for stages that need one or more full GPUs

1 # ❌ Invalid - cannot specify both
2 resources = Resources(gpus=1.0, gpu_memory_gb=16.0)
3 
4 # ✅ Valid - use gpu_memory_gb for partial GPU
5 resources = Resources(gpu_memory_gb=16.0)
6 
7 # ✅ Valid - use gpus for full GPUs
8 resources = Resources(gpus=2.0)

Using Resources with Stages

1 from dataclasses import dataclass, field
2 from nemo_curator.stages.base import ProcessingStage
3 from nemo_curator.stages.resources import Resources
4 
5 @dataclass
6 class GPUClassifierStage(ProcessingStage[DocumentBatch, DocumentBatch]):
7     name: str = "GPUClassifier"
8     resources: Resources = field(
9         default_factory=lambda: Resources(cpus=4.0, gpu_memory_gb=16.0)
10     )
11 
12     def process(self, task: DocumentBatch) -> DocumentBatch:
13         # GPU-accelerated classification
14         ...

Configuring Resources at Runtime

Use with_() to override resource configurations:

1 stage = GPUClassifierStage()
2 
3 # Override with more resources
4 high_resource_stage = stage.with_(
5     resources=Resources(cpus=8.0, gpu_memory_gb=32.0)
6 )

Source Code

View source on GitHub