API Reference

Resources

View as Markdown

The Resources dataclass defines compute requirements for processing stages.

Import

1from nemo_curator.stages.resources import Resources

Class Definition

1from dataclasses import dataclass
2
3@dataclass
4class Resources:
5 """Define compute requirements for a stage.
6
7 Attributes:
8 cpus: Number of CPU cores (default: 1.0).
9 gpu_memory_gb: GPU memory in GB for single-GPU stages (default: 0.0).
10 entire_gpu: Allocate entire GPU regardless of memory (default: False).
11 gpus: Number of GPUs for multi-GPU stages (default: 0.0).
12 """
13
14 cpus: float = 1.0
15 gpu_memory_gb: float = 0.0
16 entire_gpu: bool = False
17 gpus: float = 0.0

Properties

requires_gpu

Check if any GPU resources are requested.

1@property
2def requires_gpu(self) -> bool:
3 """Returns True if any GPU resources are requested (gpus, gpu_memory_gb, or entire_gpu)."""

Usage Examples

CPU-Only Stage

1# Default: 1 CPU core
2resources = Resources()
3
4# Multiple CPU cores
5resources = Resources(cpus=4.0)

Single-GPU Stage

Use gpu_memory_gb for stages that need a fraction of a GPU:

1# Request 16GB of GPU memory
2resources = Resources(
3 cpus=4.0,
4 gpu_memory_gb=16.0,
5)

The system automatically calculates the GPU fraction based on available GPU memory.

Multi-GPU Stage

Use gpus for stages that need one or more full GPUs:

1# Request 2 full GPUs
2resources = Resources(
3 cpus=8.0,
4 gpus=2.0,
5)

Entire GPU Allocation

Use entire_gpu: True to allocate a full GPU regardless of memory:

1resources = Resources(cpus=4.0, entire_gpu=True)

Important Constraints

You cannot specify both gpus and gpu_memory_gb. Choose one:

  • Use gpu_memory_gb for single-GPU stages (< 1 GPU)
  • Use gpus for multi-GPU stages (>= 1 GPU)
1# ❌ Invalid - cannot specify both
2resources = Resources(gpus=1.0, gpu_memory_gb=16.0)
3
4# ✅ Valid - use gpu_memory_gb for partial GPU
5resources = Resources(gpu_memory_gb=16.0)
6
7# ✅ Valid - use gpus for full GPUs
8resources = Resources(gpus=2.0)

Using Resources with Stages

1from dataclasses import dataclass, field
2from nemo_curator.stages.base import ProcessingStage
3from nemo_curator.stages.resources import Resources
4
5@dataclass
6class GPUClassifierStage(ProcessingStage[DocumentBatch, DocumentBatch]):
7 name: str = "GPUClassifier"
8 resources: Resources = field(
9 default_factory=lambda: Resources(cpus=4.0, gpu_memory_gb=16.0)
10 )
11
12 def process(self, task: DocumentBatch) -> DocumentBatch:
13 # GPU-accelerated classification
14 ...

Configuring Resources at Runtime

Use with_() to override resource configurations:

1stage = GPUClassifierStage()
2
3# Override with more resources
4high_resource_stage = stage.with_(
5 resources=Resources(cpus=8.0, gpu_memory_gb=32.0)
6)

Source Code

View source on GitHub