nemo_curator.backends.experimental.utils
Module Contents
Classes
Functions
Data
API
Bases: enum.Enum
String enum of different flags that define keys inside ray_stage_spec.
Ray remote function to execute setup_on_node for a stage.
This runs as a Ray remote task (not an actor). vLLM’s auto-detection only forces the spawn multiprocessing method inside Ray actors, not in Ray tasks. Without this override, vLLM defaults to fork in tasks and hits RuntimeError: Cannot re-initialize CUDA in forked subprocess. We explicitly set the environment variable to spawn to prevent this.
Execute setup on node for a stage.
Get available CPU and GPU resources from Ray.
Get the head node ID from the Ray cluster, with lazy evaluation and caching.
Returns: str | None
The head node ID if a head node exists, otherwise None.
Get the worker metadata and node id from the runtime context.
Check if a node is the head node.