*** layout: overview slug: nemo-curator/nemo\_curator/backends/experimental/utils title: nemo\_curator.backends.experimental.utils ------------------------------------------------ ## Module Contents ### Classes | Name | Description | | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------ | | [`RayStageSpecKeys`](#nemo_curator-backends-experimental-utils-RayStageSpecKeys) | String enum of different flags that define keys inside ray\_stage\_spec. | ### Functions | Name | Description | | -------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | | [`_setup_stage_on_node`](#nemo_curator-backends-experimental-utils-_setup_stage_on_node) | Ray remote function to execute setup\_on\_node for a stage. | | [`execute_setup_on_node`](#nemo_curator-backends-experimental-utils-execute_setup_on_node) | Execute setup on node for a stage. | | [`get_available_cpu_gpu_resources`](#nemo_curator-backends-experimental-utils-get_available_cpu_gpu_resources) | Get available CPU and GPU resources from Ray. | | [`get_head_node_id`](#nemo_curator-backends-experimental-utils-get_head_node_id) | Get the head node ID from the Ray cluster, with lazy evaluation and caching. | | [`get_worker_metadata_and_node_id`](#nemo_curator-backends-experimental-utils-get_worker_metadata_and_node_id) | Get the worker metadata and node id from the runtime context. | | [`is_head_node`](#nemo_curator-backends-experimental-utils-is_head_node) | Check if a node is the head node. | ### Data [`_HEAD_NODE_ID_CACHE`](#nemo_curator-backends-experimental-utils-_HEAD_NODE_ID_CACHE) ### API ```python class nemo_curator.backends.experimental.utils.RayStageSpecKeys ``` **Bases:** `enum.Enum` String enum of different flags that define keys inside ray\_stage\_spec. ```python nemo_curator.backends.experimental.utils._setup_stage_on_node( stage: nemo_curator.stages.base.ProcessingStage, node_info: nemo_curator.backends.base.NodeInfo, worker_metadata: nemo_curator.backends.base.WorkerMetadata ) -> None ``` Ray remote function to execute setup\_on\_node for a stage. This runs as a Ray remote task (not an actor). vLLM's auto-detection only forces the spawn multiprocessing method inside Ray actors, not in Ray tasks. Without this override, vLLM defaults to fork in tasks and hits RuntimeError: Cannot re-initialize CUDA in forked subprocess. We explicitly set the environment variable to spawn to prevent this. ```python nemo_curator.backends.experimental.utils.execute_setup_on_node( stages: list[nemo_curator.stages.base.ProcessingStage], ignore_head_node: bool = False ) -> None ``` Execute setup on node for a stage. ```python nemo_curator.backends.experimental.utils.get_available_cpu_gpu_resources( init_and_shutdown: bool = False, ignore_head_node: bool = False ) -> tuple[int, int] ``` Get available CPU and GPU resources from Ray. ```python nemo_curator.backends.experimental.utils.get_head_node_id() -> str | None ``` Get the head node ID from the Ray cluster, with lazy evaluation and caching. **Returns:** `str | None` The head node ID if a head node exists, otherwise None. ```python nemo_curator.backends.experimental.utils.get_worker_metadata_and_node_id() -> tuple[nemo_curator.backends.base.NodeInfo, nemo_curator.backends.base.WorkerMetadata] ``` Get the worker metadata and node id from the runtime context. ```python nemo_curator.backends.experimental.utils.is_head_node( node: dict[str, typing.Any] ) -> bool ``` Check if a node is the head node. ```python nemo_curator.backends.experimental.utils._HEAD_NODE_ID_CACHE = None ```