> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

# nemo_curator.backends.utils

## Module Contents

### Classes

| Name                                                                | Description                                                              |
| ------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| [`RayStageSpecKeys`](#nemo_curator-backends-utils-RayStageSpecKeys) | String enum of different flags that define keys inside ray\_stage\_spec. |

### Functions

| Name                                                                                              | Description                                                                |
| ------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
| [`_logger_custom_deserializer`](#nemo_curator-backends-utils-_logger_custom_deserializer)         | -                                                                          |
| [`_logger_custom_serializer`](#nemo_curator-backends-utils-_logger_custom_serializer)             | -                                                                          |
| [`_setup_stage_on_node`](#nemo_curator-backends-utils-_setup_stage_on_node)                       | Ray remote function to execute setup\_on\_node for a stage.                |
| [`check_total_gpu_capacity`](#nemo_curator-backends-utils-check_total_gpu_capacity)               | Raise if the cluster doesn't have enough GPUs to satisfy aggregate demand. |
| [`execute_setup_on_node`](#nemo_curator-backends-utils-execute_setup_on_node)                     | Execute `setup_on_node` for every stage on every alive Ray node.           |
| [`get_available_cpu_gpu_resources`](#nemo_curator-backends-utils-get_available_cpu_gpu_resources) | Get available CPU and GPU resources from Ray.                              |
| [`get_worker_metadata_and_node_id`](#nemo_curator-backends-utils-get_worker_metadata_and_node_id) | Get the worker metadata and node id from the runtime context.              |
| [`merge_executor_configs`](#nemo_curator-backends-utils-merge_executor_configs)                   | Recursively merge two executor configs with deep merging of nested dicts.  |
| [`register_loguru_serializer`](#nemo_curator-backends-utils-register_loguru_serializer)           | Initialize a new local Ray cluster or connects to an existing one.         |
| [`warn_on_env_var_override`](#nemo_curator-backends-utils-warn_on_env_var_override)               | -                                                                          |

### API

```python
class nemo_curator.backends.utils.RayStageSpecKeys
```

**Bases:** `enum.Enum`

String enum of different flags that define keys inside ray\_stage\_spec.

```python
nemo_curator.backends.utils._logger_custom_deserializer(
    _: None
) -> loguru.Logger
```

```python
nemo_curator.backends.utils._logger_custom_serializer(
    _: loguru.Logger
) -> None
```

```python
nemo_curator.backends.utils._setup_stage_on_node(
    stage: nemo_curator.stages.base.ProcessingStage
) -> None
```

Ray remote function to execute setup\_on\_node for a stage.

This runs as a Ray remote task (not an actor).
vLLM's auto-detection only forces the spawn multiprocessing method inside Ray actors,
not in Ray tasks. Without this override, vLLM defaults to fork in tasks and hits
RuntimeError: Cannot re-initialize CUDA in forked subprocess.
We explicitly set the environment variable to spawn to prevent this.

```python
nemo_curator.backends.utils.check_total_gpu_capacity(
    gpus_needed: int,
    ignore_head_node: bool = False
) -> None
```

Raise if the cluster doesn't have enough GPUs to satisfy aggregate demand.

Intended as a coarse pre-check before submitting placement groups: Ray's
PG scheduler can hang indefinitely on `pg.ready()` when demand exceeds
capacity, so a fast, explicit error with the actual numbers is friendlier
than waiting on a timeout.

```python
nemo_curator.backends.utils.execute_setup_on_node(
    stages: list[nemo_curator.stages.base.ProcessingStage],
    ignore_head_node: bool = False
) -> None
```

Execute `setup_on_node` for every stage on every alive Ray node.

All `(stage, node)` setup tasks are submitted up front and awaited with a single
`ray.get`, so total wall-clock time is bounded by the slowest stage rather than
the sum of per-stage times — important when setup is heavy (model downloads, weight
loads) and stages don't contend for the same resources.

```python
nemo_curator.backends.utils.get_available_cpu_gpu_resources(
    init_and_shutdown: bool = False,
    ignore_head_node: bool = False
) -> tuple[int, int]
```

Get available CPU and GPU resources from Ray.

```python
nemo_curator.backends.utils.get_worker_metadata_and_node_id() -> tuple[nemo_curator.backends.base.NodeInfo, nemo_curator.backends.base.WorkerMetadata]
```

Get the worker metadata and node id from the runtime context.

```python
nemo_curator.backends.utils.merge_executor_configs(
    base_config: dict | None,
    override_config: dict | None
) -> dict
```

Recursively merge two executor configs with deep merging of nested dicts.

**Parameters:**

Base configuration dictionary

Configuration to merge on top of base\_config

**Returns:** `dict`

Merged configuration dictionary with all nested dicts recursively merged

**Examples:**

```python
>>> base = {"runtime_env": {"env_vars": {"A": "1", "B": "2"}}}
>>> override = {"runtime_env": {"env_vars": {"B": "3", "C": "4"}}}
>>> merge_executor_configs(base, override)
{"runtime_env": {"env_vars": {"A": "1", "B": "3", "C": "4"}}}
```

```python
nemo_curator.backends.utils.register_loguru_serializer() -> None
```

Initialize a new local Ray cluster or connects to an existing one.

```python
nemo_curator.backends.utils.warn_on_env_var_override(
    existing_config: dict | None,
    merged_config: dict | None
) -> None
```