***
layout: overview
slug: nemo-curator/nemo\_curator/core/utils
title: nemo\_curator.core.utils
-------------------------------
## Module Contents
### Functions
| Name | Description |
| ------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| [`_logger_custom_deserializer`](#nemo_curator-core-utils-_logger_custom_deserializer) | - |
| [`_logger_custom_serializer`](#nemo_curator-core-utils-_logger_custom_serializer) | - |
| [`check_ray_responsive`](#nemo_curator-core-utils-check_ray_responsive) | - |
| [`get_free_port`](#nemo_curator-core-utils-get_free_port) | Checks if start\_port is free. |
| [`init_cluster`](#nemo_curator-core-utils-init_cluster) | Initialize a new local Ray cluster or connects to an existing one. |
| [`split_table_by_group_max_bytes`](#nemo_curator-core-utils-split_table_by_group_max_bytes) | Split an Arrow table by approximate byte size without splitting group rows. |
### API
```python
nemo_curator.core.utils._logger_custom_deserializer(
_: None
) -> loguru.Logger
```
```python
nemo_curator.core.utils._logger_custom_serializer(
_: loguru.Logger
) -> None
```
```python
nemo_curator.core.utils.check_ray_responsive(
timeout_s: int = RAY_CLUSTER_START_VERIFICAT...
) -> bool
```
```python
nemo_curator.core.utils.get_free_port(
start_port: int,
get_next_free_port: bool = True
) -> int
```
Checks if start\_port is free.
If not, it will get the next free port starting from start\_port if get\_next\_free\_port is True.
Else, it will raise an error if the free port is not equal to start\_port.
```python
nemo_curator.core.utils.init_cluster(
ray_port: int,
ray_temp_dir: str,
ray_dashboard_port: int,
ray_metrics_port: int,
ray_client_server_port: int,
ray_dashboard_host: str,
num_gpus: int | None = None,
num_cpus: int | None = None,
object_store_memory: int | None = None,
enable_object_spilling: bool = False,
block: bool = True,
ip_address: str | None = None,
stdouterr_capture_file: str | None = None
) -> subprocess.Popen
```
Initialize a new local Ray cluster or connects to an existing one.
```python
nemo_curator.core.utils.split_table_by_group_max_bytes(
table: pyarrow.Table,
group_column: str,
max_batch_bytes: int | None
) -> list[pyarrow.Table]
```
Split an Arrow table by approximate byte size without splitting group rows.
Each unique value in `group_column` is kept in a single output table.
If a single group exceeds `max_batch_bytes`, it is still emitted as one chunk.
Note: null values in `group_column` are grouped together (consecutive
nulls are not split). Callers should ensure the column is non-nullable
or handle nulls upstream.