> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/datadesigner/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/datadesigner/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/datadesigner/_mcp/server.

# data\_designer.config.utils.io\_helpers

## Module Contents

### Functions

| Name                                                                                                              | Description                                                                                   |
| ----------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| [`ensure_config_dir_exists`](#data_designerconfigutilsio_helpersensure_config_dir_exists)                         | Create configuration directory if it doesn't exist.                                           |
| [`load_config_file`](#data_designerconfigutilsio_helpersload_config_file)                                         | Load a YAML configuration file.                                                               |
| [`save_config_file`](#data_designerconfigutilsio_helperssave_config_file)                                         | Save configuration to a YAML file.                                                            |
| [`list_processor_names`](#data_designerconfigutilsio_helperslist_processor_names)                                 | Discover processor names from directories and parquet files under the given path.             |
| [`load_processor_dataset`](#data_designerconfigutilsio_helpersload_processor_dataset)                             | Load a processor's output dataset, checking for a directory first then a single parquet file. |
| [`read_parquet_dataset`](#data_designerconfigutilsio_helpersread_parquet_dataset)                                 | Read a parquet dataset from a path.                                                           |
| [`validate_dataset_file_path`](#data_designerconfigutilsio_helpersvalidate_dataset_file_path)                     | Validate that a dataset file path has a valid extension and optionally exists.                |
| [`validate_path_contains_files_of_type`](#data_designerconfigutilsio_helpersvalidate_path_contains_files_of_type) | Validate that a path contains files of a specific type.                                       |
| [`smart_load_dataframe`](#data_designerconfigutilsio_helperssmart_load_dataframe)                                 | Load a dataframe from file if a path is given, otherwise return the dataframe.                |
| [`smart_load_yaml`](#data_designerconfigutilsio_helperssmart_load_yaml)                                           | Return the yaml config as a dict given flexible input types.                                  |
| [`_smart_load_yaml_internal`](#data_designerconfigutilsio_helpers_smart_load_yaml_internal)                       | Internal YAML loader with context to prevent URL recursion on fetched payloads.               |
| [`is_http_url`](#data_designerconfigutilsio_helpersis_http_url)                                                   | Check whether a string is an HTTP or HTTPS URL.                                               |
| [`_maybe_rewrite_url`](#data_designerconfigutilsio_helpers_maybe_rewrite_url)                                     | Rewrite known hosting-provider file-view URLs to raw-content URLs.                            |
| [`_safe_url_for_log`](#data_designerconfigutilsio_helpers_safe_url_for_log)                                       | Return URL without query/fragment for safe logging.                                           |
| [`_maybe_rewrite_github_url`](#data_designerconfigutilsio_helpers_maybe_rewrite_github_url)                       | Rewrite GitHub blob URLs to raw\.githubusercontent.com equivalents.                           |
| [`_maybe_rewrite_huggingface_hub_url`](#data_designerconfigutilsio_helpers_maybe_rewrite_huggingface_hub_url)     | Rewrite Hugging Face Hub blob URLs to raw URL equivalents.                                    |
| [`_raise_for_failed_http_status`](#data_designerconfigutilsio_helpers_raise_for_failed_http_status)               | Raise a ValueError with actionable details for failing HTTP status codes.                     |
| [`_load_config_from_url`](#data_designerconfigutilsio_helpers_load_config_from_url)                               | Fetch a remote YAML/JSON config URL and return the parsed dict.                               |
| [`serialize_data`](#data_designerconfigutilsio_helpersserialize_data)                                             | None                                                                                          |
| [`_convert_to_serializable`](#data_designerconfigutilsio_helpers_convert_to_serializable)                         | Convert non-JSON-serializable objects to JSON-serializable Python-native types.               |

### Data

[`logger`](#data_designerconfigutilsio_helperslogger)
[`MAX_CONFIG_URL_SIZE_BYTES`](#data_designerconfigutilsio_helpersmax_config_url_size_bytes)
[`VALID_DATASET_FILE_EXTENSIONS`](#data_designerconfigutilsio_helpersvalid_dataset_file_extensions)
[`VALID_CONFIG_FILE_EXTENSIONS`](#data_designerconfigutilsio_helpersvalid_config_file_extensions)

### API

```python
logger = getLogger(...)
```

```python
MAX_CONFIG_URL_SIZE_BYTES
```

```python
VALID_DATASET_FILE_EXTENSIONS
```

```python
VALID_CONFIG_FILE_EXTENSIONS
```

```python
data_designer.config.utils.io_helpers.ensure_config_dir_exists(config_dir: pathlib.Path) -> None
```

Create configuration directory if it doesn't exist.

**Parameters:**

Directory path to create

```python
data_designer.config.utils.io_helpers.load_config_file(file_path: pathlib.Path) -> dict
```

Load a YAML configuration file.

**Parameters:**

Path to the YAML file

**Returns:**

`dict`

Parsed YAML content as dictionary

**Raises:**

If file doesn't exist

If YAML is malformed

If file is empty

```python
data_designer.config.utils.io_helpers.save_config_file(
    file_path: pathlib.Path,
    config: dict
) -> None
```

Save configuration to a YAML file.

**Parameters:**

Path where to save the file

Configuration dictionary to save

**Raises:**

If file cannot be written

```python
data_designer.config.utils.io_helpers.list_processor_names(processors_outputs_path: pathlib.Path) -> list[str]
```

Discover processor names from directories and parquet files under the given path.

```python
data_designer.config.utils.io_helpers.load_processor_dataset(
    processors_outputs_path: pathlib.Path,
    processor_name: str
) -> pandas.DataFrame
```

Load a processor's output dataset, checking for a directory first then a single parquet file.

```python
data_designer.config.utils.io_helpers.read_parquet_dataset(path: pathlib.Path) -> pandas.DataFrame
```

Read a parquet dataset from a path.

**Parameters:**

The path to the parquet dataset, can be either a file or a directory.

**Returns:**

`pandas.DataFrame`

The parquet dataset as a pandas DataFrame.

```python
data_designer.config.utils.io_helpers.validate_dataset_file_path(
    file_path: str | pathlib.Path,
    should_exist: bool = True
) -> pathlib.Path
```

Validate that a dataset file path has a valid extension and optionally exists.

**Parameters:**

The path to validate, either as a string or Path object.

If True, verify that the file exists. Defaults to True.

**Returns:**

`pathlib.Path`

The validated path as a Path object.

**Raises:**

If the path is not a file.

If the path does not have a valid extension.

```python
data_designer.config.utils.io_helpers.validate_path_contains_files_of_type(
    path: str | pathlib.Path,
    file_extension: str
) -> None
```

Validate that a path contains files of a specific type.

**Parameters:**

The path to validate. Can contain wildcards like `*.parquet`.

The extension of the files to validate (without the dot, e.g., "parquet").

**Returns:**

`None`

None if the path contains files of the specified type, raises an error otherwise.

**Raises:**

If the path does not contain files of the specified type.

```python
data_designer.config.utils.io_helpers.smart_load_dataframe(dataframe: str | pathlib.Path | pandas.DataFrame) -> pandas.DataFrame
```

Load a dataframe from file if a path is given, otherwise return the dataframe.

**Parameters:**

A path to a file or a pandas DataFrame object.

**Returns:**

`pandas.DataFrame`

A pandas DataFrame object.

```python
data_designer.config.utils.io_helpers.smart_load_yaml(yaml_in: str | pathlib.Path | dict) -> dict
```

Return the yaml config as a dict given flexible input types.

**Parameters:**

The config as a dict, yaml string, or yaml file path.

**Returns:**

`dict`

The config as a dict.

```python
data_designer.config.utils.io_helpers._smart_load_yaml_internal(
    yaml_in: str | pathlib.Path | dict,
    *,
    from_url: bool
) -> dict
```

Internal YAML loader with context to prevent URL recursion on fetched payloads.

```python
data_designer.config.utils.io_helpers.is_http_url(value: str) -> bool
```

Check whether a string is an HTTP or HTTPS URL.

```python
data_designer.config.utils.io_helpers._maybe_rewrite_url(url: str) -> str
```

Rewrite known hosting-provider file-view URLs to raw-content URLs.

```python
data_designer.config.utils.io_helpers._safe_url_for_log(url: str) -> str
```

Return URL without query/fragment for safe logging.

```python
data_designer.config.utils.io_helpers._maybe_rewrite_github_url(url: str) -> str
```

Rewrite GitHub blob URLs to raw\.githubusercontent.com equivalents.

GitHub blob URLs (e.g. [https://github.com/org/repo/blob/main/config.yaml](https://github.com/org/repo/blob/main/config.yaml))
serve HTML pages, not raw file content. This rewrites them so that
downstream fetchers get the actual file.

```python
data_designer.config.utils.io_helpers._maybe_rewrite_huggingface_hub_url(url: str) -> str
```

Rewrite Hugging Face Hub blob URLs to raw URL equivalents.

```python
data_designer.config.utils.io_helpers._raise_for_failed_http_status(
    url: str,
    response: requests.Response
) -> None
```

Raise a ValueError with actionable details for failing HTTP status codes.

```python
data_designer.config.utils.io_helpers._load_config_from_url(url: str) -> dict
```

Fetch a remote YAML/JSON config URL and return the parsed dict.

**Parameters:**

HTTP(S) URL pointing to a YAML or JSON configuration file.

**Returns:**

`dict`

The parsed configuration as a dictionary.

**Raises:**

If the URL extension is unsupported, the fetch fails,
the response exceeds the size limit, or parsing produces a
non-dict result.

```python
data_designer.config.utils.io_helpers.serialize_data(
    data: dict | list | str | numbers.Number,
    **kwargs
) -> str
```

```python
data_designer.config.utils.io_helpers._convert_to_serializable(obj: typing.Any) -> typing.Any
```

Convert non-JSON-serializable objects to JSON-serializable Python-native types.

**Raises:**

If the object type is not supported for serialization.