For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • models
      • mcp
      • column_configs
      • config_builder
      • data_designer_config
      • run_config
      • sampler_params
      • validator_params
      • seeds
      • processors
      • analysis
      • Config API
        • Analysis
        • Base
        • Column Configs
        • Column Types
        • Config Builder
        • Custom Column
        • Data Designer Config
        • Dataset Metadata
        • Default Model Settings
        • Errors
        • Exportable Config
        • Fingerprint
        • Interface
        • Mcp
        • Models
        • Preview Results
        • Processor Types
        • Processors
        • Run Config
        • Sampler Constraints
        • Sampler Params
        • Seed
        • Seed Source
        • Seed Source Dataframe
        • Seed Source Types
        • Testing
        • Utils
          • Code Lang
          • Constants
          • Errors
          • Image Helpers
          • Info
          • Io Helpers
          • Misc
          • Numerical Helpers
          • Trace Renderer
          • Trace Type
          • Type Helpers
          • Visualization
          • Warning Helpers
        • Validator Params
        • Version
  • Dev Notes
    • Overview
    • Prompt Sensitivity
    • Retriever SDG Toolkit
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Functions
  • Data
  • API
Code ReferenceConfigConfig APIUtils

data_designer.config.utils.io_helpers

||View as Markdown|
Previous

Info

Next

Misc

Module Contents

Functions

NameDescription
ensure_config_dir_existsCreate configuration directory if it doesn’t exist.
load_config_fileLoad a YAML configuration file.
save_config_fileSave configuration to a YAML file.
list_processor_namesDiscover processor names from directories and parquet files under the given path.
load_processor_datasetLoad a processor’s output dataset, checking for a directory first then a single parquet file.
read_parquet_datasetRead a parquet dataset from a path.
validate_dataset_file_pathValidate that a dataset file path has a valid extension and optionally exists.
validate_path_contains_files_of_typeValidate that a path contains files of a specific type.
smart_load_dataframeLoad a dataframe from file if a path is given, otherwise return the dataframe.
smart_load_yamlReturn the yaml config as a dict given flexible input types.
_smart_load_yaml_internalInternal YAML loader with context to prevent URL recursion on fetched payloads.
is_http_urlCheck whether a string is an HTTP or HTTPS URL.
_maybe_rewrite_urlRewrite known hosting-provider file-view URLs to raw-content URLs.
_safe_url_for_logReturn URL without query/fragment for safe logging.
_maybe_rewrite_github_urlRewrite GitHub blob URLs to raw.githubusercontent.com equivalents.
_maybe_rewrite_huggingface_hub_urlRewrite Hugging Face Hub blob URLs to raw URL equivalents.
_raise_for_failed_http_statusRaise a ValueError with actionable details for failing HTTP status codes.
_load_config_from_urlFetch a remote YAML/JSON config URL and return the parsed dict.
serialize_dataNone
_convert_to_serializableConvert non-JSON-serializable objects to JSON-serializable Python-native types.

Data

logger MAX_CONFIG_URL_SIZE_BYTES VALID_DATASET_FILE_EXTENSIONS VALID_CONFIG_FILE_EXTENSIONS

API

1logger = getLogger(...)
1MAX_CONFIG_URL_SIZE_BYTES
1VALID_DATASET_FILE_EXTENSIONS
1VALID_CONFIG_FILE_EXTENSIONS
1data_designer.config.utils.io_helpers.ensure_config_dir_exists(config_dir: pathlib.Path) -> None

Create configuration directory if it doesn’t exist.

Parameters:

config_dir
pathlib.Path

Directory path to create

1data_designer.config.utils.io_helpers.load_config_file(file_path: pathlib.Path) -> dict

Load a YAML configuration file.

Parameters:

file_path
pathlib.Path

Path to the YAML file

Returns:

dict

Parsed YAML content as dictionary

Raises:

InvalidFilePathError

If file doesn’t exist

InvalidFileFormatError

If YAML is malformed

InvalidConfigError

If file is empty

1data_designer.config.utils.io_helpers.save_config_file(
2 file_path: pathlib.Path,
3 config: dict
4) -> None

Save configuration to a YAML file.

Parameters:

file_path
pathlib.Path

Path where to save the file

config
dict

Configuration dictionary to save

Raises:

IOError

If file cannot be written

1data_designer.config.utils.io_helpers.list_processor_names(processors_outputs_path: pathlib.Path) -> list[str]

Discover processor names from directories and parquet files under the given path.

1data_designer.config.utils.io_helpers.load_processor_dataset(
2 processors_outputs_path: pathlib.Path,
3 processor_name: str
4) -> pandas.DataFrame

Load a processor’s output dataset, checking for a directory first then a single parquet file.

1data_designer.config.utils.io_helpers.read_parquet_dataset(path: pathlib.Path) -> pandas.DataFrame

Read a parquet dataset from a path.

Parameters:

path
pathlib.Path

The path to the parquet dataset, can be either a file or a directory.

Returns:

pandas.DataFrame

The parquet dataset as a pandas DataFrame.

1data_designer.config.utils.io_helpers.validate_dataset_file_path(
2 file_path: str | pathlib.Path,
3 should_exist: bool = True
4) -> pathlib.Path

Validate that a dataset file path has a valid extension and optionally exists.

Parameters:

file_path
str | pathlib.Path

The path to validate, either as a string or Path object.

should_exist
boolDefaults to True

If True, verify that the file exists. Defaults to True.

Returns:

pathlib.Path

The validated path as a Path object.

Raises:

InvalidFilePathError

If the path is not a file.

InvalidFileFormatError

If the path does not have a valid extension.

1data_designer.config.utils.io_helpers.validate_path_contains_files_of_type(
2 path: str | pathlib.Path,
3 file_extension: str
4) -> None

Validate that a path contains files of a specific type.

Parameters:

path
str | pathlib.Path

The path to validate. Can contain wildcards like *.parquet.

file_extension
str

The extension of the files to validate (without the dot, e.g., “parquet”).

Returns:

None

None if the path contains files of the specified type, raises an error otherwise.

Raises:

InvalidFilePathError

If the path does not contain files of the specified type.

1data_designer.config.utils.io_helpers.smart_load_dataframe(dataframe: str | pathlib.Path | pandas.DataFrame) -> pandas.DataFrame

Load a dataframe from file if a path is given, otherwise return the dataframe.

Parameters:

dataframe
str | pathlib.Path | pandas.DataFrame

A path to a file or a pandas DataFrame object.

Returns:

pandas.DataFrame

A pandas DataFrame object.

1data_designer.config.utils.io_helpers.smart_load_yaml(yaml_in: str | pathlib.Path | dict) -> dict

Return the yaml config as a dict given flexible input types.

Parameters:

config

The config as a dict, yaml string, or yaml file path.

Returns:

dict

The config as a dict.

1data_designer.config.utils.io_helpers._smart_load_yaml_internal(
2 yaml_in: str | pathlib.Path | dict,
3 *,
4 from_url: bool
5) -> dict

Internal YAML loader with context to prevent URL recursion on fetched payloads.

1data_designer.config.utils.io_helpers.is_http_url(value: str) -> bool

Check whether a string is an HTTP or HTTPS URL.

1data_designer.config.utils.io_helpers._maybe_rewrite_url(url: str) -> str

Rewrite known hosting-provider file-view URLs to raw-content URLs.

1data_designer.config.utils.io_helpers._safe_url_for_log(url: str) -> str

Return URL without query/fragment for safe logging.

1data_designer.config.utils.io_helpers._maybe_rewrite_github_url(url: str) -> str

Rewrite GitHub blob URLs to raw.githubusercontent.com equivalents.

GitHub blob URLs (e.g. https://github.com/org/repo/blob/main/config.yaml) serve HTML pages, not raw file content. This rewrites them so that downstream fetchers get the actual file.

1data_designer.config.utils.io_helpers._maybe_rewrite_huggingface_hub_url(url: str) -> str

Rewrite Hugging Face Hub blob URLs to raw URL equivalents.

1data_designer.config.utils.io_helpers._raise_for_failed_http_status(
2 url: str,
3 response: requests.Response
4) -> None

Raise a ValueError with actionable details for failing HTTP status codes.

1data_designer.config.utils.io_helpers._load_config_from_url(url: str) -> dict

Fetch a remote YAML/JSON config URL and return the parsed dict.

Parameters:

url
str

HTTP(S) URL pointing to a YAML or JSON configuration file.

Returns:

dict

The parsed configuration as a dictionary.

Raises:

ValueError

If the URL extension is unsupported, the fetch fails, the response exceeds the size limit, or parsing produces a non-dict result.

1data_designer.config.utils.io_helpers.serialize_data(
2 data: dict | list | str | numbers.Number,
3 **kwargs
4) -> str
1data_designer.config.utils.io_helpers._convert_to_serializable(obj: typing.Any) -> typing.Any

Convert non-JSON-serializable objects to JSON-serializable Python-native types.

Raises:

TypeError

If the object type is not supported for serialization.