For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • models
      • mcp
      • column_configs
      • config_builder
      • data_designer_config
      • run_config
      • sampler_params
      • validator_params
      • seeds
      • processors
      • analysis
      • Config API
        • Analysis
        • Base
        • Column Configs
        • Column Types
        • Config Builder
        • Custom Column
        • Data Designer Config
        • Dataset Metadata
        • Default Model Settings
        • Errors
        • Exportable Config
        • Fingerprint
        • Interface
        • Mcp
        • Models
        • Preview Results
        • Processor Types
        • Processors
        • Run Config
        • Sampler Constraints
        • Sampler Params
        • Seed
        • Seed Source
        • Seed Source Dataframe
        • Seed Source Types
        • Testing
        • Utils
        • Validator Params
        • Version
  • Dev Notes
    • Overview
    • Prompt Sensitivity
    • Retriever SDG Toolkit
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Functions
  • Data
  • API
Code ReferenceConfigConfig API

data_designer.config.fingerprint

||View as Markdown|
Previous

Exportable Config

Next

Interface

Deterministic content-addressable fingerprint for a workflow config.

The fingerprint identifies the data-relevant portion of a DataDesignerConfig so that two configs producing the same dataset hash to the same value, while configs differing only in environment, runtime, or post-generation analysis hash to different values when they should and to the same value when they shouldn’t.

The hash is computed over a canonical JSON dump of the config (Pydantic model_dump(mode="json")) with non-identity fields removed. Column order is part of identity (DAG ordering); alias-keyed lookup tables (model_configs, tool_configs) are sorted by alias so their internal order is irrelevant. Empty/None optional collections are canonicalized to a single representation so that builder-API and YAML-loaded configs producing identical datasets fingerprint identically.

The normalization scheme is versioned via CONFIG_HASH_VERSION. Persist the version alongside the hash so future scheme changes can be detected as “unknown identity” rather than “definite mismatch”.

Module Contents

Functions

NameDescription
fingerprint_configCompute a deterministic fingerprint of a workflow config.
_drop_keysNone
_drop_empty_optionalDrop keys whose value is None or an empty list.
_normalize_model_configNone
_normalize_tool_configNone
_normalize_seed_configNone
_enrich_custom_columnsReplace each custom column’s serialized generator_function (just the bare __name__) with a richer identity dict that includes __qualname__, __module__, and the @custom_column_generator() decorator metadata.
_normalize_config_dictNone

Data

CONFIG_HASH_VERSION CONFIG_HASH_ALGO _EXCLUDED_TOP_LEVEL_KEYS _EXCLUDED_MODEL_KEYS _EXCLUDED_INFERENCE_KEYS _EXCLUDED_TOOL_CONFIG_KEYS _EXCLUDED_HF_SEED_KEYS _TOP_LEVEL_OPTIONAL_COLLECTIONS _TOOL_CONFIG_OPTIONAL_COLLECTIONS

API

1CONFIG_HASH_VERSION = 1
1CONFIG_HASH_ALGO = sha256
_EXCLUDED_TOP_LEVEL_KEYS
frozenset[str]Defaults to frozenset(...)
_EXCLUDED_MODEL_KEYS
frozenset[str]Defaults to frozenset(...)
_EXCLUDED_INFERENCE_KEYS
frozenset[str]Defaults to frozenset(...)
_EXCLUDED_TOOL_CONFIG_KEYS
frozenset[str]Defaults to frozenset(...)
_EXCLUDED_HF_SEED_KEYS
frozenset[str]Defaults to frozenset(...)
_TOP_LEVEL_OPTIONAL_COLLECTIONS
frozenset[str]Defaults to frozenset(...)
_TOOL_CONFIG_OPTIONAL_COLLECTIONS
frozenset[str]Defaults to frozenset(...)
1data_designer.config.fingerprint.fingerprint_config(config: data_designer.config.data_designer_config.DataDesignerConfig) -> dict[str, str | int]

Compute a deterministic fingerprint of a workflow config.

The fingerprint is content-addressable: identical configs (modulo excluded fields) produce identical hashes across processes, Python versions, and module load orders. Changing any identity-relevant field changes the hash; changing an excluded field does not.

Identity-relevant fields:

  • columns - names, types, generator params, processors, validators, skip/drop flags. Column order is part of identity (DAG ordering).
  • model_configs - alias, model, provider, sampling-relevant inference params (temperature, top_p, max_tokens, extra_body). Sorted by alias.
  • tool_configs - alias, providers, allow_tools, max_tool_call_turns (the set of MCP tools shapes generation). Sorted by tool_alias.
  • seed_config - source path, sampling strategy, selection strategy.
  • constraints, top-level processors.

See module-level constants for the canonical excluded-fields table.

Custom column generators contribute their function’s __name__, __qualname__, __module__, generator_params, and the decorator metadata set by @custom_column_generator() (required_columns, side_effect_columns, model_aliases).

Limitation: closures captured via factory functions (e.g. make_gen(factor) returning a gen whose body references factor) share __name__, __qualname__, __module__, and source text, so two closures with different captured state will fingerprint identically. The fingerprint cannot see closure cell values.

Parameters:

config
data_designer.config.data_designer_config.DataDesignerConfig

The workflow config to fingerprint.

Returns:

dict[str, str | int]

A dict with config_hash ("sha256:..."), config_hash_algo, and config_hash_version suitable for embedding in dataset metadata.

1data_designer.config.fingerprint._drop_keys(
2 source: dict[str, typing.Any],
3 keys: collections.abc.Iterable[str]
4) -> dict[str, typing.Any]
1data_designer.config.fingerprint._drop_empty_optional(
2 source: dict[str, typing.Any],
3 keys: collections.abc.Iterable[str]
4) -> dict[str, typing.Any]

Drop keys whose value is None or an empty list.

None and [] are user-equivalent for optional collection fields; this collapses both to “absent” before hashing.

1data_designer.config.fingerprint._normalize_model_config(model_config: dict[str, typing.Any]) -> dict[str, typing.Any]
1data_designer.config.fingerprint._normalize_tool_config(tool_config: dict[str, typing.Any]) -> dict[str, typing.Any]
1data_designer.config.fingerprint._normalize_seed_config(seed_config: dict[str, typing.Any]) -> dict[str, typing.Any]
1data_designer.config.fingerprint._enrich_custom_columns(
2 config: data_designer.config.data_designer_config.DataDesignerConfig,
3 columns_dump: list[dict[str, typing.Any]]
4) -> list[dict[str, typing.Any]]

Replace each custom column’s serialized generator_function (just the bare __name__) with a richer identity dict that includes __qualname__, __module__, and the @custom_column_generator() decorator metadata.

Walks config.columns and columns_dump in lockstep so positional correspondence is reliable.

1data_designer.config.fingerprint._normalize_config_dict(
2 config_dict: dict[str, typing.Any],
3 config: data_designer.config.data_designer_config.DataDesignerConfig
4) -> dict[str, typing.Any]