> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/datadesigner/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/datadesigner/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/datadesigner/_mcp/server.

# data\_designer.config.fingerprint

Deterministic content-addressable fingerprint for a workflow config.

The fingerprint identifies the *data-relevant* portion of a `DataDesignerConfig`
so that two configs producing the same dataset hash to the same value, while
configs differing only in environment, runtime, or post-generation analysis
hash to different values when they should and to the same value when they
shouldn't.

The hash is computed over a canonical JSON dump of the config (Pydantic
`model_dump(mode="json")`) with non-identity fields removed. Column order is
part of identity (DAG ordering); alias-keyed lookup tables (`model_configs`,
`tool_configs`) are sorted by alias so their internal order is irrelevant.
Empty/`None` optional collections are canonicalized to a single representation
so that builder-API and YAML-loaded configs producing identical datasets
fingerprint identically.

The normalization scheme is versioned via `CONFIG_HASH_VERSION`. Persist the
version alongside the hash so future scheme changes can be detected as
"unknown identity" rather than "definite mismatch".

## Module Contents

### Functions

| Name                                                                                | Description                                                                                                                                                                                                              |
| ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| [`fingerprint_config`](#data_designerconfigfingerprintfingerprint_config)           | Compute a deterministic fingerprint of a workflow config.                                                                                                                                                                |
| [`_drop_keys`](#data_designerconfigfingerprint_drop_keys)                           | None                                                                                                                                                                                                                     |
| [`_drop_empty_optional`](#data_designerconfigfingerprint_drop_empty_optional)       | Drop keys whose value is `None` or an empty list.                                                                                                                                                                        |
| [`_normalize_model_config`](#data_designerconfigfingerprint_normalize_model_config) | None                                                                                                                                                                                                                     |
| [`_normalize_tool_config`](#data_designerconfigfingerprint_normalize_tool_config)   | None                                                                                                                                                                                                                     |
| [`_normalize_seed_config`](#data_designerconfigfingerprint_normalize_seed_config)   | None                                                                                                                                                                                                                     |
| [`_enrich_custom_columns`](#data_designerconfigfingerprint_enrich_custom_columns)   | Replace each custom column's serialized `generator_function` (just the bare `__name__`) with a richer identity dict that includes `__qualname__`, `__module__`, and the `@custom_column_generator()` decorator metadata. |
| [`_normalize_config_dict`](#data_designerconfigfingerprint_normalize_config_dict)   | None                                                                                                                                                                                                                     |

### Data

[`CONFIG_HASH_VERSION`](#data_designerconfigfingerprintconfig_hash_version)
[`CONFIG_HASH_ALGO`](#data_designerconfigfingerprintconfig_hash_algo)
[`_EXCLUDED_TOP_LEVEL_KEYS`](#data_designerconfigfingerprint_excluded_top_level_keys)
[`_EXCLUDED_MODEL_KEYS`](#data_designerconfigfingerprint_excluded_model_keys)
[`_EXCLUDED_INFERENCE_KEYS`](#data_designerconfigfingerprint_excluded_inference_keys)
[`_EXCLUDED_TOOL_CONFIG_KEYS`](#data_designerconfigfingerprint_excluded_tool_config_keys)
[`_EXCLUDED_HF_SEED_KEYS`](#data_designerconfigfingerprint_excluded_hf_seed_keys)
[`_TOP_LEVEL_OPTIONAL_COLLECTIONS`](#data_designerconfigfingerprint_top_level_optional_collections)
[`_TOOL_CONFIG_OPTIONAL_COLLECTIONS`](#data_designerconfigfingerprint_tool_config_optional_collections)

### API

```python
CONFIG_HASH_VERSION = 1
```

```python
CONFIG_HASH_ALGO = sha256
```

```python
data_designer.config.fingerprint.fingerprint_config(config: data_designer.config.data_designer_config.DataDesignerConfig) -> dict[str, str | int]
```

Compute a deterministic fingerprint of a workflow config.

The fingerprint is content-addressable: identical configs (modulo excluded
fields) produce identical hashes across processes, Python versions, and
module load orders. Changing any identity-relevant field changes the hash;
changing an excluded field does not.

Identity-relevant fields:

* `columns` - names, types, generator params, processors, validators,
  skip/drop flags. Column order is part of identity (DAG ordering).
* `model_configs` - alias, model, provider, sampling-relevant inference
  params (temperature, top\_p, max\_tokens, extra\_body). Sorted by alias.
* `tool_configs` - alias, providers, allow\_tools, max\_tool\_call\_turns
  (the set of MCP tools shapes generation). Sorted by tool\_alias.
* `seed_config` - source path, sampling strategy, selection strategy.
* `constraints`, top-level `processors`.

See module-level constants for the canonical excluded-fields table.

Custom column generators contribute their function's `__name__`,
`__qualname__`, `__module__`, `generator_params`, and the decorator
metadata set by `@custom_column_generator()` (`required_columns`,
`side_effect_columns`, `model_aliases`).

Limitation: closures captured via factory functions (e.g. `make_gen(factor)`
returning a `gen` whose body references `factor`) share `__name__`,
`__qualname__`, `__module__`, and source text, so two closures with
different captured state will fingerprint identically. The fingerprint
cannot see closure cell values.

**Parameters:**

The workflow config to fingerprint.

**Returns:**

`dict[str, str | int]`

A dict with `config_hash` (`"sha256:..."`), `config_hash_algo`, and
`config_hash_version` suitable for embedding in dataset metadata.

```python
data_designer.config.fingerprint._drop_keys(
    source: dict[str, typing.Any],
    keys: collections.abc.Iterable[str]
) -> dict[str, typing.Any]
```

```python
data_designer.config.fingerprint._drop_empty_optional(
    source: dict[str, typing.Any],
    keys: collections.abc.Iterable[str]
) -> dict[str, typing.Any]
```

Drop keys whose value is `None` or an empty list.

`None` and `[]` are user-equivalent for optional collection fields; this
collapses both to "absent" before hashing.

```python
data_designer.config.fingerprint._normalize_model_config(model_config: dict[str, typing.Any]) -> dict[str, typing.Any]
```

```python
data_designer.config.fingerprint._normalize_tool_config(tool_config: dict[str, typing.Any]) -> dict[str, typing.Any]
```

```python
data_designer.config.fingerprint._normalize_seed_config(seed_config: dict[str, typing.Any]) -> dict[str, typing.Any]
```

```python
data_designer.config.fingerprint._enrich_custom_columns(
    config: data_designer.config.data_designer_config.DataDesignerConfig,
    columns_dump: list[dict[str, typing.Any]]
) -> list[dict[str, typing.Any]]
```

Replace each custom column's serialized `generator_function` (just the
bare `__name__`) with a richer identity dict that includes `__qualname__`,
`__module__`, and the `@custom_column_generator()` decorator metadata.

Walks `config.columns` and `columns_dump` in lockstep so positional
correspondence is reliable.

```python
data_designer.config.fingerprint._normalize_config_dict(
    config_dict: dict[str, typing.Any],
    config: data_designer.config.data_designer_config.DataDesignerConfig
) -> dict[str, typing.Any]
```