nat.atof.scripts.atof_to_atif_converter#

ATOF-to-ATIF converter.

Converts a list of ATOF events (JSON-Lines wire format from agent runtime subscriber callbacks) into an ATIF Trajectory using NAT’s native models.

Event model: 2 event kinds (ScopeEvent / MarkEvent) per ATOF spec v0.1. Dispatch keys on (kind, scope_category, category). Category-specific typed fields live inside the category_profile sub-object (spec §4.4) — model_name for llm, tool_call_id for tool.

Output conforms to ATIF v1.7. See the conversion rules in atif-alignment/docs/atof-to-atif-mapping.md; rule identifiers (R1-R12) referenced inline map to that document.

Producer-specific payload parsing is delegated to pluggable extractors (nat.atof.extractors) keyed on the event’s declared data_schema. Events without a matching registered extractor fall back to built-in OpenAI-chat-completions / generic extractors. Two fail-fast guardrails catch producers that would otherwise silently lose content:

  • DataSchemaViolationError — when the producer declares a data_schema registered in nat.atof.schemas and event.data fails JSON-Schema validation against it. Fires in the pre-pass.

  • ShapeMismatchError — when event.data is non-empty but the resolved extractor yields nothing usable (payload would drop).

Attributes#

Exceptions#

ShapeMismatchError

Raised when an event's non-empty data produced empty extraction.

DataSchemaViolationError

Raised when an event declares a registered data_schema but its

Functions#

_validate_event_data_schema(→ None)

Validate event.data against its declared, registered data_schema.

_build_ancestry(→ dict)

Build a v1.7 ancestry dict for embedding in Step.extra["ancestry"]

_build_invocation_info(→ dict)

Build producer-scoped invocation info for step.extra (not part of ATIF v1.7 core).

_serialize_root_data(→ str | None)

Tier-1 boundary-step message serializer.

_is_scope_start(→ bool)

_is_scope_end(→ bool)

_build_category_map(→ dict[str, str])

UUID → category lookup from scope-start events.

_build_parent_map(→ dict[str, str | None])

UUID → parent_uuid for all unique UUIDs in the stream.

_find_subagent_roots(→ list[nat.atof.events.ScopeEvent])

Find agent scope-starts whose parent is a dispatcher scope (R7).

_collect_descendants(→ list[nat.atof.events.Event])

Events whose ancestry chain reaches root_uuid (inclusive of events with uuid == root_uuid).

_events_to_step_dicts(→ list[dict])

Convert typed ATOF events to ATIF v1.7 step dicts.

_materialize_steps(→ list[nat.atif.step.Step])

Build validated Step instances from raw step dicts.

convert(→ nat.atif.trajectory.Trajectory)

Convert a list of ATOF events to an ATIF v1.7 Trajectory.

_convert_impl(→ nat.atif.trajectory.Trajectory)

Internal converter supporting recursion on subagent sub-streams.

convert_file(→ nat.atif.trajectory.Trajectory)

Read an ATOF JSON-Lines file and convert to an ATIF Trajectory.

_ensure_subagent_trajectory_path_explicit(→ None)

Walk a dumped ATIF trajectory dict and ensure every

Module Contents#

logger#
exception ShapeMismatchError(
*,
kind: str,
uuid: str,
data_schema: dict[str, Any] | None,
data_keys: list[str],
)#

Bases: ValueError

Raised when an event’s non-empty data produced empty extraction.

The resolved LlmPayloadExtractor for an event’s data_schema could not pull any usable content out of a non-empty payload. The would-be-emitted content is silently dropped — this exception surfaces that case as a hard failure so callers can either (a) fix the producer to emit the expected shape, (b) declare a matching data_schema and register a profile-specific extractor via register_llm_extractor(), or (c) wrap the call and handle the drop explicitly.

Attributes:

kind: "llm_input" or "llm_output" — which extraction missed. uuid: UUID of the offending event. data_schema: The producer-declared data_schema, if any. data_keys: Sorted top-level keys observed in data.

Initialize self. See help(type(self)) for accurate signature.

kind#
uuid#
data_schema#
data_keys#
exception DataSchemaViolationError(
*,
uuid: str,
data_schema: dict[str, Any],
path: list[Any],
message: str,
)#

Bases: ValueError

Raised when an event declares a registered data_schema but its data fails JSON-Schema validation against it.

Producers declaring a schema enter a contract: their payload MUST conform. A violation here either reveals a producer bug or signals that the declared schema is wrong. Either way, downstream extraction would likely drop content, so the converter fails fast with actionable context — the offending event UUID, the declared schema identifier, the JSON-pointer path to the validation failure, and the underlying validator message.

Events whose data_schema is NOT in the registry skip validation entirely (a WARNING is logged instead).

Attributes:

uuid: UUID of the offending event. data_schema: The producer-declared {name, version} identifier. path: JSON-pointer segments to the offending value. message: The underlying jsonschema validator message.

Initialize self. See help(type(self)) for accurate signature.

uuid#
data_schema#
path#
message#
_validate_event_data_schema(event: nat.atof.events.Event) None#

Validate event.data against its declared, registered data_schema.

_build_ancestry(
uuid: str,
name: str,
parent_uuid: str | None,
name_map: dict[str, str],
) dict#

Build a v1.7 ancestry dict for embedding in Step.extra["ancestry"] or ToolCall.extra["ancestry"]. Matches nat.atif.atif_step_extra.AtifAncestry shape: parent_id / parent_name are null at the root.

_build_invocation_info(
start_micros: int | None,
end_micros: int | None,
invocation_id: str,
) dict#

Build producer-scoped invocation info for step.extra (not part of ATIF v1.7 core).

_serialize_root_data(data: Any) str | None#

Tier-1 boundary-step message serializer.

Used to lift an opaque root scope’s data payload into the ATIF user/agent boundary steps emitted by Branch A (root scope-start → user step) and Branch B (root scope-end → agent step) of the main converter loop.

Rules (locked-in by 260501-1ko quick plan brief):

  • str → return as-is.

  • dict with exactly one entry whose value is a str → return that string (single-key-dict lift heuristic — covers the common {"query": "..."} / {"result": "..."} shapes).

  • dict with anything else (multi-key, or single-key whose value is non-str and non-empty) → json.dumps(data, separators=(",", ":")) (compact JSON).

  • None or empty dict → None (caller skips emission entirely; no boundary step is produced).

  • Any other type → fall through to compact JSON for safety so we never silently drop content.

_is_scope_start(event: nat.atof.events.Event) bool#
_is_scope_end(event: nat.atof.events.Event) bool#
_build_category_map(
events: list[nat.atof.events.Event],
) dict[str, str]#

UUID → category lookup from scope-start events.

_build_parent_map(
events: list[nat.atof.events.Event],
) dict[str, str | None]#

UUID → parent_uuid for all unique UUIDs in the stream.

_find_subagent_roots(
events: list[nat.atof.events.Event],
category_map: dict[str, str],
) list[nat.atof.events.ScopeEvent]#

Find agent scope-starts whose parent is a dispatcher scope (R7).

A dispatcher scope is a tool scope (regular delegation) or a context scope (R10 context-management subagent, e.g. a compaction subagent that summarizes prior turns).

_collect_descendants(
root_uuid: str,
events: list[nat.atof.events.Event],
parent_map: dict[str, str | None],
) list[nat.atof.events.Event]#

Events whose ancestry chain reaches root_uuid (inclusive of events with uuid == root_uuid).

events preserves the caller’s order; the returned list preserves it too.

_events_to_step_dicts(
events: list[nat.atof.events.Event],
subagent_ref_by_tc_id: dict[str, dict] | None = None,
subagent_ref_by_context_uuid: dict[str, dict] | None = None,
) list[dict]#

Convert typed ATOF events to ATIF v1.7 step dicts.

subagent_ref_by_tc_id maps a tool_call_id to a SubagentTrajectoryRef-shaped dict (R7 tool-wraps-agent).

subagent_ref_by_context_uuid maps a context-scope UUID to a SubagentTrajectoryRef-shaped dict (R10 context-wrapped subagent, e.g. a compaction subagent). Either map MAY be empty.

Raises:
DataSchemaViolationError: if an event declares a registered

data_schema and its data fails validation.

ShapeMismatchError: if an llm scope event’s non-empty data

yields no extractable content (would drop payload silently).

_materialize_steps(step_dicts: list[dict]) list[nat.atif.step.Step]#

Build validated Step instances from raw step dicts.

ATIF v1.7: ancestry is no longer a typed top-level field — it’s embedded in Step.extra["ancestry"] and ToolCall.extra["ancestry"] as plain dicts (AtifAncestry shape, see nat.atif.atif_step_extra). No model conversion is needed here; the dicts pass through to the extra field unchanged.

convert(
events: list[nat.atof.events.Event],
) nat.atif.trajectory.Trajectory#

Convert a list of ATOF events to an ATIF v1.7 Trajectory.

Raises:
DataSchemaViolationError: if an event declares a registered

data_schema (see nat.atof.schemas) and its data fails JSON-Schema validation.

ShapeMismatchError: if an llm scope event carries non-empty

data that the reference extractors cannot parse. Silently dropping such a payload would lose producer content, so the converter fails fast instead.

_convert_impl(
events: list[nat.atof.events.Event],
explicit_root_uuid: str | None,
) nat.atif.trajectory.Trajectory#

Internal converter supporting recursion on subagent sub-streams.

When explicit_root_uuid is provided (recursive call), the root agent metadata is taken from the event with uuid == explicit_root_uuid rather than by searching for parent_uuid is None.

convert_file(
input_path: str | pathlib.Path,
output_path: str | pathlib.Path | None = None,
) nat.atif.trajectory.Trajectory#

Read an ATOF JSON-Lines file and convert to an ATIF Trajectory.

Raises:

ShapeMismatchError: see convert().

_ensure_subagent_trajectory_path_explicit(obj: Any) None#

Walk a dumped ATIF trajectory dict and ensure every subagent_trajectory_ref[i] entry has trajectory_path explicitly present (null for embedded refs).

model_dump(exclude_none=True) strips optional None-valued fields, which produces valid ATIF v1.7 but loses back-compat visual alignment with ATIF v1.6 consumers that expect the key. Keeping the field explicit as null is spec-allowed (the field is optional, and null is a valid value) and aids consumer-side inspection.