nat.atof.extractors#
Pluggable payload extractors for the ATOF→ATIF converter.
The ATOF wire envelope is producer-agnostic, but the contents of
event.data are producer-defined. The converter must translate those
contents into ATIF step fields (messages, tool calls, tool results,
mark-lifted sources). This module defines three Protocol interfaces and
three registries that let producers plug in their own extractors,
keyed on the producer-declared data_schema = {name, version}:
LlmPayloadExtractor— forcategory == "llm"scope events: parses input messages, output text, and assistant tool_calls.ToolPayloadExtractor— forcategory == "tool"scope-end events: serializes the tool result to a string.MarkPayloadExtractor— for mark events whose payload carries arolehint that should lift to an ATIF step source.
LLM extractors are produced by combining a declarative SchemaMap
with the generic SchemaMapLlmExtractor engine. A SchemaMap
captures the per-provider field paths (where input messages live, where
output text lives, where tool calls live) plus three optional hooks for
the irreducible per-provider transforms: polymorphic content unpacking,
output-message decomposition, and tool-call shape adaptation. Most
providers are expressible as pure paths; richer providers (Anthropic
content blocks, Gemini parts) use the hooks.
Ships one built-in extractor per protocol:
OpenAiChatCompletionsLlmExtractor— aSchemaMapLlmExtractorconfigured byOPENAI_CHAT_COMPLETIONS_V1_MAP. Registered foropenai/chat-completions@1and used as the fallback for LLM events without adata_schema.GenericToolResultExtractor— unwraps single-key{result}or{output}wrappers, otherwise serializes the payload as JSON. Used when no tool extractor is registered for an event’s schema.NatRoleMarkExtractor— lifts marks whosedata.roleis one of"user","system","agent". Used when no mark extractor is registered.
Register new extractors before calling the converter. For an OpenAI-shaped provider, define a SchemaMap and register it:
from nat.atof.extractors import (
SchemaMap, SchemaMapLlmExtractor, register_llm_extractor,
)
MYCO_MAP = SchemaMap(
name="myco/chat", version="1",
input_messages_paths=("messages",),
output_text_paths=("response",),
output_tool_calls_paths=("tool_calls",),
)
register_llm_extractor("myco/chat", "1", SchemaMapLlmExtractor(MYCO_MAP))
For richer shapes (Anthropic content blocks, Gemini parts), use the hook fields on SchemaMap to handle the irreducible transforms.
Attributes#
Classes#
Extracts ATIF-relevant fields from an |
|
Extracts a serialized result string from a |
|
Classifies a mark event payload as either a role-lifted step |
|
Declarative description of where ATIF-relevant fields live within a |
|
Generic LLM payload extractor driven by a |
|
Reference LLM extractor accepting both direct and nested OpenAI shapes. |
|
Unwraps |
|
Lifts a mark event to a sourced ATIF step when its payload carries |
Functions#
|
Install the Anthropic Messages JSON Schema and LLM extractor. |
Install the Gemini generateContent JSON Schema and LLM extractor. |
|
|
Register an LLM payload extractor for |
|
Register a tool payload extractor for |
|
Register a mark payload extractor for |
|
Return the LLM extractor registered for |
|
Return the tool extractor registered for |
|
Return the mark extractor registered for |
Module Contents#
- class LlmPayloadExtractor#
Bases:
ProtocolExtracts ATIF-relevant fields from an
llmscope event’sdata.Implementations MUST be pure functions over
data— no side effects, no network, no filesystem access. Return empty collections or strings when a field is not present; the converter distinguishes “legitimately empty” from “shape mismatch” at the dispatch layer.- extract_input_messages(data: Any) list[dict[str, Any]]#
Return the chat history messages from an LLM scope-start payload.
Each message SHOULD carry
roleandcontentkeys;contentMAY be a string or a multimodal part list (ATIF v1.6+).
- class ToolPayloadExtractor#
Bases:
ProtocolExtracts a serialized result string from a
toolscope-end payload.
- class MarkPayloadExtractor#
Bases:
ProtocolClassifies a mark event payload as either a role-lifted step (user/system/agent) or an opaque system step.
- extract_role_and_content(data: Any) tuple[str, Any] | None#
If the mark should lift to an ATIF step with a specific
source, return(source, content). Otherwise returnNoneto fall through to the opaque-system-step path.sourceMUST be one of"user","system","agent".contentis passed through as-is (string or part list).
- class SchemaMap#
Declarative description of where ATIF-relevant fields live within a provider’s LLM payload, plus optional hooks for irreducible transforms.
A
SchemaMapcaptures three things:Field paths — dotted paths (with numeric list indices) telling the engine where to find input messages, output text, and output tool calls. Each field accepts a tuple of candidate paths; the engine tries them in order and uses the first hit.
Per-tool-call sub-paths — for providers whose tool-call shape fits the OpenAI flat-or-nested convention. Each tool call is a dict; these paths name where ID/name/arguments live within that dict.
Optional hooks — escape hatches for the three transforms that can’t be expressed declaratively:
normalize_input_messages: inputdata→ ATIF-shaped message list. Use when content is polymorphic (Anthropic string-or-blocks, Gemini parts) and a single field-path can’t flatten it.normalize_output_message: outputdata→(text, tool_calls)pair. Use when output text and tool calls coexist in the same polymorphic structure (Anthropiccontentblocks).transform_tool_call: per-call dict adapter. Use when tool calls don’t carry an ID (Gemini synthesizes from name+index) or use non-OpenAI nesting.
Hooks always win over paths. If
normalize_output_messageis set, the engine ignoresoutput_text_pathsandoutput_tool_calls_paths.Pure-paths providers (OpenAI) leave the hooks at
None. Mixed providers (Anthropic, Gemini) use one or two hooks.- Parameters:
name – Schema name (e.g.
"openai/chat-completions").version – Schema version string.
input_messages_paths – Candidate paths to the input messages array.
output_text_paths – Candidate paths to the output assistant text.
output_tool_calls_paths – Candidate paths to the output tool-calls array.
tool_call_id_paths – Candidate sub-paths for tool-call ID.
tool_call_name_paths – Candidate sub-paths for tool-call function name.
tool_call_args_paths – Candidate sub-paths for tool-call arguments.
tool_call_args_parse_json – When True, parse string arguments as JSON.
role_aliases – Map of provider role values to canonical role values (e.g.,
{"model": "assistant"}for Gemini). Applied to messages extracted via field paths; hooks bypass this.normalize_input_messages – Optional hook overriding path-based input extraction. Signature:
(data) -> list[{"role", "content", ...}].normalize_output_message – Optional hook overriding path-based output extraction. Signature:
(data) -> (text, tool_calls).transform_tool_call – Optional per-call adapter. Signature:
(raw_call_dict, index) -> ATIF-shaped {"tool_call_id", "function_name", "arguments"}. When set, replaces the per-tool-call path resolution entirely.
- role_aliases: collections.abc.Mapping[str, str]#
- class SchemaMapLlmExtractor(schema_map: SchemaMap)#
Generic LLM payload extractor driven by a
SchemaMap.Implements
LlmPayloadExtractorby routing extraction through the map’s hooks (when set) or its declarative field paths (otherwise). A single instance per(name, version)is the intended pattern; register it withregister_llm_extractor().- schema_map#
- OPENAI_CHAT_COMPLETIONS_V1_MAP#
- class OpenAiChatCompletionsLlmExtractor#
Bases:
SchemaMapLlmExtractorReference LLM extractor accepting both direct and nested OpenAI shapes.
Thin convenience wrapper around
OPENAI_CHAT_COMPLETIONS_V1_MAP. Behavior is identical to instantiatingSchemaMapLlmExtractor(OPENAI_CHAT_COMPLETIONS_V1_MAP).Input shapes (extract_input_messages):
{"messages": [...]}{"content": {"messages": [...]}}
Output shapes (extract_output_text):
{"content": "..."}{"choices": [{"message": {"content": "..."}}]}
Tool-call shapes (extract_tool_calls):
Flat:
{"tool_calls": [{"id", "name", "arguments"}]}Nested:
{"choices": [{"message": {"tool_calls": [...]}}]}Per-call: either flat
{id, name, arguments}or the OpenAI{id, function: {name, arguments}}form.
- ANTHROPIC_MESSAGES_V1_MAP#
- register_anthropic_messages_v1() None#
Install the Anthropic Messages JSON Schema and LLM extractor.
Idempotent — safe to call multiple times. Registers
anthropic/messages@1in bothSCHEMA_REGISTRY(validation) andLLM_EXTRACTOR_REGISTRY(extraction). Call this once at process startup before invoking the converter on Anthropic-shaped payloads.
- GEMINI_GENERATE_CONTENT_V1_MAP#
- register_gemini_generate_content_v1() None#
Install the Gemini generateContent JSON Schema and LLM extractor.
Idempotent — safe to call multiple times. Registers
gemini/generate-content@1in bothSCHEMA_REGISTRYandLLM_EXTRACTOR_REGISTRY. Call this once at process startup before invoking the converter on Gemini-shaped payloads.
- class GenericToolResultExtractor#
Unwraps
{result: X}or{output: X}single-key wrappers into a primitive or JSON-serialized string; otherwise serializes the whole payload as compact JSON.
- class NatRoleMarkExtractor#
Lifts a mark event to a sourced ATIF step when its payload carries
data.role ∈ {"user", "system", "agent"}. Content is taken fromdata.contentthendata.message(string fallback"").- _VALID_ROLES#
- DEFAULT_LLM_EXTRACTOR: LlmPayloadExtractor#
- DEFAULT_TOOL_EXTRACTOR: ToolPayloadExtractor#
- DEFAULT_MARK_EXTRACTOR: MarkPayloadExtractor#
- register_llm_extractor(
- name: str,
- version: str,
- extractor: LlmPayloadExtractor,
Register an LLM payload extractor for
(name, version).
- register_tool_extractor(
- name: str,
- version: str,
- extractor: ToolPayloadExtractor,
Register a tool payload extractor for
(name, version).
- register_mark_extractor(
- name: str,
- version: str,
- extractor: MarkPayloadExtractor,
Register a mark payload extractor for
(name, version).
- resolve_llm_extractor( ) LlmPayloadExtractor#
Return the LLM extractor registered for
data_schema, or the built-in OpenAI chat-completions extractor if unregistered/absent.
- resolve_tool_extractor( ) ToolPayloadExtractor#
Return the tool extractor registered for
data_schema, or the generic result-unwrap extractor if unregistered/absent.
- resolve_mark_extractor( ) MarkPayloadExtractor#
Return the mark extractor registered for
data_schema, or the built-in role-lifting extractor if unregistered/absent.