nat.plugins.profiler.runtime_evaluator.atif_evaluate#

ATIF-native runtime evaluators for the profiler package.

Classes#

AverageLLMLatencyAtifEvaluator

ATIF-native mean latency between LLM start and end for agent steps with metrics.

AverageWorkflowRuntimeAtifEvaluator

ATIF-native workflow runtime per item: max(step.timestamp) - min(step.timestamp) across all steps.

AverageNumberOfLLMCallsAtifEvaluator

ATIF-native count of LLM calls per item: agent steps with metrics.

AverageTokensPerLLMEndAtifEvaluator

ATIF-native average total tokens per LLM call: (prompt_tokens + completion_tokens) from step.metrics.

Functions#

_iso_to_epoch(→ float | None)

Convert ISO 8601 timestamp to epoch seconds, or None if invalid.

Module Contents#

_iso_to_epoch(ts: str | None) float | None#

Convert ISO 8601 timestamp to epoch seconds, or None if invalid.

class AverageLLMLatencyAtifEvaluator(max_concurrency: int = 8)#

Bases: nat.plugins.eval.evaluator.atif_base_evaluator.AtifBaseEvaluator

ATIF-native mean latency between LLM start and end for agent steps with metrics.

Uses step.timestamp as end time and step.extra.get(“span_event_timestamp”) as start time. Steps without span_event_timestamp are skipped (see NEP-008 for ATIF profiling metadata).

async evaluate_atif_item(
sample: nat.plugins.eval.evaluator.atif_evaluator.AtifEvalSample,
) nat.plugins.eval.data_models.evaluator_io.EvalOutputItem#

Evaluate one ATIF sample and return a single output item.

class AverageWorkflowRuntimeAtifEvaluator(max_concurrency: int = 8)#

Bases: nat.plugins.eval.evaluator.atif_base_evaluator.AtifBaseEvaluator

ATIF-native workflow runtime per item: max(step.timestamp) - min(step.timestamp) across all steps.

async evaluate_atif_item(
sample: nat.plugins.eval.evaluator.atif_evaluator.AtifEvalSample,
) nat.plugins.eval.data_models.evaluator_io.EvalOutputItem#

Evaluate one ATIF sample and return a single output item.

class AverageNumberOfLLMCallsAtifEvaluator(max_concurrency: int = 8)#

Bases: nat.plugins.eval.evaluator.atif_base_evaluator.AtifBaseEvaluator

ATIF-native count of LLM calls per item: agent steps with metrics.

async evaluate_atif_item(
sample: nat.plugins.eval.evaluator.atif_evaluator.AtifEvalSample,
) nat.plugins.eval.data_models.evaluator_io.EvalOutputItem#

Evaluate one ATIF sample and return a single output item.

class AverageTokensPerLLMEndAtifEvaluator(max_concurrency: int = 8)#

Bases: nat.plugins.eval.evaluator.atif_base_evaluator.AtifBaseEvaluator

ATIF-native average total tokens per LLM call: (prompt_tokens + completion_tokens) from step.metrics.

async evaluate_atif_item(
sample: nat.plugins.eval.evaluator.atif_evaluator.AtifEvalSample,
) nat.plugins.eval.data_models.evaluator_io.EvalOutputItem#

Evaluate one ATIF sample and return a single output item.