nat.data_models.evaluate_runtime#

Runtime-only evaluation models used by nat eval programmatic execution.

Classes#

EndpointRetryConfig

Configuration for HTTP retry behavior on remote workflow endpoints.

EvaluationRunConfig

Parameters used for a single evaluation run. This is used by the nat eval command. It

UsageStatsLLM

Token usage counters aggregated for one LLM.

UsageStatsItem

Usage metrics for one evaluated input item.

UsageStats

Aggregated usage metrics across an evaluation run.

InferenceMetricsModel

Confidence intervals and percentiles for a sampled profiler metric.

WorkflowRuntimeMetrics

p90/p95/p99 workflow runtimes across evaluation examples.

ProfilerResults

High-level profiler output attached to an evaluation run.

EvaluationRunOutput

Output of a single evaluation run.

Module Contents#

class EndpointRetryConfig(/, **data: Any)#

Bases: pydantic.BaseModel

Configuration for HTTP retry behavior on remote workflow endpoints.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

do_auto_retry: bool = None#
max_retries: int = None#
retry_status_codes: list[int] = None#
class EvaluationRunConfig(/, **data: Any)#

Bases: pydantic.BaseModel

Parameters used for a single evaluation run. This is used by the nat eval command. It can also be used for programmatic evaluation.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

config_file: pathlib.Path | pydantic.BaseModel = None#
dataset: str | None = None#
result_json_path: str = None#
skip_workflow: bool = None#
skip_completed_entries: bool = None#
endpoint: str | None = None#
endpoint_timeout: int = None#
endpoint_retry: EndpointRetryConfig = None#
reps: int = None#
override: tuple[tuple[str, str], Ellipsis] = None#
write_output: bool = None#
adjust_dataset_size: bool = None#
num_passes: int = None#
export_timeout: float = None#
user_id: str = None#
class UsageStatsLLM(/, **data: Any)#

Bases: pydantic.BaseModel

Token usage counters aggregated for one LLM.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

prompt_tokens: int = 0#
completion_tokens: int = 0#
cached_tokens: int = 0#
reasoning_tokens: int = 0#
total_tokens: int = 0#
class UsageStatsItem(/, **data: Any)#

Bases: pydantic.BaseModel

Usage metrics for one evaluated input item.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

usage_stats_per_llm: dict[str, UsageStatsLLM]#
total_tokens: int | None = None#
runtime: float = 0.0#
min_timestamp: float = 0.0#
max_timestamp: float = 0.0#
llm_latency: float = 0.0#
class UsageStats(/, **data: Any)#

Bases: pydantic.BaseModel

Aggregated usage metrics across an evaluation run.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

min_timestamp: float = 0.0#
max_timestamp: float = 0.0#
total_runtime: float = 0.0#
usage_stats_items: dict[object, UsageStatsItem]#
class InferenceMetricsModel(/, **data: Any)#

Bases: pydantic.BaseModel

Confidence intervals and percentiles for a sampled profiler metric.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

n: int = None#
mean: float = None#
ninetieth_interval: tuple[float, float] = None#
ninety_fifth_interval: tuple[float, float] = None#
ninety_ninth_interval: tuple[float, float] = None#
p90: float = None#
p95: float = None#
p99: float = None#
class WorkflowRuntimeMetrics(/, **data: Any)#

Bases: pydantic.BaseModel

p90/p95/p99 workflow runtimes across evaluation examples.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

p90: float#
p95: float#
p99: float#
class ProfilerResults(/, **data: Any)#

Bases: pydantic.BaseModel

High-level profiler output attached to an evaluation run.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

workflow_runtime_metrics: WorkflowRuntimeMetrics | None = None#
llm_latency_ci: InferenceMetricsModel | None = None#
class EvaluationRunOutput(/, **data: Any)#

Bases: pydantic.BaseModel

Output of a single evaluation run.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

workflow_output_file: pathlib.Path | None = None#
evaluator_output_files: list[pathlib.Path] = None#
workflow_interrupted: bool = None#
eval_input: nat.data_models.evaluator.EvalInput = None#
evaluation_results: list[tuple[str, nat.data_models.evaluator.EvalOutput]] = None#
usage_stats: UsageStats | None = None#
profiler_results: ProfilerResults = None#
config_original_file: pathlib.Path | None = None#
config_effective_file: pathlib.Path | None = None#
config_metadata_file: pathlib.Path | None = None#