nat.data_models.evaluate_runtime#
Runtime-only evaluation models used by nat eval programmatic execution.
Classes#
Configuration for HTTP retry behavior on remote workflow endpoints. |
|
Parameters used for a single evaluation run. This is used by the |
|
Token usage counters aggregated for one LLM. |
|
Usage metrics for one evaluated input item. |
|
Aggregated usage metrics across an evaluation run. |
|
Confidence intervals and percentiles for a sampled profiler metric. |
|
p90/p95/p99 workflow runtimes across evaluation examples. |
|
High-level profiler output attached to an evaluation run. |
|
Output of a single evaluation run. |
Module Contents#
- class EndpointRetryConfig(/, **data: Any)#
Bases:
pydantic.BaseModelConfiguration for HTTP retry behavior on remote workflow endpoints.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.
- class EvaluationRunConfig(/, **data: Any)#
Bases:
pydantic.BaseModelParameters used for a single evaluation run. This is used by the
nat evalcommand. It can also be used for programmatic evaluation.Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- config_file: pathlib.Path | pydantic.BaseModel = None#
- endpoint_retry: EndpointRetryConfig = None#
- class UsageStatsLLM(/, **data: Any)#
Bases:
pydantic.BaseModelToken usage counters aggregated for one LLM.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.
- class UsageStatsItem(/, **data: Any)#
Bases:
pydantic.BaseModelUsage metrics for one evaluated input item.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- usage_stats_per_llm: dict[str, UsageStatsLLM]#
- class UsageStats(/, **data: Any)#
Bases:
pydantic.BaseModelAggregated usage metrics across an evaluation run.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- usage_stats_items: dict[object, UsageStatsItem]#
- class InferenceMetricsModel(/, **data: Any)#
Bases:
pydantic.BaseModelConfidence intervals and percentiles for a sampled profiler metric.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.
- class WorkflowRuntimeMetrics(/, **data: Any)#
Bases:
pydantic.BaseModelp90/p95/p99 workflow runtimes across evaluation examples.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.
- class ProfilerResults(/, **data: Any)#
Bases:
pydantic.BaseModelHigh-level profiler output attached to an evaluation run.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- workflow_runtime_metrics: WorkflowRuntimeMetrics | None = None#
- llm_latency_ci: InferenceMetricsModel | None = None#
- class EvaluationRunOutput(/, **data: Any)#
Bases:
pydantic.BaseModelOutput of a single evaluation run.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- workflow_output_file: pathlib.Path | None = None#
- evaluator_output_files: list[pathlib.Path] = None#
- eval_input: nat.data_models.evaluator.EvalInput = None#
- evaluation_results: list[tuple[str, nat.data_models.evaluator.EvalOutput]] = None#
- usage_stats: UsageStats | None = None#
- profiler_results: ProfilerResults = None#
- config_original_file: pathlib.Path | None = None#
- config_effective_file: pathlib.Path | None = None#
- config_metadata_file: pathlib.Path | None = None#