`nemo_evaluator.api.api_dataclasses`#

NeMo Evaluator Core operates on strictly defined input and output which are modelled through pydantic dataclasses. Whether you use Python API or CLI, the reference below serves as a map of configuration options and output format.

Modeling Target#

`ApiEndpoint`	API endpoint configuration containing information on endpoint placement, targeted model name and adapter used before prompting endpoint.
`EndpointType`	EndpointType is used to determine appropriate URL, payload structure or native harness inference class
`EvaluationTarget`	Target configuration for API endpoints.

Modeling Evaluation#

`EvaluationConfig`	Configuration for evaluation runs.
`ConfigParams`	Parameters for evaluation execution.

Modeling Result#

`EvaluationResult`	EvaluationResults bundles per-tasks and per-group results.
`GroupResult`	Some tasks can be grouped or logically split.
`MetricResult`	Defines mapping from metric name to its scores.
`Score`	Atomic class that contains the value of particular metric and corresponding stats
`ScoreStats`	Stats for a score.
`TaskResult`	Defines set of metrics that were calculated for particular task.

pydantic model nemo_evaluator.api.api_dataclasses.ApiEndpoint[source]#

Bases: BaseModel

API endpoint configuration containing information on endpoint placement, targeted model name and adapter used before prompting endpoint.

Config:

use_enum_values: bool = True
extra: str = forbid

Validators:

handle_api_key_deprecation » all fields

field adapter_config: AdapterConfig | None = None#

Adapter configuration

Validated by:

handle_api_key_deprecation

field api_key_name: str | None = None#

Name of the environment variable that stores API key for the model

Validated by:

handle_api_key_deprecation

field model_id: str | None = None#

Name of the model

Validated by:

handle_api_key_deprecation

field stream: bool | None = None#

Whether responses should be streamed

Validated by:

handle_api_key_deprecation

field type: EndpointType | None = None#

The type of the target

Validated by:

handle_api_key_deprecation

field url: str | None = None#

Url of the model

Validated by:

handle_api_key_deprecation

validator handle_api_key_deprecation » all fields[source]#: Handle deprecation of api_key in favor of api_key_name.

api_key: str | None#

Read-only data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

msg#: The deprecation message to be emitted.

wrapped_property#: The property instance if the deprecated field is a computed field, or None.

field_name#: The name of the field being deprecated.

pydantic model nemo_evaluator.api.api_dataclasses.ConfigParams[source]#

Bases: BaseModel

Parameters for evaluation execution.

Config:

extra: str = forbid

field extra: Dict[str, Any] | None [Optional]#: Framework specific parameters to be used for evaluation

field limit_samples: int | float | None = None#: Limit number of evaluation samples

field max_new_tokens: int | None = None#: Max tokens to generate

field max_retries: int | None = None#: Number of REST request retries

field parallelism: int | None = None#: Parallelism to be used

field request_timeout: int | None = None#: REST response timeout

field task: str | None = None#: Name of the task

field temperature: float | None = None#: Float value between 0 and 1. temp of 0 indicates greedy decoding, where the token with highest prob is chosen. Temperature can’t be set to 0.0 currently

field top_p: float | None = None#: Float value between 0 and 1; limits to the top tokens within a certain probability. top_p=0 means the model will only consider the single most likely token for the next prediction

pydantic model nemo_evaluator.api.api_dataclasses.EndpointModelConfig[source]#

Bases: BaseModel

Supporting model configuration.

field adapter_config: AdapterConfig | None = None#: Adapter configuration

field api_key_name: str | None = None#: Name of the env variable that stores API key

field extra: Dict[str, Any] | None = None#: Extra

field is_base_url: bool | None = False#: Whether the URL is a base URL

field max_new_tokens: int | None = None#: Max new tokens

field max_retries: int | None = None#: Max retries

field model_id: str [Required]#: Name of the model

field parallelism: int | None = None#: Parallelism

field request_timeout: int | None = None#: Request timeout

field stream: bool | None = None#: Whether responses should be streamed

field temperature: float | None = None#: Temperature

field top_p: float | None = None#: Top p

field type: EndpointType | None = None#: The type of the target

field url: str [Required]#: Url of the model

enum nemo_evaluator.api.api_dataclasses.EndpointType(value)[source]#

Bases: str, Enum

EndpointType is used to determine appropriate URL, payload structure or native harness inference class

Member Type:: str

Valid values are as follows:

UNDEFINED = <EndpointType.UNDEFINED: 'undefined'>#

CHAT = <EndpointType.CHAT: 'chat'>#

COMPLETIONS = <EndpointType.COMPLETIONS: 'completions'>#

VLM = <EndpointType.VLM: 'vlm'>#

EMBEDDING = <EndpointType.EMBEDDING: 'embedding'>#

pydantic model nemo_evaluator.api.api_dataclasses.Evaluation[source]#

Bases: BaseModel

Config:

extra: str = forbid

field command: str [Required]#: jinja template of the command to be executed

field config: EvaluationConfig [Required]#

field framework_name: str [Required]#: Name of the framework

field pkg_name: str [Required]#: Name of the package

field target: EvaluationTarget [Required]#

render_command()[source]#

pydantic model nemo_evaluator.api.api_dataclasses.EvaluationConfig[source]#

Bases: BaseModel

Configuration for evaluation runs.

Config:

extra: str = forbid

field output_dir: str | None = None#: Directory to output the results

field params: ConfigParams | None = None#: Parameters to be used for evaluation

field supported_endpoint_types: list[str] | None = None#: Supported endpoint types like chat or completions

field type: str | None = None#: Type of the task

class nemo_evaluator.api.api_dataclasses.EvaluationMetadata[source]#

Bases: dict

We put here various evaluation metadata that does not influence the evaluation.

pydantic model nemo_evaluator.api.api_dataclasses.EvaluationResult[source]#

Bases: BaseModel

EvaluationResults bundles per-tasks and per-group results.

field groups: Dict[str, GroupResult] | None [Optional]#: The results at the group-level

field tasks: Dict[str, TaskResult] | None [Optional]#: The results at the task-level

pydantic model nemo_evaluator.api.api_dataclasses.EvaluationTarget[source]#

Bases: BaseModel

Target configuration for API endpoints.

Config:

extra: str = forbid

field api_endpoint: ApiEndpoint | None = None#: API endpoint to be used for evaluation

pydantic model nemo_evaluator.api.api_dataclasses.GroupResult[source]#

Bases: BaseModel

Some tasks can be grouped or logically split. This class defines result on grouping level.

field groups: Dict[str, GroupResult] | None = None#: The results for the subgroups.

field metrics: Dict[str, MetricResult] [Optional]#: The value for all the metrics computed for the group.

pydantic model nemo_evaluator.api.api_dataclasses.MetricResult[source]#

Bases: BaseModel

Defines mapping from metric name to its scores.

field scores: Dict[str, Score] [Optional]#: Mapping from metric name to scores.

pydantic model nemo_evaluator.api.api_dataclasses.Score[source]#

Bases: BaseModel

Atomic class that contains the value of particular metric and corresponding stats

field stats: ScoreStats [Required]#: Statistics associated with this metric

field value: float [Required]#: The value/score produced on this metric

pydantic model nemo_evaluator.api.api_dataclasses.ScoreStats[source]#

Bases: BaseModel

Stats for a score.

field count: int | None = None#: The number of values used for computing the score.

field max: float | None = None#: The maximum of all values used for computing the score.

field mean: float | None = None#: The mean of all values used for computing the score.

field min: float | None = None#: The minimum of all values used for computing the score.

field stddev: float | None = None#: This is the population standard deviation, not the sample standard deviation.

field stderr: float | None = None#: The standard error.

field sum: float | None = None#: The sum of all values used for computing the score.

field sum_squared: float | None = None#: The sum of the square of all values used for computing the score.

field variance: float | None = None#: This is the population variance, not the sample variance.

pydantic model nemo_evaluator.api.api_dataclasses.TaskResult[source]#

Bases: BaseModel

Defines set of metrics that were calculated for particular task.

field metrics: Dict[str, MetricResult] [Optional]#: The value for all the metrics computed for the task

nemo_evaluator.api.api_dataclasses#

Modeling Target#

Modeling Evaluation#

Modeling Result#

`nemo_evaluator.api.api_dataclasses`#