nemo_evaluator.api.api_dataclasses#
NeMo Evaluator Core operates on strictly defined input and output which are modelled through pydantic dataclasses. Whether you use Python API or CLI, the reference below serves as a map of configuration options and output format.
Modeling Target#
API endpoint configuration containing information on endpoint placement, targeted model name and adapter used before prompting endpoint. |
|
EndpointType is used to determine appropriate URL, payload structure or native harness inference class |
|
Target configuration for API endpoints. |
Modeling Evaluation#
Configuration for evaluation runs. |
|
Parameters for evaluation execution. |
Modeling Result#
EvaluationResults bundles per-tasks and per-group results. |
|
Some tasks can be grouped or logically split. |
|
Defines mapping from metric name to its scores. |
|
Atomic class that contains the value of particular metric and corresponding stats |
|
Stats for a score. |
|
Defines set of metrics that were calculated for particular task. |
- pydantic model nemo_evaluator.api.api_dataclasses.ApiEndpoint[source]#
Bases:
BaseModelAPI endpoint configuration containing information on endpoint placement, targeted model name and adapter used before prompting endpoint.
- Config:
use_enum_values: bool = True
- field adapter_config: AdapterConfig | None = None#
Adapter configuration
- field api_key: str | None = None#
Name of the env variable that stores API key for the model
- field model_id: str | None = None#
Name of the model
- field stream: bool | None = None#
Whether responses should be streamed
- field type: EndpointType | None = None#
The type of the target
- field url: str | None = None#
Url of the model
- pydantic model nemo_evaluator.api.api_dataclasses.ConfigParams[source]#
Bases:
BaseModelParameters for evaluation execution.
- field extra: Dict[str, Any] | None [Optional]#
Framework specific parameters to be used for evaluation
- field limit_samples: int | float | None = None#
Limit number of evaluation samples
- field max_new_tokens: int | None = None#
Max tokens to generate
- field max_retries: int | None = None#
Number of REST request retries
- field parallelism: int | None = None#
Parallelism to be used
- field request_timeout: int | None = None#
REST response timeout
- field task: str | None = None#
Name of the task
- field temperature: float | None = None#
Float value between 0 and 1. temp of 0 indicates greedy decoding, where the token with highest prob is chosen. Temperature can’t be set to 0.0 currently
- field top_p: float | None = None#
Float value between 0 and 1; limits to the top tokens within a certain probability. top_p=0 means the model will only consider the single most likely token for the next prediction
- enum nemo_evaluator.api.api_dataclasses.EndpointType(value)[source]#
Bases:
str,EnumEndpointType is used to determine appropriate URL, payload structure or native harness inference class
- Member Type:
str
Valid values are as follows:
- UNDEFINED = <EndpointType.UNDEFINED: 'undefined'>#
- CHAT = <EndpointType.CHAT: 'chat'>#
- COMPLETIONS = <EndpointType.COMPLETIONS: 'completions'>#
- VLM = <EndpointType.VLM: 'vlm'>#
- EMBEDDING = <EndpointType.EMBEDDING: 'embedding'>#
- pydantic model nemo_evaluator.api.api_dataclasses.Evaluation[source]#
Bases:
BaseModel- field command: str [Required]#
jinja template of the command to be executed
- field config: EvaluationConfig [Required]#
- field framework_name: str [Required]#
Name of the framework
- field pkg_name: str [Required]#
Name of the package
- field target: EvaluationTarget [Required]#
- pydantic model nemo_evaluator.api.api_dataclasses.EvaluationConfig[source]#
Bases:
BaseModelConfiguration for evaluation runs.
- field output_dir: str | None = None#
Directory to output the results
- field params: ConfigParams | None = None#
Parameters to be used for evaluation
- field supported_endpoint_types: list[str] | None = None#
Supported endpoint types like chat or completions
- field type: str | None = None#
Type of the task
- class nemo_evaluator.api.api_dataclasses.EvaluationMetadata[source]#
Bases:
dictWe put here various evaluation metadata that does not influence the evaluation.
- pydantic model nemo_evaluator.api.api_dataclasses.EvaluationResult[source]#
Bases:
BaseModelEvaluationResults bundles per-tasks and per-group results.
- field groups: Dict[str, GroupResult] | None [Optional]#
The results at the group-level
- field tasks: Dict[str, TaskResult] | None [Optional]#
The results at the task-level
- pydantic model nemo_evaluator.api.api_dataclasses.EvaluationTarget[source]#
Bases:
BaseModelTarget configuration for API endpoints.
- field api_endpoint: ApiEndpoint | None = None#
API endpoint to be used for evaluation
- pydantic model nemo_evaluator.api.api_dataclasses.GroupResult[source]#
Bases:
BaseModelSome tasks can be grouped or logically split. This class defines result on grouping level.
- field groups: Dict[str, GroupResult] | None = None#
The results for the subgroups.
- field metrics: Dict[str, MetricResult] [Optional]#
The value for all the metrics computed for the group.
- pydantic model nemo_evaluator.api.api_dataclasses.MetricResult[source]#
Bases:
BaseModelDefines mapping from metric name to its scores.
- pydantic model nemo_evaluator.api.api_dataclasses.Score[source]#
Bases:
BaseModelAtomic class that contains the value of particular metric and corresponding stats
- field stats: ScoreStats [Required]#
Statistics associated with this metric
- field value: float [Required]#
The value/score produced on this metric
- pydantic model nemo_evaluator.api.api_dataclasses.ScoreStats[source]#
Bases:
BaseModelStats for a score.
- field count: int | None = None#
The number of values used for computing the score.
- field max: float | None = None#
The maximum of all values used for computing the score.
- field mean: float | None = None#
The mean of all values used for computing the score.
- field min: float | None = None#
The minimum of all values used for computing the score.
- field stddev: float | None = None#
This is the population standard deviation, not the sample standard deviation.
- field stderr: float | None = None#
The standard error.
- field sum: float | None = None#
The sum of all values used for computing the score.
- field sum_squared: float | None = None#
The sum of the square of all values used for computing the score.
- field variance: float | None = None#
This is the population variance, not the sample variance.
- pydantic model nemo_evaluator.api.api_dataclasses.TaskResult[source]#
Bases:
BaseModelDefines set of metrics that were calculated for particular task.
- field metrics: Dict[str, MetricResult] [Optional]#
The value for all the metrics computed for the task