Evaluation Config Schema#
When you create a configuration for an evaluation, you send a JSON data structure that contains the information for your configuration.
Important
Each configuration is uniquely identified by a combination of namespace
and name
. For example my-organization/my-configuration
.
The following table contains selected field reference for the JSON data. For the full API reference, refer to Job JSON Schema Reference.
Name |
Description |
Type |
Valid Values or Child Objects |
---|---|---|---|
access_policies |
The policies that control who can use the configuration. This field is for sharing configurations across organizations. |
Object |
— |
api_endpoint |
The endpoint for a model. |
Object |
- |
api_key |
The key to access an API endpoint. |
String |
— |
created_by |
The ID of the user that created the configuration. This field is for sharing configurations across organizations. |
String |
— |
custom_fields |
An optional object that you can use to store additional information. |
Object |
— |
dataset |
A dataset to use for the evaluation. |
Object |
- |
description |
A description of the configuration. |
String |
— |
extra |
Additional parameters for academic benchmarks. |
Object |
— |
files_url |
The url for a file that contains pre-generated data. Use |
String |
— |
format |
The format of a data file. For format information, refer to Custom Data. |
String |
- |
groups |
A dictionary of evaluation tasks to run in a group. |
Object |
- |
hf_token |
A Hugging Face account token. For some benchmark datasets, a valid Hugging Face token is required to access the data. For example, task |
String |
— |
id |
The ID of the configuration. The ID is returned in the response when you create a configuration. |
String |
— |
judge_llm |
The model to use to judge the answer. |
Object |
- |
limit_samples |
The number of samples to evaluate. |
Integer |
— |
max_tokens |
The maximum number of tokens to generate during inference. |
Integer |
— |
max_retries |
The number of times an evaluation job retries a request to a model after a failure. |
Integer |
— |
metrics |
A dictionary of objects in the form |
Object |
- |
model_id |
The id of the NIM model, as specified in Models. |
String |
— |
name |
An arbitrary name for to identify the configuration. If you don’t specify a name, the default is the ID associated with the configuration. |
String |
— |
namespace |
An arbitrary organization name, a vendor name, or any other text. If you don’t specify a namespace, the default is |
String |
— |
ownership |
Information about the creator of the configuration, and who can use it. This field is for sharing configurations across organizations. |
Object |
- |
parallelism |
The parallelism of job running the benchmark. Supported by |
Integer |
— |
params |
A set of parameters to apply to the evaluation. |
Object |
- |
project |
The ID of a project to associate with the configuration. |
String |
— |
request_timeout |
The time in milliseconds that the evaluation job waits for a response from the model before it fails. |
Integer |
— |
stop |
Up to 4 sequences where the API will stop generating further tokens. |
String or List |
— |
tasks |
A dictionary of evaluation tasks to run. |
Object |
- |
temperature |
Adjusts the randomness of token selection. Higher values increase randomness and creativity; lower values promote deterministic and conservative output. |
Number |
— |
top_p |
A threshold that selects from the most probable tokens until the cumulative probability exceeds p. |
Number |
— |
type |
The type of evaluation that the configuration is for. For custom evaluations, set this to |
String |
Some examples include: |
type (task) |
The type of a task. |
String |
Some examples include: |
url |
The url for a model endpoint. |
String |
— |
</rewritten_file>