Create Evaluation Job#
To create an evaluation job, send a POST request to the evaluation/jobs API. The URL of the evaluator API depends on where you deploy evaluator and how you configure it. For more information, refer to NeMo Evaluator Deployment Guide.
Prerequisites#
Set your
EVALUATOR_BASE_URLenvironment variable to your evaluator service endpoint:export EVALUATOR_BASE_URL="https://your-evaluator-service-endpoint"
Ensure you have created both an evaluation target and an evaluation configuration
v2 (Preview)#
Warning
v2 API Preview: The v2 API is available for testing and feedback but is not yet recommended for production use. Breaking changes may occur before the stable release.
The v2 API introduces a spec envelope at the top level.
import os
from nemo_microservices import NeMoMicroservices
# Initialize the client
client = NeMoMicroservices(
base_url=os.environ['EVALUATOR_BASE_URL']
)
# Create an evaluation job (v2 API)
job = client.v2.evaluation.jobs.create(
spec={
"target": {
# example target
"name": <my-target-name>,
"namespace": <my-target-name>,
"type": <my-target-type>,
},
"config": {
# example config
"name": <my-config-name>,
"namespace": <my-config-name>,
"type": <my-config-type>,
"params": {},
}
}
)
# Get the job ID and status
job_id = job.id
print(f"Job ID: {job_id}")
print(f"Job status: {job.status}")
curl -X "POST" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"spec": {
"target": {
"name": "<my-target-name>",
"namespace": "<my-target-namespace>",
"type": "model",
"model": {
"api_endpoint": {
"url": "http://nemo-nim-proxy:8000/v1/chat/completions",
"model_id": "meta/llama-3.1-8b-instruct"
}
}
},
"config": {
"name": "<my-config-name>",
"namespace": "<my-config-namespace>",
"type": "bfclv3",
"params": {
"limit_samples": 10
},
"tasks": {
"task1": {
"type": "simple"
}
}
}
}
}'
v2 Example Response
{
"id": "job-dq1pjj6vj5p64xaeqgvuk4",
"created_at": "2025-09-08T19:20:32.655254",
"updated_at": "2025-09-08T19:20:32.655256",
"spec": {
"config": {
"type": "bfclv3",
"params": {
"limit_samples": 10
},
"tasks": {
"task1": {
"type": "simple"
}
}
},
"target": {
"type": "model",
"model": {
"api_endpoint": {
"url": "https://nim.int.aire.nvidia.com/v1/chat/completions",
"model_id": "meta/llama-3.1-8b-instruct",
"format": "nim"
}
}
}
},
"status": "created",
"status_details": {},
"error_details": null,
"ownership": null,
"custom_fields": null
}
Key v2 Differences:
Spec envelope: Target and config are wrapped in a required
specobjectEndpoint: Uses
/v2/evaluation/jobsinstead of/v1/evaluation/jobsResponse structure: Includes the new fields and
specenvelope in the responseSecrets: To securely use API keys for jobs with the v2 API, the secrets must be defined in-line with the job definition, not referenced from v1 targets or configs. Refer to V2 Secrets.
Note
The target and config formats remain the same. Refer to Eval Flows for evaluation configuration examples.
v1 (Current)#
import os
from nemo_microservices import NeMoMicroservices
# Initialize the client
client = NeMoMicroservices(
base_url=os.environ['EVALUATOR_BASE_URL']
)
# Create an evaluation job (v1 API)
job = client.evaluation.jobs.create(
namespace="my-organization",
target="<my-target-namespace>/<my-target-name>",
config="<my-config-namespace>/<my-config-name>"
)
# Get the job ID and status
job_id = job.id
print(f"Job ID: {job_id}")
print(f"Job status: {job.status}")
curl -X "POST" "${EVALUATOR_BASE_URL}/v1/evaluation/jobs" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"namespace": "my-organization",
"target": "<my-target-namespace>/<my-target-name>",
"config": "<my-config-namespace>/<my-config-name>"
}'
v1 Example Response
{
"created_at": "2025-03-19T22:50:15.684382",
"updated_at": "2025-03-19T22:50:15.684385",
"id": "eval-UVW123XYZ456",
"namespace": "my-organization",
"description": null,
"target": {
//target details
},
"config": {
// config details
},
"result": null,
"output_files_url": null,
"status_details": {
"message": null,
"task_status": {},
"progress": null
},
"status": "created",
"project": null,
"custom_fields": {},
"ownership": null
}
For the full response reference, refer to Evaluator API.