Create Evaluation Job#
To create an evaluation job, send a POST request to the evaluation/jobs API. The URL of the evaluator API depends on where you deploy evaluator and how you configure it. For more information, refer to NeMo Evaluator Deployment Guide.
Prerequisites#
Set your
EVALUATOR_BASE_URLenvironment variable to your evaluator service endpoint:export EVALUATOR_BASE_URL="https://your-evaluator-service-endpoint"
Ensure you have created both an evaluation target and an evaluation configuration
v2 (Preview)#
Warning
v2 API Preview: The v2 API is available for testing and feedback but is not yet recommended for production use. Breaking changes may occur before the stable release.
The v2 API introduces a spec envelope at the top level.
import os
from nemo_microservices import NeMoMicroservices
# Initialize the client
client = NeMoMicroservices(
base_url=os.environ['EVALUATOR_BASE_URL']
)
# Create an evaluation job (v2 API)
job = client.v2.evaluation.jobs.create(
spec={
"target": {
# example target
"name": <my-target-name>,
"namespace": <my-target-name>,
"type": <my-target-type>,
},
"config": {
# example config
"name": <my-config-name>,
"namespace": <my-config-name>,
"type": <my-config-type>,
"params": {},
}
}
)
# Get the job ID and status
job_id = job.id
print(f"Job ID: {job_id}")
print(f"Job status: {job.status}")
curl -X "POST" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"spec": {
"target": {
"name": "<my-target-name>",
"namespace": "<my-target-namespace>",
"type": "model",
"model": {
"api_endpoint": {
"url": "http://nemo-nim-proxy:8000/v1/chat/completions",
"model_id": "meta/llama-3.1-8b-instruct"
}
}
},
"config": {
"name": "<my-config-name>",
"namespace": "<my-config-namespace>",
"type": "bfclv3",
"params": {
"limit_samples": 10
},
"tasks": {
"task1": {
"type": "simple"
}
}
}
}
}'
v2 Example Response
{
"id": "job-dq1pjj6vj5p64xaeqgvuk4",
"created_at": "2025-09-08T19:20:32.655254",
"updated_at": "2025-09-08T19:20:32.655256",
"spec": {
"config": {
"type": "bfclv3",
"params": {
"limit_samples": 10
},
"tasks": {
"task1": {
"type": "simple"
}
}
},
"target": {
"type": "model",
"model": {
"api_endpoint": {
"url": "https://nim.int.aire.nvidia.com/v1/chat/completions",
"model_id": "meta/llama-3.1-8b-instruct",
"format": "nim"
}
}
}
},
"status": "created",
"status_details": {},
"error_details": null,
"ownership": null,
"custom_fields": null
}
Key v2 Differences:
Spec envelope: Target and config are wrapped in a required
specobjectEndpoint: Uses
/v2/evaluation/jobsinstead of/v1/evaluation/jobsResponse structure: Includes the new fields and
specenvelope in the responseSecrets: To securely use API keys for jobs with the v2 API, the secrets must be defined in-line with the job definition, not referenced from v1 targets or configs. Refer to V2 Secrets.
v1 (Current)#
import os
from nemo_microservices import NeMoMicroservices
# Initialize the client
client = NeMoMicroservices(
base_url=os.environ['EVALUATOR_BASE_URL']
)
# Create an evaluation job (v1 API)
job = client.evaluation.jobs.create(
namespace="my-organization",
target="<my-target-namespace>/<my-target-name>",
config="<my-config-namespace>/<my-config-name>"
)
# Get the job ID and status
job_id = job.id
print(f"Job ID: {job_id}")
print(f"Job status: {job.status}")
curl -X "POST" "${EVALUATOR_BASE_URL}/v1/evaluation/jobs" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"namespace": "my-organization",
"target": "<my-target-namespace>/<my-target-name>",
"config": "<my-config-namespace>/<my-config-name>"
}'
v1 Example Response
{
"created_at": "2025-03-19T22:50:15.684382",
"updated_at": "2025-03-19T22:50:15.684385",
"id": "eval-UVW123XYZ456",
"namespace": "my-organization",
"description": null,
"target": {
//target details
},
"config": {
// config details
},
"result": null,
"output_files_url": null,
"status_details": {
"message": null,
"task_status": {},
"progress": null
},
"status": "created",
"project": null,
"custom_fields": {},
"ownership": null
}
For the full response reference, refer to Evaluator API.