Create Evaluation Job#

To create an evaluation job, send a POST request to the evaluation/jobs API. The URL of the evaluator API depends on where you deploy evaluator and how you configure it. For more information, refer to NeMo Evaluator Deployment Guide.

Prerequisites#

  • Set your EVALUATOR_BASE_URL environment variable to your evaluator service endpoint:

    export EVALUATOR_BASE_URL="https://your-evaluator-service-endpoint"
    
  • Ensure you have created both an evaluation target and an evaluation configuration

v2 (Preview)#

Warning

v2 API Preview: The v2 API is available for testing and feedback but is not yet recommended for production use. Breaking changes may occur before the stable release.

The v2 API introduces a spec envelope at the top level.

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['EVALUATOR_BASE_URL']
)

# Create an evaluation job (v2 API)
job = client.v2.evaluation.jobs.create(
    spec={
        "target": {
          # example target
          "name": <my-target-name>,
          "namespace": <my-target-name>,
          "type": <my-target-type>,
        },
        "config": {
          # example config
          "name": <my-config-name>,
          "namespace": <my-config-name>,
          "type": <my-config-type>,
          "params": {},
        }
    }
)

# Get the job ID and status
job_id = job.id
print(f"Job ID: {job_id}")
print(f"Job status: {job.status}")
curl -X "POST" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "spec": {
      "target": {
        "name": "<my-target-name>",
        "namespace": "<my-target-namespace>",
        "type": "model",
        "model": {
          "api_endpoint": {
            "url": "http://nemo-nim-proxy:8000/v1/chat/completions",
            "model_id": "meta/llama-3.1-8b-instruct"
          }
        }
      },
      "config": {
        "name": "<my-config-name>",
        "namespace": "<my-config-namespace>",
        "type": "bfclv3",
        "params": {
          "limit_samples": 10
        },
        "tasks": {
          "task1": {
            "type": "simple"
          }
        }
      }
    }
  }'
v2 Example Response
{
  "id": "job-dq1pjj6vj5p64xaeqgvuk4",
  "created_at": "2025-09-08T19:20:32.655254",
  "updated_at": "2025-09-08T19:20:32.655256",
  "spec": {
    "config": {
      "type": "bfclv3",
      "params": {
        "limit_samples": 10
      },
      "tasks": {
        "task1": {
          "type": "simple"
        }
      }
    },
    "target": {
      "type": "model",
      "model": {
        "api_endpoint": {
          "url": "https://nim.int.aire.nvidia.com/v1/chat/completions",
          "model_id": "meta/llama-3.1-8b-instruct",
          "format": "nim"
        }
      }
    }
  },
  "status": "created",
  "status_details": {},
  "error_details": null,
  "ownership": null,
  "custom_fields": null
}

Key v2 Differences:

  • Spec envelope: Target and config are wrapped in a required spec object

  • Endpoint: Uses /v2/evaluation/jobs instead of /v1/evaluation/jobs

  • Response structure: Includes the new fields and spec envelope in the response

  • Secrets: To securely use API keys for jobs with the v2 API, the secrets must be defined in-line with the job definition, not referenced from v1 targets or configs. Refer to V2 Secrets.

v1 (Current)#

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['EVALUATOR_BASE_URL']
)

# Create an evaluation job (v1 API)
job = client.evaluation.jobs.create(
    namespace="my-organization",
    target="<my-target-namespace>/<my-target-name>",
    config="<my-config-namespace>/<my-config-name>"
)

# Get the job ID and status
job_id = job.id
print(f"Job ID: {job_id}")
print(f"Job status: {job.status}")
curl -X "POST" "${EVALUATOR_BASE_URL}/v1/evaluation/jobs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "namespace": "my-organization",
    "target": "<my-target-namespace>/<my-target-name>",
    "config": "<my-config-namespace>/<my-config-name>"
  }'
v1 Example Response
{
    "created_at": "2025-03-19T22:50:15.684382",
    "updated_at": "2025-03-19T22:50:15.684385",
    "id": "eval-UVW123XYZ456",
    "namespace": "my-organization",
    "description": null,
    "target": {
        //target details
    },
    "config": {
        // config details
    },
    "result": null,
    "output_files_url": null,
    "status_details": {
        "message": null,
        "task_status": {},
        "progress": null
    },
    "status": "created",
    "project": null,
    "custom_fields": {},
    "ownership": null
}

For the full response reference, refer to Evaluator API.