Download Evaluation Results#

To download the results of an evaluation job, send a GET request to the evaluation jobs results API. This downloads a directory that contains the configuration files, logs, and evaluation results for a specific evaluation job.

v2 (Preview)#

Warning

v2 API Preview: The v2 API is available for testing and feedback but is not yet recommended for production use. Breaking changes may occur before the stable release.

The v2 API provides structured access to different result types through separate endpoints.

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['EVALUATOR_BASE_URL']
)

# Download job artifacts (v2 API)
job_id = "job-id"
artifacts_zip = client.v2.evaluation.jobs.results.artifacts.retrieve(job_id)

# Save to file
artifacts_zip.write_to_file('artifacts.tar.gz')

# Alternatively, download evaluation results separately
eval_results = client.v2.evaluation.jobs.results.evaluation_results.retrieve(job_id)
with open("evaluation_results.json", "w") as f:
    f.write(eval_results.model_dump_json(indent=2, exclude_none=True))
    
print("Download completed.")

cURL

# Download job artifacts (logs, intermediate files, etc.)
curl -X "GET" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/<job_id>/results/artifacts/download" \
  -o artifacts.tar.gz

# Download evaluation results (structured results only)
curl -X "GET" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/<job_id>/results/evaluation-results/download" \
  -H 'accept: application/json'

v2 Result Types#

The v2 API distinguishes between different result types:

artifacts: Complete job artifacts including logs, intermediate files, configuration files, and all outputs
evaluation-results: Structured evaluation metrics and scores only

v1 (Current)#

Choose one of the following options to download evaluation results.

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['EVALUATOR_BASE_URL']
)

# Download evaluation results (v1 API)
results_zip = client.evaluation.jobs.download_results("job-id")

# Save to file
results_zip.write_to_file('result.zip')
    
print("Download completed.")

cURL

curl -X "GET" "${EVALUATOR_BASE_URL}/v1/evaluation/jobs/<job_id>/download-results" \
  -H 'accept: application/zip' \
  -o result.zip

Results#

After the download completes, the results are available in the result.zip file. To unzip the result.zip file on Ubuntu, macOS, or Linux, run the following code.

unzip result.zip -d result

You can find the result files in the results/ folder. For example, if you run an lm-harness evaluation, the results are in automatic/lm_eval_harness/results.

The directory structure will look like this:

.
├── automatic
│   └── lm_eval_harness
│       ├── model_config_meta-llama-3_1-8b-instruct.yaml
│       ├── model_config_meta-llama-3_1-8b-instruct_inference_params.yaml
│       └── results
│           ├── README.md
│           ├── lm-harness-mmlu_str.json
│           ├── lm-harness.json
│           ├── lmharness_meta-llama-3_1-8b-instruct_aggregateresults-run.log
│           ├── lmharness_meta-llama-3_1-8b-instruct_mmlu_str-run.log
└── metadata.json