Download Evaluation Job Logs#

To download the logs for an evaluation job, send a GET request to the evaluation/jobs/<job_id>/logs API. This downloads a ZIP file containing the logs generated by the evaluation job.

Prerequisites#

Create a job.
Get the job status and confirm it has terminated, either in COMPLETED or FAILED status.

To Download Evaluation Job Logs#

Choose one of the following options to download evaluation job logs.

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['EVALUATOR_BASE_URL']
)

# Download job logs
job_id = "job-id"
logs_zip = client.evaluation.jobs.download_logs(job_id)

# Save to file
with open(f"{job_id}_logs.zip", 'wb') as file:
    file.write(logs_zip)
    
print("Download completed.")

cURL

curl -X "GET" "${EVALUATOR_BASE_URL}/evaluation/jobs/<job_id>/logs" \
  -o <job_id>_logs.zip

Job Logs#

After the download completes, the logs are available in the <job_id>_logs.zip file.

Unzip the file on Ubuntu, macOS, or Linux using the following command:
```
unzip <job_id>_logs.zip -d logs
```

Review example log result.

Example Response

INFO:nveval.utils.process_custom_data:Validation passed for /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/BFCL_v3_simple.json and /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/possible_answer/BFCL_v3_simple.json
INFO:nveval.adapter:Running command: ['export BFCL_DATA_DIR=/jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat', 'bfcl generate --skip-server-setup --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --limit 5', 'bfcl evaluate --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --score-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores --limit 5', 'unset BFCL_DATA_DIR']
environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used.
Generating results for ['meta/llama-3.1-8b-instruct']
Running full test cases for categories: ['simple'].
Generating results for meta/llama-3.1-8b-instruct:   0%|          | 0/2 [00:00<?, ?it/s]
Generating results for meta/llama-3.1-8b-instruct:  50%|█████     | 1/2 [00:00<00:00,  3.09it/s]
Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00,  4.37it/s]
Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00,  4.12it/s]
environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used.
Number of models evaluated:   0%|          | 0/1 [00:00<?, ?it/s]
Number of models evaluated: 100%|██████████| 1/1 [00:00<00:00, 41.87it/s]
🦍 Model: meta_llama-3.1-8b-instruct
🔍 Running test: simple
✅ Test completed: simple. 🎯 Accuracy: 1.0
📈 Aggregating data to generate leaderboard score table...
🏁 Evaluation completed. See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_overall.csv for overall evaluation results on BFCL V3.
See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_live.csv, /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_non_live.csv and /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_multi_turn.csv for detailed evaluation results on each sub-section categories respectively.
2025-05-28T20:54:07Z (eval_factory_lib.execution.infra) - INFO: Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully.
INFO:eval_factory_lib.execution.infra:Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully.

The contents of the ZIP file depend on the evaluation job and configuration. Typically, it includes log files generated during the evaluation process.