Download Evaluation Job Logs#
To download the logs for an evaluation job, send a GET
request to the evaluation/jobs/<job_id>/logs
API. This downloads a ZIP file containing the logs generated by the evaluation job.
Prerequisites#
Get the job status and confirm it has terminated, either in
COMPLETED
orFAILED
status.
Options#
API#
Submit a GET request to
/v1/evaluation/jobs/<job_id>/logs
.curl -X "GET" "${EVALUATOR_SERVICE_URL}/v1/evaluation/jobs/<job_id>/logs" \ -o <job_id>_logs.zip
import requests url = f"{EVALUATOR_SERVICE_URL}/v1/evaluation/jobs/<job_id>/logs" response = requests.get(url, stream=True) with open(f"<job_id>_logs.zip", 'wb') as file: for chunk in response.iter_content(chunk_size=8192): file.write(chunk) print("Download completed.")
After the download completes, the logs are available in the
<job_id>_logs.zip
file.Unzip the file on Ubuntu, macOS, or Linux using the following command:
unzip <job_id>_logs.zip -d logs
Review example log result.
Example Response
INFO:nveval.utils.process_custom_data:Validation passed for /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/BFCL_v3_simple.json and /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/possible_answer/BFCL_v3_simple.json INFO:nveval.adapter:Running command: ['export BFCL_DATA_DIR=/jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat', 'bfcl generate --skip-server-setup --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --limit 5', 'bfcl evaluate --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --score-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores --limit 5', 'unset BFCL_DATA_DIR'] environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used. Generating results for ['meta/llama-3.1-8b-instruct'] Running full test cases for categories: ['simple']. Generating results for meta/llama-3.1-8b-instruct: 0%| | 0/2 [00:00<?, ?it/s] Generating results for meta/llama-3.1-8b-instruct: 50%|█████ | 1/2 [00:00<00:00, 3.09it/s] Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00, 4.37it/s] Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00, 4.12it/s] environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used. Number of models evaluated: 0%| | 0/1 [00:00<?, ?it/s] Number of models evaluated: 100%|██████████| 1/1 [00:00<00:00, 41.87it/s] 🦍 Model: meta_llama-3.1-8b-instruct 🔍 Running test: simple ✅ Test completed: simple. 🎯 Accuracy: 1.0 📈 Aggregating data to generate leaderboard score table... 🏁 Evaluation completed. See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_overall.csv for overall evaluation results on BFCL V3. See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_live.csv, /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_non_live.csv and /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_multi_turn.csv for detailed evaluation results on each sub-section categories respectively. 2025-05-28T20:54:07Z (eval_factory_lib.execution.infra) - INFO: Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully. INFO:eval_factory_lib.execution.infra:Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully.
The contents of the ZIP file depend on the evaluation job and configuration. Typically, it includes log files generated during the evaluation process.