Download Evaluation Job Logs#
To download the logs for an evaluation job, send a GET request to the evaluation/jobs/<job_id>/logs API. This downloads a ZIP file containing the logs generated by the evaluation job.
Prerequisites#
v2 (Preview)#
Warning
v2 API Preview: The v2 API is available for testing and feedback but is not yet recommended for production use. Breaking changes may occur before the stable release.
The v2 API provides real-time log access with JSON responses and pagination support.
import os
from nemo_microservices import NeMoMicroservices
# Initialize the client
client = NeMoMicroservices(
base_url=os.environ['EVALUATOR_BASE_URL']
)
# Get logs with pagination (v2 API)
job_id = "job-id"
all_messages = []
logs = client.v2.evaluation.jobs.logs.list(job_id=job_id)
while logs.next_page is not None:
# Process logs
all_messages.extend(log.message for log in logs.data)
# Handle pagination
logs = client.v2.evaluation.jobs.logs.list(job_id=job_id, page_cursor=logs.next_page)
all_messages.extend(log.message for log in logs.data)
print("".join(all_messages))
# Get logs (first page)
curl -X "GET" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/<job_id>/logs" \
-H 'accept: application/json'
# Get logs with pagination
curl -X "GET" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/<job_id>/logs?page_cursor=<cursor>" \
-H 'accept: application/json'
# Parse log messages only (simulate log file)
curl -X "GET" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/<job_id>/logs" \
-H 'accept: application/json' | jq -r '.data[].message'
v2 Example Response
Danger
Breaking Change: v2 logs return JSON with pagination instead of a ZIP file. Update your log processing code accordingly.
{
"data": [
{
"id": 4362,
"job_id": "job-dq1pjj6vj5p64xaeqgvuk4",
"job_step": "evaluation",
"job_task": "64976129addc499f88071823fcc48f30",
"message": "INFO:nveval.utils.process_custom_data:Validation passed for dataset\n",
"timestamp": "2025-09-08T19:20:44.799146"
},
{
"id": 4363,
"job_id": "job-dq1pjj6vj5p64xaeqgvuk4",
"job_step": "evaluation",
"job_task": "64976129addc499f88071823fcc48f30",
"message": "INFO:nveval.adapter:Running command: ['export BFCL_DATA_DIR=...]\n",
"timestamp": "2025-09-08T19:20:46.249423"
}
],
"next_page": "2ALHpsvTaqATe",
"prev_page": null,
"total": 373
}
v2 Log Processing#
Since v2 returns structured JSON instead of log files, you can process logs programmatically:
# Extract just the log messages to simulate a log file
curl -s "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/${JOB_ID}/logs" | \
jq -r '.logs[] | "\(.timestamp) \(.message)"' > job_logs.txt
# Get all logs with pagination
JOB_ID="job-dq1pjj6vj5p64xaeqgvuk4"
CURSOR=""
while true; do
if [ -z "$CURSOR" ]; then
RESPONSE=$(curl -s "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/${JOB_ID}/logs")
else
RESPONSE=$(curl -s "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/${JOB_ID}/logs?page_cursor=${CURSOR}")
fi
# Extract messages and append to file
echo "$RESPONSE" | jq -r '.logs[] | "\(.timestamp) \(.message)"' >> all_logs.txt
# Check for next page
CURSOR=$(echo "$RESPONSE" | jq -r '.next_page // empty')
[ -z "$CURSOR" ] && break
done
Real-Time Log Monitoring#
With v2, you can monitor logs in real-time as they are generated:
import time
def monitor_job_logs(client, job_id, poll_interval=5):
"""Monitor job logs in real-time."""
last_log_id = 0
while True:
# Get latest logs
logs_response = client.evaluation.jobs.get_logs(job_id)
# Process new logs
new_logs = [log for log in logs_response.logs if log.id > last_log_id]
for log_entry in new_logs:
print(f"[{log_entry.timestamp}] {log_entry.message.strip()}")
last_log_id = max(last_log_id, log_entry.id)
# Check if job is still running
job = client.evaluation.jobs.retrieve(job_id)
if job.status in ["completed", "failed"]:
break
time.sleep(poll_interval)
v1 (Current)#
Additional Prerequisites
Get the job status and confirm it has terminated, either in
COMPLETEDorFAILEDstatus.
The v1 API downloads logs as a ZIP file containing log files.
import os
from nemo_microservices import NeMoMicroservices
# Initialize the client
client = NeMoMicroservices(
base_url=os.environ['EVALUATOR_BASE_URL']
)
# Download job logs (v1 API)
job_id = "job-id"
logs_zip = client.evaluation.jobs.download_logs(job_id)
# Save to file
with open(f"{job_id}_logs.zip", 'wb') as file:
file.write(logs_zip)
print("Download completed.")
curl -X "GET" "${EVALUATOR_BASE_URL}/v1/evaluation/jobs/<job_id>/logs" \
-o <job_id>_logs.zip
Job Logs#
After the download completes, the logs are available in the <job_id>_logs.zip file.
Unzip the file on Ubuntu, macOS, or Linux using the following command:
unzip <job_id>_logs.zip -d logs
Review example log result.
Example Response
INFO:nveval.utils.process_custom_data:Validation passed for /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/BFCL_v3_simple.json and /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/possible_answer/BFCL_v3_simple.json INFO:nveval.adapter:Running command: ['export BFCL_DATA_DIR=/jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat', 'bfcl generate --skip-server-setup --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --limit 5', 'bfcl evaluate --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --score-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores --limit 5', 'unset BFCL_DATA_DIR'] environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used. Generating results for ['meta/llama-3.1-8b-instruct'] Running full test cases for categories: ['simple']. Generating results for meta/llama-3.1-8b-instruct: 0%| | 0/2 [00:00<?, ?it/s] Generating results for meta/llama-3.1-8b-instruct: 50%|█████ | 1/2 [00:00<00:00, 3.09it/s] Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00, 4.37it/s] Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00, 4.12it/s] environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used. Number of models evaluated: 0%| | 0/1 [00:00<?, ?it/s] Number of models evaluated: 100%|██████████| 1/1 [00:00<00:00, 41.87it/s] 🦍 Model: meta_llama-3.1-8b-instruct 🔍 Running test: simple ✅ Test completed: simple. 🎯 Accuracy: 1.0 📈 Aggregating data to generate leaderboard score table... 🏁 Evaluation completed. See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_overall.csv for overall evaluation results on BFCL V3. See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_live.csv, /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_non_live.csv and /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_multi_turn.csv for detailed evaluation results on each sub-section categories respectively. 2025-05-28T20:54:07Z (eval_factory_lib.execution.infra) - INFO: Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully. INFO:eval_factory_lib.execution.infra:Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully.
The contents of the ZIP file depend on the evaluation job and configuration. Typically, it includes log files generated during the evaluation process.