Download Evaluation Job Logs#

To download the logs for an evaluation job, send a GET request to the evaluation/jobs/<job_id>/logs API. This downloads a ZIP file containing the logs generated by the evaluation job.

Prerequisites#

Create a job.

v2 (Preview)#

Warning

v2 API Preview: The v2 API is available for testing and feedback but is not yet recommended for production use. Breaking changes may occur before the stable release.

The v2 API provides real-time log access with JSON responses and pagination support.

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['EVALUATOR_BASE_URL']
)

# Get logs with pagination (v2 API)
job_id = "job-id"
all_messages = []
logs = client.v2.evaluation.jobs.logs.list(job_id=job_id)
while logs.next_page is not None:
    # Process logs
    all_messages.extend(log.message for log in logs.data)
    # Handle pagination
    logs = client.v2.evaluation.jobs.logs.list(job_id=job_id, page_cursor=logs.next_page)
all_messages.extend(log.message for log in logs.data)
print("".join(all_messages))

cURL

# Get logs (first page)
curl -X "GET" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/<job_id>/logs" \
  -H 'accept: application/json'

# Get logs with pagination
curl -X "GET" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/<job_id>/logs?page_cursor=<cursor>" \
  -H 'accept: application/json'

# Parse log messages only (simulate log file)
curl -X "GET" "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/<job_id>/logs" \
  -H 'accept: application/json' | jq -r '.data[].message'

v2 Log Processing#

Since v2 returns structured JSON instead of log files, you can process logs programmatically:

# Extract just the log messages to simulate a log file
curl -s "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/${JOB_ID}/logs" | \
  jq -r '.logs[] | "\(.timestamp) \(.message)"' > job_logs.txt

# Get all logs with pagination
JOB_ID="job-dq1pjj6vj5p64xaeqgvuk4"
CURSOR=""
while true; do
  if [ -z "$CURSOR" ]; then
    RESPONSE=$(curl -s "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/${JOB_ID}/logs")
  else
    RESPONSE=$(curl -s "${EVALUATOR_BASE_URL}/v2/evaluation/jobs/${JOB_ID}/logs?page_cursor=${CURSOR}")
  fi
  
  # Extract messages and append to file
  echo "$RESPONSE" | jq -r '.logs[] | "\(.timestamp) \(.message)"' >> all_logs.txt
  
  # Check for next page
  CURSOR=$(echo "$RESPONSE" | jq -r '.next_page // empty')
  [ -z "$CURSOR" ] && break
done

Real-Time Log Monitoring#

With v2, you can monitor logs in real-time as they are generated:

import time

def monitor_job_logs(client, job_id, poll_interval=5):
    """Monitor job logs in real-time."""
    last_log_id = 0
    
    while True:
        # Get latest logs
        logs_response = client.evaluation.jobs.get_logs(job_id)
        
        # Process new logs
        new_logs = [log for log in logs_response.logs if log.id > last_log_id]
        
        for log_entry in new_logs:
            print(f"[{log_entry.timestamp}] {log_entry.message.strip()}")
            last_log_id = max(last_log_id, log_entry.id)
        
        # Check if job is still running
        job = client.evaluation.jobs.retrieve(job_id)
        if job.status in ["completed", "failed"]:
            break
            
        time.sleep(poll_interval)

v1 (Current)#

Additional Prerequisites

Get the job status and confirm it has terminated, either in COMPLETED or FAILED status.

The v1 API downloads logs as a ZIP file containing log files.

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['EVALUATOR_BASE_URL']
)

# Download job logs (v1 API)
job_id = "job-id"
logs_zip = client.evaluation.jobs.download_logs(job_id)

# Save to file
with open(f"{job_id}_logs.zip", 'wb') as file:
    file.write(logs_zip)
    
print("Download completed.")

cURL

curl -X "GET" "${EVALUATOR_BASE_URL}/v1/evaluation/jobs/<job_id>/logs" \
  -o <job_id>_logs.zip

Job Logs#

After the download completes, the logs are available in the <job_id>_logs.zip file.

Unzip the file on Ubuntu, macOS, or Linux using the following command:
```
unzip <job_id>_logs.zip -d logs
```

Review example log result.

Example Response

INFO:nveval.utils.process_custom_data:Validation passed for /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/BFCL_v3_simple.json and /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/possible_answer/BFCL_v3_simple.json
INFO:nveval.adapter:Running command: ['export BFCL_DATA_DIR=/jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat', 'bfcl generate --skip-server-setup --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --limit 5', 'bfcl evaluate --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --score-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores --limit 5', 'unset BFCL_DATA_DIR']
environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used.
Generating results for ['meta/llama-3.1-8b-instruct']
Running full test cases for categories: ['simple'].
Generating results for meta/llama-3.1-8b-instruct:   0%|          | 0/2 [00:00<?, ?it/s]
Generating results for meta/llama-3.1-8b-instruct:  50%|█████     | 1/2 [00:00<00:00,  3.09it/s]
Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00,  4.37it/s]
Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00,  4.12it/s]
environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used.
Number of models evaluated:   0%|          | 0/1 [00:00<?, ?it/s]
Number of models evaluated: 100%|██████████| 1/1 [00:00<00:00, 41.87it/s]
🦍 Model: meta_llama-3.1-8b-instruct
🔍 Running test: simple
✅ Test completed: simple. 🎯 Accuracy: 1.0
📈 Aggregating data to generate leaderboard score table...
🏁 Evaluation completed. See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_overall.csv for overall evaluation results on BFCL V3.
See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_live.csv, /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_non_live.csv and /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_multi_turn.csv for detailed evaluation results on each sub-section categories respectively.
2025-05-28T20:54:07Z (eval_factory_lib.execution.infra) - INFO: Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully.
INFO:eval_factory_lib.execution.infra:Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully.

The contents of the ZIP file depend on the evaluation job and configuration. Typically, it includes log files generated during the evaluation process.