Download Evaluation Job Logs#

To download the logs for an evaluation job, send a GET request to the evaluation/jobs/<job_id>/logs API. This downloads a ZIP file containing the logs generated by the evaluation job.

Prerequisites#


Options#

API#

  1. Submit a GET request to /v1/evaluation/jobs/<job_id>/logs.

    curl -X "GET" "${EVALUATOR_SERVICE_URL}/v1/evaluation/jobs/<job_id>/logs" \
      -o <job_id>_logs.zip
    
    import requests
    
    url = f"{EVALUATOR_SERVICE_URL}/v1/evaluation/jobs/<job_id>/logs"
    
    response = requests.get(url, stream=True)
    
    with open(f"<job_id>_logs.zip", 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
            file.write(chunk)
    
    print("Download completed.")
    

    After the download completes, the logs are available in the <job_id>_logs.zip file.

  2. Unzip the file on Ubuntu, macOS, or Linux using the following command:

    unzip <job_id>_logs.zip -d logs
    
  3. Review example log result.

    Example Response
    INFO:nveval.utils.process_custom_data:Validation passed for /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/BFCL_v3_simple.json and /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat/possible_answer/BFCL_v3_simple.json
    INFO:nveval.adapter:Running command: ['export BFCL_DATA_DIR=/jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat', 'bfcl generate --skip-server-setup --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --limit 5', 'bfcl evaluate --result-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/results --model meta/llama-3.1-8b-instruct --model-handler openai-apicompatible --test-category simple --score-dir /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores --limit 5', 'unset BFCL_DATA_DIR']
    environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used.
    Generating results for ['meta/llama-3.1-8b-instruct']
    Running full test cases for categories: ['simple'].
    Generating results for meta/llama-3.1-8b-instruct:   0%|          | 0/2 [00:00<?, ?it/s]
    Generating results for meta/llama-3.1-8b-instruct:  50%|█████     | 1/2 [00:00<00:00,  3.09it/s]
    Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00,  4.37it/s]
    Generating results for meta/llama-3.1-8b-instruct: 100%|██████████| 2/2 [00:00<00:00,  4.12it/s]
    environment variable BFCL_DATA_DIR is set as /jobs/datasets/eval-2rDA3TM9Kt4rozrn8B6mrg/default/kngo-eval-test-datasets/bfcl-custom-evalfactory/nativeformat. Custom data directory is used.
    Number of models evaluated:   0%|          | 0/1 [00:00<?, ?it/s]
    Number of models evaluated: 100%|██████████| 1/1 [00:00<00:00, 41.87it/s]
    🦍 Model: meta_llama-3.1-8b-instruct
    🔍 Running test: simple
    ✅ Test completed: simple. 🎯 Accuracy: 1.0
    📈 Aggregating data to generate leaderboard score table...
    🏁 Evaluation completed. See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_overall.csv for overall evaluation results on BFCL V3.
    See /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_live.csv, /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_non_live.csv and /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/task1/scores/data_multi_turn.csv for detailed evaluation results on each sub-section categories respectively.
    2025-05-28T20:54:07Z (eval_factory_lib.execution.infra) - INFO: Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully.
    INFO:eval_factory_lib.execution.infra:Job script /jobs/results/eval-2rDA3TM9Kt4rozrn8B6mrg/job_commands.sh executed successfully.
    

The contents of the ZIP file depend on the evaluation job and configuration. Typically, it includes log files generated during the evaluation process.