Local Executor#

The Local executor runs evaluations on your machine using Docker. It provides a fast way to iterate if you have Docker installed, evaluating existing endpoints.

See common concepts and commands in Executors.

Prerequisites#

Quick Start#

For detailed step-by-step instructions on evaluating existing endpoints, refer to the NeMo Evaluator Launcher Quickstart guide, which covers:

  • Choosing models and tasks

  • Setting up API keys (for NVIDIA APIs, see Setting up API Keys)

  • Creating configuration files

  • Running evaluations

Here’s a quick overview for the Local executor:

Run evaluation for existing endpoint#

# Run evaluation
nemo-evaluator-launcher run --config-dir examples --config-name local_llama_3_1_8b_instruct \
  -o target.api_endpoint.api_key_name=API_KEY

Environment Variables#

The Local executor supports passing environment variables from your local machine to evaluation containers:

How It Works#

The executor passes environment variables to Docker containers using docker run -e KEY=VALUE flags. The executor automatically adds $ to your variable names from the configuration env_vars (for example, OPENAI_API_KEY becomes $OPENAI_API_KEY).

Configuration#

evaluation:
  env_vars:
    API_KEY: YOUR_API_KEY_ENV_VAR_NAME
    CUSTOM_VAR: YOUR_CUSTOM_ENV_VAR_NAME
  tasks:
    - name: my_task
      env_vars:
        TASK_SPECIFIC_VAR: TASK_ENV_VAR_NAME

Secrets and API Keys#

The executor handles API keys the same way as environment variables - store them as environment variables on your machine and reference them in the env_vars configuration.

Mounting and Storage#

The Local executor uses Docker volume mounts for data persistence:

Docker Volumes#

  • Results Mount: Each task’s artifacts directory mounts as /results in evaluation containers

  • No Custom Mounts: Local executor doesn’t support custom volume mounts

Rerunning Evaluations#

The Local executor generates reusable scripts for rerunning evaluations:

Script Generation#

The Local executor automatically generates scripts:

  • run_all.sequential.sh: Script to run all evaluation tasks sequentially (in output directory)

  • run.sh: Individual scripts for each task (in each task subdirectory)

  • Reproducible: Scripts contain all necessary commands and configurations

Manual Rerun#

# Rerun all tasks
cd /path/to/output_dir/2024-01-15-10-30-45-abc12345/
bash run_all.sequential.sh

# Rerun specific task
cd /path/to/output_dir/2024-01-15-10-30-45-abc12345/task1/
bash run.sh

Key Features#

  • Docker-based execution: Isolated, reproducible runs

  • OpenAI-compatible endpoint support: Works with any OpenAI-compatible endpoint

  • Script generation: Reusable scripts for rerunning evaluations

  • Real-time logs: Status tracking via log files

Monitoring and Job Management#

For monitoring jobs, checking status, and managing evaluations, see Executors.