CLI Workflows#

This document explains how to use evaluation containers within NeMo Evaluator workflows, focusing on command execution and configuration.

Overview#

Evaluation containers provide consistent, reproducible environments for running AI model evaluations. For a comprehensive list of all available containers, refer to NeMo Evaluator Containers.

Basic CLI#

Using YAML Configuration#

Define your config:

config:
  type: mmlu_pro
  output_dir: /workspace/results
  params:
    limit_samples: 10
target:
  api_endpoint:
    url: https://integrate.api.nvidia.com/v1/chat/completions
    model_id: meta/llama-3.1-8b-instruct
    type: chat
    api_key: NGC_API_KEY

Run evaluation:

export HF_TOKEN=hf_xxx
export NGC_API_KEY=nvapi-xxx

nemo-evaluator run_eval \
  --run_config /workspace/my_config.yml

Using CLI overrides#

Provide all arguments through CLI:

export HF_TOKEN=hf_xxx
export NGC_API_KEY=nvapi-xxx

nemo-evaluator run_eval \
    --eval_type mmlu_pro \
    --model_id meta/llama-3.1-8b-instruct \
    --model_url https://integrate.api.nvidia.com/v1/chat/completions \
    --model_type chat \
    --api_key_name NGC_API_KEY \
    --output_dir /workspace/results \
    --overrides 'config.params.limit_samples=10'

Interceptor Configuration#

The adapter system uses interceptors to modify requests and responses. Configure interceptors using the --overrides parameter.

For detailed interceptor configuration, refer to Interceptors.

Note

Always remember to include endpoint Interceptor at the and of your custom Interceptors chain.

Enable Request Logging#

config:
  type: mmlu_pro
  output_dir: /workspace/results
  params:
    limit_samples: 10
target:
  api_endpoint:
    url: https://integrate.api.nvidia.com/v1/chat/completions
    model_id: meta/llama-3.1-8b-instruct
    type: chat
    api_key: NGC_API_KEY
    adapter_config:
      interceptors:
        - name: "request_logging"
            enabled: true
            config:
              max_requests: 1000
        - name: "endpoint"
          enabled: true
          config: {}

export HF_TOKEN=hf_xxx
export NGC_API_KEY=nvapi-xxx

nemo-evaluator run_eval \
  --run_config /workspace/my_config.yml

Enable Caching#

config:
  type: mmlu_pro
  output_dir: /workspace/results
  params:
    limit_samples: 10
target:
  api_endpoint:
    url: https://integrate.api.nvidia.com/v1/chat/completions
    model_id: meta/llama-3.1-8b-instruct
    type: chat
    api_key: NGC_API_KEY
    adapter_config:
      interceptors:
        - name: "caching"
          enabled: true
          config:
            cache_dir: "./evaluation_cache"
            reuse_cached_responses: true
            save_requests: true
            save_responses: true
            max_saved_requests: 1000
            max_saved_responses: 1000
        - name: "endpoint"
          enabled: true
          config: {}

export HF_TOKEN=hf_xxx
export NGC_API_KEY=nvapi-xxx

nemo-evaluator run_eval \
  --run_config /workspace/my_config.yml

Multiple Interceptors#

config:
  type: mmlu_pro
  output_dir: /workspace/results
  params:
    limit_samples: 10
target:
  api_endpoint:
    url: https://integrate.api.nvidia.com/v1/chat/completions
    model_id: meta/llama-3.1-8b-instruct
    type: chat
    api_key: NGC_API_KEY
    adapter_config:
      interceptors:
        - name: "caching"
          enabled: true
          config:
            cache_dir: "./evaluation_cache"
            reuse_cached_responses: true
            save_requests: true
            save_responses: true
            max_saved_requests: 1000
            max_saved_responses: 1000
        - name: "request_logging"
            enabled: true
            config:
              max_requests: 1000
        - name: "reasoning"
          config:
            start_reasoning_token: "<think>"
            end_reasoning_token: "</think>"
            add_reasoning: true
            enable_reasoning_tracking: true
        - name: "endpoint"
          enabled: true
          config: {}

export HF_TOKEN=hf_xxx
export NGC_API_KEY=nvapi-xxx

nemo-evaluator run_eval \
  --run_config /workspace/my_config.yml

Legacy Configuration Support#

Provide Interceptor configuration with --overrides flag:

nemo-evaluator run_eval \
    --eval_type mmlu_pro \
    --model_id meta/llama-3.1-8b-instruct \
    --model_url https://integrate.api.nvidia.com/v1/chat/completions \
    --model_type chat \
    --api_key_name NGC_API_KEY \
    --output_dir ./results \
    --overrides 'target.api_endpoint.adapter_config.use_request_logging=True,target.api_endpoint.adapter_config.max_saved_requests=1000,target.api_endpoint.adapter_config.use_caching=True,target.api_endpoint.adapter_config.caching_dir=./cache,target.api_endpoint.adapter_config.reuse_cached_responses=True'

Note

Legacy parameters will be automatically converted to the modern interceptor-based configuration. For new projects, use the YAML interceptor configutation shown above.

Troubleshooting#

Port Conflicts#

If you manually specify the adapter server port, you can encounter port conflicts. Try selecting a differnt port:

export ADAPTER_PORT=3828
export ADAPTER_HOST=localhost

Note

You can also rely on NeMo Evaluator’s dynamic port binding feature.

API Key Issues#

Verify your API key environment variable:

echo $MY_API_KEY

Environment Variables#

Adapter Server Configuration#

export ADAPTER_PORT=3828  # Default: 3825
export ADAPTER_HOST=localhost

API Key Management#

export MY_API_KEY=your_api_key_here
export HF_TOKEN=your_hf_token_here