Evaluation Utilities Reference#

Complete reference for evaluation discovery and utility functions in NeMo Evaluator.

nemo_evaluator.show_available_tasks()#

Discovers and displays all available evaluation tasks across installed evaluation frameworks.

Function Signature#

def show_available_tasks() -> None

Returns#

Type

Description

None

Prints available tasks to stdout

Description#

This function scans all installed core_evals packages and prints a hierarchical list of available evaluation tasks organized by framework. Use this function to discover which benchmarks and tasks are available in your environment.

The function automatically detects:

  • Installed frameworks: lm-evaluation-harness, simple-evals, bigcode, BFCL

  • Available tasks: All tasks defined in each framework’s configuration

  • Installation status: Displays message if no evaluation packages are installed

Usage Examples#

Basic Task Discovery#

from nemo_evaluator import show_available_tasks

# Display all available evaluations
show_available_tasks()

# Example output:
# lm-evaluation-harness: 
#   * mmlu
#   * gsm8k
#   * arc_challenge
#   * hellaswag
# simple-evals:
#   * AIME_2025
#   * humaneval
#   * drop
# bigcode:
#   * mbpp
#   * humaneval
#   * apps

Programmatic Task Discovery#

For programmatic access to task information, use the launcher API:

from nemo_evaluator_launcher.api.functional import get_tasks_list

# Get structured task information
tasks = get_tasks_list()
for task in tasks:
    task_name, endpoint_type, harness, container = task
    print(f"Task: {task_name}, Type: {endpoint_type}, Framework: {harness}")

To filter tasks using the CLI:

# List all tasks
nemo-evaluator-launcher ls tasks

# Filter for specific tasks
nemo-evaluator-launcher ls tasks | grep mmlu

Check Installation Status#

from nemo_evaluator import show_available_tasks

# Check if evaluation packages are installed
print("Available evaluation frameworks:")
show_available_tasks()

# If no packages installed, you'll see:
# NO evaluation packages are installed.

Installation Requirements#

To use this function, install evaluation framework packages:

# Install all frameworks
pip install nvidia-lm-eval nvidia-simple-evals nvidia-bigcode-eval nvidia-bfcl

# Or install selectively
pip install nvidia-lm-eval        # LM Evaluation Harness
pip install nvidia-simple-evals   # Simple Evals
pip install nvidia-bigcode-eval   # BigCode benchmarks
pip install nvidia-bfcl           # Berkeley Function Calling Leaderboard

Error Handling#

The function handles missing packages:

from nemo_evaluator import show_available_tasks

# Safely check for available tasks
try:
    show_available_tasks()
except ImportError as e:
    print(f"Error: {e}")
    print("Install evaluation frameworks: pip install nvidia-lm-eval")

Integration with Evaluation Workflows#

Pre-Flight Task Verification#

Verify task availability before running evaluations:

from nemo_evaluator_launcher.api.functional import get_tasks_list

def verify_task_available(task_name: str) -> bool:
    """Check if a specific task is available."""
    tasks = get_tasks_list()
    return any(task[0] == task_name for task in tasks)

# Usage
if verify_task_available("mmlu"):
    print("✓ MMLU is available")
else:
    print("✗ MMLU not found. Install evaluation framework packages")

Filter Tasks by Endpoint Type#

Use task discovery to filter by endpoint type:

from nemo_evaluator_launcher.api.functional import get_tasks_list

# Get all chat endpoint tasks
tasks = get_tasks_list()
chat_tasks = [task[0] for task in tasks if task[1] == "chat"]
completions_tasks = [task[0] for task in tasks if task[1] == "completions"]

print(f"Chat tasks: {chat_tasks[:5]}")  # Show first five
print(f"Completions tasks: {completions_tasks[:5]}")

Framework Selection#

When a task is provided by more than one framework, use explicit framework specification in your configuration:

from nemo_evaluator.api.api_dataclasses import EvaluationConfig, ConfigParams

# Explicit framework specification
config = EvaluationConfig(
    type="lm-evaluation-harness.mmlu",  # Instead of just "mmlu"
    params=ConfigParams(task="mmlu")
)

Troubleshooting#

Problem: “NO evaluation packages are installed”#

Solution:

# Install evaluation frameworks
pip install nvidia-lm-eval nvidia-simple-evals nvidia-bigcode-eval nvidia-bfcl

# Verify installation
python -c "from nemo_evaluator import show_available_tasks; show_available_tasks()"

Problem: Task not appearing in list#

Solution:

# Install the required framework package
pip install nvidia-lm-eval

# Verify installation
python -c "from nemo_evaluator import show_available_tasks; show_available_tasks()"

Problem: Task conflicts between frameworks#

When a task name is provided by more than one framework (for example, both lm-evaluation-harness and simple-evals provide mmlu), use explicit framework specification:

Solution:

# Use explicit framework.task format in your configuration overrides
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_llama_3_1_8b_instruct.yaml \
    -o 'evaluation.tasks=["lm-evaluation-harness.mmlu"]'