MLflow Exporter (`mlflow`)#

Exports accuracy metrics and artifacts to an MLflow Tracking Server.

Purpose: Centralize metrics, parameters, and artifacts in MLflow for experiment tracking
Requirements: mlflow package installed and a reachable MLflow tracking server

Usage#

Export evaluation results to MLflow Tracking Server for centralized experiment management.

Auto-Export (Recommended)

Configure MLflow export to run automatically after evaluation completes. Add MLflow configuration to your run config YAML file:

execution:
  auto_export:
    destinations: ["mlflow"]
    configs:
      mlflow:
        tracking_uri: "http://mlflow.example.com:5000"
        experiment_name: "llm-evaluation"
        description: "Llama 3.1 8B evaluation"
        log_metrics: ["accuracy", "f1"]
        tags:
          model_family: "llama"
          version: "3.1"
        extra_metadata:
          hardware: "A100"
          batch_size: 32
        log_artifacts: true

target:
  api_endpoint:
    model_id: meta/llama-3.1-8b-instruct
    url: https://integrate.api.nvidia.com/v1/chat/completions

evaluation:
  tasks:
    - name: simple_evals.mmlu

Run the evaluation with auto-export enabled:

nemo-evaluator-launcher run --config-dir . --config-name my_config

Manual Export (Python API)

Export results programmatically after evaluation completes:

from nemo_evaluator_launcher.api.functional import export_results

# Basic MLflow export
export_results(
    invocation_ids=["8abcd123"], 
    dest="mlflow", 
    config={
        "tracking_uri": "http://mlflow:5000",
        "experiment_name": "model-evaluation"
    }
)

# Export with metadata and tags
export_results(
    invocation_ids=["8abcd123"], 
    dest="mlflow", 
    config={
        "tracking_uri": "http://mlflow:5000",
        "experiment_name": "llm-benchmarks",
        "run_name": "llama-3.1-8b-mmlu",
        "description": "Evaluation of Llama 3.1 8B on MMLU",
        "tags": {
            "model_family": "llama",
            "model_version": "3.1",
            "benchmark": "mmlu"
        },
        "log_metrics": ["accuracy"],
        "extra_metadata": {
            "hardware": "A100-80GB",
            "batch_size": 32
        }
    }
)

# Export with artifacts disabled
export_results(
    invocation_ids=["8abcd123"], 
    dest="mlflow", 
    config={
        "tracking_uri": "http://mlflow:5000",
        "experiment_name": "model-comparison",
        "log_artifacts": False
    }
)

# Skip if run already exists
export_results(
    invocation_ids=["8abcd123"], 
    dest="mlflow", 
    config={
        "tracking_uri": "http://mlflow:5000",
        "experiment_name": "nightly-evals",
        "skip_existing": True
    }
)

Configuration Parameters#

Parameter	Type	Description	Default
`tracking_uri`	str	MLflow tracking server URI	Required
`experiment_name`	str	MLflow experiment name	`"nemo-evaluator-launcher"`
`run_name`	str	Run display name	Auto-generated
`description`	str	Run description	None
`tags`	dict[str, str]	Custom tags for the run	None
`extra_metadata`	dict	Additional parameters logged to MLflow	None
`skip_existing`	bool	Skip export if run exists for invocation	`false`
`log_metrics`	list[str]	Filter metrics by substring match	All metrics
`log_artifacts`	bool	Upload evaluation artifacts	`true`

MLflow Exporter (mlflow)#

Usage#

Configuration Parameters#

MLflow Exporter (`mlflow`)#