MLflow Exporter (mlflow
)#
Exports accuracy metrics and artifacts to an MLflow Tracking Server.
Purpose: Centralize metrics, parameters, and artifacts in MLflow for experiment tracking
Requirements:
mlflow
package installed and a reachable MLflow tracking server
Usage#
Export evaluation results to MLflow Tracking Server for centralized experiment management.
Configure MLflow export to run automatically after evaluation completes. Add MLflow configuration to your run config YAML file:
execution:
auto_export:
destinations: ["mlflow"]
# Export-related env vars (placeholders expanded at runtime)
env_vars:
export:
MLFLOW_TRACKING_URI: MLFLOW_TRACKING_URI # or set tracking_uri under export.mflow
PATH: "/path/to/conda/env/bin:$PATH"
export:
mlflow:
tracking_uri: "http://mlflow.example.com:5000"
experiment_name: "llm-evaluation"
description: "Llama 3.1 8B evaluation"
log_metrics: ["accuracy", "f1"]
tags:
model_family: "llama"
version: "3.1"
extra_metadata:
hardware: "A100"
batch_size: 32
log_artifacts: true
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct
url: https://integrate.api.nvidia.com/v1/chat/completions
evaluation:
tasks:
- name: simple_evals.mmlu
Run the evaluation with auto-export enabled:
nemo-evaluator-launcher run --config-dir . --config-name my_config
Export results programmatically after evaluation completes:
from nemo_evaluator_launcher.api.functional import export_results
# Basic MLflow export
export_results(
invocation_ids=["8abcd123"],
dest="mlflow",
config={
"tracking_uri": "http://mlflow:5000",
"experiment_name": "model-evaluation"
}
)
# Export with metadata and tags
export_results(
invocation_ids=["8abcd123"],
dest="mlflow",
config={
"tracking_uri": "http://mlflow:5000",
"experiment_name": "llm-benchmarks",
"run_name": "llama-3.1-8b-mmlu",
"description": "Evaluation of Llama 3.1 8B on MMLU",
"tags": {
"model_family": "llama",
"model_version": "3.1",
"benchmark": "mmlu"
},
"log_metrics": ["accuracy"],
"extra_metadata": {
"hardware": "A100-80GB",
"batch_size": 32
}
}
)
# Export with artifacts disabled
export_results(
invocation_ids=["8abcd123"],
dest="mlflow",
config={
"tracking_uri": "http://mlflow:5000",
"experiment_name": "model-comparison",
"log_artifacts": False
}
)
# Skip if run already exists
export_results(
invocation_ids=["8abcd123"],
dest="mlflow",
config={
"tracking_uri": "http://mlflow:5000",
"experiment_name": "nightly-evals",
"skip_existing": True
}
)
Export results after evaluation completes:
# Default export
nemo-evaluator-launcher export 8abcd123 --dest mlflow
# With overrides
nemo-evaluator-launcher export 8abcd123 --dest mlflow \
-o export.mlflow.tracking_uri=http://mlflow:5000 \
-o export.mlflow.experiment_name=my-exp
# With metric filtering
nemo-evaluator-launcher export 8abcd123 --dest mlflow --log-metrics accuracy pass@1
Configuration Parameters#
Parameter |
Type |
Description |
Default |
---|---|---|---|
|
str |
MLflow tracking server URI |
Required if env var |
|
str |
MLflow experiment name |
|
|
str |
Run display name |
Auto-generated |
|
str |
Run description |
None |
|
dict[str, str] |
Custom tags for the run |
None |
|
dict |
Additional parameters logged to MLflow |
None |
|
bool |
Skip export if run exists for invocation. Useful to avoid creating duplicate runs when re-exporting. |
|
|
list[str] |
Filter metrics by substring match |
All metrics |
|
bool |
Upload evaluation artifacts |
|
|
bool |
Upload execution logs |
|
|
bool |
Copy only required artifacts |
|