OTel and MLflow Telemetry
What You Will Learn
This tutorial walks through AIPerf’s telemetry integrations:
- Live OTel streaming — push GenAI-spec metrics to an OpenTelemetry Collector in real time during a benchmark run.
- Live MLflow logging — record per-request scalars to an MLflow tracking server as the run executes.
- Post-run artifact upload — automatically upload the JSON/CSV exports, metadata, and plots to the same MLflow run after profiling completes.
By the end you will have a single aiperf profile command that streams metrics to both sinks and a follow-up aiperf plot command that attaches visualizations to the MLflow run.
Prerequisites
Install AIPerf with the optional telemetry extras:
You also need:
Verify both are reachable before continuing:
Run a Profile with Telemetry Enabled
Flag breakdown
Inspect Live OTel Data
While the benchmark runs, metrics flow to your OTel Collector and from there to any configured backend (Prometheus, Grafana, Jaeger, etc.).
AIPerf emits metrics using OTel GenAI semantic conventions:
Metrics carry standard GenAI attributes (gen_ai.operation.name, gen_ai.provider.name, gen_ai.request.model) plus AIPerf-specific dimensions prefixed with aiperf.*.
Example Grafana query
Inspect Live MLflow Data
Open the MLflow UI at http://localhost:5000. Navigate to the experiment you specified and select the active run.
During profiling you will see live scalars logged under the live.* namespace:
live.gen_ai.client.operation.durationlive.gen_ai.client.token.usagelive.gen_ai.client.operation.time_per_output_chunklive.gen_ai.client.operation.time_to_first_chunk
These update in near real time as each request completes. Refresh the MLflow metrics tab to see the curves build up.
Post-Run Artifact Upload
When the benchmark finishes, AIPerf performs a deferred export:
- Local exporters write JSON and CSV files to the output directory.
- The MLflow data exporter detects the live run via
mlflow_export.json(written during the run). - All artifacts (JSON export, CSV export, GPU telemetry, metadata) are uploaded to the same MLflow run.
The mlflow_export.json file records the mapping between the local run and the MLflow run:
Grouping benchmarks under a parent run
Use --mlflow-parent-run-id to organize multiple benchmarks as child runs under a single parent. This is useful for parameter sweeps or A/B comparisons.
In the MLflow UI the parent run shows both child runs nested beneath it, making it straightforward to compare concurrency=4 vs concurrency=8 side by side.
Attach Plots
After profiling, generate and upload plots to the same MLflow run:
The --mlflow-upload flag reads mlflow_export.json from the input directory and uploads the generated PNG files as artifacts on the existing run. The plots appear under the run’s artifact tab in MLflow.
Customising provider.name
AIPerf infers the gen_ai.provider.name attribute (provider name) from the endpoint URL hostname. For example, requests to api.openai.com resolve to openai.
When auto-inference doesn’t match your setup (e.g. you’re running vLLM on localhost), override it explicitly:
The value you pass appears as gen_ai.provider.name on every emitted metric and as the gen_ai.provider.name tag in MLflow.
Troubleshooting
Metric name migration
If you previously relied on aiperf.* metric names (aiperf.request_latency_ns, etc.), AIPerf now emits OTel GenAI spec names (gen_ai.client.operation.duration, etc.) in seconds. Dashboards querying the old names must be updated; see the mapping table in docs/metrics-reference.md.
Collector unreachable
AIPerf logs a warning and continues the benchmark without streaming. The run itself is not affected. Check that your collector is running and the port is correct. If using Docker, ensure the container port is mapped to the host.
MLflow timeout
Verify the tracking server is running. For remote servers, check network connectivity and firewall rules. You can also set MLFLOW_HTTP_REQUEST_TIMEOUT to increase the timeout.
Missing optional dependencies
Install the required extras: