AIPerf Server Metrics Parquet Export Schema

View as Markdown

Schema reference for the server_metrics_export.parquet file. Optimized for SQL analytics with DuckDB, pandas, and Polars.

Overview

The Parquet export provides raw time-series data with cumulative delta calculations applied at each timestamp. Uses a normalized schema where histogram buckets are separate rows (not wide columns), producing ~50% smaller files.

Enable Parquet Export

$aiperf profile --model MODEL ... --server-metrics-formats json csv parquet

Delta Calculations

All values are deltas from a reference point (last sample before profiling period):

Metric TypeValue Semantics
GaugeRaw value at timestamp (no delta)
CounterCumulative delta from reference (value[t] - value[ref])
HistogramCumulative deltas for sum, count, and each bucket_count

Negative deltas (counter resets) are clamped to 0.


Schema Definition

Fixed Columns

ColumnTypeNullableDescription
endpoint_urlstringNoPrometheus endpoint URL (e.g., http://localhost:8000/metrics)
metric_namestringNoMetric name (e.g., vllm:kv_cache_usage_perc)
metric_typestringNogauge, counter, or histogram
unitstringYesInferred unit (seconds, tokens, requests, ratio, etc.)
descriptionstringYesMetric HELP text from Prometheus
timestamp_nsint64NoCollection timestamp in nanoseconds since epoch

Value Columns

ColumnTypeNullableUsed ByDescription
valuefloat64YesGauge, CounterMetric value (raw for gauge, delta for counter)
sumfloat64YesHistogramCumulative sum delta from reference
countfloat64YesHistogramCumulative count delta from reference
bucket_lestringYesHistogramBucket upper bound (e.g., 0.1, +Inf)
bucket_countfloat64YesHistogramCumulative bucket count delta (observations <= bucket_le)

Dynamic Label Columns

Prometheus labels become individual columns (alphabetically sorted):

ColumnTypeNullableDescription
enginestringYesvLLM engine ID
engine_typestringYesEngine type (trtllm, unified)
finished_reasonstringYesRequest completion reason
model_namestringYesModel identifier
dynamo_componentstringYesDynamo worker component
tp_rankstringYesTensor parallel rank
pp_rankstringYesPipeline parallel rank
stagestringYesSGLang processing stage
(others)stringYesAny additional Prometheus labels

Label columns vary by endpoint/model. Use union_by_name=true for cross-file queries.

Note: Prometheus labels that conflict with reserved column names (endpoint_url, metric_name, metric_type, unit, description, timestamp_ns, value, sum, count, bucket_le, bucket_count) are silently excluded.


Row Structure by Metric Type

Column order: fixed columns → label columns (alphabetically) → value columns.

Gauge/Counter: One Row per Timestamp

endpoint_url | metric_name | metric_type | unit | description |timestamp_ns | model_name | value | sum | count | bucket_le | bucket_count
-------------|--------------------------|-------------|-------|-------------|---------------------|--------------|-------|------|-------|-----------|-------------
http://... | vllm:kv_cache_usage_perc | gauge | ratio | KV-cache... | 1765793061967310848 | Qwen/Qwen3-0.6B | 0.72 | null | null | null | null
http://... | vllm:request_success | counter | null | Count of... | 1765793061967310848 | Qwen/Qwen3-0.6B | 150.0 | null | null | null | null

Histogram: N Rows per Timestamp (One per Bucket)

endpoint_url | metric_name | metric_type | unit | description | timestamp_ns | model_name | value | sum | count | bucket_le | bucket_count
-------------|----------------------------------|-------------|---------|--------------|---------------------|-----------------|-------|--------|-------|-----------|-------------
http://... | vllm:e2e_request_latency_seconds | histogram | seconds | Histogram... | 1765793061967310848 | Qwen/Qwen3-0.6B | null | 259.87 | 19.0 | 0.3 | 0.0
http://... | vllm:e2e_request_latency_seconds | histogram | seconds | Histogram... | 1765793061967310848 | Qwen/Qwen3-0.6B | null | 259.87 | 19.0 | 1.0 | 1.0
http://... | vllm:e2e_request_latency_seconds | histogram | seconds | Histogram... | 1765793061967310848 | Qwen/Qwen3-0.6B | null | 259.87 | 19.0 | 5.0 | 3.0
http://... | vllm:e2e_request_latency_seconds | histogram | seconds | Histogram... | 1765793061967310848 | Qwen/Qwen3-0.6B | null | 259.87 | 19.0 | +Inf | 19.0

File Metadata

Parquet file metadata (accessible via pq.read_metadata()) includes:

KeyDescription
aiperf.schema_versionSchema version (1.0)
aiperf.versionAIPerf version
aiperf.benchmark_idUnique benchmark UUID
aiperf.exporterExporter class name (ServerMetricsParquetExporter)
aiperf.export_timestamp_utcExport timestamp (ISO 8601)
aiperf.time_filter_start_nsProfiling period start (nanoseconds)
aiperf.time_filter_end_nsProfiling period end (nanoseconds)
aiperf.profiling_duration_nsProfiling duration (nanoseconds)
aiperf.profiling_duration_secondsProfiling duration (seconds)
aiperf.endpoint_urlsJSON array of endpoint URLs
aiperf.endpoint_countNumber of endpoints
aiperf.label_columnsJSON array of label column names
aiperf.label_countNumber of label columns
aiperf.metric_countTotal unique metrics
aiperf.metric_type_countsJSON object: {"gauge": N, "counter": N, "histogram": N}
aiperf.model_namesJSON array of model names
aiperf.concurrencyBenchmark concurrency setting
aiperf.request_rateBenchmark request rate (if set)
aiperf.input_configFull user configuration (JSON)
aiperf.hostnameCollection host
aiperf.python_versionPython version
aiperf.pyarrow_versionPyArrow version
aiperf.schema_noteCross-file query hint

Compression: Snappy (good compression ratio with fast decompression)


Example Queries

DuckDB

1-- Time-series for a specific metric
2SELECT timestamp_ns, value
3FROM 'server_metrics_export.parquet'
4WHERE metric_name = 'vllm:kv_cache_usage_perc'
5ORDER BY timestamp_ns;
6
7-- Filter by label
8SELECT timestamp_ns, value
9FROM 'server_metrics_export.parquet'
10WHERE metric_name = 'vllm:request_success'
11 AND model_name = 'Qwen/Qwen3-0.6B'
12ORDER BY timestamp_ns;
13
14-- Histogram bucket distribution at final timestamp
15SELECT bucket_le, bucket_count
16FROM 'server_metrics_export.parquet'
17WHERE metric_name = 'vllm:e2e_request_latency_seconds'
18 AND timestamp_ns = (SELECT MAX(timestamp_ns) FROM 'server_metrics_export.parquet'
19 WHERE metric_name = 'vllm:e2e_request_latency_seconds')
20ORDER BY CAST(REPLACE(bucket_le, '+Inf', '999999') AS DOUBLE);
21
22-- Aggregate across multiple runs (handles schema differences)
23SELECT metric_name, AVG(value) as avg_value
24FROM read_parquet('artifacts/*/server_metrics_export.parquet', union_by_name=true)
25WHERE metric_type = 'gauge'
26GROUP BY metric_name;
27
28-- Compare endpoints
29SELECT endpoint_url, metric_name, AVG(value) as avg_value
30FROM 'server_metrics_export.parquet'
31WHERE metric_type = 'gauge'
32GROUP BY endpoint_url, metric_name;

pandas

1import pandas as pd
2
3df = pd.read_parquet('server_metrics_export.parquet')
4
5# Filter to gauge metrics
6gauges = df[df['metric_type'] == 'gauge']
7
8# Time-series plot
9kv_usage = df[df['metric_name'] == 'vllm:kv_cache_usage_perc']
10kv_usage.plot(x='timestamp_ns', y='value', title='KV Cache Usage')
11
12# Pivot histogram buckets
13hist = df[df['metric_name'] == 'vllm:e2e_request_latency_seconds']
14pivot = hist.pivot(index='timestamp_ns', columns='bucket_le', values='bucket_count')

Polars

1import polars as pl
2
3df = pl.read_parquet('server_metrics_export.parquet')
4
5# Filter and aggregate
6result = (
7 df.filter(pl.col('metric_type') == 'gauge')
8 .group_by('metric_name')
9 .agg([
10 pl.col('value').mean().alias('avg'),
11 pl.col('value').max().alias('max'),
12 ])
13)
14
15# Lazy scan for large files
16lazy = pl.scan_parquet('artifacts/*/server_metrics_export.parquet')
17result = lazy.filter(pl.col('metric_name') == 'vllm:kv_cache_usage_perc').collect()

Reading Metadata

1import pyarrow.parquet as pq
2import json
3
4metadata = pq.read_metadata('server_metrics_export.parquet')
5schema_metadata = metadata.schema.to_arrow_schema().metadata
6
7# Access specific fields
8benchmark_id = schema_metadata[b'aiperf.benchmark_id'].decode()
9config = json.loads(schema_metadata[b'aiperf.input_config'])
10label_columns = json.loads(schema_metadata[b'aiperf.label_columns'])

Best Practices

Cross-File Analysis

Label columns vary by endpoint and model. Always use union_by_name:

1-- DuckDB
2SELECT * FROM read_parquet('run_*/server_metrics_export.parquet', union_by_name=true);
1# pandas
2import pandas as pd
3from pathlib import Path
4
5dfs = [pd.read_parquet(p) for p in Path('.').glob('run_*/server_metrics_export.parquet')]
6combined = pd.concat(dfs, ignore_index=True)

Histogram Percentile Estimation

Reconstruct percentiles from bucket data. Note that bucket_count values are cumulative (each bucket includes all observations with value <= bucket_le), matching Prometheus histogram semantics:

1import numpy as np
2
3def estimate_percentile(bucket_les, bucket_counts, percentile):
4 """Estimate percentile from histogram buckets using linear interpolation."""
5 # Convert bucket_le strings to floats (handle +Inf)
6 bounds = [float(b) if b != '+Inf' else np.inf for b in bucket_les]
7 counts = np.array(bucket_counts)
8
9 total = counts[-1] # +Inf bucket has cumulative total
10 target = total * (percentile / 100)
11
12 for i, (le, count) in enumerate(zip(bounds, counts)):
13 if count >= target:
14 if i == 0:
15 return le
16 prev_le = bounds[i-1] if i > 0 else 0
17 prev_count = counts[i-1] if i > 0 else 0
18 # Linear interpolation within bucket
19 fraction = (target - prev_count) / (count - prev_count) if count > prev_count else 0
20 return prev_le + fraction * (le - prev_le)
21 return bounds[-2] # Return last finite bound

Memory-Efficient Processing

For large files, use lazy evaluation:

1# Polars lazy scan
2import polars as pl
3df = pl.scan_parquet('server_metrics_export.parquet') \
4 .filter(pl.col('metric_name') == 'vllm:kv_cache_usage_perc') \
5 .collect()
6
7# DuckDB direct query (doesn't load entire file)
8import duckdb
9result = duckdb.query("""
10 SELECT AVG(value) FROM 'server_metrics_export.parquet'
11 WHERE metric_name = 'vllm:kv_cache_usage_perc'
12""").fetchone()

Schema Version History

VersionChanges
1.0Initial schema with normalized histogram buckets

For aggregated statistics, see JSON Schema. For metric definitions, see Server Metrics Reference. For usage examples, see the Server Metrics Tutorial.