Working with Profile Export Files

View as Markdown

This guide demonstrates how to programmatically work with AIPerf benchmark output files using the native Pydantic data models.

Overview

AIPerf generates multiple output formats after each benchmark run, each optimized for different analysis workflows:

Data Models

AIPerf uses Pydantic models for type-safe parsing and validation of all benchmark output files. These models ensure data integrity and provide IDE autocompletion support.

Core Models

1from aiperf.common.models import (
2 MetricRecordInfo,
3 MetricRecordMetadata,
4 MetricValue,
5 ErrorDetails,
6 InputsFile,
7 SessionPayloads,
8)
ModelDescriptionSource
MetricRecordInfoComplete per-request record including metadata, metrics, and error informationrecord_models.py
MetricRecordMetadataRequest metadata: timestamps, IDs, worker identifiers, and phase informationrecord_models.py
MetricValueIndividual metric value with associated unit of measurementrecord_models.py
ErrorDetailsError information including HTTP code, error type, and descriptive messageerror_models.py
InputsFileContainer for all input dataset sessions with formatted payloads for each turndataset_models.py
SessionPayloadsSingle conversation session with session ID and list of formatted request payloadsdataset_models.py

Output File Formats

Input Dataset (JSON)

File: artifacts/my-run/inputs.json

A structured representation of all input datasets converted to the payload format used by the endpoint.

Structure:

1{
2 "data": [
3 {
4 "session_id": "a5cdb1fe-19a3-4ed0-9e54-ed5ed6dc5578",
5 "payloads": [
6 { ... } // formatted payload based on the endpoint type.
7 ]
8 }
9 ]
10}

Key fields:

  • session_id: Unique identifier for the conversation. This can be used to correlate inputs with results.
  • payloads: Array of formatted request payloads (one per turn in multi-turn conversations)

Per-Request Records (JSONL)

File: artifacts/my-run/profile_export.jsonl

The JSONL output contains one record per line, for each request sent during the benchmark. Each record includes request metadata, computed metrics, and error information if the request failed.

Successful Request Record

1{
2 "metadata": {
3 "session_num": 45,
4 "x_request_id": "7609a2e7-aa53-4ab1-98f4-f35ecafefd25",
5 "x_correlation_id": "32ee4f33-cfca-4cfc-988f-79b45408b909",
6 "conversation_id": "77aa5b0e-b305-423f-88d5-c00da1892599",
7 "turn_index": 0,
8 "request_start_ns": 1759813207532900363,
9 "request_ack_ns": 1759813207650730976,
10 "request_end_ns": 1759813207838764604,
11 "worker_id": "worker_359d423a",
12 "record_processor_id": "record_processor_1fa47cd7",
13 "benchmark_phase": "profiling",
14 "was_cancelled": false,
15 "cancellation_time_ns": null
16 },
17 "metrics": {
18 "input_sequence_length": {"value": 550, "unit": "tokens"},
19 "time_to_first_token": {"value": 255.88656799999998, "unit": "ms"},
20 "request_latency": {"value": 297.52522799999997, "unit": "ms"},
21 "output_token_count": {"value": 9, "unit": "tokens"},
22 "time_to_first_token": {"value": 4.8984369999999995, "unit": "ms"},
23 "inter_chunk_latency": {"value": [4.898437, 5.316006, 4.801489, 5.674918, 4.811467, 5.097998, 5.504797, 5.533548], "unit": "ms"},
24 "output_sequence_length": {"value": 9, "unit": "tokens"},
25 "inter_token_latency": {"value": 5.2048325, "unit": "ms"},
26 "output_token_throughput_per_user": {"value": 192.1291415237666, "unit": "tokens/sec/user"}
27 },
28 "error": null
29}

Metadata Fields:

  • session_num: Sequential request number across the entire benchmark (0-indexed).
    • For single-turn conversations, this will be the request index across all requests in the benchmark.
    • For multi-turn conversations, this will be the index of the user session across all sessions in the benchmark.
  • x_request_id: Unique identifier for this specific request. This is sent to the endpoint as the X-Request-ID header.
  • x_correlation_id: Unique identifier for the user session. This is the same for all requests in the same user session for multi-turn conversations. This is sent to the endpoint as the X-Correlation-ID header.
  • conversation_id: ID of the input dataset conversation. This can be used to correlate inputs with results.
  • turn_index: Position within a multi-turn conversation (0-indexed), or 0 for single-turn conversations.
  • request_start_ns: Epoch time in nanoseconds when request was initiated by AIPerf.
  • request_ack_ns: Epoch time in nanoseconds when server acknowledged the request. This is only applicable to streaming requests.
  • request_end_ns: Epoch time in nanoseconds when the last response was received from the endpoint.
  • worker_id: ID of the AIPerf worker that executed the request against the endpoint.
  • record_processor_id: ID of the AIPerf record processor that processed the results from the server.
  • benchmark_phase: Phase of the benchmark. Currently only profiling is supported.
  • was_cancelled: Whether the request was cancelled during execution (such as when --request-cancellation-rate is enabled).
  • cancellation_time_ns: Epoch time in nanoseconds when the request was cancelled (if applicable).

Metrics: See the Complete Metrics Reference page for a list of all metrics and their descriptions. Will always be null for failed requests.

Failed Request Record

1{
2 "metadata": {
3 "session_num": 80,
4 "x_request_id": "c35e4b1b-6775-4750-b875-94cd68e5ec15",
5 "x_correlation_id": "77ecf78d-b848-4efc-9579-cd695c6e89c4",
6 "conversation_id": "9526b41d-5dbc-41a5-a353-99ae06a53bc5",
7 "turn_index": 0,
8 "request_start_ns": 1759879161119147826,
9 "request_ack_ns": null,
10 "request_end_ns": 1759879161119772754,
11 "worker_id": "worker_6006099d",
12 "record_processor_id": "record_processor_fdeeec8f",
13 "benchmark_phase": "profiling",
14 "was_cancelled": true,
15 "cancellation_time_ns": 1759879161119772754
16 },
17 "metrics": {
18 "error_isl": {"value": 550, "unit": "tokens"}
19 },
20 "error": {
21 "code": 499,
22 "type": "RequestCancellationError",
23 "message": "Request was cancelled after 0.000 seconds"
24 }
25}

Error Fields:

  • code: HTTP status code or custom error code
  • type: Classification of the error (e.g., timeout, cancellation, server error). Typically the python exception class name.
  • message: Human-readable error description

Aggregated Statistics (JSON)

File: artifacts/my-run/profile_export_aiperf.json

A single JSON object containing statistical summaries (min, max, mean, percentiles) for all metrics across the entire benchmark run, as well as the user configuration used for the benchmark.

Aggregated Statistics (CSV)

File: artifacts/my-run/profile_export_aiperf.csv

Contains the same aggregated statistics as the JSON format, but in a spreadsheet-friendly structure with one metric per row.

Working with Output Data

AIPerf output files can be parsed using the native Pydantic models for type-safe data handling and analysis.

Synchronous Loading

1from aiperf.common.models import MetricRecordInfo
2
3def load_records(file_path: Path) -> list[MetricRecordInfo]:
4 """Load artifacts/my-run/profile_export.jsonl file into structured Pydantic models in sync mode."""
5 records = []
6 with open(file_path, encoding="utf-8") as f:
7 for line in f:
8 if line.strip():
9 record = MetricRecordInfo.model_validate_json(line)
10 records.append(record)
11 return records

Asynchronous Loading

For large benchmark runs with thousands of requests, use async file I/O for better performance:

1import aiofiles
2from aiperf.common.models import MetricRecordInfo
3
4async def process_streaming_records_async(file_path: Path) -> None:
5 """Load artifacts/my-run/profile_export.jsonl file into structured Pydantic models in async mode and process the streaming records."""
6 async with aiofiles.open(file_path, encoding="utf-8") as f:
7 async for line in f:
8 if line.strip():
9 record = MetricRecordInfo.model_validate_json(line)
10 # ... Process the streaming records here ...

Working with Input Datasets

Load and analyze the inputs.json file to understand what data was sent during the benchmark:

1from pathlib import Path
2from aiperf.common.models import InputsFile
3
4def load_inputs_file(file_path: Path) -> InputsFile:
5 """Load inputs.json file into structured Pydantic model."""
6 with open(file_path, encoding="utf-8") as f:
7 return InputsFile.model_validate_json(f.read())
8
9inputs = load_inputs_file(Path("artifacts/my-run/inputs.json"))

Correlating Inputs with Results

Combine artifacts/my-run/inputs.json with artifacts/my-run/profile_export.jsonl for deeper analysis:

1from pathlib import Path
2from aiperf.common.models import InputsFile, MetricRecordInfo
3
4def correlate_inputs_and_results(inputs_path: Path, results_path: Path):
5 """Correlate input prompts with performance metrics."""
6 # Load inputs
7 with open(inputs_path, encoding="utf-8") as f:
8 inputs = InputsFile.model_validate_json(f.read())
9
10 # Create session lookup
11 session_inputs = {session.session_id: session for session in inputs.data}
12
13 # Process results and correlate
14 with open(results_path, encoding="utf-8") as f:
15 for line in f:
16 if not line.strip():
17 continue
18
19 record = MetricRecordInfo.model_validate_json(line)
20
21 # Find corresponding input
22 conv_id = record.metadata.conversation_id
23 if conv_id not in session_inputs:
24 raise ValueError(f"Conversation ID {conv_id} not found in inputs")
25
26 session = session_inputs[conv_id]
27 turn_idx = record.metadata.turn_index
28
29 if turn_idx >= len(session.payloads):
30 raise ValueError(f"Turn index {turn_idx} is out of range for session {conv_id}")
31
32 # Assign the raw request payload to the record, and print it out
33 # You can do this because AIPerf models allow extra fields to be added to the model.
34 payload = session.payloads[turn_idx]
35 record.payload = payload
36 print(record.model_dump_json(indent=2))
37
38correlate_inputs_and_results(
39 Path("artifacts/my-run/inputs.json"),
40 Path("artifacts/my-run/profile_export.jsonl")
41)