Creating Your First AIPerf Plugin

View as Markdown

This tutorial walks you through creating a custom AIPerf endpoint plugin from scratch. By the end, you’ll have a working plugin package that can benchmark any custom API.

Contributing directly to AIPerf? The endpoint class (Step 2) and manifest format (Step 3) are the same, but you can skip the external packaging:

  • Add your class under src/aiperf/ instead of a separate package
  • Register it in the existing src/aiperf/plugin/plugins.yaml instead of creating a new one
  • Skip: Project Structure, Step 1 (pyproject.toml/entry points), Step 4 (install)

What You’ll Build

We’ll create a plugin for a hypothetical “Echo API” that returns the input text with some metadata. This simple example demonstrates all the core concepts you need to build more complex plugins.

Prerequisites

  • Python 3.10+
  • AIPerf installed (pip install aiperf)
  • Basic understanding of Python async/await and Pydantic

Key Concepts

Before diving in, understand the plugin system terminology:

TermWhat It Is
PackageYour Python package that provides plugins (e.g., my-aiperf-plugins)
ManifestThe plugins.yaml file declaring your plugins
CategoryA type of plugin (e.g., endpoint, transport, timing_strategy)
EntryA single registered plugin within a category
ClassThe Python class implementing your plugin
MetadataConfiguration describing your plugin’s capabilities

What you’re building:

Package (my-aiperf-plugins)
└── Manifest (plugins.yaml)
└── Category (endpoint)
└── Entry (echo)
├── Class (EchoEndpoint)
└── Metadata (supports_streaming: true, ...)

For complete plugin system documentation, see the Plugin System Reference.

Project Structure

Create a new directory for your plugin package:

$PKG=my-aiperf-plugins
$SRC=$PKG/src/my_plugins
$
$mkdir -p $SRC/endpoints $PKG/tests
$touch $PKG/pyproject.toml \
> $PKG/echo_server.py \
> $SRC/__init__.py \
> $SRC/plugins.yaml \
> $SRC/endpoints/__init__.py \
> $SRC/endpoints/echo_endpoint.py \
> $PKG/tests/test_echo_endpoint.py
$tree $PKG
$cd $PKG

You should see:

my-aiperf-plugins/
├── echo_server.py
├── pyproject.toml
├── src/
│ └── my_plugins/
│ ├── __init__.py
│ ├── plugins.yaml
│ └── endpoints/
│ ├── __init__.py
│ └── echo_endpoint.py
└── tests/
└── test_echo_endpoint.py

Now fill in each file in the steps below.

Step 1: Create the Project Files

pyproject.toml

1[build-system]
2requires = ["hatchling"]
3build-backend = "hatchling.build"
4
5[project]
6name = "my-aiperf-plugins"
7version = "0.1.0"
8description = "Custom AIPerf plugins for my use case"
9requires-python = ">=3.10"
10dependencies = [
11 "aiperf",
12]
13
14[project.entry-points."aiperf.plugins"]
15my-plugins = "my_plugins:plugins.yaml"
16
17[tool.hatch.build.targets.wheel]
18packages = ["src/my_plugins"]

The key part is the [project.entry-points."aiperf.plugins"] section - this tells AIPerf where to find your plugin manifest.

src/my_plugins/init.py

1"""My custom AIPerf plugins."""

src/my_plugins/endpoints/init.py

1"""Custom endpoint implementations."""
2
3from my_plugins.endpoints.echo_endpoint import EchoEndpoint
4
5__all__ = ["EchoEndpoint"]

Step 2: Create the Endpoint Class

src/my_plugins/endpoints/echo_endpoint.py

Your endpoint needs two methods: format_payload() and parse_response().

1"""Echo endpoint for demonstration purposes."""
2from __future__ import annotations
3from typing import Any
4
5from aiperf.common.models import ParsedResponse, RequestInfo, TextResponseData, InferenceServerResponse
6from aiperf.endpoints.base_endpoint import BaseEndpoint
7
8
9class EchoEndpoint(BaseEndpoint):
10 """Echo endpoint that sends text and receives it back."""
11
12 # ─────────────────────────────────────────────────────────────────────────
13 # REQUIRED: Format outgoing request
14 # ─────────────────────────────────────────────────────────────────────────
15 def format_payload(self, request_info: RequestInfo) -> dict[str, Any]:
16 turn = request_info.turns[-1]
17 model_endpoint = request_info.model_endpoint
18 texts = [content for text in turn.texts for content in text.contents if content]
19 return {
20 "text": texts[0] if texts else "",
21 "model": turn.model or model_endpoint.primary_model_name,
22 "max_tokens": turn.max_tokens,
23 "stream": model_endpoint.endpoint.streaming,
24 }
25
26 # ─────────────────────────────────────────────────────────────────────────
27 # REQUIRED: Parse incoming response
28 # ─────────────────────────────────────────────────────────────────────────
29 def parse_response(self, response: InferenceServerResponse) -> ParsedResponse | None:
30 if json_obj := response.get_json():
31 if text := json_obj.get("echo") or json_obj.get("text"):
32 return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=text))
33 # Fallback: auto-detect common response formats
34 if data := self.auto_detect_and_extract(json_obj):
35 return ParsedResponse(perf_ns=response.perf_ns, data=data)
36 if text := response.get_text():
37 return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=text))
38 return None

What’s happening: format_payload() converts AIPerf’s RequestInfo into your API’s format. parse_response() extracts the response text into a ParsedResponse.

Step 3: Create the Plugin Manifest

src/my_plugins/plugins.yaml

1# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
2schema_version: "1.0"
3
4# Register your endpoint
5# Note: Package metadata (name, version, author) comes from pyproject.toml,
6# not from this file. AIPerf reads it via importlib.metadata.
7endpoint:
8 echo:
9 class: my_plugins.endpoints.echo_endpoint:EchoEndpoint
10 description: |
11 Echo endpoint for testing. Sends text to an Echo API and receives it back.
12 Useful for testing connectivity and basic benchmarking.
13 metadata:
14 endpoint_path: /echo
15 supports_streaming: true
16 produces_tokens: true
17 tokenizes_input: true
18 metrics_title: Echo Metrics

Step 4: Install Your Plugin

From your plugin directory, install into the same Python environment where AIPerf is installed. AIPerf discovers plugins via entry points, which only works when both packages share the same environment.

$pip install -e .

You should see:

Successfully installed my-aiperf-plugins-0.1.0

Important: If you use uv, virtual environments, or conda, make sure you activate the environment where AIPerf is installed before running pip install.

Step 5: Verify Installation

Confirm both packages are installed in the same environment:

$pip show aiperf my-aiperf-plugins

You should see both packages listed in the same environment:

Name: aiperf
Version: 0.7.0
Location: ...
Requires: ...
Required-by: my-aiperf-plugins
---
Name: my-aiperf-plugins
Version: 0.1.0
Location: ...
Requires: aiperf
Required-by:

Check that AIPerf discovers your plugin:

$# List all plugins - your echo endpoint should appear
$aiperf plugins endpoint

You should see your plugin in the table:

Endpoint Types
┌──────────────┬──────────────────────────────────────────────────────────────┐
│ Type │ Description │
├──────────────┼──────────────────────────────────────────────────────────────┤
│ chat │ OpenAI Chat Completions endpoint... │
│ ... │ ... │
│ echo │ Echo endpoint for testing. Sends text to an Echo API... │
└──────────────┴──────────────────────────────────────────────────────────────┘
$# View details about your endpoint
$aiperf plugins endpoint echo

You should see:

╭──────────────────────────── endpoint:echo ─────────────────────────────╮
│ Type: echo │
│ Category: endpoint │
│ Package: my-plugins │
│ Class: my_plugins.endpoints.echo_endpoint:EchoEndpoint │
│ │
│ Echo endpoint for testing. Sends text to an Echo API and receives it │
│ back. Useful for testing connectivity and basic benchmarking. │
╰────────────────────────────────────────────────────────────────────────╯
$# Validate your plugin
$aiperf plugins --validate

You should see:

Validating plugins...
✓ Class paths
All checks passed

Step 6: Create a Test Server

To test your plugin end-to-end, create a minimal Echo API server. Save this as echo_server.py in your project root:

1"""Minimal Echo API server for testing the EchoEndpoint plugin."""
2from __future__ import annotations
3
4import asyncio
5
6import cyclopts
7import orjson
8import uvicorn
9from fastapi import FastAPI
10from fastapi.responses import ORJSONResponse, StreamingResponse
11
12app = FastAPI()
13cli = cyclopts.App()
14
15@app.post("/echo", response_model=None)
16async def echo(body: dict) -> ORJSONResponse | StreamingResponse:
17 echo_text = f"[echo] {body.get('text', '')}"
18 model = body.get("model", "echo-model")
19
20 if not body.get("stream"):
21 return ORJSONResponse({"echo": echo_text, "model": model})
22
23 async def sse():
24 for i, word in enumerate(echo_text.split()):
25 chunk = orjson.dumps({"echo": word if i == 0 else f" {word}", "model": model})
26 yield b"data: " + chunk + b"\n\n"
27 await asyncio.sleep(0.02)
28 yield b"data: [DONE]\n\n"
29
30 return StreamingResponse(sse(), media_type="text/event-stream")
31
32
33@cli.default
34def main(host: str = "127.0.0.1", port: int = 8000) -> None:
35 uvicorn.run(app, host=host, port=port)
36
37
38if __name__ == "__main__":
39 cli()

Start the server:

$pip install fastapi uvicorn orjson cyclopts
$python echo_server.py &

You should see:

INFO: Started server process
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Step 7: Use Your Plugin

With the test server running, use your endpoint with AIPerf:

$# Basic usage (endpoint_path: /echo from metadata is appended automatically)
$aiperf profile \
> --model echo-model \
> --url http://localhost:8000 \
> --endpoint-type echo \
> --tokenizer gpt2 \
> --synthetic-input-tokens-mean 100 \
> --request-count 10
$
$# With custom configuration
$aiperf profile \
> --model echo-model \
> --url http://localhost:8000 \
> --endpoint-type echo \
> --tokenizer gpt2 \
> --extra-inputs echo_prefix:"[ECHO] " \
> --synthetic-input-tokens-mean 100 \
> --concurrency 4 \
> --request-count 100

You should see:

NVIDIA AIPerf | Echo Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ Request Latency (ms) │ 2.05 │ 0.29 │ 15.42 │ 14.18 │ 4.47 │
│ Output Sequence Length (tokens) │ 104.00 │ 104.00 │ 104.00 │ 104.00 │ 0.00 │
│ Input Sequence Length (tokens) │ 100.00 │ 100.00 │ 100.00 │ 100.00 │ 0.00 │
│ Output Token Throughput │ 40,850.61 │ N/A │ N/A │ N/A │ N/A │
│ (tokens/sec) │ │ │ │ │ │
│ Request Throughput │ 392.79 │ N/A │ N/A │ N/A │ N/A │
│ (requests/sec) │ │ │ │ │ │
│ Request Count (requests) │ 10.00 │ N/A │ N/A │ N/A │ N/A │
└──────────────────────────────────┴───────────┴────────┴────────┴────────┴──────┘

Step 8: Add Tests

tests/test_echo_endpoint.py

1"""Tests for the Echo endpoint."""
2import pytest
3from my_plugins.endpoints.echo_endpoint import EchoEndpoint
4
5
6class TestEchoEndpoint:
7 def test_format_payload(self, mock_model_endpoint, mock_request_info):
8 endpoint = EchoEndpoint(model_endpoint=mock_model_endpoint)
9 payload = endpoint.format_payload(mock_request_info)
10 assert "text" in payload and "model" in payload
11
12 def test_parse_response(self, mock_model_endpoint, mock_response):
13 endpoint = EchoEndpoint(model_endpoint=mock_model_endpoint)
14 result = endpoint.parse_response(mock_response)
15 assert result is not None and result.data.text

Fixtures: Create conftest.py with mock_model_endpoint, mock_request_info, and mock_response fixtures. See AIPerf’s test utilities for examples.

Understanding the Code

Component Summary

ComponentWhat It DoesYou Provide
BaseEndpointLogging, auto_detect_and_extract(), config accessInherit from it
format_payload()Converts RequestInfo → API requestYour API format
parse_response()Converts API response → ParsedResponseYour parsing logic

Data Flow

RequestInfo.turns[-1] → format_payload() → HTTP Request → Your API
ParsedResponse ← parse_response() ← HTTP Response ←────┘

Response Types

TypeUse CaseKey Field
TextResponseDataLLM completionstext: str
EmbeddingResponseDataEmbeddingsembeddings: list[list[float]]
RankingsResponseDataRerankingrankings: list[dict[str, Any]]

Metadata Fields

FieldRequiredPurpose
endpoint_pathYes (nullable)Default API path (e.g., /v1/chat/completions)
supports_streamingYesSSE streaming support
produces_tokensYesEnables token metrics
tokenizes_inputYesEnables input tokenization
metrics_titleYesDashboard display name (nullable)

Next Steps

GoalAction
Multiple endpointsAdd more entries under endpoint: in plugins.yaml
Other plugin typesUse same pattern for timing_strategy, data_exporter, dataset_composer
Publishpython -m build && twine upload dist/* to PyPI

Troubleshooting

Plugin not found

TypeNotFoundError: Type 'echo' not found for category 'endpoint'.

Solutions:

  1. Ensure pip install -e . completed successfully
  2. Check the entry point in pyproject.toml matches your package structure
  3. Run aiperf plugins --validate to check for errors

Import errors

ImportError: Failed to import module for endpoint:echo from 'my_plugins.endpoints.echo_endpoint:EchoEndpoint'
Reason: ...
Tip: Check that the module is installed and importable

Solutions:

  1. Verify the class path format: module.path:ClassName
  2. Check all imports in your endpoint file work: python -c "from my_plugins.endpoints.echo_endpoint import EchoEndpoint"
  3. Ensure all dependencies are installed

Response parsing fails

Solutions:

  1. Use -vv flag to see raw responses in debug logs
  2. Check that your parse_response handles your API’s actual response format
  3. Use auto_detect_and_extract() as a fallback for unknown formats

Reference