Creating Your First AIPerf Plugin | NVIDIA AIPerf Documentation

This tutorial walks you through creating a custom AIPerf endpoint plugin from scratch. By the end, you’ll have a working plugin package that can benchmark any custom API.

Contributing directly to AIPerf? The endpoint class (Step 2) and manifest format (Step 3) are the same, but you can skip the external packaging:

Add your class under src/aiperf/ instead of a separate package
Register it in the existing src/aiperf/plugin/plugins.yaml instead of creating a new one
Skip: Project Structure, Step 1 (pyproject.toml/entry points), Step 4 (install)

What You’ll Build

We’ll create a plugin for a hypothetical “Echo API” that returns the input text with some metadata. This simple example demonstrates all the core concepts you need to build more complex plugins.

Prerequisites

Python 3.10+
AIPerf installed (uv pip install aiperf)
Basic understanding of Python async/await and Pydantic

Key Concepts

Before diving in, understand the plugin system terminology:

Term	What It Is
Package	Your Python package that provides plugins (e.g., `my-aiperf-plugins`)
Manifest	The `plugins.yaml` file declaring your plugins
Category	A type of plugin (e.g., `endpoint`, `transport`, `timing_strategy`)
Entry	A single registered plugin within a category
Class	The Python class implementing your plugin
Metadata	Configuration describing your plugin’s capabilities

What you’re building:

Package (my-aiperf-plugins)
└── Manifest (plugins.yaml)
    └── Category (endpoint)
        └── Entry (echo)
            ├── Class (EchoEndpoint)
            └── Metadata (supports_streaming: true, ...)

For complete plugin system documentation, see the Plugin System Reference.

Project Structure

Create a new directory for your plugin package:

$ PKG=my-aiperf-plugins
$ SRC=$PKG/src/my_plugins
$ 
$ mkdir -p $SRC/endpoints $PKG/tests
$ touch $PKG/pyproject.toml \
>       $PKG/echo_server.py \
>       $SRC/__init__.py \
>       $SRC/plugins.yaml \
>       $SRC/endpoints/__init__.py \
>       $SRC/endpoints/echo_endpoint.py \
>       $PKG/tests/test_echo_endpoint.py
$ tree $PKG
$ cd $PKG

You should see:

my-aiperf-plugins/
├── echo_server.py
├── pyproject.toml
├── src/
│   └── my_plugins/
│       ├── __init__.py
│       ├── plugins.yaml
│       └── endpoints/
│           ├── __init__.py
│           └── echo_endpoint.py
└── tests/
    └── test_echo_endpoint.py

Now fill in each file in the steps below.

Step 1: Create the Project Files

pyproject.toml

1 [build-system]
2 requires = ["hatchling"]
3 build-backend = "hatchling.build"
4 
5 [project]
6 name = "my-aiperf-plugins"
7 version = "0.1.0"
8 description = "Custom AIPerf plugins for my use case"
9 requires-python = ">=3.10"
10 dependencies = [
11     "aiperf",
12 ]
13 
14 [project.entry-points."aiperf.plugins"]
15 my-plugins = "my_plugins:plugins.yaml"
16 
17 [tool.hatch.build.targets.wheel]
18 packages = ["src/my_plugins"]

The key part is the [project.entry-points."aiperf.plugins"] section - this tells AIPerf where to find your plugin manifest.

src/my_plugins/init.py

1 """My custom AIPerf plugins."""

src/my_plugins/endpoints/init.py

1 """Custom endpoint implementations."""
2 
3 from my_plugins.endpoints.echo_endpoint import EchoEndpoint
4 
5 __all__ = ["EchoEndpoint"]

Step 2: Create the Endpoint Class

src/my_plugins/endpoints/echo_endpoint.py

Your endpoint needs two methods: format_payload() and parse_response().

1 """Echo endpoint for demonstration purposes."""
2 from __future__ import annotations
3 from typing import Any
4 
5 from aiperf.common.models import ParsedResponse, RequestInfo, TextResponseData, InferenceServerResponse
6 from aiperf.endpoints.base_endpoint import BaseEndpoint
7 
8 
9 class EchoEndpoint(BaseEndpoint):
10     """Echo endpoint that sends text and receives it back."""
11 
12     # ─────────────────────────────────────────────────────────────────────────
13     # REQUIRED: Format outgoing request
14     # ─────────────────────────────────────────────────────────────────────────
15     def format_payload(self, request_info: RequestInfo) -> dict[str, Any]:
16         turn = request_info.turns[-1]
17         model_endpoint = request_info.model_endpoint
18         texts = [content for text in turn.texts for content in text.contents if content]
19         return {
20             "text": texts[0] if texts else "",
21             "model": turn.model or model_endpoint.primary_model_name,
22             "max_tokens": turn.max_tokens,
23             "stream": model_endpoint.endpoint.streaming,
24         }
25 
26     # ─────────────────────────────────────────────────────────────────────────
27     # REQUIRED: Parse incoming response
28     # ─────────────────────────────────────────────────────────────────────────
29     def parse_response(self, response: InferenceServerResponse) -> ParsedResponse | None:
30         if json_obj := response.get_json():
31             if text := json_obj.get("echo") or json_obj.get("text"):
32                 return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=text))
33             # Fallback: auto-detect common response formats
34             if data := self.auto_detect_and_extract(json_obj):
35                 return ParsedResponse(perf_ns=response.perf_ns, data=data)
36         if text := response.get_text():
37             return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=text))
38         return None

What’s happening: format_payload() converts AIPerf’s RequestInfo into your API’s format. parse_response() extracts the response text into a ParsedResponse.

Step 3: Create the Plugin Manifest

src/my_plugins/plugins.yaml

1 # yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
2 schema_version: "1.0"
3 
4 # Register your endpoint
5 # Note: Package metadata (name, version, author) comes from pyproject.toml,
6 # not from this file. AIPerf reads it via importlib.metadata.
7 endpoint:
8   echo:
9     class: my_plugins.endpoints.echo_endpoint:EchoEndpoint
10     description: |
11       Echo endpoint for testing. Sends text to an Echo API and receives it back.
12       Useful for testing connectivity and basic benchmarking.
13     metadata:
14       endpoint_path: /echo
15       supports_streaming: true
16       produces_tokens: true
17       tokenizes_input: true
18       metrics_title: Echo Metrics

Step 4: Install Your Plugin

From your plugin directory, install into the same Python environment where AIPerf is installed. AIPerf discovers plugins via entry points, which only works when both packages share the same environment.

$ uv pip install -e .

You should see:

Successfully installed my-aiperf-plugins-0.1.0

Important: If you use virtual environments or conda, make sure you activate the environment where AIPerf is installed before running uv pip install.

Step 5: Verify Installation

Confirm both packages are installed in the same environment:

$ uv pip show aiperf my-aiperf-plugins

You should see both packages listed in the same environment:

Name: aiperf
Version: 0.11.0
Location: ...
Requires: ...
Required-by: my-aiperf-plugins
---
Name: my-aiperf-plugins
Version: 0.1.0
Location: ...
Requires: aiperf
Required-by:

Check that AIPerf discovers your plugin:

$ # List all plugins - your echo endpoint should appear
$ aiperf plugins endpoint

You should see your plugin in the table:

Endpoint Types
┌──────────────┬──────────────────────────────────────────────────────────────┐
│ Type         │ Description                                                  │
├──────────────┼──────────────────────────────────────────────────────────────┤
│ chat         │ OpenAI Chat Completions endpoint...                          │
│ ...          │ ...                                                          │
│ echo         │ Echo endpoint for testing. Sends text to an Echo API...      │
└──────────────┴──────────────────────────────────────────────────────────────┘

$ # View details about your endpoint
$ aiperf plugins endpoint echo

You should see:

╭──────────────────────────── endpoint:echo ─────────────────────────────╮
│ Type: echo                                                             │
│ Category: endpoint                                                     │
│ Package: my-plugins                                                    │
│ Class: my_plugins.endpoints.echo_endpoint:EchoEndpoint                 │
│                                                                        │
│ Echo endpoint for testing. Sends text to an Echo API and receives it   │
│ back. Useful for testing connectivity and basic benchmarking.          │
╰────────────────────────────────────────────────────────────────────────╯

$ # Validate your plugin
$ aiperf plugins --validate

You should see:

Validating plugins...
Class paths: OK
All checks passed

Step 6: Create a Test Server

To test your plugin end-to-end, create a minimal Echo API server. Save this as echo_server.py in your project root:

1 """Minimal Echo API server for testing the EchoEndpoint plugin."""
2 from __future__ import annotations
3 
4 import asyncio
5 
6 import cyclopts
7 import orjson
8 import uvicorn
9 from fastapi import FastAPI
10 from fastapi.responses import ORJSONResponse, StreamingResponse
11 
12 app = FastAPI()
13 cli = cyclopts.App()
14 
15 @app.post("/echo", response_model=None)
16 async def echo(body: dict) -> ORJSONResponse | StreamingResponse:
17     echo_text = f"[echo] {body.get('text', '')}"
18     model = body.get("model", "echo-model")
19 
20     if not body.get("stream"):
21         return ORJSONResponse({"echo": echo_text, "model": model})
22 
23     async def sse():
24         for i, word in enumerate(echo_text.split()):
25             chunk = orjson.dumps({"echo": word if i == 0 else f" {word}", "model": model})
26             yield b"data: " + chunk + b"\n\n"
27             await asyncio.sleep(0.02)
28         yield b"data: [DONE]\n\n"
29 
30     return StreamingResponse(sse(), media_type="text/event-stream")
31 
32 
33 @cli.default
34 def main(host: str = "127.0.0.1", port: int = 8000) -> None:
35     uvicorn.run(app, host=host, port=port)
36 
37 
38 if __name__ == "__main__":
39     cli()

Start the server:

$ uv pip install fastapi uvicorn orjson cyclopts
$ python echo_server.py &

You should see:

INFO:     Started server process
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Step 7: Use Your Plugin

With the test server running, use your endpoint with AIPerf:

$ # Basic usage (endpoint_path: /echo from metadata is appended automatically)
$ aiperf profile \
>   --model echo-model \
>   --url http://localhost:8000 \
>   --endpoint-type echo \
>   --tokenizer gpt2 \
>   --synthetic-input-tokens-mean 100 \
>   --request-count 10
$ 
$ # With custom configuration
$ aiperf profile \
>   --model echo-model \
>   --url http://localhost:8000 \
>   --endpoint-type echo \
>   --tokenizer gpt2 \
>   --synthetic-input-tokens-mean 100 \
>   --concurrency 4 \
>   --request-count 100

You should see:

                            NVIDIA AIPerf | Echo Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃                           Metric ┃       avg ┃    min ┃    max ┃    p99 ┃  std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│             Request Latency (ms) │      2.05 │   0.29 │  15.42 │  14.18 │ 4.47 │
│  Output Sequence Length (tokens) │    104.00 │ 104.00 │ 104.00 │ 104.00 │ 0.00 │
│   Input Sequence Length (tokens) │    100.00 │ 100.00 │ 100.00 │ 100.00 │ 0.00 │
│          Output Token Throughput │ 40,850.61 │    N/A │    N/A │    N/A │  N/A │
│                     (tokens/sec) │           │        │        │        │      │
│               Request Throughput │    392.79 │    N/A │    N/A │    N/A │  N/A │
│                   (requests/sec) │           │        │        │        │      │
│         Request Count (requests) │     10.00 │    N/A │    N/A │    N/A │  N/A │
└──────────────────────────────────┴───────────┴────────┴────────┴────────┴──────┘

Step 8: Add Tests

tests/test_echo_endpoint.py

1 """Tests for the Echo endpoint."""
2 import pytest
3 from my_plugins.endpoints.echo_endpoint import EchoEndpoint
4 
5 
6 class TestEchoEndpoint:
7     def test_format_payload(self, mock_model_endpoint, mock_request_info):
8         endpoint = EchoEndpoint(model_endpoint=mock_model_endpoint)
9         payload = endpoint.format_payload(mock_request_info)
10         assert "text" in payload and "model" in payload
11 
12     def test_parse_response(self, mock_model_endpoint, mock_response):
13         endpoint = EchoEndpoint(model_endpoint=mock_model_endpoint)
14         result = endpoint.parse_response(mock_response)
15         assert result is not None and result.data.text

Fixtures: The mock_model_endpoint, mock_request_info, and mock_response fixtures in this snippet are illustrative, not built into AIPerf. Create them in your package’s conftest.py (or replace them with concrete RequestInfo / response objects) before running the test.

Understanding the Code

Component Summary

Component	What It Does	You Provide
`BaseEndpoint`	Logging, `auto_detect_and_extract()`, config access	Inherit from it
`format_payload()`	Converts `RequestInfo` → API request	Your API format
`parse_response()`	Converts API response → `ParsedResponse`	Your parsing logic

Data Flow

RequestInfo.turns[-1]  →  format_payload()  →  HTTP Request  →  Your API
                                                                    ↓
ParsedResponse         ←  parse_response()  ←  HTTP Response ←────┘

Response Types

Type	Use Case	Key Field
`TextResponseData`	LLM completions	`text: str`
`EmbeddingResponseData`	Embeddings	`embeddings: list[list[float]]`
`RankingsResponseData`	Reranking	`rankings: list[dict[str, Any]]`

Metadata Fields

Field	Required	Purpose
`endpoint_path`	Yes (nullable)	Default API path (e.g., `/v1/chat/completions`)
`supports_streaming`	Yes	SSE streaming support
`produces_tokens`	Yes	Enables token metrics
`tokenizes_input`	Yes	Enables input tokenization
`metrics_title`	Yes	Dashboard display name (nullable)

Next Steps

Goal	Action
Multiple endpoints	Add more entries under `endpoint:` in `plugins.yaml`
Other plugin types	Use same pattern for `timing_strategy`, `data_exporter`, `dataset_composer`
Publish	`python -m build && twine upload dist/*` to PyPI

Troubleshooting

Plugin not found

TypeNotFoundError: Type 'echo' not found for category 'endpoint'.

Solutions:

Ensure uv pip install -e . completed successfully
Check the entry point in pyproject.toml matches your package structure
Run aiperf plugins --validate to check for errors

Import errors

ImportError: Failed to import module for endpoint:echo from 'my_plugins.endpoints.echo_endpoint:EchoEndpoint'
Reason: ...
Tip: Check that the module is installed and importable

Solutions:

Verify the class path format: module.path:ClassName
Check all imports in your endpoint file work: python -c "from my_plugins.endpoints.echo_endpoint import EchoEndpoint"
Ensure all dependencies are installed

Response parsing fails

Solutions:

Use -vv flag to see raw responses in debug logs
Check that your parse_response handles your API’s actual response format
Use auto_detect_and_extract() as a fallback for unknown formats

Reference

Plugin System Documentation - Complete plugin system reference
Template Endpoint Tutorial - Using templates for custom payloads