For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Plugin System
      • Creating Your First AIPerf Plugin
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Table of Contents
  • Overview
  • Terminology
  • Key Components
  • Architecture
  • Discovery Flow
  • Registry Singleton Pattern
  • Plugin Categories
  • Timing Categories
  • Dataset Categories
  • Endpoint and Transport Categories
  • Processing Categories
  • Accuracy Categories
  • UI and Selection Categories
  • Service Categories
  • Visualization and Telemetry Categories
  • Infrastructure Categories (Internal)
  • Sweep / Adaptive Search Categories
  • Using Plugins
  • Creating Custom Plugins
  • Minimal Endpoint Example
  • Plugin Configuration
  • categories.yaml Schema
  • plugins.yaml Schema
  • Metadata Schemas
  • CLI Commands
  • Advanced Topics
  • Conflict Resolution
  • API Reference
  • Built-in Plugins Reference
  • Endpoints
  • Timing Strategies
  • Arrival Patterns
  • Dataset Composers
  • UI Types
  • Accuracy Benchmarks
  • Accuracy Graders
  • Troubleshooting
  • Plugin Not Found
  • Module Import Errors
  • Class Not Found
  • Conflict Resolution Issues
Plugins

AIPerf Plugin System

||View as Markdown|
Previous

Server Metrics Parquet Export Schema

Next

Creating Your First AIPerf Plugin

The AIPerf plugin system provides a flexible, extensible architecture for customizing benchmark behavior. It uses YAML-based configuration with lazy loading, priority-based conflict resolution, and dynamic enum generation.

Table of Contents

  • Overview
    • Terminology
    • Key Components
  • Architecture
  • Plugin Categories
  • Using Plugins
  • Creating Custom Plugins
  • Plugin Configuration
  • CLI Commands
  • Advanced Topics

Overview

The plugin system enables:

  • Extensibility: Add custom endpoints, exporters, and timing strategies without modifying core code
  • Lazy Loading: Classes load on first access, avoiding circular imports
  • Conflict Resolution: Higher priority plugins override lower priority ones
  • Type Safety: Auto-generated enums provide IDE autocomplete
  • Validation: Validate plugins without importing them

Terminology

TermDescriptionCode Type
RegistryGlobal singleton holding all plugins_PluginRegistry
PackagePython package providing pluginsPackageInfo
Manifestplugins.yaml declaring pluginsPluginsManifest
CategoryPlugin type (e.g., endpoint, transport)PluginType enum
EntrySingle registered plugin (name, class_path, priority, metadata)PluginEntry
ClassPython class implementing a plugin (lazy-loaded)type
MetadataTyped configuration (e.g., EndpointMetadata)Pydantic model

Hierarchy:

Registry (singleton)
└── Package (1+) ─── discovered via entry points
└── Manifest (1+ per package) ─── plugins.yaml files
└── Category (1+)
└── Entry (1+) ─── PluginEntry
├── Class ─── lazy-loaded Python class
└── Metadata ─── optional typed config

Key Components

ComponentFilePurpose
Plugin Registrysrc/aiperf/plugin/plugins.pySingleton managing discovery and loading
Plugin Entrysrc/aiperf/plugin/types.pyLazy-loading entry with metadata
Categoriessrc/aiperf/plugin/categories.yamlCategory definitions with protocols
Built-in Pluginssrc/aiperf/plugin/plugins.yamlBuilt-in plugin registrations
Schemassrc/aiperf/plugin/schema/schemas.pyPydantic models for validation
Enumssrc/aiperf/plugin/enums.pyAuto-generated enums from registry
CLIsrc/aiperf/cli_commands/plugins.pyPlugin exploration commands

Architecture

Discovery Flow

Entry Points → plugins.yaml → Pydantic Validation → Registry
↓
get_class() → Import Module → Cache
PhaseAction
1. DiscoveryScan aiperf.plugins entry points for plugins.yaml files
2. LoadingParse YAML, validate with Pydantic, register with conflict resolution
3. Accessget_class() imports module, caches class for reuse

Registry Singleton Pattern

The plugin registry follows the singleton pattern with module-level exports:

1from aiperf.plugin import plugins
2from aiperf.plugin.enums import PluginType
3
4# Get a plugin class by name
5EndpointClass = plugins.get_class(PluginType.ENDPOINT, "chat")
6
7# Iterate all plugins in a category
8for entry, cls in plugins.iter_all(PluginType.ENDPOINT):
9 print(f"{entry.name}: {entry.description}")

Plugin Categories

AIPerf supports 33 plugin categories organized by function, including api_router and public_dataset_loader:

Timing Categories

CategoryEnumDescription
timing_strategyTimingModeRequest scheduling strategies (fixed schedule, request rate, user-centric)
arrival_patternArrivalPatternInter-arrival time distributions (constant, Poisson, gamma, concurrency burst)
rampRampTypeValue ramping strategies (linear, exponential, Poisson)

Dataset Categories

CategoryEnumDescription
dataset_backing_storeDatasetBackingStoreTypeServer-side dataset storage
dataset_client_storeDatasetClientStoreTypeWorker-side dataset access
dataset_samplerDatasetSamplingStrategySampling strategies (random, sequential, shuffle)
dataset_composerComposerTypeDataset generation (synthetic, custom, rankings)
custom_dataset_loaderCustomDatasetTypeJSONL format loaders
public_dataset_loaderPublicDatasetTypeShared benchmark dataset fetchers (HTTP, HuggingFace)

Endpoint and Transport Categories

CategoryEnumDescription
api_routerAPIRouterTypeLifecycle-managed HTTP/WebSocket routers exposed via BaseRouter
endpointEndpointTypeAPI endpoint implementations (chat, completions, embeddings, etc.)
transportTransportTypeNetwork transport (HTTP via aiohttp)

Processing Categories

CategoryEnumDescription
record_processorRecordProcessorTypePer-record metric computation
results_processorResultsProcessorTypeAggregated results computation
gpu_telemetry_processorGPUTelemetryProcessorTypeSide-channel GPU telemetry aggregation/export within GPUTelemetryManager
server_metrics_processorServerMetricsProcessorTypeSide-channel Prometheus server metrics aggregation/export within ServerMetricsManager
data_exporterDataExporterTypeFile format exporters (CSV, JSON, Parquet)
console_exporterConsoleExporterTypeTerminal output exporters

Accuracy Categories

CategoryEnumDescription
accuracy_benchmarkAccuracyBenchmarkTypeAccuracy benchmark problem sets (MMLU, AIME, HellaSwag, BigBench, etc.)
accuracy_graderAccuracyGraderTypeGrading strategies for accuracy evaluation (exact match, math, multiple choice, code execution)

UI and Selection Categories

CategoryEnumDescription
uiUITypeUI implementations (dashboard, simple, none)
url_selection_strategyURLSelectionStrategyRequest distribution (round-robin)

Service Categories

CategoryEnumDescription
serviceServiceTypeCore AIPerf services
service_managerServiceRunTypeService orchestration. The built-in multiprocessing service-manager plugin is registered; Kubernetes execution is referenced by future-facing code paths but is not a registered service-manager plugin in this checkout.

Visualization and Telemetry Categories

CategoryEnumDescription
plotPlotTypeChart types (scatter, histogram, timeline, etc.)
gpu_telemetry_collectorGPUTelemetryCollectorTypeGPU metric collection (DCGM, pynvml)

Infrastructure Categories (Internal)

CategoryEnumDescription
communicationCommunicationBackendZMQ backends (IPC, TCP, dual-bind)
communication_clientCommClientTypeSocket patterns (PUB, SUB, PUSH, PULL)
zmq_proxyZMQProxyTypeMessage routing proxies

Sweep / Adaptive Search Categories

CategoryEnumDescription
search_recipeSearchRecipeTypeNamed presets that compile to AdaptiveSearchSweep or grid sweep parameters; selected via --search-recipe
search_recipe_post_processSearchRecipePostProcessTypeStateless handlers emitting derived artifacts (curves, knee points) into sweep_aggregate/ after SweepAnalyzer.compute()
convergence_criterionConvergenceCriterionTypeDecides when metrics have stabilized across repeated runs; selected via --convergence-mode
search_plannerSearchPlannerTypeDrives the adaptive outer loop via ask()/tell(); selected via --search-planner

Using Plugins

1from aiperf.plugin import plugins
2from aiperf.plugin.enums import PluginType, EndpointType
3
4# Get class by name, enum, or full path
5ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, "chat")
6ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, EndpointType.CHAT)
7ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, "aiperf.endpoints.openai_chat:ChatEndpoint")
8
9# Iterate plugins
10for entry, cls in plugins.iter_all(PluginType.ENDPOINT):
11 print(f"{entry.name}: {entry.class_path}")
12
13# Get metadata (raw dict or typed)
14metadata = plugins.get_metadata("endpoint", "chat")
15endpoint_meta = plugins.get_endpoint_metadata("chat") # Returns EndpointMetadata
FunctionReturnsUse Case
get_class(category, name)typeGet plugin class
iter_all(category)Iterator[tuple[PluginEntry, type]]List all plugins
get_metadata(category, name)dictRaw metadata
get_endpoint_metadata(name)EndpointMetadataTyped endpoint config
get_transport_metadata(name)TransportMetadataTyped transport config
get_plot_metadata(name)PlotMetadataTyped plot config
get_service_metadata(name)ServiceMetadataTyped service config
get_gpu_telemetry_collector_metadata(name)GPUTelemetryCollectorMetadataTyped GPU collector config

Creating Custom Plugins

Contributing directly to AIPerf? You only need two things:

  1. Add your class under src/aiperf/
  2. Register it in src/aiperf/plugin/plugins.yaml

The pyproject.toml entry points and separate package install below are only needed for external/third-party plugins.

Quick Start (4 steps):

StepFileAction
1my_endpoint.pyCreate class extending BaseEndpoint
2plugins.yamlRegister with class path, description, and metadata
3pyproject.tomlAdd entry point: my-package = "my_package:plugins.yaml"
4Terminaluv pip install -e . && aiperf plugins endpoint my_custom

Minimal Endpoint Example

1# my_package/endpoints/custom_endpoint.py
2class MyCustomEndpoint(BaseEndpoint):
3 def format_payload(self, request_info: RequestInfo) -> dict[str, Any]:
4 turn = request_info.turns[-1]
5 texts = [content for text in turn.texts for content in text.contents if content]
6 return {"prompt": texts[0] if texts else ""}
7
8 def parse_response(self, response: InferenceServerResponse) -> ParsedResponse | None:
9 if json_obj := response.get_json():
10 return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=json_obj.get("text", "")))
11 return None
1# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
2# my_package/plugins.yaml
3schema_version: "1.0"
4endpoint:
5 my_custom:
6 class: my_package.endpoints.custom_endpoint:MyCustomEndpoint
7 description: Custom endpoint for my API.
8 metadata: { endpoint_path: /v1/generate, supports_streaming: true, produces_tokens: true, tokenizes_input: true, metrics_title: My Custom Metrics }

Extend base classes (BaseEndpoint, etc.) to get logging, helpers, and default implementations. Only implement core methods.

Plugin Configuration

categories.yaml Schema

Defines plugin categories with their protocols and metadata schemas:

1# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/categories.schema.json
2schema_version: "1.0"
3
4endpoint:
5 protocol: aiperf.endpoints.protocols:EndpointProtocol
6 metadata_class: aiperf.plugin.schema.schemas:EndpointMetadata
7 enum: EndpointType
8 description: |
9 Endpoints define how to format requests and parse responses for different APIs.
10 internal: false # Set to true for infrastructure categories

plugins.yaml Schema

Registers plugin implementations:

1# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
2schema_version: "1.0"
3
4endpoint:
5 chat:
6 class: aiperf.endpoints.openai_chat:ChatEndpoint
7 description: OpenAI Chat Completions endpoint.
8 priority: 0 # Higher priority wins conflicts
9 metadata:
10 endpoint_path: /v1/chat/completions
11 supports_streaming: true
12 produces_tokens: true
13 tokenizes_input: true
14 metrics_title: LLM Metrics

Metadata Schemas

Category-specific metadata is validated against Pydantic models in aiperf.plugin.schema.schemas:

ModelKey Fields
EndpointMetadataendpoint_path, supports_streaming, produces_tokens, tokenizes_input, metrics_title + optional streaming/service/multimodal/polling fields
TransportMetadatatransport_type, url_schemes
PlotMetadatadisplay_name, category
ServiceMetadatarequired, auto_start, disable_gc, replicable
GPUTelemetryCollectorMetadatais_local

CLI Commands

CommandOutput
aiperf pluginsInstalled packages with versions and plugin counts
aiperf plugins --allAll categories with registered plugins
aiperf plugins endpointAll endpoint types with descriptions
aiperf plugins endpoint chatDetails: class path, package, metadata
aiperf plugins --validateValidates class paths and existence
$$ aiperf plugins endpoint chat
$╭───────────────── endpoint:chat ─────────────────╮
$│ Type: chat │
$│ Category: endpoint │
$│ Package: aiperf │
$│ Class: aiperf.endpoints.openai_chat:ChatEndpoint│
$│ │
$│ OpenAI Chat Completions endpoint. Supports │
$│ multi-modal inputs and streaming responses. │
$╰─────────────────────────────────────────────────╯

Advanced Topics

Conflict Resolution

PriorityRule
1Higher priority value wins
2External packages beat built-in (equal priority)
3First registered wins (with warning)

Shadowed plugins remain accessible via full class path: plugins.get_class("endpoint", "my_pkg.endpoints:MyEndpoint")

API Reference

1# Runtime registration (testing)
2plugins.register("endpoint", "test", TestEndpoint, priority=10)
3plugins.reset_registry() # Reset to initial state
4
5# Dynamic enum generation
6MyEndpointType = plugins.create_enum(PluginType.ENDPOINT, "MyEndpointType", module=__name__)
7
8# Validation without importing
9errors = plugins.validate_all(check_class=True) # {category: [(name, error), ...]}
10
11# Reverse lookup
12name = plugins.find_registered_name(PluginType.ENDPOINT, ChatEndpoint) # "chat"
13
14# Package metadata
15pkg = plugins.get_package_metadata("aiperf") # PackageInfo(version, author, ...)

Type Safety: get_class() returns typed results (e.g., type[EndpointProtocol]) with IDE autocomplete.

Built-in Plugins Reference

Endpoints

NameClassDescription
chatChatEndpointOpenAI Chat Completions API
chat_embeddingsChatEmbeddingsEndpointvLLM multimodal embeddings via chat API
completionsCompletionsEndpointOpenAI Completions API
cohere_rankingsCohereRankingsEndpointCohere Reranking API
embeddingsEmbeddingsEndpointOpenAI Embeddings API
hf_tei_rankingsHFTeiRankingsEndpointHuggingFace TEI Rankings
huggingface_generateHuggingFaceGenerateEndpointHuggingFace TGI
image_editImageEditEndpointOpenAI Image Edit (image-to-image) API; multipart upload of reference image + prompt to /v1/images/edits. Compatible with SGLang FLUX.2 unified diffusion serving.
image_generationImageGenerationEndpointOpenAI Image Generation API
image_retrievalImageRetrievalEndpointImage retrieval API
nim_embeddingsNIMEmbeddingsEndpointNVIDIA NIM Embeddings
nim_rankingsNIMRankingsEndpointNVIDIA NIM Rankings
responsesResponsesEndpointOpenAI Responses API
solido_ragSolidoEndpointSolido RAG Pipeline
templateTemplateEndpointTemplate for custom endpoints
video_generationVideoGenerationEndpointText-to-video generation API

Timing Strategies

NameClassDescription
fixed_scheduleFixedScheduleStrategySend requests at exact timestamps
request_rateRequestRateStrategySend requests at specified rate
user_centric_rateUserCentricStrategyEach session acts as separate user

Arrival Patterns

NameClassDescription
constantConstantIntervalGeneratorFixed intervals between requests
poissonPoissonIntervalGeneratorPoisson process arrivals
gammaGammaIntervalGeneratorGamma distribution with tunable smoothness
concurrency_burstConcurrencyBurstIntervalGeneratorSend ASAP up to concurrency limit

Dataset Composers

NameClassDescription
syntheticSyntheticDatasetComposerGenerate synthetic conversations
customCustomDatasetComposerLoad from JSONL files
synthetic_rankingsSyntheticRankingsDatasetComposerGenerate ranking tasks

UI Types

NameClassDescription
dashboardAIPerfDashboardUIRich terminal dashboard
simpleTQDMProgressUISimple tqdm progress bar
noneNoUIHeadless execution

Accuracy Benchmarks

NameClassDescription
mmluMMLUBenchmarkMassive Multitask Language Understanding
aimeAIMEBenchmarkAmerican Invitational Mathematics Examination
aime24AIME24BenchmarkAIME 2024 competition problems
aime25AIME25BenchmarkAIME 2025 competition problems
hellaswagHellaSwagBenchmarkHellaSwag commonsense reasoning
bigbenchBigBenchBenchmarkBIG-Bench benchmark tasks
math_500Math500BenchmarkMATH-500 problem set
gpqa_diamondGPQADiamondBenchmarkGPQA Diamond graduate-level science
lcb_codegenerationLCBCodeGenerationBenchmarkLiveCodeBench code generation

Accuracy Graders

NameClassDescription
exact_matchExactMatchGraderExact string matching
mathMathGraderMathematical expression evaluation
multiple_choiceMultipleChoiceGraderMultiple choice answer extraction
code_executionCodeExecutionGraderCode execution and output comparison

Troubleshooting

Plugin Not Found

TypeNotFoundError: Type 'my_plugin' not found for category 'endpoint'.

Solutions:

  1. Verify the plugin is registered in plugins.yaml
  2. Check the entry point is defined in pyproject.toml
  3. Reinstall the package in the active environment: uv pip install -e .
  4. Run aiperf plugins --validate to check for errors

Module Import Errors

ImportError: Failed to import module for endpoint:my_plugin

Solutions:

  1. Verify the class path format: module.path:ClassName
  2. Check all dependencies are installed
  3. Verify the module is importable: python -c "import module.path"

Class Not Found

AttributeError: Class 'MyClass' not found

Solutions:

  1. Verify the class name matches exactly (case-sensitive)
  2. Ensure the class is exported from the module
  3. Run aiperf plugins --validate for detailed error

Conflict Resolution Issues

If your plugin is being shadowed by another:

  1. Use higher priority: priority: 10 in plugins.yaml
  2. Access by full class path: plugins.get_class("endpoint", "my_pkg.endpoints:MyEndpoint")
  3. Check aiperf plugins to see which packages are loaded