AIPerf Code Patterns
Code examples for common development tasks. Referenced from CLAUDE.md.
CLI Command Pattern
Commands live in src/aiperf/cli_commands/, one file per command. They are
lazily loaded via import strings in aiperf.cli — modules are only imported
when their command is invoked:
Conventions:
- Export a single
Appnamedapp. - Hyphenate multi-word commands:
App(name="analyze-trace"). - Keep module-level imports minimal; heavy deps go inside the function body.
- Heavy implementation logic lives in a
cli.pyinside the owning domain package (e.g.aiperf/plugin/cli.py), lazily imported at call time.
Adding a New CLI Flag
CLIConfig is a flat DTO — every CLI flag is a top-level field on
CLIConfig with an Annotated[...] annotation that carries Pydantic
metadata + the cyclopts CLI binding. Never add a new nested config
class. Disambiguate collisions with a section prefix
(e.g. image_batch_size vs audio_batch_size).
Pick a Groups.X from src/aiperf/config/cli_parameter.py:
ENDPOINT, INPUT, FIXED_SCHEDULE, GOODPUT, OUTPUT, HTTP_TRACE,
TOKENIZER, LOAD_GENERATOR, WARMUP, USER_CENTRIC,
REQUEST_CANCELLATION, CONVERSATION_INPUT, ISL, OSL, PROMPT,
PREFIX_PROMPT, RANKINGS, SYNTHESIS, AUDIO_INPUT, IMAGE_INPUT,
VIDEO_INPUT, SERVICE, SERVER_METRICS, GPU_TELEMETRY, UI,
WORKERS, ZMQ_COMMUNICATION, ACCURACY, MULTI_RUN.
If none fit, prefer adding a new Groups.X constant in
src/aiperf/config/cli_parameter.py over reusing an unrelated group.
Then:
- Add the attr name to the appropriate
<SECTION>_FIELDSfrozenset insrc/aiperf/config/flags/_section_fields.pyso the resolver/converter can scopecli.model_fields_set & <SECTION>_FIELDSqueries. - If the flag maps to an existing
AIPerfConfigkey, add an entry to that section’s field map (e.g._ENDPOINT_FIELD_MAPin_converter_endpoint.py). Otherwise, read it directly in the relevant_converter_*.pybuilder. - Run
make generate-cli-docsto regendocs/cli-options.md. Runmake generate-env-vars-docsif you also added a corresponding env var. - Add a unit test under
tests/unit/config/constructingCLIConfig(my_new_flag=...)and asserting the converter emits the rightAIPerfConfigshape. - The disjointedness invariant in
tests/unit/config/v1/test_section_fields.pywill catch any cross-section name collision automatically.
CLI flag DTO charter (enforced):
- No validators on CLIConfig fields.
BeforeValidator(parse_str_or_list)for CLI input coercion is fine; domain validation (range checks across fields, cross-field constraints) lives onAIPerfConfig, not CLIConfig. - The CLI-to-envelope converter is the only module outside
cli_commands/that may readCLIConfigattributes.
Service Pattern
Services run in separate processes via bootstrap.py:
Register in plugins.yaml:
Config types:
CLIConfig: unified CLI input DTO carrying both benchmark params (endpoints, loadgen) and service-runtime knobs (ZMQ ports, logging level)
Model Pattern
Use AIPerfBaseModel for data, BaseConfig for configuration:
Message Pattern
Messages require message_type field and handler decorator:
Auto-subscription happens during @on_init phase.
Plugin System Pattern
YAML-based registry with lazy-loading:
Local GPU telemetry collectors declare themselves via is_local. Each collector class implements validate_environment() to surface missing native bindings before the benchmark starts; DCGM is a passthrough no-op.
Error Handling Pattern
Log errors and publish ErrorDetails in messages:
Logging Pattern
Use lambda for expensive log messages:
NaN/Inf Discipline Pattern
NaN/+inf/-inf in metric data corrupts downstream artifacts in three ways:
orjson.dumps (and Pydantic model_dump_json) silently coerce them to JSON
null, which is indistinguishable from “metric was missing”; CSV writers
emit literal "nan"/"inf" strings that pandas/duckdb parse
inconsistently; and np.mean/np.std/polyfit poison downstream decision
logic (Pareto fronts, BO acquisition maxima, plateau detectors) without
raising.
The aiperf.common.finite module centralizes the discipline as four
primitives. Use them at every numeric boundary.
FiniteFloat for Pydantic metric fields
The AfterValidator rejects NaN/+inf/-inf at config-load and
model_validate time with a debuggable message. For
finite-or-explicitly-missing semantics, use FiniteFloat | None — the
validator only fires when a non-None value is provided.
scrub_non_finite before every JSON exporter
scrub_non_finite recursively walks dict/list/tuple containers and
rewrites non-finite numeric values to None. It leaves str/bytes/bool
alone and handles numpy scalar types correctly (numpy.float32,
numpy.float64).
is_finite_value for the canonical finiteness check
Use is_finite_value instead of math.isfinite or not math.isnan:
isinstance(x, float) misses numpy scalar types on some numpy versions,
and math.isfinite raises on non-numeric inputs.
nan_safe_mean / nan_safe_std for aggregation
Both functions return None (not NaN) when the input has too few finite
values, so callers can distinguish “no data” from “data averaged to NaN”.
Don’t: the bug pattern these primitives prevent
Mechanical CI invariants in tests/unit/property/test_finite_invariants.py
reject all three patterns for new code; see
/aiperf/architecture-internals/global-property-test-invariants for the full contract and
the baseline-ratchet mechanism.
Safe Filesystem Reads Pattern
User-supplied filesystem paths reaching AIPerf (e.g. --extra-inputs payload_template=<path>, endpoint.template.body in a YAML config) must
go through aiperf.common.path_safety.safe_read_template_path rather than
inline Path(...).read_text() / open(...).read(). The helper is the
canonical CWE-22 path-traversal sanitizer recognized by SAST tools — every
inline read regenerates that finding.
What the helper does
Sanitizer chain (in the order SAST engines walk it):
Path(ts).expanduser()— catchesTypeError/ValueError/RuntimeError(the last fires on unresolvable~userprefixes).- Reject if
pathor any component inpath.parentsis a symlink.resolve()alone is insufficient because it follows symlinked parent directories silently. path.resolve(strict=True)— the canonical sanitizer that Snyk/CodeQL/Semgrep recognize; raises on missing paths.- Require
resolved.is_file()— rejects directories, devices, fifos. read_text(encoding="utf-8")— explicit decode; no platform default. CatchesUnicodeErroralongsideOSErrorso non-UTF-8 files fall back to the literal-string branch rather than crashing config conversion.
Returning None on any failure preserves the existing “treat as a literal
value” fallback that both call sites (_converter_endpoint and
TemplateEndpoint.__init__) already implement.
When this pattern does NOT apply
- Path joining of trusted strings —
Path(__file__).parent / "data.yaml",artifact_dir / "inputs.json". These never resolve untrusted input; no sanitizer needed. - Binary reads —
open(p, "rb")for parquet/orjson/etc. The helper is UTF-8-text only. If a hardened binary variant is needed, add it toaiperf.common.path_safetyalongside the existing helper rather than inliningread_bytes(). - Reads where missing-file should hard-fail rather than fall back — the
helper still works (returns
None); the caller is responsible for raising instead of substituting a literal.
Testing Pattern
Auto-fixtures (always active): asyncio.sleep runs instantly, RNG=42, singletons reset.
Console Exporter Pattern
Console exporters subclass ConsoleMetricsExporter and configure rendering via class attributes — no method overrides required for the common case. The base class handles filtering, grouping, table construction, and printing; subclasses just declare what to show and when to run.
Override _check_enabled(self, exporter_config) to raise ConsoleExporterDisabled when the exporter shouldn’t run (env var, user-config flag, dev mode). The base class no-ops (always-enabled). The flag-driven sibling exporters (ConsoleInternalMetricsExporter, ConsoleExperimentalMetricsExporter, HttpTraceConsoleExporter) follow this pattern verbatim — copy one of them as a starting point.
Uncertainty Plot Pattern
The latency-throughput uncertainty plot uses a one-data-contract, three-renderers architecture.
Data Contract
Multi-Series Data Contract
Plotly Renderer (interactive + Kaleido PNG)
Matplotlib Renderer (code-gen reports)
Ellipse Geometry Utility
Plot Envelope (plot:)
AIPerfConfig accepts an optional top-level plot: key that fully describes
which plots are rendered after the run. Two forms are supported:
When plot: is set, ~/.aiperf/plot_config.yaml is ignored and
artifacts.auto_plot flips to True unless explicitly false. The auto-plot
callback writes the resolved envelope to <artifact_dir>/.aiperf-plot-config.yaml
as a reproducibility receipt, so aiperf plot <run> later picks it up
automatically without needing the original AIPerf YAML. Pydantic models live in
src/aiperf/config/plot.py.
Validator Pattern
Per-feature load-time validators (e.g. BranchOrchestrator v1) run from the
end of dataset loaders. Unsupported constructs raise NotImplementedError
with a <loc>: <reason> prefix where <loc> identifies the offending
conversation/turn so misconfigurations surface before any credit is issued:
Endpoint Mixin Pattern
Reusable response-parsing behavior lives in mixins applied to endpoint classes:
The mixin in src/aiperf/endpoints/response_mixin.py compiles an optional
endpoint.extra.response_field JMESPath query at construction time, with
auto-detect fallback when the query fails or no JSON body is present.
Per-turn dataset extra
Custom dataset rows use extra for non-native request-body fields. Loaders map that user-facing field into internal Turn.extra_body. Every endpoint formatter that builds a JSON request body shallow-merges Turn.extra_body into the wire body at the very end of payload construction, AFTER model_endpoint.endpoint.extra. The merge is shallow dict.update; user-provided keys win on collision.
Rules new formatters and loaders must follow:
- Dispatch-turn scoping. Endpoint formatters read
turn.extra_body,turn.max_tokens, andturn.modelfromrequest_info.turns[-1]only. Parent turns earlier in the conversation history must never leak these request-control fields into a child payload, so DAG/FORK children stay clean of parent vendor knobs, limits, or model overrides. - Tools-as-system-prompt. Only
raw_toolswalksrequest_info.turnsfrom the end viaBaseEndpoint._latest_turn_attr. Tool definitions behave like a system prompt and persist across a multi-turn or FORK conversation when the dispatching turn does not redeclare them. - Dataset user-facing field is
extra. Custom dataset row schemas (SingleTurn, innerMultiTurnturns,MooncakeTrace,DagTurn) declare a per-turnextra: dict[str, Any] | None. Loaders translaterow.extraintoTurn.extra_bodyat construction time.DagTurnuses Pydantic’sextra="forbid"so a typo’dextra_bodyis rejected at load time; the other dataset schemas areextra="allow"so an unrecognizedextra_bodyis silently ignored — author the supported field instead.
Coverage:
- Chat-style formatters with full history flattening (
openai_chat,chat_embeddingsvia inheritance,openai_responses). - Single-turn formatters (
openai_completions,openai_embeddingsandnim_embeddings,openai_image_generation,openai_video_generation,openai_image_edit,nim_image_retrieval,huggingface_generate,solido_rag, the rankings family viaBaseRankingsEndpoint, andtemplate_endpoint).
huggingface_generate deliberately merges extra_body at the TOP level of the wire body (not nested under parameters).
openai_image_edit filters reserved keys (prompt, image, url, mask) out of both endpoint extras and extra_body to protect the multipart upload contract.
raw_endpoint intentionally skips this merge — it ships the user-authored Turn.raw_payload verbatim.
Strategy Protocol Pattern
The OTel results processor uses a strategy protocol to dispatch incoming data to specialised handlers. Each strategy declares what data it supports and processes matching records independently:
Concrete strategies accept a context object at construction time and implement the two-method interface:
The processor iterates registered strategies on each incoming record:
Conventions:
- One strategy class per file under
post_processors/strategies/. supports()usesisinstancechecks — no dynamic dispatch tables.OTelStrategyContextProtocolexposes instrument factories (get_or_create_histogram, etc.) so strategies never construct OTel instruments directly.
Drop-Oldest Fanout Queue
OTelMetricsResultsProcessor fans out metric events to a dedicated child
process via a bounded multiprocessing.Queue. The queue uses drop-oldest
semantics so the hot path (the main benchmark loop) is never blocked by a slow
downstream consumer.
Queue sizing:
Backpressure algorithm:
- Attempt
queue.put_nowait(event). - On
queue.Full, callqueue.get_nowait()to discard the oldest event. - Retry
queue.put_nowait(event)once. - If the retry also fails, increment
_fanout_dropped_eventsand log at thresholds (1, 100, 1 000 drops).
Design rationale:
- The benchmark hot path must never block on telemetry I/O.
- Dropping the oldest event (rather than the newest) preserves the most recent state, which is more useful for live dashboards.
- The counter
_fanout_dropped_eventsis reported at shutdown so operators can tuneAIPERF_OTEL_MAX_BUFFERED_RECORDSif drops are frequent.