Code examples for common development tasks. Referenced from CLAUDE.md.
Commands live in src/aiperf/cli_commands/, one file per command. They are
lazily loaded via import strings in aiperf.cli — modules are only imported
when their command is invoked:
Conventions:
App named app.App(name="analyze-trace").cli.py inside the owning domain
package (e.g. aiperf/plugin/cli.py), lazily imported at call time.CLIConfig is a flat DTO — every CLI flag is a top-level field on
CLIConfig with an Annotated[...] annotation that carries Pydantic
metadata + the cyclopts CLI binding. Never add a new nested config
class. Disambiguate collisions with a section prefix
(e.g. image_batch_size vs audio_batch_size).
Pick a Groups.X from src/aiperf/config/cli_parameter.py:
ENDPOINT, INPUT, FIXED_SCHEDULE, GOODPUT, OUTPUT, HTTP_TRACE,
TOKENIZER, LOAD_GENERATOR, WARMUP, USER_CENTRIC,
REQUEST_CANCELLATION, CONVERSATION_INPUT, ISL, OSL, PROMPT,
PREFIX_PROMPT, RANKINGS, SYNTHESIS, AUDIO_INPUT, IMAGE_INPUT,
VIDEO_INPUT, SERVICE, SERVER_METRICS, GPU_TELEMETRY, UI,
WORKERS, ZMQ_COMMUNICATION, ACCURACY, MULTI_RUN.
If none fit, prefer adding a new Groups.X constant in
src/aiperf/config/cli_parameter.py over reusing an unrelated group.
Then:
<SECTION>_FIELDS frozenset in
src/aiperf/config/flags/_section_fields.py so the resolver/converter
can scope cli.model_fields_set & <SECTION>_FIELDS queries.AIPerfConfig key, add an entry to that section’s
field map (e.g. _ENDPOINT_FIELD_MAP in _converter_endpoint.py).
Otherwise, read it directly in the relevant _converter_*.py builder.make generate-cli-docs to regen docs/cli-options.md. Run
make generate-env-vars-docs if you also added a corresponding env var.tests/unit/config/ constructing
CLIConfig(my_new_flag=...) and asserting the converter emits the
right AIPerfConfig shape.tests/unit/config/v1/test_section_fields.py will catch any
cross-section name collision automatically.CLI flag DTO charter (enforced):
BeforeValidator(parse_str_or_list) for
CLI input coercion is fine; domain validation (range checks across fields,
cross-field constraints) lives on AIPerfConfig, not CLIConfig.cli_commands/ that may
read CLIConfig attributes.Services run in separate processes via bootstrap.py:
Register in plugins.yaml:
Config types:
CLIConfig: unified CLI input DTO carrying both benchmark params (endpoints, loadgen) and service-runtime knobs (ZMQ ports, logging level)Use AIPerfBaseModel for data, BaseConfig for configuration:
Messages require message_type field and handler decorator:
Auto-subscription happens during @on_init phase.
YAML-based registry with lazy-loading:
Local GPU telemetry collectors declare themselves via is_local. Each collector class implements validate_environment() to surface missing native bindings before the benchmark starts; DCGM is a passthrough no-op.
Log errors and publish ErrorDetails in messages:
Use lambda for expensive log messages:
NaN/+inf/-inf in metric data corrupts downstream artifacts in three ways:
orjson.dumps (and Pydantic model_dump_json) silently coerce them to JSON
null, which is indistinguishable from “metric was missing”; CSV writers
emit literal "nan"/"inf" strings that pandas/duckdb parse
inconsistently; and np.mean/np.std/polyfit poison downstream decision
logic (Pareto fronts, BO acquisition maxima, plateau detectors) without
raising.
The aiperf.common.finite module centralizes the discipline as four
primitives. Use them at every numeric boundary.
FiniteFloat for Pydantic metric fieldsThe AfterValidator rejects NaN/+inf/-inf at config-load and
model_validate time with a debuggable message. For
finite-or-explicitly-missing semantics, use FiniteFloat | None — the
validator only fires when a non-None value is provided.
scrub_non_finite before every JSON exporterscrub_non_finite recursively walks dict/list/tuple containers and
rewrites non-finite numeric values to None. It leaves str/bytes/bool
alone and handles numpy scalar types correctly (numpy.float32,
numpy.float64).
is_finite_value for the canonical finiteness checkUse is_finite_value instead of math.isfinite or not math.isnan:
isinstance(x, float) misses numpy scalar types on some numpy versions,
and math.isfinite raises on non-numeric inputs.
nan_safe_mean / nan_safe_std for aggregationBoth functions return None (not NaN) when the input has too few finite
values, so callers can distinguish “no data” from “data averaged to NaN”.
Mechanical CI invariants in tests/unit/property/test_finite_invariants.py
reject all three patterns for new code; see
/aiperf/dev/architecture-internals/global-property-test-invariants for the full contract and
the baseline-ratchet mechanism.
User-supplied filesystem paths reaching AIPerf (e.g. --extra-inputs payload_template=<path>, endpoint.template.body in a YAML config) must
go through aiperf.common.path_safety.safe_read_template_path rather than
inline Path(...).read_text() / open(...).read(). The helper is the
canonical CWE-22 path-traversal sanitizer recognized by SAST tools — every
inline read regenerates that finding.
Sanitizer chain (in the order SAST engines walk it):
Path(ts).expanduser() — catches TypeError / ValueError /
RuntimeError (the last fires on unresolvable ~user prefixes).path or any component in path.parents is a symlink.
resolve() alone is insufficient because it follows symlinked parent
directories silently.path.resolve(strict=True) — the canonical sanitizer that
Snyk/CodeQL/Semgrep recognize; raises on missing paths.resolved.is_file() — rejects directories, devices, fifos.read_text(encoding="utf-8") — explicit decode; no platform default. Catches UnicodeError alongside OSError so non-UTF-8 files fall back to the literal-string branch rather than crashing config conversion.Returning None on any failure preserves the existing “treat as a literal
value” fallback that both call sites (_converter_endpoint and
TemplateEndpoint.__init__) already implement.
Path(__file__).parent / "data.yaml",
artifact_dir / "inputs.json". These never resolve untrusted input; no
sanitizer needed.open(p, "rb") for parquet/orjson/etc. The helper is
UTF-8-text only. If a hardened binary variant is needed, add it to
aiperf.common.path_safety alongside the existing helper rather than
inlining read_bytes().None); the caller is responsible for
raising instead of substituting a literal.Auto-fixtures (always active): asyncio.sleep runs instantly, RNG=42, singletons reset.
Console exporters subclass ConsoleMetricsExporter and configure rendering via class attributes — no method overrides required for the common case. The base class handles filtering, grouping, table construction, and printing; subclasses just declare what to show and when to run.
Override _check_enabled(self, exporter_config) to raise ConsoleExporterDisabled when the exporter shouldn’t run (env var, user-config flag, dev mode). The base class no-ops (always-enabled). The flag-driven sibling exporters (ConsoleInternalMetricsExporter, ConsoleExperimentalMetricsExporter, HttpTraceConsoleExporter) follow this pattern verbatim — copy one of them as a starting point.
The latency-throughput uncertainty plot uses a one-data-contract, three-renderers architecture.
plot:)AIPerfConfig accepts an optional top-level plot: key that fully describes
which plots are rendered after the run. Two forms are supported:
When plot: is set, ~/.aiperf/plot_config.yaml is ignored and
artifacts.auto_plot flips to True unless explicitly false. The auto-plot
callback writes the resolved envelope to <artifact_dir>/.aiperf-plot-config.yaml
as a reproducibility receipt, so aiperf plot <run> later picks it up
automatically without needing the original AIPerf YAML. Pydantic models live in
src/aiperf/config/plot.py.
Per-feature load-time validators (e.g. BranchOrchestrator v1) run from the
end of dataset loaders. Unsupported constructs raise NotImplementedError
with a <loc>: <reason> prefix where <loc> identifies the offending
conversation/turn so misconfigurations surface before any credit is issued:
Reusable response-parsing behavior lives in mixins applied to endpoint classes:
The mixin in src/aiperf/endpoints/response_mixin.py compiles an optional
endpoint.extra.response_field JMESPath query at construction time, with
auto-detect fallback when the query fails or no JSON body is present.
extraCustom dataset rows use extra for non-native request-body fields. Loaders map that user-facing field into internal Turn.extra_body. Every endpoint formatter that builds a JSON request body shallow-merges Turn.extra_body into the wire body at the very end of payload construction, AFTER model_endpoint.endpoint.extra. The merge is shallow dict.update; user-provided keys win on collision.
Rules new formatters and loaders must follow:
turn.extra_body, turn.max_tokens, and turn.model from request_info.turns[-1] only. Parent turns earlier in the conversation history must never leak these request-control fields into a child payload, so DAG/FORK children stay clean of parent vendor knobs, limits, or model overrides.raw_tools walks request_info.turns from the end via BaseEndpoint._latest_turn_attr. Tool definitions behave like a system prompt and persist across a multi-turn or FORK conversation when the dispatching turn does not redeclare them.extra. Custom dataset row schemas (SingleTurn, inner MultiTurn turns, MooncakeTrace, DagTurn) declare a per-turn extra: dict[str, Any] | None. Loaders translate row.extra into Turn.extra_body at construction time. DagTurn uses Pydantic’s extra="forbid" so a typo’d extra_body is rejected at load time; the other dataset schemas are extra="allow" so an unrecognized extra_body is silently ignored — author the supported field instead.Coverage:
openai_chat, chat_embeddings via inheritance, openai_responses).openai_completions, openai_embeddings and nim_embeddings, openai_image_generation, openai_video_generation, openai_image_edit, nim_image_retrieval, huggingface_generate, solido_rag, the rankings family via BaseRankingsEndpoint, and template_endpoint).huggingface_generate deliberately merges extra_body at the TOP level of the wire body (not nested under parameters).
openai_image_edit filters reserved keys (prompt, image, url, mask) out of both endpoint extras and extra_body to protect the multipart upload contract.
raw_endpoint intentionally skips this merge — it ships the user-authored Turn.raw_payload verbatim.
The OTel results processor uses a strategy protocol to dispatch incoming data to specialised handlers. Each strategy declares what data it supports and processes matching records independently:
Concrete strategies accept a context object at construction time and implement the two-method interface:
The processor iterates registered strategies on each incoming record:
Conventions:
post_processors/strategies/.supports() uses isinstance checks — no dynamic dispatch tables.OTelStrategyContextProtocol exposes instrument factories (get_or_create_histogram, etc.) so strategies never construct OTel instruments directly.OTelMetricsResultsProcessor fans out metric events to a dedicated child
process via a bounded multiprocessing.Queue. The queue uses drop-oldest
semantics so the hot path (the main benchmark loop) is never blocked by a slow
downstream consumer.
Queue sizing:
Backpressure algorithm:
queue.put_nowait(event).queue.Full, call queue.get_nowait() to discard the oldest event.queue.put_nowait(event) once._fanout_dropped_events and log at
thresholds (1, 100, 1 000 drops).Design rationale:
_fanout_dropped_events is reported at shutdown so operators can
tune AIPERF_OTEL_MAX_BUFFERED_RECORDS if drops are frequent.