GenAI-Perf vs AIPerf CLI Feature Comparison Matrix

View as Markdown

This comparison matrix shows the supported CLI options between GenAI-Perf and AIPerf.

This is a living document and will be updated as new features are added to AIPerf.

Legend:

  • Fully Supported - Feature available with same/similar functionality
  • Enhanced - Feature available in both tools, with broader capabilities or better ergonomics on the marked side
  • 🟡 Partial Support - Feature available but with different parameters or limitations
  • N/A Not Applicable - Feature not applicable
  • Not Supported - Feature not currently supported

AIPerf is the successor to GenAI-Perf, so most ⭐ marks fall in AIPerf’s column. They flag rows where AIPerf doesn’t merely match the GenAI-Perf surface but expands on it. ❌ vs ✅ rows are already self-explanatory and are left unannotated.


Core Subcommands

SubcommandDescriptionGenAI-PerfAIPerfNotes
analyze-traceAnalyze mooncake trace for prefix statistics
profileProfile LLMs and GenAI modelsAIPerf accepts a YAML config via profile -f config.yaml (CLI flags override)
plotGenerate visualizations from profiling dataAuto-detects multi-run comparison vs single-run analysis; renders Pareto overlays when multiple artifact dirs are passed; supports dashboard mode
analyzeSweep through multiple scenariosAIPerf folds sweeps into profile via magic-list CLI flags, --variant, or YAML sweep: blocks (grid, zip, Sobol, Latin Hypercube). See Parameter Sweeping
configRun a YAML config end-to-end✅ (separate config subcommand)🟡AIPerf has no aiperf config <yaml> run shortcut — pass -f config.yaml to aiperf profile instead
create-template / config initScaffold a template config✅ (GenAI-Perf: create-template)AIPerf: aiperf config init -t <template>; supports --list, --search, --category for discovery
config expandPreview a sweep without running itPrints every variation the orchestrator would iterate; --full/--index/--format controls verbosity
config validatePre-flight validate a config fileRuns the same load pipeline as profile; non-zero exit on fatal errors, warnings to stderr
pluginsList/inspect registered pluginsaiperf plugins enumerates planners, recipes, exporters, dataset loaders, and more
synthesizeMaterialize a synthetic dataset to diskUseful for caching dataset generation between repeated sweep cells
process-export-filesMulti-node result aggregationN/AAIPerf aggregates results in real-time

Endpoint Types Support Matrix

--endpoint-type

Endpoint TypeDescriptionGenAI-PerfAIPerfNotes
chatStandard chat completion API (OpenAI-compatible)
completionsText completion API for prompt completion
embeddingsText embedding generation for similarity/search
rankingsText ranking/re-ranking for search relevance✅ ⭐GenAI-Perf has a single generic rankings endpoint (/v1/ranking, HF-TEI-compatible). AIPerf splits it into dedicated nim_rankings, hf_tei_rankings, and cohere_rankings endpoints.
hf_tei_rankingsHuggingFace TEI re-ranker API🟡GenAI-Perf has only generic rankings; AIPerf has a dedicated endpoint at /rerank
nim_rankingsNVIDIA NIM re-ranker API
cohere_rankingsCohere re-ranker API
chat_embeddingsChat-style multimodal embeddings (vLLM VLM2Vec)
embeddings (NIM)NVIDIA NIM embeddings endpointAIPerf nim_embeddings; supports text and image inputs
responsesOpenAI Responses API endpointMulti-modal (text, image, audio) with streaming
dynamic_grpcDynamic gRPC service calls
huggingface_generateHuggingFace TGI generate API/generate and /generate_stream supported
image_generationOpenAI-compatible image generation (/v1/images/generations)DALL-E-style text-to-image; supports raw export for image extraction
video_generationOpenAI/SGLang text-to-video (/v1/videos)Async polling; Sora / Wan2.1 / HunyuanVideo compatible; multipart-form requests
image_retrievalImage search and retrieval endpointsAIPerf serves NIM image retrieval / bounding-box detection at /v1/infer
nvclipNVIDIA CLIP model endpoints
multimodalMulti-modal (text + image/audio) endpointsAIPerf uses chat endpoint with multimodal content
generateGeneric text generation endpoints
kserveKServe model serving endpoints
templateTemplate-based inference endpoints🟡AIPerf supports multimodal and multi-turn templates
tensorrtllm_engineTensorRT-LLM engine direct access
visionComputer vision model endpointsAIPerf uses chat endpoint for VLMs
solido_ragSOLIDO RAG endpoint

Endpoint Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Model Names-m
Model Selection Strategy--model-selection-strategy
{round_robin,random}
Backend Selection--backend
{tensorrtllm,vllm}
Custom Endpoint--endpoint
Endpoint Type--endpoint-type✅ ⭐AIPerf supports 15+ endpoint types vs. GenAI-Perf’s 12; see detailed comparison
Server Metrics URL--server-metrics-urlAIPerf uses --server-metrics (enabled by default, auto-collects Prometheus metrics from the inference endpoint at base_url + /metrics). See the note below on GenAI-Perf’s flag name.
Streaming--streaming
URL-u URL
--url
Request Timeout--request-timeout-seconds
API Key--api-key✅ ⭐GenAI-Perf has no dedicated flag — users must pass -H 'Authorization: Bearer ...' manually
Request Content Type--request-content-type
{application/json,multipart/form-data}
Switch between JSON and multipart-form encoding (required by some video-gen servers)

GenAI-Perf’s --server-metrics-url is misleadingly named. Despite the “server metrics” label, the flag points GenAI-Perf at a Triton / DCGM telemetry endpoint (GPU power, utilization, memory) — it is not a general Prometheus inference-server metrics scraper. AIPerf splits this into two clearly-scoped flags:

  • --server-metrics — Prometheus inference-server metrics from the model endpoint (base_url + /metrics). Enabled by default; pass additional endpoint URLs to scrape extra targets.
  • --gpu-telemetry — GPU telemetry collection. Supports both the DCGM exporter HTTP endpoint (default; localhost:9400 + localhost:9401) and the local pynvml library (pass pynvml). Custom DCGM exporter URLs and a dashboard realtime view are also accepted.

If you’re porting a GenAI-Perf invocation, --server-metrics-url http://node:9400 maps to AIPerf’s --gpu-telemetry http://node:9400, not to --server-metrics.


Input Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Extra Inputs--extra-inputs
Custom Headers--header -H
YAML Config File-f --config✅ (separate config subcommand)AIPerf passes YAML to profile -f; CLI flags override file values
Input File--input-file
Inline Records in YAMLdataset.records: (YAML only)Embed dataset rows directly in the YAML config; >500 records emits a warning
Dataset Entries--num-dataset-entries --num-promptsGenAI-Perf and AIPerf both accept this flag. In AIPerf it is collapsed with --num-sessions / --conversation-num / --num-conversations into a single conversation count; GenAI-Perf keeps --num-dataset-entries and --num-sessions distinct (see Session Configuration).
Public Dataset--public-datasetsharegpt, aimo, mmstar, vision_arena, llava_onevision, speed_bench_* (50+ subsets), librispeech, voxpopuli, gigaspeech, ami, spgispeech, instruct_coder, blazedit_5k, blazedit_10k, …
HuggingFace Subset Override--hf-subsetOverride the HF subset/config for HF-backed public datasets
Custom Dataset Type--custom-dataset-type
{single_turn,multi_turn,random_pool,mooncake_trace,bailian_trace,burst_gpt_trace,sagemaker_data_capture}
GenAI-Perf infers dataset type from input file format
Dataset Sampling Strategy--dataset-sampling-strategy
{sequential,random,shuffle}
Controls how entries are drawn during benchmarking
Fixed Schedule--fixed-schedule
Fixed Schedule Auto Offset--fixed-schedule-auto-offset
Fixed Schedule Start/End Offset--fixed-schedule-start-offset
--fixed-schedule-end-offset
Random Seed--random-seed
GRPC Method--grpc-method

Output Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Artifact Directory--artifact-dir
Checkpoint Directory--checkpoint-dir
Generate Plots--generate-plots✅ ⭐AIPerf replaces inline plot generation with the dedicated aiperf plot subcommand: dashboard mode, Pareto overlays across runs, configurable plot envelope, auto-plot hook
Auto-Plot After Profile--auto-plot --no-auto-plotAuto-runs aiperf plot on the artifact dir after the benchmark completes; honored by recipe defaults
Plot Required (Strict)--plot-requiredTreat auto-plot failures as fatal (non-zero exit)
Export Level--export-level --profile-export-level
{summary,records,raw}
Controls whether per-record and raw request/response files are emitted alongside the summary
Time-Sliced Metrics--slice-durationWindow the benchmark timeline into fixed slices and compute metrics per slice
Enable Checkpointing--enable-checkpointing
Profile Export File--profile-export-fileAIPerf works as a prefix for the profile export file names.

Tokenizer Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Tokenizer--tokenizer
Tokenizer Revision--tokenizer-revision
Tokenizer Trust Remote Code--tokenizer-trust-remote-code

Load Generator Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Concurrency--concurrency
Request Rate--request-rate
Request Count--request-count
--num-requests
Request Rate w/ Max Concurrency--request-rate with --concurrencyDual control of rate and concurrency ceiling
Measurement Interval--measurement-interval -pN/ANot applicable to AIPerf
Stability Percentage--stability-percentage -sN/ANot applicable to AIPerf

Arrival Pattern Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Arrival Pattern--arrival-pattern
{constant,poisson,gamma}
Controls inter-arrival time distribution
Arrival Smoothness--arrival-smoothness
--vllm-burstiness
Gamma distribution shape: <1=bursty, 1=Poisson, >1=smooth

Duration-Based Benchmarking

FeatureCLI OptionGenAI-PerfAIPerfNotes
Benchmark Duration--benchmark-durationStop after N seconds
Benchmark Grace Period--benchmark-grace-periodWait for in-flight requests after duration (default: 30s, supports inf)

Concurrency Control

FeatureCLI OptionGenAI-PerfAIPerfNotes
Session Concurrency--concurrencyMax concurrent sessions
Prefill Concurrency--prefill-concurrencyLimit concurrent prefill operations (requires --streaming)

Gradual Ramping

FeatureCLI OptionGenAI-PerfAIPerfNotes
Concurrency Ramp--concurrency-ramp-durationRamp concurrency from 1 to target over N seconds
Prefill Concurrency Ramp--prefill-concurrency-ramp-durationRamp prefill concurrency over N seconds
Request Rate Ramp--request-rate-ramp-durationRamp request rate over N seconds

User-Centric Timing (KV Cache Benchmarking)

FeatureCLI OptionGenAI-PerfAIPerfNotes
User-Centric Rate--user-centric-ratePer-user rate limiting with consistent turn gaps
Number of Users--num-usersNumber of simulated users (required with --user-centric-rate)
Shared System Prompt--shared-system-prompt-lengthSystem prompt shared across all users (KV cache prefix)
User Context Prompt--user-context-prompt-lengthPer-user unique context padding

Warmup Phase Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Warmup Request Count--warmup-request-count
Warmup Duration--warmup-durationDuration-based warmup stop condition
Warmup Session Count--num-warmup-sessionsSession-based warmup stop condition
Warmup Concurrency--warmup-concurrencyOverride concurrency during warmup
Warmup Prefill Concurrency--warmup-prefill-concurrencyOverride prefill concurrency during warmup
Warmup Request Rate--warmup-request-rateOverride request rate during warmup
Warmup Arrival Pattern--warmup-arrival-patternOverride arrival pattern during warmup
Warmup Grace Period--warmup-grace-periodGrace period for warmup responses
Warmup Concurrency Ramp--warmup-concurrency-ramp-durationRamp warmup concurrency
Warmup Prefill Ramp--warmup-prefill-concurrency-ramp-durationRamp warmup prefill concurrency
Warmup Rate Ramp--warmup-request-rate-ramp-durationRamp warmup request rate

Session/Conversation Configuration (Multi-turn)

FeatureCLI OptionGenAI-PerfAIPerfNotes
Number of Sessions--num-sessions
Session Concurrency--session-concurrencyUse --concurrency for AIPerf
Session Delay Ratio--session-delay-ratio
Session Turn Delay Mean--session-turn-delay-mean
Session Turn Delay Stddev--session-turn-delay-stddev
Session Turns Mean--session-turns-mean
Session Turns Stddev--session-turns-stddev

Input Sequence Length (ISL) Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Input Tokens Mean--synthetic-input-tokens-mean
--isl
Input Tokens Stddev--synthetic-input-tokens-stddev
Input Tokens Block Size--prompt-input-tokens-block-size
--isl-block-size
Used for mooncake_trace hash_id blocks

Output Sequence Length (OSL) Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Output Tokens Mean--output-tokens-mean
--osl
Output Tokens Stddev--output-tokens-stddev
Output Tokens Mean Deterministic--output-tokens-mean-deterministicOnly applicable to Triton

Batch Size Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Text Batch Size--batch-size-text
--batch-size -b
Audio Batch Size--batch-size-audio
Image Batch Size--batch-size-image

Prefix Prompt Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Number of Prefix Prompts--num-prefix-prompts
Prefix Prompt Length--prefix-prompt-length

Audio Input Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Audio Length Mean--audio-length-mean
Audio Length Stddev--audio-length-stddev
Audio Format--audio-format
{wav,mp3,random}
🟡GenAI-Perf supports {wav, mp3} only; AIPerf adds random
Audio Depths--audio-depths
Audio Sample Rates--audio-sample-rates
Audio Number of Channels--audio-num-channelsGenAI-Perf accepts {1, 2}; AIPerf accepts {0, 1, 2} (0 disables)

Image Input Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Image Width Mean--image-width-mean
Image Width Stddev--image-width-stddev
Image Height Mean--image-height-mean
Image Height Stddev--image-height-stddev
Image Format--image-format
{png,jpeg,random}
🟡GenAI-Perf supports {png, jpeg} only; AIPerf adds random

Video Input Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Video Batch Size--video-batch-size --batch-size-videoSet to 0 to disable video inputs
Video Duration--video-durationSeconds per clip; requires FFmpeg
Video FPS--video-fpsFrames per second
Video Width/Height--video-width --video-heightResolution in pixels (both or neither)
Video Synth Type--video-synth-type
{moving_shapes,grid_clock,noise}
Synthetic content generator
Video Format--video-format
{webm,mp4}
Container format
Video Codec--video-codecAny FFmpeg-supported codec (libvpx-vp9, libx264, h264_nvenc, …)
Embedded Audio Track--video-audio-num-channels --video-audio-sample-rate --video-audio-codec --video-audio-depthOptional audio mux for video clips
Download Video Content--download-video-contentInclude video download time in request latency

Multi-Run / Confidence Reporting

AIPerf can repeat the same benchmark N times and report mean / std / confidence-interval / coefficient-of-variation across runs, optionally stopping early once a target metric stabilizes.

FeatureCLI OptionGenAI-PerfAIPerfNotes
Number of Profile Runs--num-profile-runs1-10 runs; >1 enables aggregate statistics
Profile Run Cooldown--profile-run-cooldown-secondsStabilization gap between runs
Confidence Level--confidence-level0.90 / 0.95 (default) / 0.99 CI width
Disable Warmup After First--profile-run-disable-warmup-after-firstFirst run warms, rest measure steady state
Consistent Seed Across Runs--set-consistent-seedAuto-pin --random-seed=42 for valid statistics
Vary Seed Per Trial--vary-seed-per-trialCapture input-noise + runtime-noise variance
Adaptive Convergence Stopping--convergence-metric --convergence-stat --convergence-threshold --convergence-modeStop early when CI width, CV, or KS-distribution stabilizes

Parameter Sweeping

AIPerf folds GenAI-Perf’s analyze subcommand into profile via three composable mechanisms: magic-list CLI flags, --variant scenarios, and YAML sweep: blocks. Multi-cell sweeps stream a per-cell results table; QMC sweeps additionally write a sampling_design.json for reproducibility.

FeatureCLI OptionGenAI-PerfAIPerfNotes
Magic-List CLI Flags--concurrency 1,10,100 --request-rate 50,100,200 --isl 128,512,2048 --osl ... --isl-stddev ... --osl-stddev ... --conversation-turn-mean ...🟡Any CLI flag in the allowlist accepts a comma list and triggers a sweep
Grid Sweep--sweep-type grid (default)✅ (via analyze)Cartesian product of all magic-list flags
Zip Sweep--sweep-type zipLockstep element-wise pairing; YAML form: sweep: {type: zip}
Scenario Sweep--variant --sweep-variantRepeatable [name:] key=value, ... per occurrence; emits a ScenarioSweep. GenAI-Perf’s analyze sweeps one stimulus at a time ({batch_size, concurrency, num_dataset_entries, input_sequence_length, request_rate}), so multi-parameter scenarios are not expressible.
Quasi-Monte-Carlo (Sobol)YAML sweep: {type: sobol}Low-discrepancy quasi-random sampling over continuous + integer dimensions
Latin Hypercube SamplingYAML sweep: {type: latin_hypercube}Stratified sampling alternative to Sobol
Sweep Variation Cooldown--parameter-sweep-cooldown-secondsInter-variation pause
Same Seed Across Variations--parameter-sweep-same-seedCorrelated comparisons vs. independent draws
Sweep Order--parameter-sweep-mode
{repeated,independent}
Outer loop = trials or variations
Live Sweep Table(auto) / --no-sweep-tablePer-cell streaming results table; auto-suppressed for non-TTY, dashboard UI, or single-cell sweeps
Sampling Design Artifactsweep_aggregate/sampling_design.jsonEmitted only for Sobol / Latin Hypercube (QMC) sweeps
YAML Config Drivendataset: phases: sweep: multi_run: blocks✅ (separate config cmd)Single AIPerf YAML drives sweep + multi-run + plot envelope; aiperf config expand previews variations without running

Adaptive Search / Bayesian Optimization

AIPerf ships a native Bayesian-optimization search planner (Optuna + BoTorch preset, Hvarfner-DSP Matern-5/2 kernel), with native multi-objective Pareto support (qLogNEHVI), outcome constraints, posterior-regret stopping, and a curated set of preset “search recipes”. GenAI-Perf does not offer adaptive search.

FeatureCLI OptionGenAI-PerfAIPerfNotes
Search Space--search-space 'path:lo,hi[:kind]'Repeatable; CLI grammar supports int and real dimensions. YAML sweep.search_space[] also supports prior: log-uniform.
Search Metric--search-metricTag from RunResult.summary_metrics
Search Stat--search-stat
{avg,p50,p90,p95,p99}
Search Direction--search-directionMaximize / minimize
Search Iterations--search-max-iterations --search-initial-points --search-random-seedSobol seed phase + GP fit + stopping
Search Planner Plugin--search-planner
{bayesian,monotonic_sla,smooth_isotonic,optuna}
bayesian is curated Optuna+BoTorch preset; third-party planners registerable
Optuna Sampler--optuna-sampler
{tpe,gp,botorch}
--search-planner=optuna expert mode
Optuna Acquisition--optuna-acquisition
{logei,qlogei,qnei,qlognei,qehvi,qnehvi,qlognehvi}
Modern noisy-EI defaults; multi-objective variants gated on len(objectives) > 1
Posterior-Regret Stopping--optuna-terminator
{regret,emmr,none}
RegretBoundEvaluator (Makarova 2022) / EMMR (Ishibashi 2023)
Percentile Pooling--search-percentile-pooling
{mean,pooled}
Pool raw samples across trials for tail-correct percentile objectives
SLA Filter--search-sla 'metric:stat:op:threshold'Repeatable; outcome-constraint or hard filter
Multi-Objective Paretoobjectives: [...] (YAML)qNEHVI / qLogNEHVI; emits Pareto front in search_history.json
Pareto Overlay Renderingaiperf plot <dir1> <dir2> ...Multi-directory invocation triggers the Pareto overlay handler

Search Recipes (Preset Experiments)

FeatureCLI OptionGenAI-PerfAIPerfNotes
Named Recipe--search-recipeExpands to a search-space + SLA filter + post-process pipeline
Pareto Sweep--search-recipe pareto-sweep --isl-osl-pairs '128/128,512/256,...'Multi-shape throughput/latency Pareto with paired ISL/OSL workloads
Max Throughput under TTFT SLA--search-recipe max-throughput-ttft-sla --ttft-sla-msLog-uniform concurrency prior with TTFT constraint
Max Throughput under ITL SLA--search-recipe max-throughput-itl-sla --itl-sla-msStreaming required
Max Concurrency under SLA--search-recipe max-concurrency-under-sla --tpot-sla-ms --e2e-sla-ms --error-rate-sla --search-style {smooth_isotonic,monotonic,bo,optuna,grid}Selectable 1D SLA-saturation strategy
Max Goodput under SLO--search-recipe max-goodput-under-slo --slo-attainment-fractionDistServe-style per-request SLO attainment (default 0.95)
Concurrency Ramp / Degradation Knee--search-recipe concurrency-ramp --degradation-threshold --degradation-metric-tag --degradation-stat --concurrency-min --concurrency-max --concurrency-stepsReports first concurrency where stat exceeds baseline × (1 + threshold)
Prefill TTFT Curve--search-recipe prefill-ttft-curve --isl-min --isl-max --isl-stepsLog-spaced ISL ramp for prefill characterization
Decode ITL Curve--search-recipe decode-itl-curve --osl-min --osl-max --osl-stepsLog-spaced OSL ramp for decode characterization

Service Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Record Processor Service Count--record-processor-service-count
--record-processors
Maximum Workers--workers-max
--max-workers
ZMQ Host--zmq-host
ZMQ IPC Path--zmq-ipc-path

Request Cancellation

FeatureCLI OptionGenAI-PerfAIPerfNotes
Request Cancellation Rate--request-cancellation-ratePercentage of requests to cancel (0-100)
Request Cancellation Delay--request-cancellation-delaySeconds to wait before cancelling

Additional Features

FeatureCLI OptionGenAI-PerfAIPerfNotes
Goodput Constraints--goodput -g
Verbose-v --verbose
Extra Verbose-vv
Log Level--log-level{TRACE,DEBUG,INFO,NOTICE,WARNING,SUCCESS,ERROR,CRITICAL} (Loguru; case-insensitive in practice)
UI Type--ui-type --ui
{dashboard,simple,none}
Help-h --help

Perf-Analyzer Passthrough Arguments

GenAI-Perf supports passing through arguments to the Perf-Analyzer CLI. AIPerf does not support this, as it does not use Perf-Analyzer under the hood.

FeatureCLI OptionGenAI-PerfAIPerfNotes
Perf-Analyzer Passthrough Arguments--N/AOnly applicable to GenAI-Perf

Data Exporters

FeatureGenAI-PerfAIPerfNotes
Console output
JSON outputSee discrepancies below
CSV output
API Error Summary
profile_export.jsonUse --export-level raw in AIPerf to get raw input/output payloads
Per-Record Metrics
inputs.jsonAIPerf format is slightly different

Discrepancies

JSON Output

  • Fields in the input_config section may differ between GenAI-Perf and AIPerf.

Advanced Features Comparison

FeatureGenAI-PerfAIPerfNotes
Multi-modal support
GPU Telemetry✅ ⭐AIPerf supports dual backends: DCGM exporter HTTP endpoints (default; localhost:9400 + localhost:9401, custom URLs accepted) and local pynvml. GenAI-Perf is DCGM-only via --server-metrics-url.
Streaming API support
Multi-turn conversationsFull multi-turn benchmarking with session tracking
Payload schedulingFixed schedule workloads
Distributed testing✅ ⭐GenAI-Perf has post-hoc multi-node result aggregation via process-export-files. AIPerf runs a single federated benchmark across nodes via ZMQ-TCP service-to-service communication.
Custom endpoints✅ ⭐AIPerf ships 15+ endpoint types incl. responses, chat_embeddings, nim_embeddings, nim_rankings, cohere_rankings, image_generation, video_generation, image_retrieval, solido_rag. See Endpoint Types Support Matrix.
Synthetic data generation
Bring Your Own Data (BYOD)Custom dataset support
Audio input supportBoth tools synthesize audio inputs (WAV/MP3). Neither computes audio-specific metrics (e.g. WER, audio-token-rate)
Vision metricsImage-specific performance metrics
Image generation benchmarkingText-to-image with raw export for image extraction
Video input benchmarkingSynthetic video generation (FFmpeg) for VLM endpoints with configurable codec, resolution, FPS, audio
Live MetricsLive metrics display
Dashboard UIDashboard UI
Reasoning token parsingParsing of reasoning tokens
Arrival pattern controlConstant, Poisson, Gamma distributions with tunable burstiness
Prefill concurrency limitingFine-grained prefill queueing control for TTFT behavior
Gradual rampingSmooth ramp-up for concurrency and rate
Duration-based benchmarkingTime-based stop conditions with grace periods
User-centric timingPer-user rate limiting for KV cache benchmarking
Configurable warmup phase✅ ⭐GenAI-Perf has only --warmup-request-count. AIPerf adds duration-based, session-based, and full per-phase overrides (rate, concurrency, prefill, arrival pattern, grace period, ramping).
HTTP trace metricsDetailed HTTP lifecycle timing (DNS, TCP, TLS, TTFB)
Request cancellationTest timeout behavior and service resilience
Timeslice metricsPer-timeslice metric breakdown
Interactive plot dashboardWeb-based exploration with dynamic metric selection and filtering
Multi-run comparison plotsAuto-detected Pareto curves and throughput analysis
YAML-first configuration✅ ⭐GenAI-Perf has a separate config subcommand to run a YAML file. AIPerf threads a single YAML through dataset, phases, sweep, multi-run, and plot envelope with a config init / expand / validate lifecycle.
Inline datasets in YAMLEmbed dataset records directly in the config file (no sidecar .jsonl)
Plot config envelopeYAML carries the visualization spec; auto-plot materializes .aiperf-plot-config.yaml for reproducibility
Parameter sweeps✅ (analyze subcommand)✅ ⭐GenAI-Perf’s analyze sweeps one stimulus at a time. AIPerf folds sweeps into profile with magic-list CLI flags, multi-key scenario --variant, and YAML sweep: blocks (grid / zip / Sobol / Latin Hypercube).
Confidence reportingMulti-run aggregation with CI / CV / KS-distribution convergence stopping
Bayesian optimizationNative Optuna+BoTorch search planner with Hvarfner-DSP kernel
Multi-objective Pareto searchqLogNEHVI acquisition, hypervolume stopping, Pareto front in search_history.json
Search recipes (preset experiments)pareto-sweep, max-throughput-under-SLA, max-concurrency-under-SLA, max-goodput-under-SLO, concurrency-ramp, prefill-ttft-curve, decode-itl-curve
Live sweep table with Pareto markingPer-cell streaming sweep table inline-marks Pareto-dominant cells (★) for recipes that declare pareto_axes