GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
This comparison matrix shows the supported CLI options between GenAI-Perf and AIPerf.
Legend:
N/A Not Applicable - Feature not applicableAIPerf is the successor to GenAI-Perf, so most ⭐ marks fall in AIPerf’s column. They flag rows where AIPerf doesn’t merely match the GenAI-Perf surface but expands on it. ❌ vs ✅ rows are already self-explanatory and are left unannotated.
| Subcommand | Description | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| analyze-trace | Analyze mooncake trace for prefix statistics | ❌ | ✅ | |
| profile | Profile LLMs and GenAI models | ✅ | ✅ | AIPerf accepts a YAML config via profile -f config.yaml (CLI flags override) |
| plot | Generate visualizations from profiling data | ❌ | ✅ | Auto-detects multi-run comparison vs single-run analysis; renders Pareto overlays when multiple artifact dirs are passed; supports dashboard mode |
| analyze | Sweep through multiple scenarios | ✅ | ✅ | AIPerf folds sweeps into profile via magic-list CLI flags, --variant, or YAML sweep: blocks (grid, zip, Sobol, Latin Hypercube). See Parameter Sweeping |
| config | Run a YAML config end-to-end | ✅ (separate config subcommand) | 🟡 | AIPerf has no aiperf config <yaml> run shortcut — pass -f config.yaml to aiperf profile instead |
| create-template / config init | Scaffold a template config | ✅ (GenAI-Perf: create-template) | ✅ | AIPerf: aiperf config init -t <template>; supports --list, --search, --category for discovery |
| config expand | Preview a sweep without running it | ❌ | ✅ | Prints every variation the orchestrator would iterate; --full/--index/--format controls verbosity |
| config validate | Pre-flight validate a config file | ❌ | ✅ | Runs the same load pipeline as profile; non-zero exit on fatal errors, warnings to stderr |
| plugins | List/inspect registered plugins | ❌ | ✅ | aiperf plugins enumerates planners, recipes, exporters, dataset loaders, and more |
| synthesize | Materialize a synthetic dataset to disk | ❌ | ✅ | Useful for caching dataset generation between repeated sweep cells |
| process-export-files | Multi-node result aggregation | ✅ | N/A | AIPerf aggregates results in real-time |
--endpoint-type
| Endpoint Type | Description | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| chat | Standard chat completion API (OpenAI-compatible) | ✅ | ✅ | |
| completions | Text completion API for prompt completion | ✅ | ✅ | |
| embeddings | Text embedding generation for similarity/search | ✅ | ✅ | |
| rankings | Text ranking/re-ranking for search relevance | ✅ | ✅ ⭐ | GenAI-Perf has a single generic rankings endpoint (/v1/ranking, HF-TEI-compatible). AIPerf splits it into dedicated nim_rankings, hf_tei_rankings, and cohere_rankings endpoints. |
| hf_tei_rankings | HuggingFace TEI re-ranker API | 🟡 | ✅ | GenAI-Perf has only generic rankings; AIPerf has a dedicated endpoint at /rerank |
| nim_rankings | NVIDIA NIM re-ranker API | ❌ | ✅ | |
| cohere_rankings | Cohere re-ranker API | ❌ | ✅ | |
| chat_embeddings | Chat-style multimodal embeddings (vLLM VLM2Vec) | ❌ | ✅ | |
| embeddings (NIM) | NVIDIA NIM embeddings endpoint | ❌ | ✅ | AIPerf nim_embeddings; supports text and image inputs |
| responses | OpenAI Responses API endpoint | ❌ | ✅ | Multi-modal (text, image, audio) with streaming |
| dynamic_grpc | Dynamic gRPC service calls | ✅ | ❌ | |
| huggingface_generate | HuggingFace TGI generate API | ✅ | ✅ | /generate and /generate_stream supported |
| image_generation | OpenAI-compatible image generation (/v1/images/generations) | ❌ | ✅ | DALL-E-style text-to-image; supports raw export for image extraction |
| video_generation | OpenAI/SGLang text-to-video (/v1/videos) | ❌ | ✅ | Async polling; Sora / Wan2.1 / HunyuanVideo compatible; multipart-form requests |
| image_retrieval | Image search and retrieval endpoints | ✅ | ✅ | AIPerf serves NIM image retrieval / bounding-box detection at /v1/infer |
| nvclip | NVIDIA CLIP model endpoints | ✅ | ❌ | |
| multimodal | Multi-modal (text + image/audio) endpoints | ✅ | ✅ | AIPerf uses chat endpoint with multimodal content |
| generate | Generic text generation endpoints | ✅ | ❌ | |
| kserve | KServe model serving endpoints | ✅ | ❌ | |
| template | Template-based inference endpoints | 🟡 | ✅ | AIPerf supports multimodal and multi-turn templates |
| tensorrtllm_engine | TensorRT-LLM engine direct access | ✅ | ❌ | |
| vision | Computer vision model endpoints | ✅ | ✅ | AIPerf uses chat endpoint for VLMs |
| solido_rag | SOLIDO RAG endpoint | ❌ | ✅ |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Model Names | -m | ✅ | ✅ | |
| Model Selection Strategy | --model-selection-strategy{round_robin,random} | ✅ | ✅ | |
| Backend Selection | --backend{tensorrtllm,vllm} | ✅ | ❌ | |
| Custom Endpoint | --endpoint | ✅ | ✅ | |
| Endpoint Type | --endpoint-type | ✅ | ✅ ⭐ | AIPerf supports 15+ endpoint types vs. GenAI-Perf’s 12; see detailed comparison |
| Server Metrics URL | --server-metrics-url | ❌ | ✅ | AIPerf uses --server-metrics (enabled by default, auto-collects Prometheus metrics from the inference endpoint at base_url + /metrics). See the note below on GenAI-Perf’s flag name. |
| Streaming | --streaming | ✅ | ✅ | |
| URL | -u URL--url | ✅ | ✅ | |
| Request Timeout | --request-timeout-seconds | ❌ | ✅ | |
| API Key | --api-key | ✅ | ✅ ⭐ | GenAI-Perf has no dedicated flag — users must pass -H 'Authorization: Bearer ...' manually |
| Request Content Type | --request-content-type{application/json,multipart/form-data} | ❌ | ✅ | Switch between JSON and multipart-form encoding (required by some video-gen servers) |
GenAI-Perf’s --server-metrics-url is misleadingly named. Despite the “server metrics” label, the flag points GenAI-Perf at a Triton / DCGM telemetry endpoint (GPU power, utilization, memory) — it is not a general Prometheus inference-server metrics scraper. AIPerf splits this into two clearly-scoped flags:
--server-metrics — Prometheus inference-server metrics from the model endpoint (base_url + /metrics). Enabled by default; pass additional endpoint URLs to scrape extra targets.--gpu-telemetry — GPU telemetry collection. Supports both the DCGM exporter HTTP endpoint (default; localhost:9400 + localhost:9401) and the local pynvml library (pass pynvml). Custom DCGM exporter URLs and a dashboard realtime view are also accepted.If you’re porting a GenAI-Perf invocation, --server-metrics-url http://node:9400 maps to AIPerf’s --gpu-telemetry http://node:9400, not to --server-metrics.
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Extra Inputs | --extra-inputs | ✅ | ✅ | |
| Custom Headers | --header -H | ✅ | ✅ | |
| YAML Config File | -f --config | ✅ (separate config subcommand) | ✅ | AIPerf passes YAML to profile -f; CLI flags override file values |
| Input File | --input-file | ✅ | ✅ | |
| Inline Records in YAML | dataset.records: (YAML only) | ❌ | ✅ | Embed dataset rows directly in the YAML config; >500 records emits a warning |
| Dataset Entries | --num-dataset-entries --num-prompts | ✅ | ✅ | GenAI-Perf and AIPerf both accept this flag. In AIPerf it is collapsed with --num-sessions / --conversation-num / --num-conversations into a single conversation count; GenAI-Perf keeps --num-dataset-entries and --num-sessions distinct (see Session Configuration). |
| Public Dataset | --public-dataset | ❌ | ✅ | sharegpt, aimo, mmstar, vision_arena, llava_onevision, speed_bench_* (50+ subsets), librispeech, voxpopuli, gigaspeech, ami, spgispeech, instruct_coder, blazedit_5k, blazedit_10k, … |
| HuggingFace Subset Override | --hf-subset | ❌ | ✅ | Override the HF subset/config for HF-backed public datasets |
| Custom Dataset Type | --custom-dataset-type{single_turn,multi_turn,random_pool,mooncake_trace,bailian_trace,burst_gpt_trace,sagemaker_data_capture} | ❌ | ✅ | GenAI-Perf infers dataset type from input file format |
| Dataset Sampling Strategy | --dataset-sampling-strategy{sequential,random,shuffle} | ❌ | ✅ | Controls how entries are drawn during benchmarking |
| Fixed Schedule | --fixed-schedule | ✅ | ✅ | |
| Fixed Schedule Auto Offset | --fixed-schedule-auto-offset | ❌ | ✅ | |
| Fixed Schedule Start/End Offset | --fixed-schedule-start-offset--fixed-schedule-end-offset | ❌ | ✅ | |
| Random Seed | --random-seed | ✅ | ✅ | |
| GRPC Method | --grpc-method | ✅ | ❌ |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Artifact Directory | --artifact-dir | ✅ | ✅ | |
| Checkpoint Directory | --checkpoint-dir | ✅ | ❌ | |
| Generate Plots | --generate-plots | ✅ | ✅ ⭐ | AIPerf replaces inline plot generation with the dedicated aiperf plot subcommand: dashboard mode, Pareto overlays across runs, configurable plot envelope, auto-plot hook |
| Auto-Plot After Profile | --auto-plot --no-auto-plot | ❌ | ✅ | Auto-runs aiperf plot on the artifact dir after the benchmark completes; honored by recipe defaults |
| Plot Required (Strict) | --plot-required | ❌ | ✅ | Treat auto-plot failures as fatal (non-zero exit) |
| Export Level | --export-level --profile-export-level{summary,records,raw} | ❌ | ✅ | Controls whether per-record and raw request/response files are emitted alongside the summary |
| Time-Sliced Metrics | --slice-duration | ❌ | ✅ | Window the benchmark timeline into fixed slices and compute metrics per slice |
| Enable Checkpointing | --enable-checkpointing | ✅ | ❌ | |
| Profile Export File | --profile-export-file | ✅ | ✅ | AIPerf works as a prefix for the profile export file names. |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Tokenizer | --tokenizer | ✅ | ✅ | |
| Tokenizer Revision | --tokenizer-revision | ✅ | ✅ | |
| Tokenizer Trust Remote Code | --tokenizer-trust-remote-code | ✅ | ✅ |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Concurrency | --concurrency | ✅ | ✅ | |
| Request Rate | --request-rate | ✅ | ✅ | |
| Request Count | --request-count--num-requests | ✅ | ✅ | |
| Request Rate w/ Max Concurrency | --request-rate with --concurrency | ❌ | ✅ | Dual control of rate and concurrency ceiling |
| Measurement Interval | --measurement-interval -p | ✅ | N/A | Not applicable to AIPerf |
| Stability Percentage | --stability-percentage -s | ✅ | N/A | Not applicable to AIPerf |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Arrival Pattern | --arrival-pattern{constant,poisson,gamma} | ❌ | ✅ | Controls inter-arrival time distribution |
| Arrival Smoothness | --arrival-smoothness--vllm-burstiness | ❌ | ✅ | Gamma distribution shape: <1=bursty, 1=Poisson, >1=smooth |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Benchmark Duration | --benchmark-duration | ❌ | ✅ | Stop after N seconds |
| Benchmark Grace Period | --benchmark-grace-period | ❌ | ✅ | Wait for in-flight requests after duration (default: 30s, supports inf) |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Session Concurrency | --concurrency | ✅ | ✅ | Max concurrent sessions |
| Prefill Concurrency | --prefill-concurrency | ❌ | ✅ | Limit concurrent prefill operations (requires --streaming) |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Concurrency Ramp | --concurrency-ramp-duration | ❌ | ✅ | Ramp concurrency from 1 to target over N seconds |
| Prefill Concurrency Ramp | --prefill-concurrency-ramp-duration | ❌ | ✅ | Ramp prefill concurrency over N seconds |
| Request Rate Ramp | --request-rate-ramp-duration | ❌ | ✅ | Ramp request rate over N seconds |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| User-Centric Rate | --user-centric-rate | ❌ | ✅ | Per-user rate limiting with consistent turn gaps |
| Number of Users | --num-users | ❌ | ✅ | Number of simulated users (required with --user-centric-rate) |
| Shared System Prompt | --shared-system-prompt-length | ❌ | ✅ | System prompt shared across all users (KV cache prefix) |
| User Context Prompt | --user-context-prompt-length | ❌ | ✅ | Per-user unique context padding |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Warmup Request Count | --warmup-request-count | ✅ | ✅ | |
| Warmup Duration | --warmup-duration | ❌ | ✅ | Duration-based warmup stop condition |
| Warmup Session Count | --num-warmup-sessions | ❌ | ✅ | Session-based warmup stop condition |
| Warmup Concurrency | --warmup-concurrency | ❌ | ✅ | Override concurrency during warmup |
| Warmup Prefill Concurrency | --warmup-prefill-concurrency | ❌ | ✅ | Override prefill concurrency during warmup |
| Warmup Request Rate | --warmup-request-rate | ❌ | ✅ | Override request rate during warmup |
| Warmup Arrival Pattern | --warmup-arrival-pattern | ❌ | ✅ | Override arrival pattern during warmup |
| Warmup Grace Period | --warmup-grace-period | ❌ | ✅ | Grace period for warmup responses |
| Warmup Concurrency Ramp | --warmup-concurrency-ramp-duration | ❌ | ✅ | Ramp warmup concurrency |
| Warmup Prefill Ramp | --warmup-prefill-concurrency-ramp-duration | ❌ | ✅ | Ramp warmup prefill concurrency |
| Warmup Rate Ramp | --warmup-request-rate-ramp-duration | ❌ | ✅ | Ramp warmup request rate |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Number of Sessions | --num-sessions | ✅ | ✅ | |
| Session Concurrency | --session-concurrency | ✅ | ✅ | Use --concurrency for AIPerf |
| Session Delay Ratio | --session-delay-ratio | ✅ | ✅ | |
| Session Turn Delay Mean | --session-turn-delay-mean | ✅ | ✅ | |
| Session Turn Delay Stddev | --session-turn-delay-stddev | ✅ | ✅ | |
| Session Turns Mean | --session-turns-mean | ✅ | ✅ | |
| Session Turns Stddev | --session-turns-stddev | ✅ | ✅ |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Input Tokens Mean | --synthetic-input-tokens-mean--isl | ✅ | ✅ | |
| Input Tokens Stddev | --synthetic-input-tokens-stddev | ✅ | ✅ | |
| Input Tokens Block Size | --prompt-input-tokens-block-size--isl-block-size | ❌ | ✅ | Used for mooncake_trace hash_id blocks |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Output Tokens Mean | --output-tokens-mean--osl | ✅ | ✅ | |
| Output Tokens Stddev | --output-tokens-stddev | ✅ | ✅ | |
| Output Tokens Mean Deterministic | --output-tokens-mean-deterministic | ✅ | ❌ | Only applicable to Triton |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Text Batch Size | --batch-size-text--batch-size -b | ✅ | ✅ | |
| Audio Batch Size | --batch-size-audio | ✅ | ✅ | |
| Image Batch Size | --batch-size-image | ✅ | ✅ |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Number of Prefix Prompts | --num-prefix-prompts | ✅ | ✅ | |
| Prefix Prompt Length | --prefix-prompt-length | ✅ | ✅ |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Audio Length Mean | --audio-length-mean | ✅ | ✅ | |
| Audio Length Stddev | --audio-length-stddev | ✅ | ✅ | |
| Audio Format | --audio-format{wav,mp3,random} | 🟡 | ✅ | GenAI-Perf supports {wav, mp3} only; AIPerf adds random |
| Audio Depths | --audio-depths | ✅ | ✅ | |
| Audio Sample Rates | --audio-sample-rates | ✅ | ✅ | |
| Audio Number of Channels | --audio-num-channels | ✅ | ✅ | GenAI-Perf accepts {1, 2}; AIPerf accepts {0, 1, 2} (0 disables) |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Image Width Mean | --image-width-mean | ✅ | ✅ | |
| Image Width Stddev | --image-width-stddev | ✅ | ✅ | |
| Image Height Mean | --image-height-mean | ✅ | ✅ | |
| Image Height Stddev | --image-height-stddev | ✅ | ✅ | |
| Image Format | --image-format{png,jpeg,random} | 🟡 | ✅ | GenAI-Perf supports {png, jpeg} only; AIPerf adds random |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Video Batch Size | --video-batch-size --batch-size-video | ❌ | ✅ | Set to 0 to disable video inputs |
| Video Duration | --video-duration | ❌ | ✅ | Seconds per clip; requires FFmpeg |
| Video FPS | --video-fps | ❌ | ✅ | Frames per second |
| Video Width/Height | --video-width --video-height | ❌ | ✅ | Resolution in pixels (both or neither) |
| Video Synth Type | --video-synth-type{moving_shapes,grid_clock,noise} | ❌ | ✅ | Synthetic content generator |
| Video Format | --video-format{webm,mp4} | ❌ | ✅ | Container format |
| Video Codec | --video-codec | ❌ | ✅ | Any FFmpeg-supported codec (libvpx-vp9, libx264, h264_nvenc, …) |
| Embedded Audio Track | --video-audio-num-channels --video-audio-sample-rate --video-audio-codec --video-audio-depth | ❌ | ✅ | Optional audio mux for video clips |
| Download Video Content | --download-video-content | ❌ | ✅ | Include video download time in request latency |
AIPerf can repeat the same benchmark N times and report mean / std / confidence-interval / coefficient-of-variation across runs, optionally stopping early once a target metric stabilizes.
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Number of Profile Runs | --num-profile-runs | ❌ | ✅ | 1-10 runs; >1 enables aggregate statistics |
| Profile Run Cooldown | --profile-run-cooldown-seconds | ❌ | ✅ | Stabilization gap between runs |
| Confidence Level | --confidence-level | ❌ | ✅ | 0.90 / 0.95 (default) / 0.99 CI width |
| Disable Warmup After First | --profile-run-disable-warmup-after-first | ❌ | ✅ | First run warms, rest measure steady state |
| Consistent Seed Across Runs | --set-consistent-seed | ❌ | ✅ | Auto-pin --random-seed=42 for valid statistics |
| Vary Seed Per Trial | --vary-seed-per-trial | ❌ | ✅ | Capture input-noise + runtime-noise variance |
| Adaptive Convergence Stopping | --convergence-metric --convergence-stat --convergence-threshold --convergence-mode | ❌ | ✅ | Stop early when CI width, CV, or KS-distribution stabilizes |
AIPerf folds GenAI-Perf’s analyze subcommand into profile via three composable mechanisms: magic-list CLI flags, --variant scenarios, and YAML sweep: blocks. Multi-cell sweeps stream a per-cell results table; QMC sweeps additionally write a sampling_design.json for reproducibility.
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Magic-List CLI Flags | --concurrency 1,10,100 --request-rate 50,100,200 --isl 128,512,2048 --osl ... --isl-stddev ... --osl-stddev ... --conversation-turn-mean ... | 🟡 | ✅ | Any CLI flag in the allowlist accepts a comma list and triggers a sweep |
| Grid Sweep | --sweep-type grid (default) | ✅ (via analyze) | ✅ | Cartesian product of all magic-list flags |
| Zip Sweep | --sweep-type zip | ❌ | ✅ | Lockstep element-wise pairing; YAML form: sweep: {type: zip} |
| Scenario Sweep | --variant --sweep-variant | ❌ | ✅ | Repeatable [name:] key=value, ... per occurrence; emits a ScenarioSweep. GenAI-Perf’s analyze sweeps one stimulus at a time ({batch_size, concurrency, num_dataset_entries, input_sequence_length, request_rate}), so multi-parameter scenarios are not expressible. |
| Quasi-Monte-Carlo (Sobol) | YAML sweep: {type: sobol} | ❌ | ✅ | Low-discrepancy quasi-random sampling over continuous + integer dimensions |
| Latin Hypercube Sampling | YAML sweep: {type: latin_hypercube} | ❌ | ✅ | Stratified sampling alternative to Sobol |
| Sweep Variation Cooldown | --parameter-sweep-cooldown-seconds | ❌ | ✅ | Inter-variation pause |
| Same Seed Across Variations | --parameter-sweep-same-seed | ❌ | ✅ | Correlated comparisons vs. independent draws |
| Sweep Order | --parameter-sweep-mode{repeated,independent} | ❌ | ✅ | Outer loop = trials or variations |
| Live Sweep Table | (auto) / --no-sweep-table | ❌ | ✅ | Per-cell streaming results table; auto-suppressed for non-TTY, dashboard UI, or single-cell sweeps |
| Sampling Design Artifact | sweep_aggregate/sampling_design.json | ❌ | ✅ | Emitted only for Sobol / Latin Hypercube (QMC) sweeps |
| YAML Config Driven | dataset: phases: sweep: multi_run: blocks | ✅ (separate config cmd) | ✅ | Single AIPerf YAML drives sweep + multi-run + plot envelope; aiperf config expand previews variations without running |
AIPerf ships a native Bayesian-optimization search planner (Optuna + BoTorch preset, Hvarfner-DSP Matern-5/2 kernel), with native multi-objective Pareto support (qLogNEHVI), outcome constraints, posterior-regret stopping, and a curated set of preset “search recipes”. GenAI-Perf does not offer adaptive search.
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Search Space | --search-space 'path:lo,hi[:kind]' | ❌ | ✅ | Repeatable; CLI grammar supports int and real dimensions. YAML sweep.search_space[] also supports prior: log-uniform. |
| Search Metric | --search-metric | ❌ | ✅ | Tag from RunResult.summary_metrics |
| Search Stat | --search-stat{avg,p50,p90,p95,p99} | ❌ | ✅ | |
| Search Direction | --search-direction | ❌ | ✅ | Maximize / minimize |
| Search Iterations | --search-max-iterations --search-initial-points --search-random-seed | ❌ | ✅ | Sobol seed phase + GP fit + stopping |
| Search Planner Plugin | --search-planner{bayesian,monotonic_sla,smooth_isotonic,optuna} | ❌ | ✅ | bayesian is curated Optuna+BoTorch preset; third-party planners registerable |
| Optuna Sampler | --optuna-sampler{tpe,gp,botorch} | ❌ | ✅ | --search-planner=optuna expert mode |
| Optuna Acquisition | --optuna-acquisition{logei,qlogei,qnei,qlognei,qehvi,qnehvi,qlognehvi} | ❌ | ✅ | Modern noisy-EI defaults; multi-objective variants gated on len(objectives) > 1 |
| Posterior-Regret Stopping | --optuna-terminator{regret,emmr,none} | ❌ | ✅ | RegretBoundEvaluator (Makarova 2022) / EMMR (Ishibashi 2023) |
| Percentile Pooling | --search-percentile-pooling{mean,pooled} | ❌ | ✅ | Pool raw samples across trials for tail-correct percentile objectives |
| SLA Filter | --search-sla 'metric:stat:op:threshold' | ❌ | ✅ | Repeatable; outcome-constraint or hard filter |
| Multi-Objective Pareto | objectives: [...] (YAML) | ❌ | ✅ | qNEHVI / qLogNEHVI; emits Pareto front in search_history.json |
| Pareto Overlay Rendering | aiperf plot <dir1> <dir2> ... | ❌ | ✅ | Multi-directory invocation triggers the Pareto overlay handler |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Named Recipe | --search-recipe | ❌ | ✅ | Expands to a search-space + SLA filter + post-process pipeline |
| Pareto Sweep | --search-recipe pareto-sweep --isl-osl-pairs '128/128,512/256,...' | ❌ | ✅ | Multi-shape throughput/latency Pareto with paired ISL/OSL workloads |
| Max Throughput under TTFT SLA | --search-recipe max-throughput-ttft-sla --ttft-sla-ms | ❌ | ✅ | Log-uniform concurrency prior with TTFT constraint |
| Max Throughput under ITL SLA | --search-recipe max-throughput-itl-sla --itl-sla-ms | ❌ | ✅ | Streaming required |
| Max Concurrency under SLA | --search-recipe max-concurrency-under-sla --tpot-sla-ms --e2e-sla-ms --error-rate-sla --search-style {smooth_isotonic,monotonic,bo,optuna,grid} | ❌ | ✅ | Selectable 1D SLA-saturation strategy |
| Max Goodput under SLO | --search-recipe max-goodput-under-slo --slo-attainment-fraction | ❌ | ✅ | DistServe-style per-request SLO attainment (default 0.95) |
| Concurrency Ramp / Degradation Knee | --search-recipe concurrency-ramp --degradation-threshold --degradation-metric-tag --degradation-stat --concurrency-min --concurrency-max --concurrency-steps | ❌ | ✅ | Reports first concurrency where stat exceeds baseline × (1 + threshold) |
| Prefill TTFT Curve | --search-recipe prefill-ttft-curve --isl-min --isl-max --isl-steps | ❌ | ✅ | Log-spaced ISL ramp for prefill characterization |
| Decode ITL Curve | --search-recipe decode-itl-curve --osl-min --osl-max --osl-steps | ❌ | ✅ | Log-spaced OSL ramp for decode characterization |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Record Processor Service Count | --record-processor-service-count--record-processors | ❌ | ✅ | |
| Maximum Workers | --workers-max--max-workers | ❌ | ✅ | |
| ZMQ Host | --zmq-host | ❌ | ✅ | |
| ZMQ IPC Path | --zmq-ipc-path | ❌ | ✅ |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Request Cancellation Rate | --request-cancellation-rate | ❌ | ✅ | Percentage of requests to cancel (0-100) |
| Request Cancellation Delay | --request-cancellation-delay | ❌ | ✅ | Seconds to wait before cancelling |
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Goodput Constraints | --goodput -g | ✅ | ✅ | |
| Verbose | -v --verbose | ✅ | ✅ | |
| Extra Verbose | -vv | ✅ | ✅ | |
| Log Level | --log-level | ❌ | ✅ | {TRACE,DEBUG,INFO,NOTICE,WARNING,SUCCESS,ERROR,CRITICAL} (Loguru; case-insensitive in practice) |
| UI Type | --ui-type --ui{dashboard,simple,none} | ❌ | ✅ | |
| Help | -h --help | ✅ | ✅ |
GenAI-Perf supports passing through arguments to the Perf-Analyzer CLI. AIPerf does not support this, as it does not use Perf-Analyzer under the hood.
| Feature | CLI Option | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|---|
| Perf-Analyzer Passthrough Arguments | -- | ✅ | N/A | Only applicable to GenAI-Perf |
| Feature | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|
| Console output | ✅ | ✅ | |
| JSON output | ✅ | ✅ | See discrepancies below |
| CSV output | ✅ | ✅ | |
| API Error Summary | ❌ | ✅ | |
profile_export.json | ✅ | ✅ | Use --export-level raw in AIPerf to get raw input/output payloads |
| Per-Record Metrics | ❌ | ✅ | |
inputs.json | ✅ | ✅ | AIPerf format is slightly different |
input_config section may differ between GenAI-Perf and AIPerf.| Feature | GenAI-Perf | AIPerf | Notes |
|---|---|---|---|
| Multi-modal support | ✅ | ✅ | |
| GPU Telemetry | ✅ | ✅ ⭐ | AIPerf supports dual backends: DCGM exporter HTTP endpoints (default; localhost:9400 + localhost:9401, custom URLs accepted) and local pynvml. GenAI-Perf is DCGM-only via --server-metrics-url. |
| Streaming API support | ✅ | ✅ | |
| Multi-turn conversations | ✅ | ✅ | Full multi-turn benchmarking with session tracking |
| Payload scheduling | ✅ | ✅ | Fixed schedule workloads |
| Distributed testing | ✅ | ✅ ⭐ | GenAI-Perf has post-hoc multi-node result aggregation via process-export-files. AIPerf runs a single federated benchmark across nodes via ZMQ-TCP service-to-service communication. |
| Custom endpoints | ✅ | ✅ ⭐ | AIPerf ships 15+ endpoint types incl. responses, chat_embeddings, nim_embeddings, nim_rankings, cohere_rankings, image_generation, video_generation, image_retrieval, solido_rag. See Endpoint Types Support Matrix. |
| Synthetic data generation | ✅ | ✅ | |
| Bring Your Own Data (BYOD) | ✅ | ✅ | Custom dataset support |
| Audio input support | ✅ | ✅ | Both tools synthesize audio inputs (WAV/MP3). Neither computes audio-specific metrics (e.g. WER, audio-token-rate) |
| Vision metrics | ✅ | ✅ | Image-specific performance metrics |
| Image generation benchmarking | ❌ | ✅ | Text-to-image with raw export for image extraction |
| Video input benchmarking | ❌ | ✅ | Synthetic video generation (FFmpeg) for VLM endpoints with configurable codec, resolution, FPS, audio |
| Live Metrics | ❌ | ✅ | Live metrics display |
| Dashboard UI | ❌ | ✅ | Dashboard UI |
| Reasoning token parsing | ❌ | ✅ | Parsing of reasoning tokens |
| Arrival pattern control | ❌ | ✅ | Constant, Poisson, Gamma distributions with tunable burstiness |
| Prefill concurrency limiting | ❌ | ✅ | Fine-grained prefill queueing control for TTFT behavior |
| Gradual ramping | ❌ | ✅ | Smooth ramp-up for concurrency and rate |
| Duration-based benchmarking | ❌ | ✅ | Time-based stop conditions with grace periods |
| User-centric timing | ❌ | ✅ | Per-user rate limiting for KV cache benchmarking |
| Configurable warmup phase | ✅ | ✅ ⭐ | GenAI-Perf has only --warmup-request-count. AIPerf adds duration-based, session-based, and full per-phase overrides (rate, concurrency, prefill, arrival pattern, grace period, ramping). |
| HTTP trace metrics | ❌ | ✅ | Detailed HTTP lifecycle timing (DNS, TCP, TLS, TTFB) |
| Request cancellation | ❌ | ✅ | Test timeout behavior and service resilience |
| Timeslice metrics | ❌ | ✅ | Per-timeslice metric breakdown |
| Interactive plot dashboard | ❌ | ✅ | Web-based exploration with dynamic metric selection and filtering |
| Multi-run comparison plots | ❌ | ✅ | Auto-detected Pareto curves and throughput analysis |
| YAML-first configuration | ✅ | ✅ ⭐ | GenAI-Perf has a separate config subcommand to run a YAML file. AIPerf threads a single YAML through dataset, phases, sweep, multi-run, and plot envelope with a config init / expand / validate lifecycle. |
| Inline datasets in YAML | ❌ | ✅ | Embed dataset records directly in the config file (no sidecar .jsonl) |
| Plot config envelope | ❌ | ✅ | YAML carries the visualization spec; auto-plot materializes .aiperf-plot-config.yaml for reproducibility |
| Parameter sweeps | ✅ (analyze subcommand) | ✅ ⭐ | GenAI-Perf’s analyze sweeps one stimulus at a time. AIPerf folds sweeps into profile with magic-list CLI flags, multi-key scenario --variant, and YAML sweep: blocks (grid / zip / Sobol / Latin Hypercube). |
| Confidence reporting | ❌ | ✅ | Multi-run aggregation with CI / CV / KS-distribution convergence stopping |
| Bayesian optimization | ❌ | ✅ | Native Optuna+BoTorch search planner with Hvarfner-DSP kernel |
| Multi-objective Pareto search | ❌ | ✅ | qLogNEHVI acquisition, hypervolume stopping, Pareto front in search_history.json |
| Search recipes (preset experiments) | ❌ | ✅ | pareto-sweep, max-throughput-under-SLA, max-concurrency-under-SLA, max-goodput-under-SLO, concurrency-ramp, prefill-ttft-curve, decode-itl-curve |
| Live sweep table with Pareto marking | ❌ | ✅ | Per-cell streaming sweep table inline-marks Pareto-dominant cells (★) for recipes that declare pareto_axes |