GenAI-Perf vs AIPerf CLI Feature Comparison Matrix

View as Markdown

This comparison matrix shows the supported CLI options between GenAI-Perf and AIPerf.

This is a living document and will be updated as new features are added to AIPerf.

Legend:

  • Fully Supported - Feature available with same/similar functionality
  • 🟡 Partial Support - Feature available but with different parameters or limitations
  • N/A Not Applicable - Feature not applicable
  • Not Supported - Feature not currently supported

Core Subcommands

SubcommandDescriptionGenAI-PerfAIPerfNotes
analyze-traceAnalyze mooncake trace for prefix statistics
profileProfile LLMs and GenAI models
plotGenerate visualizations from profiling dataAuto-detects multi-run comparison vs single-run analysis; supports dashboard mode
analyzeSweep through multiple scenarios
configRun using YAML configuration files
create-templateGenerate template configs
process-export-filesMulti-node result aggregationN/AAIPerf will aggregate results in real-time

Endpoint Types Support Matrix

--endpoint-type

Endpoint TypeDescriptionGenAI-PerfAIPerfNotes
chatStandard chat completion API (OpenAI-compatible)
completionsText completion API for prompt completion
embeddingsText embedding generation for similarity/search
rankingsText ranking/re-ranking for search relevanceGenAI-Perf’s generic rankings is HF TEI compatible; AIPerf has separate nim_rankings, hf_tei_rankings and cohere_rankings
hf_tei_rankingsHuggingFace TEI re-ranker APIGenAI-Perf uses generic rankings endpoint
nim_rankingsNVIDIA NIM re-ranker API
cohere_rankingsCohere re-ranker API
responsesOpenAI responses endpoint
dynamic_grpcDynamic gRPC service calls
huggingface_generateHuggingFace transformers generate API/generate and /generate_stream supported
image_generationOpenAI-compatible image generation (/v1/images/generations)Text-to-image benchmarking with SGLang, supports raw export for image extraction
image_retrievalImage search and retrieval endpoints
nvclipNVIDIA CLIP model endpoints
multimodalMulti-modal (text + image/audio) endpointsAIPerf uses chat endpoint with multimodal content
generateGeneric text generation endpoints
kserveKServe model serving endpoints
templateTemplate-based inference endpoints🟡AIPerf supports multimodal and multi-turn templates
tensorrtllm_engineTensorRT-LLM engine direct access
visionComputer vision model endpointsAIPerf uses chat endpoint for VLMs
solido_ragSOLIDO RAG endpoint

Endpoint Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Model Names-m
Model Selection Strategy--model-selection-strategy
{round_robin,random}
Backend Selection--backend
{tensorrtllm,vllm}
Custom Endpoint--endpoint
Endpoint Type--endpoint-typeSee detailed comparison above
Server Metrics URL--server-metrics-urlAIPerf uses --server-metrics (enabled by default, auto-collects Prometheus metrics from endpoint). GenAI-Perf’s --server-metrics-url is for GPU telemetry only.
Streaming--streaming
URL-u URL
--url
Request Timeout--request-timeout-seconds
API Key--api-key🟡For GenAI-Perf, use -H instead

Input Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Extra Inputs--extra-inputs
Custom Headers--header -H
Input File--input-file
Dataset Entries/Conversations--num-dataset-entries
Public Dataset--public-dataset
{sharegpt}
Custom Dataset Type--custom-dataset-type
{single_turn,multi_turn,random_pool,mooncake_trace}
GenAI-Perf infers dataset type from input file format
Fixed Schedule--fixed-schedule
Fixed Schedule Auto Offset--fixed-schedule-auto-offset
Fixed Schedule Start/End Offset--fixed-schedule-start-offset
--fixed-schedule-end-offset
Random Seed--random-seed
GRPC Method--grpc-method

Output Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Artifact Directory--artifact-dir
Checkpoint Directory--checkpoint-dir
Generate Plots--generate-plots🟡AIPerf uses separate aiperf plot subcommand with more features
Enable Checkpointing--enable-checkpointing
Profile Export File--profile-export-fileAIPerf works as a prefix for the profile export file names.

Tokenizer Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Tokenizer--tokenizer
Tokenizer Revision--tokenizer-revision
Tokenizer Trust Remote Code--tokenizer-trust-remote-code

Load Generator Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Concurrency--concurrency
Request Rate--request-rate
Request Count--request-count
--num-requests
Request Rate w/ Max Concurrency--request-rate with --concurrencyDual control of rate and concurrency ceiling
Measurement Interval--measurement-interval -pN/ANot applicable to AIPerf
Stability Percentage--stability-percentage -sN/ANot applicable to AIPerf

Arrival Pattern Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Arrival Pattern--arrival-pattern
{constant,poisson,gamma}
Controls inter-arrival time distribution
Arrival Smoothness--arrival-smoothness
--vllm-burstiness
Gamma distribution shape: <1=bursty, 1=Poisson, >1=smooth

Duration-Based Benchmarking

FeatureCLI OptionGenAI-PerfAIPerfNotes
Benchmark Duration--benchmark-durationStop after N seconds
Benchmark Grace Period--benchmark-grace-periodWait for in-flight requests after duration (default: 30s, supports inf)

Concurrency Control

FeatureCLI OptionGenAI-PerfAIPerfNotes
Session Concurrency--concurrencyMax concurrent sessions
Prefill Concurrency--prefill-concurrencyLimit concurrent prefill operations (requires --streaming)

Gradual Ramping

FeatureCLI OptionGenAI-PerfAIPerfNotes
Concurrency Ramp--concurrency-ramp-durationRamp concurrency from 1 to target over N seconds
Prefill Concurrency Ramp--prefill-concurrency-ramp-durationRamp prefill concurrency over N seconds
Request Rate Ramp--request-rate-ramp-durationRamp request rate over N seconds

User-Centric Timing (KV Cache Benchmarking)

FeatureCLI OptionGenAI-PerfAIPerfNotes
User-Centric Rate--user-centric-ratePer-user rate limiting with consistent turn gaps
Number of Users--num-usersNumber of simulated users (required with --user-centric-rate)
Shared System Prompt--shared-system-prompt-lengthSystem prompt shared across all users (KV cache prefix)
User Context Prompt--user-context-prompt-lengthPer-user unique context padding

Warmup Phase Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Warmup Request Count--warmup-request-count
Warmup Duration--warmup-durationDuration-based warmup stop condition
Warmup Session Count--num-warmup-sessionsSession-based warmup stop condition
Warmup Concurrency--warmup-concurrencyOverride concurrency during warmup
Warmup Prefill Concurrency--warmup-prefill-concurrencyOverride prefill concurrency during warmup
Warmup Request Rate--warmup-request-rateOverride request rate during warmup
Warmup Arrival Pattern--warmup-arrival-patternOverride arrival pattern during warmup
Warmup Grace Period--warmup-grace-periodGrace period for warmup responses
Warmup Concurrency Ramp--warmup-concurrency-ramp-durationRamp warmup concurrency
Warmup Prefill Ramp--warmup-prefill-concurrency-ramp-durationRamp warmup prefill concurrency
Warmup Rate Ramp--warmup-request-rate-ramp-durationRamp warmup request rate

Session/Conversation Configuration (Multi-turn)

FeatureCLI OptionGenAI-PerfAIPerfNotes
Number of Sessions--num-sessions
Session Concurrency--session-concurrencyUse --concurrency for AIPerf
Session Delay Ratio--session-delay-ratio
Session Turn Delay Mean--session-turn-delay-mean
Session Turn Delay Stddev--session-turn-delay-stddev
Session Turns Mean--session-turns-mean
Session Turns Stddev--session-turns-stddev

Input Sequence Length (ISL) Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Input Tokens Mean--synthetic-input-tokens-mean
--isl
Input Tokens Stddev--synthetic-input-tokens-stddev
Input Tokens Block Size--prompt-input-tokens-block-size
--isl-block-size
Used for mooncake_trace hash_id blocks

Output Sequence Length (OSL) Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Output Tokens Mean--output-tokens-mean
--osl
Output Tokens Stddev--output-tokens-stddev
Output Tokens Mean Deterministic--output-tokens-mean-deterministicOnly applicable to Triton

Batch Size Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Text Batch Size--batch-size-text
--batch-size -b
Audio Batch Size--batch-size-audio
Image Batch Size--batch-size-image

Prefix Prompt Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Number of Prefix Prompts--num-prefix-prompts
Prefix Prompt Length--prefix-prompt-length

Audio Input Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Audio Length Mean--audio-length-mean
Audio Length Stddev--audio-length-stddev
Audio Format--audio-format
{wav,mp3,random}
Audio Depths--audio-depths
Audio Sample Rates--audio-sample-rates
Audio Number of Channels--audio-num-channels

Image Input Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Image Width Mean--image-width-mean
Image Width Stddev--image-width-stddev
Image Height Mean--image-height-mean
Image Height Stddev--image-height-stddev
Image Format--image-format
{png,jpeg,random}

Service Configuration

FeatureCLI OptionGenAI-PerfAIPerfNotes
Record Processor Service Count--record-processor-service-count
--record-processors
Maximum Workers--workers-max
--max-workers
ZMQ Host--zmq-host
ZMQ IPC Path--zmq-ipc-path

Request Cancellation

FeatureCLI OptionGenAI-PerfAIPerfNotes
Request Cancellation Rate--request-cancellation-ratePercentage of requests to cancel (0-100)
Request Cancellation Delay--request-cancellation-delaySeconds to wait before cancelling

Additional Features

FeatureCLI OptionGenAI-PerfAIPerfNotes
Goodput Constraints--goodput -g
Verbose-v --verbose
Extra Verbose-vv
Log Level--log-level{trace,debug,info,notice,warning,success,error,critical}
UI Type--ui-type --ui
{dashboard,simple,none}
Help-h --help

Perf-Analyzer Passthrough Arguments

GenAI-Perf supports passing through arguments to the Perf-Analyzer CLI. AIPerf does not support this, as it does not use Perf-Analyzer under the hood.

FeatureCLI OptionGenAI-PerfAIPerfNotes
Perf-Analyzer Passthrough Arguments--N/AOnly applicable to GenAI-Perf

Data Exporters

FeatureGenAI-PerfAIPerfNotes
Console output
JSON outputSee discrepancies below
CSV output
API Error Summary
profile_export.jsonUse --export-level raw in AIPerf to get raw input/output payloads
Per-Record Metrics
inputs.jsonAIPerf format is slightly different

Discrepancies

JSON Output

  • Fields in the input_config section may differ between GenAI-Perf and AIPerf.

Advanced Features Comparison

FeatureGenAI-PerfAIPerfNotes
Multi-modal support
GPU Telemetry
Streaming API support
Multi-turn conversationsFull multi-turn benchmarking with session tracking
Payload schedulingFixed schedule workloads
Distributed testing🟡Multi-node result aggregation
Custom endpoints
Synthetic data generation
Bring Your Own Data (BYOD)Custom dataset support
Audio metricsAudio-specific performance metrics
Vision metricsImage-specific performance metrics
Image generation benchmarkingText-to-image with raw export for image extraction
Live MetricsLive metrics display
Dashboard UIDashboard UI
Reasoning token parsingParsing of reasoning tokens
Arrival pattern controlConstant, Poisson, Gamma distributions with tunable burstiness
Prefill concurrency limitingFine-grained prefill queueing control for TTFT behavior
Gradual rampingSmooth ramp-up for concurrency and rate
Duration-based benchmarkingTime-based stop conditions with grace periods
User-centric timingPer-user rate limiting for KV cache benchmarking
Configurable warmup phase🟡AIPerf supports full warmup configuration (rate, concurrency, duration, ramping)
HTTP trace metricsDetailed HTTP lifecycle timing (DNS, TCP, TLS, TTFB)
Request cancellationTest timeout behavior and service resilience
Timeslice metricsPer-timeslice metric breakdown
Interactive plot dashboardWeb-based exploration with dynamic metric selection and filtering
Multi-run comparison plotsAuto-detected Pareto curves and throughput analysis