AIPerf can be configured using environment variables with the AIPERF_ prefix.
All settings are organized into logical subsystems for better discoverability.
Pattern: AIPERF_{SUBSYSTEM}_{SETTING_NAME}
Examples:
Environment variable names, default values, and definitions are subject to change. These settings may be modified, renamed, or removed in future releases.
CLI runner post-run callback behavior. Controls whether OnComplete callback exceptions abort the run after all callbacks attempt or are isolated and logged. Default is isolated so that a single misbehaving callback (e.g. auto-plot in strict mode, third-party hook) cannot bypass the deliberate os._exit hang-protection that guards against multiprocessing/ZMQ teardown hangs in the parent process.
API server settings. Controls the host and port of the API server.
Compression settings for streaming file transfers. Controls chunk size and compression levels for zstd and gzip encodings used in dataset and results file transfers.
Settings for DAG benchmark mode (dag_jsonl input type).
Dataset loading and configuration. Controls timeouts and behavior for dataset loading operations, as well as memory-mapped dataset storage settings.
GPU telemetry collection configuration. Controls GPU metrics collection frequency, endpoint detection, and shutdown behavior. Metrics are collected from DCGM endpoints at the specified interval.
HTTP client socket and connection configuration. Controls low-level socket options, keepalive settings, DNS caching, and connection pooling for HTTP clients. These settings optimize performance for high-throughput streaming workloads. Video Generation Polling: For async video generation APIs that use job polling (e.g., SGLang /v1/videos), the poll interval is controlled by AIPERF_HTTP_VIDEO_POLL_INTERVAL. The max poll time uses the —request-timeout-seconds CLI argument.
Logging system configuration. Controls multiprocessing log queue size and other logging behavior.
Metrics collection and storage configuration. Controls metrics storage allocation and collection behavior.
MLflow export configuration. Controls timeout behavior for post-run MLflow artifact uploads.
OpenTelemetry metrics streaming configuration. Controls buffering and flush behavior for OTLP metric streaming.
Record processing and export configuration. Controls batch sizes, processor scaling, and progress reporting for record processing.
Adaptive-search planner tunables. Controls precision targets, warmup-phase injection, and request-count presets for the smooth-isotonic and monotonic SLA-saturation search planners. All values are read at planner-construction or iteration-mutate time, so changes take effect on the next search run.
Server metrics collection configuration. Controls server metrics collection frequency, endpoint detection, and shutdown behavior. Metrics are collected from Prometheus-compatible endpoints at the specified interval. Use --no-server-metrics CLI flag to disable collection.
Service lifecycle and inter-service communication configuration. Controls timeouts for service registration, startup, shutdown, command handling, connection probing, heartbeats, and profile operations.
Timing manager configuration. Controls timing-related settings for credit phase execution and scheduling.
Tokenizer pre-warm and loading configuration. Controls how the CLI parent pre-warms tokenizer caches before spawning AIPerf services. Pre-warming runs in subprocesses so the parent never imports the heavy native libraries (transformers, Rust-backed tokenizers, tiktoken).
User interface and dashboard configuration. Controls refresh rates, update thresholds, and notification behavior for the various UI modes (dashboard, tqdm, etc.).
Worker management and auto-scaling configuration. Controls worker pool sizing, health monitoring, load detection, and recovery behavior. The CPU_UTILIZATION_FACTOR is used in the auto-scaling formula: max_workers = max(1, min(int(cpu_count * factor) - 1, MAX_WORKERS_CAP))
ZMQ socket and communication configuration. Controls ZMQ socket timeouts, keepalive settings, retry behavior, and concurrency limits. These settings affect reliability and performance of the internal message bus.
Development and debugging configuration. Controls developer-focused features like debug logging, profiling, and internal metrics. These settings are typically disabled in production environments.