Environment Variables
AIPerf can be configured using environment variables with the AIPERF_ prefix.
All settings are organized into logical subsystems for better discoverability.
Pattern: AIPERF_{SUBSYSTEM}_{SETTING_NAME}
Examples:
Environment variable names, default values, and definitions are subject to change. These settings may be modified, renamed, or removed in future releases.
CLI RUNNER
CLI runner post-run callback behavior. Controls whether OnComplete callback exceptions abort the run after all callbacks attempt or are isolated and logged. Default is isolated so that a single misbehaving callback (e.g. auto-plot in strict mode, third-party hook) cannot bypass the deliberate os._exit hang-protection that guards against multiprocessing/ZMQ teardown hangs in the parent process.
APISERVER
API server settings. Controls the host and port of the API server.
COMPRESSION
Compression settings for streaming file transfers. Controls chunk size and compression levels for zstd and gzip encodings used in dataset and results file transfers.
DAG
Settings for DAG benchmark mode (dag_jsonl input type).
DATASET
Dataset loading and configuration. Controls timeouts and behavior for dataset loading operations, as well as memory-mapped dataset storage settings.
GPU
GPU telemetry collection configuration. Controls GPU metrics collection frequency, endpoint detection, and shutdown behavior. Metrics are collected from DCGM endpoints at the specified interval.
HTTP
HTTP client socket and connection configuration. Controls low-level socket options, keepalive settings, DNS caching, and connection pooling for HTTP clients. These settings optimize performance for high-throughput streaming workloads. Video Generation Polling: For async video generation APIs that use job polling (e.g., SGLang /v1/videos), the poll interval is controlled by AIPERF_HTTP_VIDEO_POLL_INTERVAL. The max poll time uses the —request-timeout-seconds CLI argument.
LOGGING
Logging system configuration. Controls multiprocessing log queue size and other logging behavior.
METRICS
Metrics collection and storage configuration. Controls metrics storage allocation and collection behavior.
MLFLOW
MLflow export configuration. Controls timeout behavior for post-run MLflow artifact uploads.
OTEL
OpenTelemetry metrics streaming configuration. Controls buffering and flush behavior for OTLP metric streaming.
RECORD
Record processing and export configuration. Controls batch sizes, processor scaling, and progress reporting for record processing.
SEARCHPLANNER
Adaptive-search planner tunables. Controls precision targets, warmup-phase injection, and request-count presets for the smooth-isotonic and monotonic SLA-saturation search planners. All values are read at planner-construction or iteration-mutate time, so changes take effect on the next search run.
SERVERMETRICS
Server metrics collection configuration. Controls server metrics collection frequency, endpoint detection, and shutdown behavior. Metrics are collected from Prometheus-compatible endpoints at the specified interval. Use --no-server-metrics CLI flag to disable collection.
SERVICE
Service lifecycle and inter-service communication configuration. Controls timeouts for service registration, startup, shutdown, command handling, connection probing, heartbeats, and profile operations.
TIMING
Timing manager configuration. Controls timing-related settings for credit phase execution and scheduling.
TOKENIZER
Tokenizer pre-warm and loading configuration. Controls how the CLI parent pre-warms tokenizer caches before spawning AIPerf services. Pre-warming runs in subprocesses so the parent never imports the heavy native libraries (transformers, Rust-backed tokenizers, tiktoken).
UI
User interface and dashboard configuration. Controls refresh rates, update thresholds, and notification behavior for the various UI modes (dashboard, tqdm, etc.).
WORKER
Worker management and auto-scaling configuration. Controls worker pool sizing, health monitoring, load detection, and recovery behavior. The CPU_UTILIZATION_FACTOR is used in the auto-scaling formula: max_workers = max(1, min(int(cpu_count * factor) - 1, MAX_WORKERS_CAP))
ZMQ
ZMQ socket and communication configuration. Controls ZMQ socket timeouts, keepalive settings, retry behavior, and concurrency limits. These settings affect reliability and performance of the internal message bus.
DEV
Development and debugging configuration. Controls developer-focused features like debug logging, profiling, and internal metrics. These settings are typically disabled in production environments.