Random Number Generation & Reproducibility
Random Number Generation & Reproducibility
Quick Links:
Overview • What Is Reproducible • User Guide • Developer Guide • Reference
Overview
TL;DR: Use --random-seed 42 to get identical dataset content across runs. Performance metrics and worker assignment vary due to distributed system architecture.
AIPerf provides deterministic reproducibility for all seed-controlled randomness using hash-based RNG derivation. This enables reproducible dataset generation while maintaining realistic load testing performance.
Default behavior: Without --random-seed, AIPerf produces non-deterministic results. Set --random-seed <integer> for reproducibility.
Distributed System Constraints: Even with --random-seed, performance metrics and worker assignment are NOT reproducible due to system non-determinism (network timing, async I/O, ZMQ load balancing).
Reproducible (with --random-seed):
- ✅ Dataset content (prompts, images, audio)
- ✅ Dataset sampling order (random/shuffle strategies)
- ✅ Request timing intervals (Poisson values)
- ✅ Model selection (random strategy)
- ✅ Session IDs (
session_000000,session_000001, …)
NOT Reproducible (system-dependent):
- ❌ Worker assignment / request execution order
- ❌ Performance metrics (TTFT, ITL, throughput)
- ❌ Server responses / absolute timestamps
Testing: Reproducibility is enforced by integration canary tests and CI/CD validation on every commit. See Testing & Validation.
What Is Reproducible, What Is Not
Key Principle: Seeds control WHAT you ask, not WHEN it completes or WHAT the server answers.
✅ Reproducible with —random-seed
Dataset: Prompt text/tokens, image dimensions/formats, audio duration/formats, session IDs
Sampling: Random selection, shuffle order, conversation selection
Timing Decisions: Poisson interval values, cancellation decisions
❌ NOT Reproducible
Worker/Execution: Which worker handles which request, request start/completion order, async I/O timing
Performance: TTFT, ITL, latency, throughput
System: Timestamps, process IDs, request IDs (ZMQ routing)
Server: LLM output text, output token counts, errors/failures
Why This Architecture?
AIPerf achieves its high throughput through parallel workers, ZMQ load balancing, and async I/O. Full determinism would require single-worker synchronous execution, destroying performance.
How It Works
Phase 1 (Startup - PROFILE_CONFIGURE):
- DatasetManager pre-generates complete dataset using derived RNGs and stores in memory
- TimingManager creates credit issuing strategy with RNG-based interval generator
- Workers set global seed (defensive measure) but don’t derive/use RNGs
Phase 2 (Runtime - PROFILE_START):
- TimingManager generates intervals on-the-fly using RNG, sleeps, then drops credits
- Workers receive credits via ZMQ load balancing
- Workers request conversations from DatasetManager’s pre-generated pool
- DatasetManager returns conversations (using sampler RNG or specific ID)
- Workers send API requests with pre-generated content
- Result: Same dataset and interval values, but actual timing/worker assignment vary per run
Analogy: Like a deterministic deck of cards (same 52 cards, same shuffle) dealt to players who play at different speeds. The deck is reproducible; card distribution to players varies based on who finishes hands first.
Testing & Validation
Reproducibility is enforced by automated tests on every commit:
- test_random_generator_canary.py: Compares payloads against reference snapshots to detect regressions
- test_deterministic_behavior.py: Verifies byte-for-byte identical outputs with same seed, different outputs with different seeds, tested with 5+ parallel workers
User Guide
Basic Usage
Same seed + same config = identical dataset content. Performance metrics always vary.
Use Cases
Debugging: Reproduce exact prompts across runs to isolate prompt-related vs. network/timing issues
Performance Testing: Compare metrics with same dataset
Stress Testing: Vary patterns by omitting seed
Developer Guide
System Architecture
Where RNGs Are Used:
- DatasetManager: Pre-generates all dataset content at startup using derived RNGs
- TimingManager: Generates Poisson timing intervals and cancellation decisions
- Workers: Set global seed (defensive) but do NOT derive RNGs—they only execute API requests with pre-generated content
Process Flow:
bootstrap.pyinitializes RNG withrng.init(seed)in each process- Sets Python’s
random.seed()and NumPy’snp.random.seed()globally (defensive measure) - Protects against third-party code inadvertently using global random state
- Sets Python’s
- DatasetManager creates generators (PromptGenerator, ImageGenerator, etc.) that derive RNGs in
__init__ - TimingManager creates interval generator that derives RNG in
__init__ - Workers initialize global seed but don’t derive any RNGs (they only execute API requests)
- All dataset content is generated before any requests are sent
- Workers pull from pre-generated pool at runtime
How to Use RNGs in Your Code
Workers do NOT use RNGs. Only use RNGs in DatasetManager (content generation) or TimingManager (request timing) components.
Rules:
- Derive in
__init__, not in methods (or you’ll get the same first value every call) - Store as instance variable
- Use unique dotted identifier:
<module>.<component>.<aspect> - Never use Python’s
randommodule (technically seeded, but fragile—any code using it affects your sequence)
Hash-Based Seed Derivation
Uses SHA-256 to derive independent seeds: SHA-256(root_seed:identifier) → child seed
Benefits:
- Deterministic: Same identifier always gets same seed
- Independent: Changing one RNG doesn’t affect others
- Fast: ~1-2 microseconds per derivation (happens once at init)
Common Mistakes
❌ Deriving in methods → Returns same first value every call.
✅ Derive in __init__.
❌ Using Python’s random → Fragile (global state affected by any code).
✅ Use rng.derive().
❌ Adding operations to existing RNG → Shifts all subsequent values.
✅ Derive new RNG for new feature.
FAQ
Q: Performance metrics still vary with same seed. Why?
A: Expected. Seeds control dataset content, not network timing or worker scheduling. See What Is Reproducible.
Q: Same seed across different configs?
A: Yes. Same seed + different config = different but reproducible results.
Q: Multiple workers—how does this work?
A: Workers set global seed (defensive) but don’t derive RNGs. DatasetManager pre-generates content, workers pull from this fixed pool. Validated with 5+ workers.
Q: Are RNGs thread-safe?
A: No, but not an issue—each process uses RNGs in its own space. If adding multi-threaded RNG usage, derive per-thread.
Q: Session IDs reproducible?
A: Yes. With seed: sequential (session_000000, session_000001). Without: UUIDs.
Q: Performance impact?
A: None measurable. Network I/O dominates by 1000×.
Reference
All Component-Specific RNG Identifiers
Dataset
Timing
Composer
Models
Module API
See random_generator.py for the RandomGenerator class and full API details.