Random Number Generation & Reproducibility

Quick Links:
Overview • What Is Reproducible • User Guide • Developer Guide • Reference

Overview

TL;DR: Use --random-seed 42 to get identical dataset content across runs. Performance metrics and worker assignment vary due to distributed system architecture.

AIPerf provides deterministic reproducibility for all seed-controlled randomness using hash-based RNG derivation. This enables reproducible dataset generation while maintaining realistic load testing performance.

Default behavior: Without --random-seed, AIPerf produces non-deterministic results. Set --random-seed <integer> for reproducibility.

Distributed System Constraints: Even with --random-seed, performance metrics and worker assignment are NOT reproducible due to system non-determinism (network timing, async I/O, ZMQ load balancing).

Reproducible (with --random-seed):

✅ Dataset content (prompts, images, audio)
✅ Dataset sampling order (random/shuffle strategies)
✅ Request timing intervals (Poisson values)
✅ Model selection (random strategy)
✅ Session IDs (session_000000, session_000001, …)

NOT Reproducible (system-dependent):

❌ Worker assignment / request execution order
❌ Performance metrics (TTFT, ITL, throughput)
❌ Server responses / absolute timestamps

Testing: Reproducibility is enforced by integration canary tests and CI/CD validation on every commit. See Testing & Validation.

What Is Reproducible, What Is Not

Key Principle: Seeds control WHAT you ask, not WHEN it completes or WHAT the server answers.

✅ Reproducible with —random-seed

Dataset: Prompt text/tokens, image dimensions/formats, audio duration/formats, session IDs
Sampling: Random selection, shuffle order, conversation selection
Timing Decisions: Poisson interval values, cancellation decisions

❌ NOT Reproducible

Worker/Execution: Which worker handles which request, request start/completion order, async I/O timing
Performance: TTFT, ITL, latency, throughput
System: Timestamps, process IDs, request IDs (ZMQ routing)
Server: LLM output text, output token counts, errors/failures

Why This Architecture?

AIPerf achieves its high throughput through parallel workers, ZMQ load balancing, and async I/O. Full determinism would require single-worker synchronous execution, destroying performance.

How It Works

Phase 1 (Startup - PROFILE_CONFIGURE):

DatasetManager pre-generates complete dataset using derived RNGs and stores in memory
TimingManager creates credit issuing strategy with RNG-based interval generator
Workers set global seed (defensive measure) but don’t derive/use RNGs

Phase 2 (Runtime - PROFILE_START):

TimingManager generates intervals on-the-fly using RNG, sleeps, then drops credits
Workers receive credits via ZMQ load balancing
Workers request conversations from DatasetManager’s pre-generated pool
DatasetManager returns conversations (using sampler RNG or specific ID)
Workers send API requests with pre-generated content
Result: Same dataset and interval values, but actual timing/worker assignment vary per run

Analogy: Like a deterministic deck of cards (same 52 cards, same shuffle) dealt to players who play at different speeds. The deck is reproducible; card distribution to players varies based on who finishes hands first.

Testing & Validation

Reproducibility is enforced by automated tests on every commit:

test_random_generator_canary.py: Compares payloads against reference snapshots to detect regressions
test_deterministic_behavior.py: Verifies byte-for-byte identical outputs with same seed, different outputs with different seeds, tested with 5+ parallel workers

User Guide

Basic Usage

$ # Reproducible dataset
$ aiperf --random-seed 42 [options...]
$ 
$ # Non-reproducible (default)
$ aiperf [options...]

Same seed + same config = identical dataset content. Performance metrics always vary.

Use Cases

Debugging: Reproduce exact prompts across runs to isolate prompt-related vs. network/timing issues

$ aiperf --random-seed 42 [...] --profile-export-file run1.json
$ aiperf --random-seed 42 [...] --profile-export-file run2.json
$ # Prompts identical; metrics may vary

Performance Testing: Compare metrics with same dataset

$ aiperf --random-seed 42 [...] --profile-export-file baseline.json
$ # After optimization...
$ aiperf --random-seed 42 [...] --profile-export-file optimized.json
$ # Use statistical analysis (median, p95, p99)

Stress Testing: Vary patterns by omitting seed

$ for i in {1..10}; do
$   aiperf [...] --profile-export-file run_$i.json
$ done

Developer Guide

System Architecture

Where RNGs Are Used:

DatasetManager: Pre-generates all dataset content at startup using derived RNGs
TimingManager: Generates Poisson timing intervals and cancellation decisions
Workers: Set global seed (defensive) but do NOT derive RNGs—they only execute API requests with pre-generated content

Process Flow:

bootstrap.py initializes RNG with rng.init(seed) in each process
- Sets Python’s random.seed() and NumPy’s np.random.seed() globally (defensive measure)
- Protects against third-party code inadvertently using global random state
DatasetManager creates generators (PromptGenerator, ImageGenerator, etc.) that derive RNGs in __init__
TimingManager creates interval generator that derives RNG in __init__
Workers initialize global seed but don’t derive any RNGs (they only execute API requests)
All dataset content is generated before any requests are sent
Workers pull from pre-generated pool at runtime

How to Use RNGs in Your Code

Workers do NOT use RNGs. Only use RNGs in DatasetManager (content generation) or TimingManager (request timing) components.

1 from aiperf.common import random_generator as rng
2 
3 class MyGenerator:
4     def __init__(self, config):
5         # Derive once in __init__ with unique identifier
6         self._rng = rng.derive("dataset.mycomponent.feature")
7 
8     def generate(self):
9         # Use stored RNG instance
10         return self._rng.choice([1, 2, 3, 4, 5])

Rules:

Derive in __init__, not in methods (or you’ll get the same first value every call)
Store as instance variable
Use unique dotted identifier: <module>.<component>.<aspect>
Never use Python’s random module (technically seeded, but fragile—any code using it affects your sequence)

Hash-Based Seed Derivation

Uses SHA-256 to derive independent seeds: SHA-256(root_seed:identifier) → child seed

Benefits:

Deterministic: Same identifier always gets same seed
Independent: Changing one RNG doesn’t affect others
Fast: ~1-2 microseconds per derivation (happens once at init)

Common Mistakes

❌ Deriving in methods → Returns same first value every call.
✅ Derive in __init__.

❌ Using Python’s random → Fragile (global state affected by any code).
✅ Use rng.derive().

❌ Adding operations to existing RNG → Shifts all subsequent values.
✅ Derive new RNG for new feature.

FAQ

Q: Performance metrics still vary with same seed. Why?
A: Expected. Seeds control dataset content, not network timing or worker scheduling. See What Is Reproducible.

Q: Same seed across different configs?
A: Yes. Same seed + different config = different but reproducible results.

Q: Multiple workers—how does this work?
A: Workers set global seed (defensive) but don’t derive RNGs. DatasetManager pre-generates content, workers pull from this fixed pool. Validated with 5+ workers.

Q: Are RNGs thread-safe?
A: No, but not an issue—each process uses RNGs in its own space. If adding multi-threaded RNG usage, derive per-thread.

Q: Session IDs reproducible?
A: Yes. With seed: sequential (session_000000, session_000001). Without: UUIDs.

Q: Performance impact?
A: None measurable. Network I/O dominates by 1000×.

Reference

All Component-Specific RNG Identifiers

Dataset

1 # Prompts (3)
2 "dataset.prompt.length"        # Token count distribution
3 "dataset.prompt.corpus"        # Content position selection
4 "dataset.prompt.prefix"        # Prefix selection
5 
6 # Images (4)
7 "dataset.image.dimensions"     # Width + height (coupled for aspect ratio)
8 "dataset.image.format"         # PNG/JPEG/etc. selection
9 "dataset.image.source"         # Source image selection (assets and directory modes only)
10 "dataset.image.noise"          # Random-noise pixel generation (noise mode, default)
11 
12 # Audio (3)
13 "dataset.audio.duration"       # Length distribution
14 "dataset.audio.format"         # Sample rate + bit depth
15 "dataset.audio.data"           # Audio sample generation
16 
17 # Samplers (2)
18 "dataset.sampler.random"       # Random sampling strategy
19 "dataset.sampler.shuffle"      # Shuffle sampling strategy
20 
21 # Loaders (2)
22 "dataset.loader.random_pool"   # Random pool loader
23 "dataset.loader.sharegpt"      # ShareGPT loader

Timing

1 "timing.request.cancellation"      # Cancellation decisions (probabilistic)
2 "timing.request.poisson_interval"  # Exponential inter-arrival times (Poisson process)

Composer

1 "composer.turn.model_selection"    # Model selection per turn
2 "composer.turn.max_tokens"         # max_tokens sampling
3 "composer.conversation.turn_count" # Number of turns per conversation
4 "composer.conversation.turn_delay" # Delay between turns

Models

1 "models.sequence.distribution"     # ISL/OSL distribution sampling

Module API

1 from aiperf.common import random_generator as rng
2 
3 # Initialize (called automatically in bootstrap.py)
4 rng.init(seed: int | None)
5     # seed: Any integer for deterministic, None for random
6     # Also sets global random.seed() and np.random.seed() defensively
7 
8 # Derive component RNGs (call in __init__)
9 my_rng = rng.derive(identifier: str) -> RandomGenerator
10     # Returns: Independent RNG with SHA-256 derived seed
11 
12 # Reset (for testing only)
13 rng.reset()

See random_generator.py for the RandomGenerator class and full API details.

Quick Links:
Overview • What Is Reproducible • User Guide • Developer Guide • Reference

Overview

TL;DR: Use --random-seed 42 to get identical dataset content across runs. Performance metrics and worker assignment vary due to distributed system architecture.

Default behavior: Without --random-seed, AIPerf produces non-deterministic results. Set --random-seed <integer> for reproducibility.

Reproducible (with --random-seed):

✅ Dataset content (prompts, images, audio)
✅ Dataset sampling order (random/shuffle strategies)
✅ Request timing intervals (Poisson values)
✅ Model selection (random strategy)
✅ Session IDs (session_000000, session_000001, …)

NOT Reproducible (system-dependent):

❌ Worker assignment / request execution order
❌ Performance metrics (TTFT, ITL, throughput)
❌ Server responses / absolute timestamps

Testing: Reproducibility is enforced by integration canary tests and CI/CD validation on every commit. See Testing & Validation.

What Is Reproducible, What Is Not

Key Principle: Seeds control WHAT you ask, not WHEN it completes or WHAT the server answers.

✅ Reproducible with —random-seed

❌ NOT Reproducible

Why This Architecture?

AIPerf achieves its high throughput through parallel workers, ZMQ load balancing, and async I/O. Full determinism would require single-worker synchronous execution, destroying performance.

How It Works

Phase 1 (Startup - PROFILE_CONFIGURE):

DatasetManager pre-generates complete dataset using derived RNGs and stores in memory
TimingManager creates credit issuing strategy with RNG-based interval generator
Workers set global seed (defensive measure) but don’t derive/use RNGs

Phase 2 (Runtime - PROFILE_START):

TimingManager generates intervals on-the-fly using RNG, sleeps, then drops credits
Workers receive credits via ZMQ load balancing
Workers request conversations from DatasetManager’s pre-generated pool
DatasetManager returns conversations (using sampler RNG or specific ID)
Workers send API requests with pre-generated content
Result: Same dataset and interval values, but actual timing/worker assignment vary per run

Testing & Validation

Reproducibility is enforced by automated tests on every commit:

test_random_generator_canary.py: Compares payloads against reference snapshots to detect regressions
test_deterministic_behavior.py: Verifies byte-for-byte identical outputs with same seed, different outputs with different seeds, tested with 5+ parallel workers

User Guide

Basic Usage

$ # Reproducible dataset
$ aiperf --random-seed 42 [options...]
$ 
$ # Non-reproducible (default)
$ aiperf [options...]

Same seed + same config = identical dataset content. Performance metrics always vary.

Use Cases

Debugging: Reproduce exact prompts across runs to isolate prompt-related vs. network/timing issues

$ aiperf --random-seed 42 [...] --profile-export-file run1.json
$ aiperf --random-seed 42 [...] --profile-export-file run2.json
$ # Prompts identical; metrics may vary

Performance Testing: Compare metrics with same dataset

$ aiperf --random-seed 42 [...] --profile-export-file baseline.json
$ # After optimization...
$ aiperf --random-seed 42 [...] --profile-export-file optimized.json
$ # Use statistical analysis (median, p95, p99)

Stress Testing: Vary patterns by omitting seed

$ for i in {1..10}; do
$   aiperf [...] --profile-export-file run_$i.json
$ done

Developer Guide

System Architecture

Where RNGs Are Used:

DatasetManager: Pre-generates all dataset content at startup using derived RNGs
TimingManager: Generates Poisson timing intervals and cancellation decisions
Workers: Set global seed (defensive) but do NOT derive RNGs—they only execute API requests with pre-generated content

Process Flow:

bootstrap.py initializes RNG with rng.init(seed) in each process
- Sets Python’s random.seed() and NumPy’s np.random.seed() globally (defensive measure)
- Protects against third-party code inadvertently using global random state
DatasetManager creates generators (PromptGenerator, ImageGenerator, etc.) that derive RNGs in __init__
TimingManager creates interval generator that derives RNG in __init__
Workers initialize global seed but don’t derive any RNGs (they only execute API requests)
All dataset content is generated before any requests are sent
Workers pull from pre-generated pool at runtime

How to Use RNGs in Your Code

Workers do NOT use RNGs. Only use RNGs in DatasetManager (content generation) or TimingManager (request timing) components.

1 from aiperf.common import random_generator as rng
2 
3 class MyGenerator:
4     def __init__(self, config):
5         # Derive once in __init__ with unique identifier
6         self._rng = rng.derive("dataset.mycomponent.feature")
7 
8     def generate(self):
9         # Use stored RNG instance
10         return self._rng.choice([1, 2, 3, 4, 5])

Rules:

Derive in __init__, not in methods (or you’ll get the same first value every call)
Store as instance variable
Use unique dotted identifier: <module>.<component>.<aspect>
Never use Python’s random module (technically seeded, but fragile—any code using it affects your sequence)

Hash-Based Seed Derivation

Uses SHA-256 to derive independent seeds: SHA-256(root_seed:identifier) → child seed

Benefits:

Deterministic: Same identifier always gets same seed
Independent: Changing one RNG doesn’t affect others
Fast: ~1-2 microseconds per derivation (happens once at init)

Common Mistakes

❌ Deriving in methods → Returns same first value every call.
✅ Derive in __init__.

❌ Using Python’s random → Fragile (global state affected by any code).
✅ Use rng.derive().

❌ Adding operations to existing RNG → Shifts all subsequent values.
✅ Derive new RNG for new feature.

FAQ

Q: Performance metrics still vary with same seed. Why?
A: Expected. Seeds control dataset content, not network timing or worker scheduling. See What Is Reproducible.

Q: Same seed across different configs?
A: Yes. Same seed + different config = different but reproducible results.

Q: Are RNGs thread-safe?
A: No, but not an issue—each process uses RNGs in its own space. If adding multi-threaded RNG usage, derive per-thread.

Q: Session IDs reproducible?
A: Yes. With seed: sequential (session_000000, session_000001). Without: UUIDs.

Q: Performance impact?
A: None measurable. Network I/O dominates by 1000×.

Reference

All Component-Specific RNG Identifiers

Dataset

1 # Prompts (3)
2 "dataset.prompt.length"        # Token count distribution
3 "dataset.prompt.corpus"        # Content position selection
4 "dataset.prompt.prefix"        # Prefix selection
5 
6 # Images (4)
7 "dataset.image.dimensions"     # Width + height (coupled for aspect ratio)
8 "dataset.image.format"         # PNG/JPEG/etc. selection
9 "dataset.image.source"         # Source image selection (assets and directory modes only)
10 "dataset.image.noise"          # Random-noise pixel generation (noise mode, default)
11 
12 # Audio (3)
13 "dataset.audio.duration"       # Length distribution
14 "dataset.audio.format"         # Sample rate + bit depth
15 "dataset.audio.data"           # Audio sample generation
16 
17 # Samplers (2)
18 "dataset.sampler.random"       # Random sampling strategy
19 "dataset.sampler.shuffle"      # Shuffle sampling strategy
20 
21 # Loaders (2)
22 "dataset.loader.random_pool"   # Random pool loader
23 "dataset.loader.sharegpt"      # ShareGPT loader

Timing

1 "timing.request.cancellation"      # Cancellation decisions (probabilistic)
2 "timing.request.poisson_interval"  # Exponential inter-arrival times (Poisson process)

Composer

1 "composer.turn.model_selection"    # Model selection per turn
2 "composer.turn.max_tokens"         # max_tokens sampling
3 "composer.conversation.turn_count" # Number of turns per conversation
4 "composer.conversation.turn_delay" # Delay between turns

Models

1 "models.sequence.distribution"     # ISL/OSL distribution sampling

Module API

1 from aiperf.common import random_generator as rng
2 
3 # Initialize (called automatically in bootstrap.py)
4 rng.init(seed: int | None)
5     # seed: Any integer for deterministic, None for random
6     # Also sets global random.seed() and np.random.seed() defensively
7 
8 # Derive component RNGs (call in __init__)
9 my_rng = rng.derive(identifier: str) -> RandomGenerator
10     # Returns: Independent RNG with SHA-256 derived seed
11 
12 # Reset (for testing only)
13 rng.reset()

See random_generator.py for the RandomGenerator class and full API details.

$	# Reproducible dataset
$	aiperf --random-seed 42 [options...]
$
$	# Non-reproducible (default)
$	aiperf [options...]

$	aiperf --random-seed 42 [...] --profile-export-file run1.json
$	aiperf --random-seed 42 [...] --profile-export-file run2.json
$	# Prompts identical; metrics may vary

$	aiperf --random-seed 42 [...] --profile-export-file baseline.json
$	# After optimization...
$	aiperf --random-seed 42 [...] --profile-export-file optimized.json
$	# Use statistical analysis (median, p95, p99)

$	for i in {1..10}; do
$	aiperf [...] --profile-export-file run_$i.json
$	done

1	from aiperf.common import random_generator as rng
2
3	class MyGenerator:
4	def __init__(self, config):
5	# Derive once in __init__ with unique identifier
6	self._rng = rng.derive("dataset.mycomponent.feature")
7
8	def generate(self):
9	# Use stored RNG instance
10	return self._rng.choice([1, 2, 3, 4, 5])

1	# Prompts (3)
2	"dataset.prompt.length" # Token count distribution
3	"dataset.prompt.corpus" # Content position selection
4	"dataset.prompt.prefix" # Prefix selection
5
6	# Images (4)
7	"dataset.image.dimensions" # Width + height (coupled for aspect ratio)
8	"dataset.image.format" # PNG/JPEG/etc. selection
9	"dataset.image.source" # Source image selection (assets and directory modes only)
10	"dataset.image.noise" # Random-noise pixel generation (noise mode, default)
11
12	# Audio (3)
13	"dataset.audio.duration" # Length distribution
14	"dataset.audio.format" # Sample rate + bit depth
15	"dataset.audio.data" # Audio sample generation
16
17	# Samplers (2)
18	"dataset.sampler.random" # Random sampling strategy
19	"dataset.sampler.shuffle" # Shuffle sampling strategy
20
21	# Loaders (2)
22	"dataset.loader.random_pool" # Random pool loader
23	"dataset.loader.sharegpt" # ShareGPT loader

1	"timing.request.cancellation" # Cancellation decisions (probabilistic)
2	"timing.request.poisson_interval" # Exponential inter-arrival times (Poisson process)

1	"composer.turn.model_selection" # Model selection per turn
2	"composer.turn.max_tokens" # max_tokens sampling
3	"composer.conversation.turn_count" # Number of turns per conversation
4	"composer.conversation.turn_delay" # Delay between turns

1	from aiperf.common import random_generator as rng
2
3	# Initialize (called automatically in bootstrap.py)
4	rng.init(seed: int \| None)
5	# seed: Any integer for deterministic, None for random
6	# Also sets global random.seed() and np.random.seed() defensively
7
8	# Derive component RNGs (call in __init__)
9	my_rng = rng.derive(identifier: str) -> RandomGenerator
10	# Returns: Independent RNG with SHA-256 derived seed
11
12	# Reset (for testing only)
13	rng.reset()