Sampling Distributions in YAML Configs

View as Markdown

Several fields in an AIPerf YAML config — input/output token lengths, conversation turn counts, turn delays, image dimensions, audio length, and ranking passage counts — accept a sampling distribution instead of a single number. This tutorial covers all five distribution shapes AIPerf supports, the auto-detection rules that pick between them, and the optional min:/max: clamps that compose with any of them.

If you only ever write isl: 512, you’ve already used a distribution — that scalar is the shorthand for a FixedDistribution. Everything below extends from there.

Where distributions show up

Any field in a YAML config typed as a sampling distribution accepts the full set of shapes described in this tutorial. The current list:

FieldSectionWhat it controls
isldataset.prompts (and shorthand at dataset.isl)Input sequence length, in tokens
osldataset.prompts and dataset.osl shorthand; also on file datasetsOutput sequence length, in tokens
turnsdatasetNumber of request/response turns per conversation
turn_delaydatasetDelay between turns, in milliseconds
width, heightdataset.imagesSynthetic image dimensions, in pixels
lengthdataset.audioSynthetic audio duration, in seconds
passages, passage_tokens, query_tokensdataset.rankingsRankings/reranking endpoint shapes

Wherever you see {mean: ..., stddev: ...} in a template, you can swap in any other shape from this page.

The five distribution types

AIPerf supports five distribution shapes, and figures out which one you mean from the keys you wrote — you don’t have to add a type: key. The discriminator is purely structural:

What you wroteTypeWhy
isl: 512FixedBare scalar
isl: {mean: 512, stddev: 50}Normalstddev present
isl: {mean: 512, median: 400}Log-normalmedian present
isl: {peaks: [...]}Multimodalpeaks present
isl: {points: [...]}Empiricalpoints present

You can override the inference with an explicit type: if you’d rather be loud:

1isl: {type: normal, mean: 512, stddev: 50}

type: accepts one of fixed, normal, lognormal, multimodal, empirical. AIPerf strips it after dispatch, so the rest of the dict is parsed normally.

Fixed — a constant

The simplest case. Every sample returns the same value.

1prompts:
2 isl: 512 # scalar shorthand
3 osl: {value: 128} # explicit object form (rarely needed)

Use a fixed distribution when you want a deterministic input or output size — e.g. reproducing a sizing study or feeding a controlled stress test.

Normal — Gaussian around a mean

A truncated Gaussian implemented via rejection sampling (samples below 0 are redrawn; falls back to clamped-mean if 10k iterations fail to land in range). Parameterised by mean and stddev.

1prompts:
2 isl: {mean: 512, stddev: 50}
3 osl: {mean: 128, stddev: 25}

This is the workhorse for “vary around a target.” If stddev: 0 is set or omitted, the distribution collapses to deterministic — equivalent to fixed.

A few details worth knowing:

  • mean must be >= 0. Zero is allowed (e.g. osl: {mean: 0} disables output, turn_delay: {mean: 0} disables inter-turn delay).
  • stddev must be >= 0. Default is 0.
  • A bare {mean: 512} (no stddev, no median) is still treated as Normal — a Normal with zero stddev. This is intentional: it keeps the rule “set mean and you get a Normal” simple. If you want a log-normal with no skew, write {mean: 512, median: 512}.

Log-normal — right-skewed, always positive

A log-normal distribution parameterised by mean and median. Skew is controlled by the mean / median ratio: the larger the ratio, the heavier the right tail. When mean == median it collapses to deterministic.

1prompts:
2 isl: {mean: 1024, median: 512} # heavy right tail
3 osl: {mean: 200, median: 180} # mild skew

Constraints:

  • Both mean and median must be > 0.
  • median must be <= mean. (A log-normal with median > mean is mathematically impossible.)

Use log-normal when modelling sizes that are bounded below by zero and have a long right tail — chat prompt lengths, retrieval-augmented context windows, “most requests are small but some are huge” workloads.

Multimodal — a mixture of N peaks

A weighted mixture of two or more sub-distributions. Each peak is itself a distribution, written inline, with an optional weight.

1prompts:
2 isl:
3 peaks:
4 - {mean: 128, stddev: 20, weight: 60} # 60% — short queries
5 - {mean: 2048, median: 1800, weight: 30} # 30% — long contexts (log-normal)
6 - {value: 8192, weight: 10} # 10% — exact 8K stress
7
8 # Equal-weight peaks: omit `weight` and they're split evenly.
9 osl:
10 peaks:
11 - {mean: 64, stddev: 10}
12 - {mean: 256, stddev: 40}
13 - {mean: 1024, stddev: 100}

Notes:

  • Requires at least 2 peaks.
  • Each peak follows the same auto-detection rules — write {stddev: ...} for Normal peaks, {median: ...} for log-normal peaks, {value: N} for fixed peaks.
  • Weights are relative — they’re normalised internally, so [60, 30, 10] and [6, 3, 1] produce the same mixture.
  • weight is optional and defaults to 1.0. Omit it on every peak to get an equal split.

Use multimodal when your real workload is a mix of distinct request shapes — e.g. a chat product where 70% of traffic is one-shot Q&A and 30% is long document summarisation. A single Normal can’t capture that.

Empirical — discrete weighted values

A discrete distribution sampled from a set of weighted values. No interpolation, no Gaussian — each draw returns one of the values you listed.

1prompts:
2 isl:
3 points:
4 - {value: 128, weight: 40}
5 - {value: 512, weight: 35}
6 - {value: 2048, weight: 20}
7 - {value: 8192, weight: 5}

Notes:

  • Requires at least one point. Weights must be > 0 and are normalised internally.
  • weight defaults to 1.0 — omit it for an equal-probability sampler over the listed values.

Use empirical when you have measured frequencies from production traces and want to reproduce them exactly without smoothing into a parametric shape.

Clamping with min: / max:

Every distribution shape — including the scalar shorthand — accepts optional min: and max: bounds. Samples outside the range are clamped (not resampled), so the bounds are hard limits, not statistical guarantees.

1prompts:
2 isl:
3 mean: 512
4 stddev: 200
5 min: 32 # never below 32 tokens
6 max: 4096 # never above 4096 tokens
7
8 osl:
9 peaks:
10 - {mean: 64, stddev: 30}
11 - {mean: 1024, stddev: 200}
12 min: 16
13 max: 2048

A few rules:

  • Bounds are inclusive: min: 32 means values down to and including 32 are kept; below 32 is clamped up to 32.
  • min: and max: must be finite. NaN/inf are rejected at config-validation time so they can’t silently disable clamping.
  • If both are set, min <= max is enforced.
  • Bounds compose with every shape — Fixed, Normal, Log-normal, Multimodal, and Empirical.

For multimodal distributions, a top-level min/max applies to the output of the mixture. If you want different bounds per peak, set min/max on each peak’s sub-distribution instead.

Disambiguation cheat-sheet

If AIPerf can’t figure out what shape you meant, it errors at config-load time with a message that names the keys it saw. The most common causes:

MistakeWhat AIPerf does
isl: {mean: 512} (no stddev, no median)Treated as Normal with stddev=0 (deterministic).
isl: {stddev: 50} (no mean)Error — Normal requires mean.
isl: {peaks: [...one entry...]}Error — Multimodal requires at least 2 peaks.
isl: {value: 512, mean: 600}Error — value selects Fixed, but mean is unknown to Fixed.
Passing a string like "128,64:50;512,128:50"Error — that’s the legacy sequence_distribution string format (semicolon-separated ISL,OSL:prob pairs summing to 100), not a sampling distribution. See Sequence Length Distributions.

When in doubt, run:

$aiperf config validate my-config.yaml

The validator runs the same load pipeline aiperf profile does, so any distribution-shape problem surfaces here before you spend compute.

Combining with sweeps

Sweep parameters (sweep.parameters) can replace a distribution wholesale. The right-hand side of a sweep entry is the value that gets substituted into the body, so you can sweep across distribution shapes the same way you sweep across scalars:

1sweep:
2 type: grid
3 parameters:
4 # Sweep across three different ISL distributions.
5 datasets.default.prompts.isl:
6 - 512
7 - {mean: 512, stddev: 100}
8 - {peaks: [{mean: 128, stddev: 20}, {mean: 2048, stddev: 200}]}

That gives you three benchmark variations, each with a different ISL shape, while the rest of the body stays constant. Pair with multi_run for confidence intervals per shape — see Multi-Run Confidence Reporting.

Worked example — a realistic chat workload

Putting it all together: a synthetic dataset that mixes short and long queries, with a log-normal output shape and clamped bounds.

1schemaVersion: "2.0"
2
3benchmark:
4 model: meta-llama/Llama-3.1-8B-Instruct
5 endpoint:
6 url: http://localhost:8000/v1/chat/completions
7 type: chat
8 streaming: true
9
10 dataset:
11 type: synthetic
12 entries: 500
13 prompts:
14 # Bimodal ISL — most traffic is short, but 20% is a long context.
15 isl:
16 peaks:
17 - {mean: 200, stddev: 50, weight: 80}
18 - {mean: 4096, median: 3500, weight: 20}
19 min: 32
20 max: 8192
21
22 # OSL has a long right tail — a few responses are unusually long.
23 osl:
24 mean: 256
25 median: 200
26 max: 1024
27
28 # Multi-turn chat: most conversations are 2-3 turns, some run longer.
29 turns:
30 mean: 3
31 stddev: 1
32 min: 1
33 max: 8
34
35 # User think-time between turns, in milliseconds.
36 turn_delay:
37 mean: 1500
38 stddev: 800
39 min: 100
40
41 phases:
42 - name: warmup
43 type: concurrency
44 concurrency: 4
45 requests: 50
46 exclude_from_results: true
47 - name: profiling
48 type: poisson
49 rate: 30.0
50 duration: 120
51 concurrency: 64

Run it with:

$aiperf profile --config chat-mixed.yaml

Where to go next