For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Custom Dataset Guide
      • Inline Datasets
      • Custom Prompt Benchmarking
      • Profile with ShareGPT Dataset
      • Synthetic Dataset Generation
      • Profile with InstructCoder Dataset
      • Profile with AIMO Dataset
      • Profile with MMStar Dataset
      • Profile with MMVU Dataset
      • Profile with LLaVA-OneVision Dataset
      • Profile with VisionArena Dataset
      • Profile with Blazedit Dataset
      • Profile with SpecBench Dataset
      • Profile with SPEED-Bench Dataset
      • Profile with Bailian Traces
      • Profile with BurstGPT Traces
      • Replay SageMaker Data Capture Traces
      • Raw Payload Replay
      • Inputs JSON Replay
      • Multi-Turn Conversations
      • Sequence Length Distributions for Advanced Benchmarking
      • Prefix Data Synthesis Tutorial
      • Agentic Code Dataset Generator
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • When to inline vs. when to keep a file
  • Single-turn
  • Multi-turn
  • Random pool
  • Trace replay (mooncake_trace)
  • Mutual exclusion
  • Soft size limit
  • Tutorial template
TutorialsDatasets & Inputs

Inline Datasets

||View as Markdown|
Previous

Custom Dataset Guide

Next

Custom Prompt Benchmarking

Embed your benchmark dataset directly in the YAML config — no separate JSONL file required.

When to inline vs. when to keep a file

Inline (records:)File (path:)
Few records (< ~100)RecommendedOK
Many records (> ~500)Discouraged (warning emitted)Recommended
Single-file deployment unit (k8s ConfigMap)RecommendedRequires sidecar mount
Shareable repro for a colleagueRecommendedTwo files to ship
Records updated independently of the configDiscouragedRecommended

The schema is the same: each inline record matches one line of the equivalent JSONL file.

Single-turn

1schemaVersion: "2.0"
2benchmark:
3 model: meta-llama/Llama-3.1-8B-Instruct
4 endpoint:
5 url: http://localhost:8000/v1/chat/completions
6 dataset:
7 type: file
8 format: single_turn
9 records:
10 - {text: "What is machine learning?"}
11 - {text: "Explain GANs in two sentences.", output_length: 200}
12 - {text: "Define reinforcement learning."}
13 phases:
14 type: concurrency
15 concurrency: 2
16 requests: 100

Multi-turn

1benchmark:
2 dataset:
3 type: file
4 format: multi_turn
5 records:
6 - session_id: chat_1
7 turns:
8 - {text: "What is machine learning?"}
9 - {text: "Can you give me an example?"}
10 - session_id: chat_2
11 turns:
12 - {text: "Explain neural networks."}
13 - {text: "How do they differ from traditional algorithms?"}
14 - {text: "Which architecture for image classification?"}

Random pool

Single-pool inline:

1benchmark:
2 dataset:
3 type: file
4 format: random_pool
5 sampling: random
6 records:
7 - {text: "Common query", type: random_pool}
8 - {text: "Less common query", type: random_pool}
9 - {text: "Rare query", type: random_pool}

Multi-pool inline (mirrors a directory-of-JSONLs file layout):

1benchmark:
2 dataset:
3 type: file
4 format: random_pool
5 records:
6 queries:
7 - {text: "What is your refund policy?", type: random_pool}
8 - {text: "How do I reset my password?", type: random_pool}
9 passages:
10 - {text: "Refunds are processed within 5 business days.", type: random_pool}
11 - {text: "Click 'Forgot password' on the login page.", type: random_pool}

Trace replay (mooncake_trace)

1benchmark:
2 dataset:
3 type: file
4 format: mooncake_trace
5 synthesis:
6 speedup_ratio: 2.0 # replay 2x faster (1.0 = real-time, 0.5 = 2x slower)
7 records:
8 - {timestamp: 0, input_length: 512, output_length: 128, hash_ids: [1, 2, 3]}
9 - {timestamp: 100, input_length: 1024, output_length: 256, hash_ids: [4, 5]}
10 - {timestamp: 250, input_length: 256, output_length: 64, hash_ids: [1, 2]}

Mutual exclusion

path: and records: are mutually exclusive. Setting both, or neither, raises a Pydantic ValidationError at config load with this message:

FileDataset requires exactly one source: set either `path:` (load from disk) or `records:` (embed in YAML), not both. Got path=<...>, records=<...>.

Soft size limit

If you inline more than 500 records, AIPerf logs a warning recommending a file. There is no hard cap — you can keep going if you have a good reason — but reading a 5,000-line YAML in code review is rough. The threshold is configurable via AIPERF_DATASET_INLINE_RECORDS_WARN_THRESHOLD.

Tutorial template

A bundled template demonstrates all three formats:

$aiperf config init --template inline_dataset --output bench.yaml