For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Custom Dataset Guide
      • Inline Datasets
      • Custom Prompt Benchmarking
      • Profile with ShareGPT Dataset
      • Synthetic Dataset Generation
      • Profile with InstructCoder Dataset
      • Profile with AIMO Dataset
      • Profile with MMStar Dataset
      • Profile with MMVU Dataset
      • Profile with LLaVA-OneVision Dataset
      • Profile with VisionArena Dataset
      • Profile with Blazedit Dataset
      • Profile with SpecBench Dataset
      • Profile with SPEED-Bench Dataset
      • Profile with Bailian Traces
      • Profile with BurstGPT Traces
      • Replay SageMaker Data Capture Traces
      • Raw Payload Replay
      • Inputs JSON Replay
      • Multi-Turn Conversations
      • Sequence Length Distributions for Advanced Benchmarking
      • Prefix Data Synthesis Tutorial
      • Agentic Code Dataset Generator
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Overview
  • Input Modes
  • Single File Mode
  • Directory Mode
  • File Format
  • Single-Turn Example
  • Multi-Turn Example (Directory Mode)
  • Auto-Detection
  • Basic Usage
  • Single File
  • Directory for Multi-Turn Conversations
  • Endpoint Type
  • Configuration Reference
  • Context Mode
  • Tips
TutorialsDatasets & Inputs

Raw Payload Replay

||View as Markdown|
Previous

Replay SageMaker Data Capture Traces

Next

Inputs JSON Replay

Benchmark LLM servers by replaying pre-built API request bodies verbatim.

Overview

The raw_payload dataset type replays complete API request bodies exactly as written in your JSONL files. Unlike other dataset types where AIPerf constructs the request payload from structured fields, raw payload replay sends each JSON object directly to the server with no transformation.

This is useful when you:

  • Have captured production traffic and want to replay it exactly
  • Need full control over every field in the request body (model, temperature, tools, system prompts, etc.)
  • Are testing non-standard APIs where AIPerf’s built-in endpoint formatters do not apply
  • Want to benchmark with pre-built payloads exported from another tool or logging pipeline

Input Modes

The loader supports two input modes, selected automatically based on whether --input-file points to a file or a directory.

Single File Mode

Each line in the JSONL file is a complete API request payload. Each line becomes a separate single-turn conversation.

payloads.jsonl
line 1 -> conversation 1 (single turn)
line 2 -> conversation 2 (single turn)
line 3 -> conversation 3 (single turn)

Directory Mode

Each .jsonl file in the directory is one multi-turn conversation. Lines within a file are ordered turns. Files are processed in sorted alphabetical order.

payloads/
session_001.jsonl -> conversation 1 (lines = turns)
session_002.jsonl -> conversation 2 (lines = turns)
session_003.jsonl -> conversation 3 (lines = turns)

File Format

Each line must be a valid JSON object containing at minimum a messages key with a list value. Any additional fields (model, temperature, max_tokens, tools, stream, etc.) are preserved and sent verbatim.

Single-Turn Example

1{"messages": [{"role": "user", "content": "What is machine learning?"}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 100}
2{"messages": [{"role": "user", "content": "Explain neural networks."}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 200}
3{"messages": [{"role": "user", "content": "How does backpropagation work?"}], "model": "Qwen/Qwen3-0.6B", "temperature": 0.7}

Multi-Turn Example (Directory Mode)

Each file represents a conversation. Each line carries the full message history for that point in the conversation:

session_001.jsonl:

1{"messages": [{"role": "user", "content": "Hello"}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 100}
2{"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi"}, {"role": "user", "content": "How are you?"}], "model": "Qwen/Qwen3-0.6B", "temperature": 0.7}

Auto-Detection

When --custom-dataset-type is not specified, AIPerf auto-detects raw payload format by checking the first non-empty line for a messages key with a list value. In directory mode, it checks the first .jsonl file found.

Auto-detection rejects records that contain a conversation_id key or a data key with a list value (to avoid conflicts with other dataset formats). If your payloads include these keys, use --custom-dataset-type raw_payload explicitly.


Basic Usage

Single File

$cat > payloads.jsonl << 'EOF'
${"messages": [{"role": "user", "content": "What is machine learning?"}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 100}
${"messages": [{"role": "user", "content": "Explain neural networks."}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 200}
${"messages": [{"role": "user", "content": "How does backpropagation work?"}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 150}
$EOF
$
$aiperf profile \
> --input-file payloads.jsonl \
> --model Qwen/Qwen3-0.6B \
> --custom-dataset-type raw_payload \
> --streaming \
> --url localhost:8000 \
> --concurrency 2

Since auto-detection recognizes files with messages arrays, you can omit --custom-dataset-type:

$aiperf profile \
> --input-file payloads.jsonl \
> --model Qwen/Qwen3-0.6B \
> --streaming \
> --url localhost:8000 \
> --concurrency 2

Directory for Multi-Turn Conversations

$mkdir -p conversations/
$
$cat > conversations/session_001.jsonl << 'EOF'
${"messages": [{"role": "user", "content": "What is Python?"}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 200}
${"messages": [{"role": "user", "content": "What is Python?"}, {"role": "assistant", "content": "Python is a programming language."}, {"role": "user", "content": "Show me a hello world example."}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 200}
$EOF
$
$cat > conversations/session_002.jsonl << 'EOF'
${"messages": [{"role": "user", "content": "Explain REST APIs."}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 300}
${"messages": [{"role": "user", "content": "Explain REST APIs."}, {"role": "assistant", "content": "REST is an architectural style..."}, {"role": "user", "content": "What about GraphQL?"}], "model": "Qwen/Qwen3-0.6B", "max_tokens": 300}
$EOF
$
$aiperf profile \
> --input-file conversations/ \
> --model Qwen/Qwen3-0.6B \
> --custom-dataset-type raw_payload \
> --streaming \
> --url localhost:8000 \
> --concurrency 2

Endpoint Type

Raw payloads work with any endpoint type. The endpoint controls only response parsing and URL path — payload formatting is always bypassed when raw payloads are present.

Using a regular endpoint type (e.g., the default chat) is recommended because it provides structured response parsing (token counts, finish reasons, choices) instead of generic auto-detection.

For non-standard APIs where no built-in endpoint matches, use --endpoint-type raw. The raw endpoint does not append a URL path (you must include the full path in --url) and parses responses using auto-detection. For non-standard response formats, you can specify a JMESPath expression via --extra-inputs response_field:<expression> to extract the relevant field.


Configuration Reference

OptionRequiredDefaultDescription
--input-fileYes—Path to a JSONL file or directory of JSONL files
--modelYes—Model name (e.g., Qwen/Qwen3-0.6B)
--endpoint-typeNochatAny endpoint type works; raw available for non-standard APIs
--custom-dataset-typeNoAuto-detectedSet to raw_payload to force this loader
--streamingNofalseEnable streaming responses
--urlNolocalhost:8000Server base URL (endpoint type appends the API path)
--concurrencyNo—Number of concurrent users
--dataset-sampling-strategyNosequentialsequential, random, or shuffle

Context Mode

Raw payload conversations use message_array_with_responses context mode by default. Each turn is sent exactly as written — AIPerf does not accumulate prior turns or inject server responses into subsequent requests.

This is the correct behavior because raw payloads already contain the complete message history for each turn. In directory mode, each line in a session file should include all prior context needed for that point in the conversation (see the multi-turn examples above).


Tips

  • Include the full API path in --url only when using --endpoint-type raw. Other endpoint types append the path automatically.
  • Every line must have a messages key with a list value.
  • Empty lines are skipped in both modes.
  • Directory files are sorted alphabetically. Name files with zero-padded numbers (e.g., session_001.jsonl) for predictable ordering.
  • Non-.jsonl files are ignored in directory mode.
  • Payloads are sent verbatim — AIPerf does not modify, validate, or reformat them.
  • Default sampling is sequential. Use --dataset-sampling-strategy shuffle or random for varied ordering.