For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Custom Dataset Guide
      • Inline Datasets
      • Custom Prompt Benchmarking
      • Profile with ShareGPT Dataset
      • Synthetic Dataset Generation
      • Profile with InstructCoder Dataset
      • Profile with AIMO Dataset
      • Profile with MMStar Dataset
      • Profile with MMVU Dataset
      • Profile with LLaVA-OneVision Dataset
      • Profile with VisionArena Dataset
      • Profile with Blazedit Dataset
      • Profile with SpecBench Dataset
      • Profile with SPEED-Bench Dataset
      • Profile with Bailian Traces
      • Profile with BurstGPT Traces
      • Replay SageMaker Data Capture Traces
      • Raw Payload Replay
      • Inputs JSON Replay
      • Multi-Turn Conversations
      • Sequence Length Distributions for Advanced Benchmarking
      • Prefix Data Synthesis Tutorial
      • Agentic Code Dataset Generator
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Prerequisites
  • SageMaker Data Capture Format
  • Download and Replay
  • Replay a Time Window
  • Enabling Data Capture on Your Endpoint
  • Known Limitations
  • Related Tutorials
TutorialsDatasets & Inputs

Replay SageMaker Data Capture Traces

||View as Markdown|
Previous

Profile with BurstGPT Traces

Next

Raw Payload Replay

AIPerf supports replaying production traffic captured by Amazon SageMaker Data Capture. This enables benchmarking inference servers using real request patterns and prompts recorded from SageMaker real-time endpoints.

The loader sends the exact captured prompts (literal replay via the messages array) with original request timing, enabling accurate A/B comparisons when migrating models, changing instance types, or upgrading serving frameworks.


Prerequisites

  • A SageMaker real-time endpoint with Data Capture enabled (captures both input and output)
  • Captured data synced from S3 to local disk
  • The captured endpoint must use the OpenAI-compatible chat completions API (messages array in the request payload)

SageMaker Data Capture Format

Data Capture writes JSONL files to S3, partitioned by hour:

s3://<bucket>/<prefix>/<endpoint-name>/<variant-name>/yyyy/mm/dd/hh/<uuid>.jsonl

Each JSONL line contains the full request and response payloads with timing metadata:

1{
2 "captureData": {
3 "endpointInput": {
4 "observedContentType": "application/json",
5 "mode": "INPUT",
6 "data": "{\"messages\":[{\"role\":\"user\",\"content\":\"What is AI?\"}],\"max_tokens\":50}",
7 "encoding": "JSON"
8 },
9 "endpointOutput": {
10 "observedContentType": "application/json",
11 "mode": "OUTPUT",
12 "data": "{\"usage\":{\"prompt_tokens\":12,\"completion_tokens\":30,\"total_tokens\":42},...}",
13 "encoding": "JSON"
14 }
15 },
16 "eventMetadata": {
17 "eventId": "e4378ff2-2b43-4031-a21f-401bb3c3e038",
18 "inferenceTime": "2026-04-29T00:03:18Z"
19 },
20 "eventVersion": "0"
21}

Download and Replay

Sync captured data from S3 and point AIPerf at the directory:

$# Sync all capture files (preserves hourly directory structure)
$aws s3 sync \
> s3://my-bucket/datacapture/my-endpoint/primary/ \
> ./captured_data/
$
$# Replay against a target server
$aiperf profile \
> --model my-model \
> --endpoint-type chat \
> --url localhost:8000 \
> --input-file ./captured_data/ \
> --custom-dataset-type sagemaker_data_capture \
> --fixed-schedule \
> --fixed-schedule-auto-offset

The loader recursively finds all .jsonl files in the directory, parses them, and sorts records by timestamp. No manual file concatenation is needed.

Single-file input also works:

$# Concatenate if preferred
$find captured_data/ -name "*.jsonl" -exec cat {} + > all_captures.jsonl
$
$aiperf profile \
> --model my-model \
> --endpoint-type chat \
> --url localhost:8000 \
> --input-file all_captures.jsonl \
> --custom-dataset-type sagemaker_data_capture \
> --fixed-schedule \
> --fixed-schedule-auto-offset

Replay a Time Window

Use timestamp offsets to replay a subset of the captured traffic:

$aiperf profile \
> --model my-model \
> --endpoint-type chat \
> --url localhost:8000 \
> --input-file ./captured_data/ \
> --custom-dataset-type sagemaker_data_capture \
> --fixed-schedule \
> --fixed-schedule-auto-offset \
> --fixed-schedule-end-offset 300000

This replays only the first 5 minutes (300,000 ms) of captured traffic.


Enabling Data Capture on Your Endpoint

When creating the endpoint configuration, include DataCaptureConfig with JsonContentTypes to store payloads as raw JSON (not base64):

1import boto3
2
3client = boto3.client("sagemaker")
4
5client.create_endpoint_config(
6 EndpointConfigName="my-endpoint-config-with-capture",
7 ProductionVariants=[{
8 "VariantName": "primary",
9 "ModelName": "my-model",
10 "InitialInstanceCount": 1,
11 "InstanceType": "ml.g5.xlarge",
12 "InitialVariantWeight": 1.0,
13 }],
14 DataCaptureConfig={
15 "EnableCapture": True,
16 "InitialSamplingPercentage": 100,
17 "DestinationS3Uri": "s3://my-bucket/datacapture",
18 "CaptureOptions": [
19 {"CaptureMode": "Input"},
20 {"CaptureMode": "Output"},
21 ],
22 "CaptureContentTypeHeader": {
23 "JsonContentTypes": ["application/json"],
24 },
25 },
26)

Setting JsonContentTypes ensures payloads are stored as raw JSON. Without it, SageMaker base64-encodes the data by default. The AIPerf loader handles both encodings.


Known Limitations

  • Second-level timestamp precision: inferenceTime has no fractional seconds. At high QPS, requests sharing the same second fire in rapid succession.
  • No streaming capture: InvokeEndpointWithResponseStream responses are not captured by SageMaker. Output token counts may be missing for streaming endpoints.
  • Single-turn only: Each captured record is an independent request. No multi-turn session linking.
  • OpenAI-compatible only: The captured payload must contain a messages array. Non-chat endpoints are not supported.

Related Tutorials

  • Trace Replay with Mooncake Traces - Mooncake FAST’25 trace replay
  • Bailian Traces - Bailian production trace replay
  • Fixed Schedule - Precise timestamp-based execution for any dataset