Benchmark Datasets | NVIDIA AIPerf Documentation

This document describes datasets that AIPerf can use to generate stimulus. Additional support is under development, so check back often.

Dataset Options

Dataset	Support	Data Source
Synthetic Text	✅	Synthetically generated text prompts pulled from Shakespeare
Synthetic Audio	✅	Synthetically generated audio samples
Synthetic Images	✅	Synthetically generated image samples
Custom Data	✅	—input-file your_file.jsonl —custom-dataset-type single_turn
Mooncake	✅	Mooncake trace file `—input-file your_trace_file.jsonl —custom-dataset-type mooncake_trace`
Baseten Trace	✅	Baseten completion trace parquet `—input-file your_trace.parquet —custom-dataset-type baseten_trace`
ShareGPT	✅	Conversations from `—public-dataset sharegpt`
Exgentic	✅	Recorded agent sessions from `—public-dataset exgentic`
Exgentic v2	✅	Expanded recorded agent sessions from `—public-dataset exgentic_v2`
Agentic Code	✅	Synthetic multi-turn coding-agent traces with shared prompt layers, repository context, and cache-aware turn growth. Generated via `aiperf synthesize agentic-code` and replayed as a Mooncake trace.

Exgentic Agent Trace Replay

The Exgentic loaders stream recorded agent sessions directly from Hugging Face. exgentic is pinned to v1 revision 70036b93a04e61b0ea2706a68b962f4f26774587; exgentic_v2 is independently pinned to v2 revision 4b8ad4ab198438e5a170f9171c19c6a2cf7c1814. Each replays successful, positive-token chat call snapshots. Recorded messages, system instructions, tool definitions, output-token limits, request controls, and call start times are preserved. Tools are not executed, and live responses are not added to later requests. Every request carries the source session as x-dynamo-session-id for Dynamo agentic tracing while AIPerf retains its own request correlation ID.

Provide a finite materialization bound through --num-conversations, --num-dataset-entries, or --request-count. --benchmark-duration limits request issuance, not dataset setup.

Select a source harness and source model independently from the target model served by the endpoint:

$ aiperf profile \
>   --model TARGET_MODEL \
>   --url http://localhost:8000/v1/chat/completions \
>   --endpoint-type chat \
>   --public-dataset exgentic_v2 \
>   --dataset-filter benchmark=swebench \
>   --dataset-filter harness=tool_calling \
>   --dataset-filter source_model=Kimi-K2.5 \
>   --num-conversations 1 \
>   --fixed-schedule

source_model selects the model that produced the trace; --model selects the target model receiving the replay. benchmark selects an Exgentic v2 workload. Invalid filters report the available harness/model combinations. The v1 dataset contains 22 combinations across five harnesses and six canonical source models.

Fixed-schedule mode emits each recorded call as an independently scheduled one-turn request using its start offset from the source session. Calls that overlapped in the trace therefore overlap during replay. Selected source sessions start together at offset zero. Without --fixed-schedule, each source session remains a closed-loop multi-turn conversation: AIPerf waits for one live response before applying the recorded residual delay and sending the next request.

Size the target context window for the selected trace plus the target model’s chat-template overhead. Recorded contexts reach about 178K tokens, and some Gemini tool-calling sessions exceed 64K before target formatting. tool_calling_with_shortlisting alternates a selector request containing the full tool catalog with executor requests containing a changing subset of schemas. Low executor prefix-cache reuse is expected for that harness and is not a loader error.