For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Command Line Options
      • Environment Variables
      • Metrics Reference
      • Benchmark Datasets
      • Pre-Flight Tokenizer Auto Detection
      • Conversation Context Mode
      • List-Metric Aggregation
      • Vendor Usage Field Reference
      • JSON Export Schema
      • HTTP API Endpoints
      • YAML Config Roadmap
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Dataset Options
Reference

Benchmark Datasets

||View as Markdown|

This document describes datasets that AIPerf can use to generate stimulus. Additional support is under development, so check back often.

Dataset Options

DatasetSupportData Source
Synthetic Text✅Synthetically generated text prompts pulled from Shakespeare
Synthetic Audio✅Synthetically generated audio samples
Synthetic Images✅Synthetically generated image samples
Custom Data✅—input-file your_file.jsonl —custom-dataset-type single_turn
Mooncake✅Mooncake trace file —input-file your_trace_file.jsonl —custom-dataset-type mooncake_trace
ShareGPT✅Conversations from —public-dataset sharegpt
Agentic Code✅Synthetic multi-turn coding-agent traces with shared prompt layers, repository context, and cache-aware turn growth. Generated via aiperf synthesize agentic-code and replayed as a Mooncake trace.
Previous

Metrics Reference

Next

Pre-Flight Tokenizer Auto Detection