For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Custom Dataset Guide
      • Inline Datasets
      • Custom Prompt Benchmarking
      • Profile with ShareGPT Dataset
      • Synthetic Dataset Generation
      • Profile with InstructCoder Dataset
      • Profile with AIMO Dataset
      • Profile with MMStar Dataset
      • Profile with MMVU Dataset
      • Profile with LLaVA-OneVision Dataset
      • Profile with VisionArena Dataset
      • Profile with Blazedit Dataset
      • Profile with SpecBench Dataset
      • Profile with SPEED-Bench Dataset
      • Profile with Bailian Traces
      • Profile with BurstGPT Traces
      • Replay SageMaker Data Capture Traces
      • Raw Payload Replay
      • Inputs JSON Replay
      • Multi-Turn Conversations
      • Sequence Length Distributions for Advanced Benchmarking
      • Prefix Data Synthesis Tutorial
      • Agentic Code Dataset Generator
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Start a vLLM Server
  • Profile with InstructCoder Dataset
TutorialsDatasets & Inputs

Profile with InstructCoder Dataset

||View as Markdown|
Previous

Synthetic Dataset Generation

Next

Profile with AIMO Dataset

AIPerf supports benchmarking using the InstructCoder dataset (likaixin/InstructCoder), which contains code editing and generation instructions. This dataset is useful for measuring model throughput and latency under code generation workloads.

This guide covers profiling OpenAI-compatible chat completions endpoints using the InstructCoder public dataset.


Start a vLLM Server

Launch a vLLM server with a chat model:

$docker pull vllm/vllm-openai:latest
$docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \
> --model Qwen/Qwen3-0.6B

Verify the server is ready:

$curl -s localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{"model":"Qwen/Qwen3-0.6B","messages":[{"role":"user","content":"test"}],"max_tokens":1}'

Profile with InstructCoder Dataset

AIPerf loads the InstructCoder dataset from HuggingFace and uses each instruction as a single-turn prompt.

$aiperf profile \
> --model Qwen/Qwen3-0.6B \
> --endpoint-type chat \
> --streaming \
> --url localhost:8000 \
> --public-dataset instruct_coder \
> --request-count 10 \
> --concurrency 4

Sample Output (Successful Run):

NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p50 ┃ std ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Time to First Token │ 467.66 │ 276.26 │ 727.41 │ 727.40 │ 727.36 │ 320.26 │ 202.34 │
│ (ms) │ │ │ │ │ │ │ │
│ Time to Second │ 58.66 │ 51.91 │ 70.72 │ 70.37 │ 67.24 │ 54.80 │ 7.30 │
│ Token (ms) │ │ │ │ │ │ │ │
│ Time to First │ 35,962.41 │ 18,843.60 │ 82,092.30 │ 79,669.08 │ 57,860.07 │ 30,302.17 │ 18,641.22 │
│ Output Token (ms) │ │ │ │ │ │ │ │
│ Request Latency │ 49,546.08 │ 23,752.73 │ 99,184.28 │ 95,994.62 │ 67,287.65 │ 47,369.35 │ 20,292.88 │
│ (ms) │ │ │ │ │ │ │ │
│ Inter Token Latency │ 60.88 │ 43.09 │ 67.81 │ 67.80 │ 67.68 │ 64.09 │ 7.40 │
│ (ms) │ │ │ │ │ │ │ │
│ Output Token │ 16.73 │ 14.75 │ 23.21 │ 22.85 │ 19.64 │ 15.60 │ 2.49 │
│ Throughput Per User │ │ │ │ │ │ │ │
│ (tokens/sec/user) │ │ │ │ │ │ │ │
│ Output Sequence │ 849.60 │ 383.00 │ 2,295.00 │ 2,184.12 │ 1,186.20 │ 709.50 │ 512.81 │
│ Length (tokens) │ │ │ │ │ │ │ │
│ Input Sequence │ 15.20 │ 11.00 │ 21.00 │ 20.82 │ 19.20 │ 14.50 │ 3.31 │
│ Length (tokens) │ │ │ │ │ │ │ │
│ Output Token │ 47.46 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ Throughput │ │ │ │ │ │ │ │
│ (tokens/sec) │ │ │ │ │ │ │ │
│ Request Throughput │ 0.06 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ (requests/sec) │ │ │ │ │ │ │ │
│ Request Count │ 10.00 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ (requests) │ │ │ │ │ │ │ │
└─────────────────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┘