Profile with InstructCoder Dataset

AIPerf supports benchmarking using the InstructCoder dataset (likaixin/InstructCoder), which contains code editing and generation instructions. This dataset is useful for measuring model throughput and latency under code generation workloads.

This guide covers profiling OpenAI-compatible chat completions endpoints using the InstructCoder public dataset.

Start a vLLM Server

Launch a vLLM server with a chat model:

$ docker pull vllm/vllm-openai:latest
$ docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \
>   --model Qwen/Qwen3-0.6B

Verify the server is ready:

$ curl -s localhost:8000/v1/chat/completions \
>   -H "Content-Type: application/json" \
>   -d '{"model":"Qwen/Qwen3-0.6B","messages":[{"role":"user","content":"test"}],"max_tokens":1}'

Profile with InstructCoder Dataset

AIPerf loads the InstructCoder dataset from HuggingFace and uses each instruction as a single-turn prompt.

$ aiperf profile \
>     --model Qwen/Qwen3-0.6B \
>     --endpoint-type chat \
>     --streaming \
>     --url localhost:8000 \
>     --public-dataset instruct_coder \
>     --request-count 10 \
>     --concurrency 4

Sample Output (Successful Run):

                                        NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃              Metric ┃       avg ┃       min ┃       max ┃       p99 ┃       p90 ┃       p50 ┃       std ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Time to First Token │    467.66 │    276.26 │    727.41 │    727.40 │    727.36 │    320.26 │    202.34 │
│                (ms) │           │           │           │           │           │           │           │
│      Time to Second │     58.66 │     51.91 │     70.72 │     70.37 │     67.24 │     54.80 │      7.30 │
│          Token (ms) │           │           │           │           │           │           │           │
│       Time to First │ 35,962.41 │ 18,843.60 │ 82,092.30 │ 79,669.08 │ 57,860.07 │ 30,302.17 │ 18,641.22 │
│   Output Token (ms) │           │           │           │           │           │           │           │
│     Request Latency │ 49,546.08 │ 23,752.73 │ 99,184.28 │ 95,994.62 │ 67,287.65 │ 47,369.35 │ 20,292.88 │
│                (ms) │           │           │           │           │           │           │           │
│ Inter Token Latency │     60.88 │     43.09 │     67.81 │     67.80 │     67.68 │     64.09 │      7.40 │
│                (ms) │           │           │           │           │           │           │           │
│        Output Token │     16.73 │     14.75 │     23.21 │     22.85 │     19.64 │     15.60 │      2.49 │
│ Throughput Per User │           │           │           │           │           │           │           │
│   (tokens/sec/user) │           │           │           │           │           │           │           │
│     Output Sequence │    849.60 │    383.00 │  2,295.00 │  2,184.12 │  1,186.20 │    709.50 │    512.81 │
│     Length (tokens) │           │           │           │           │           │           │           │
│      Input Sequence │     15.20 │     11.00 │     21.00 │     20.82 │     19.20 │     14.50 │      3.31 │
│     Length (tokens) │           │           │           │           │           │           │           │
│        Output Token │     47.46 │       N/A │       N/A │       N/A │       N/A │       N/A │       N/A │
│          Throughput │           │           │           │           │           │           │           │
│        (tokens/sec) │           │           │           │           │           │           │           │
│  Request Throughput │      0.06 │       N/A │       N/A │       N/A │       N/A │       N/A │       N/A │
│      (requests/sec) │           │           │           │           │           │           │           │
│       Request Count │     10.00 │       N/A │       N/A │       N/A │       N/A │       N/A │       N/A │
│          (requests) │           │           │           │           │           │           │           │
└─────────────────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┘