For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Profile OpenAI-Compatible Text APIs Using AIPerf
      • Profile the OpenAI Responses API with AIPerf
      • Profile Hugging Face TGI Models with AIPerf
      • Profile Vision Language Models with AIPerf
      • Profile Audio Language Models with AIPerf
      • Profile ASR Models with Public Datasets
      • Profile Embedding Models with AIPerf
      • Profile Ranking Models with AIPerf
      • Profile NIM Image Retrieval with AIPerf
      • SGLang Image Generation
      • SGLang Image Edit
      • SGLang Video Generation
      • Synthetic Video Generation
      • Template Endpoint
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • When to Use
  • Basic Example
  • Configuration
  • Required
  • Optional
  • Template Variables
  • Content Variables
  • Named Content Variables
  • Request Metadata
  • Response Parsing
  • Text Responses
  • Embedding Responses
  • Ranking Responses
  • Custom Extraction
  • Examples
  • Custom Embedding API
  • Named Template
  • Template from File
  • Multi-Modal Request
  • Tips
  • Troubleshooting
TutorialsModel & Endpoint Guides

Template Endpoint

||View as Markdown|
Previous

Synthetic Video Generation

Next

Custom Dataset Guide

The template endpoint provides a flexible way to benchmark custom APIs that don’t match standard OpenAI formats. You define request payloads using Jinja2 templates and optionally specify how to extract responses using JMESPath queries.

When to Use

Use the template endpoint when:

  • Your API has a custom request/response format
  • Standard endpoints (chat, completions, embeddings, rankings) don’t fit your use case

Basic Example

Benchmark an API that accepts text in a custom format:

$aiperf profile \
> --model your-model \
> --url http://localhost:8000/custom-endpoint \
> --endpoint-type template \
> --extra-inputs payload_template:'
> {
> "text": {{ text|tojson }}
> }' \
> --synthetic-input-tokens-mean 100 \
> --output-tokens-mean 50 \
> --concurrency 4 \
> --request-count 20

Sample Output (Successful Run):

INFO Starting AIPerf System
INFO Using template endpoint with custom payload
INFO AIPerf System is PROFILING
Profiling: 20/20 |████████████████████████| 100% [00:28<00:00]
INFO Benchmark completed successfully
INFO Results saved to: artifacts/your-model-template-concurrency4/
NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p50 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ Request Latency (ms) │ 456.78 │ 389.23 │ 567.45 │ 554.32 │ 452.34 │
│ Time to First Token (ms) │ 89.34 │ 67.45 │ 112.34 │ 109.23 │ 87.56 │
│ Inter Token Latency (ms) │ 11.23 │ 9.45 │ 14.56 │ 14.12 │ 11.01 │
│ Output Token Count (tokens) │ 50.00 │ 48.00 │ 52.00 │ 51.89 │ 50.00 │
│ Request Throughput (req/s) │ 8.78 │ - │ - │ - │ - │
└─────────────────────────────┴────────┴────────┴────────┴────────┴────────┘
JSON Export: artifacts/your-model-template-concurrency4/profile_export_aiperf.json

Configuration

Configure the template endpoint using --extra-inputs:

Required

  • payload_template: Jinja2 template defining the request payload format
    • Named template: nv-embedqa
    • File path: /path/to/template.json
    • Inline string: '{"text": {{ text|tojson }}}'

Optional

  • response_field: JMESPath query to extract data from responses
    • Auto-detection is used if not provided
    • Example: data[0].embedding

Any other --extra-inputs fields are merged into every request payload:

$--extra-inputs temperature:0.7 top_p:0.9

Template Variables

Content Variables

  • text: First text content (or None)
  • texts: List of all text contents
  • image, audio, video: First media content (or None)
  • images, audios, videos: Lists of all media contents

Named Content Variables

  • query: First query text
  • queries: All query texts
  • passage: First passage text
  • passages: All passage texts
  • texts_by_name: Dict mapping content names to text lists
  • images_by_name, audios_by_name, videos_by_name: Dicts for media

Request Metadata

  • model: Model name
  • max_tokens: Output token limit
  • stream: Whether streaming is enabled
  • role: Message role
  • turn: Current turn object
  • turns: List of all turns
  • request_info: Full request context

Response Parsing

Auto-detection tries to extract in this order: embeddings, rankings, then text.

Text Responses

  • Fields: text, content, response, output, result
  • OpenAI: choices[0].text, choices[0].message.content

Embedding Responses

  • OpenAI: data[].embedding
  • Simple: embeddings, embedding

Ranking Responses

  • Lists: rankings, results

Custom Extraction

Specify a JMESPath query to extract specific fields:

$--extra-inputs response_field:'data[0].vector'

Examples

Custom Embedding API

$aiperf profile \
> --model embedding-model \
> --url http://localhost:8000/embed \
> --endpoint-type template \
> --extra-inputs payload_template:'
> {
> "input": {{ texts|tojson }},
> "model": {{ model|tojson }}
> }' \
> --extra-inputs response_field:'embeddings' \
> --synthetic-input-tokens-mean 50 \
> --concurrency 8 \
> --request-count 100

Named Template

Using the built-in nv-embedqa template:

$aiperf profile \
> --model nv-embed-v2 \
> --url http://localhost:8000/embeddings \
> --endpoint-type template \
> --extra-inputs payload_template:nv-embedqa \
> --synthetic-input-tokens-mean 100 \
> --concurrency 4 \
> --request-count 50

Note: The nv-embedqa template expands to {"text": {{ texts|tojson }}}.

Template from File

Create chat_template.json:

1{
2 "model": {{ model|tojson }},
3 "prompt": {{ text|tojson }},
4 "max_new_tokens": {{ max_tokens|tojson }},
5 "stream": {{ stream|lower }}
6}

Use it:

$aiperf profile \
> --model custom-llm \
> --url http://localhost:8000/generate \
> --endpoint-type template \
> --extra-inputs payload_template:./chat_template.json \
> --extra-inputs response_field:'generated_text' \
> --streaming \
> --synthetic-input-tokens-mean 200 \
> --output-tokens-mean 100 \
> --concurrency 10

Multi-Modal Request

$aiperf profile \
> --model vision-model \
> --url http://localhost:8000/analyze \
> --endpoint-type template \
> --extra-inputs payload_template:'
> {
> "text": {{ text|tojson }},
> "image": {{ image|tojson }}
> }' \
> --input-file ./multimodal_dataset.jsonl \
> --concurrency 2

Tips

  • Always use |tojson for string/list values to properly escape JSON
  • Use -v or -vv to see debug logs with formatted payloads
  • Check artifacts/<run-name>/inputs.json to see all formatted request payloads
  • Let auto-detection work first before specifying response_field

Troubleshooting

Template didn’t render valid JSON

  • Use |tojson filter for string or nullable values

Response not parsed correctly

  • Use -vv to see raw responses in logs
  • Specify response_field with a JMESPath query

Variables not available

  • Verify your input dataset includes the required fields
  • Use request_info and turn objects for nested data