AIPerf supports benchmarking NVIDIA NIM image retrieval endpoints that detect objects and layout elements (tables, charts, titles, etc.) in images and return bounding box coordinates.
This guide covers profiling NIM for Object Detection models such as nemoretriever-page-elements and nemoretriever-graphic-elements using the /v1/infer API.
Launch the NIM for Object Detection (page elements):
Wait for the server to start, then verify it is ready:
Sample Response:
Bounding box coordinates are normalized (0-1 range) relative to the top-left corner of the image.
Create a JSONL input file with image paths or URLs:
Each line should contain an image field with either:
Run AIPerf against the image retrieval endpoint:
Sample Output:
Since this endpoint does not produce tokens, no TTFT or ITL metrics are reported. The primary metrics are request latency, image throughput, and image latency.
You can send multiple images in a single request:
When sending multiple images per request, the image throughput metric reflects the total number of images processed per second across all requests.
Use --extra-inputs to pass additional parameters to the NIM endpoint:
Extra inputs are merged into the request payload alongside the image data.
The image_retrieval endpoint works with any NIM that accepts the /v1/infer API format. You can swap models by changing the Docker image:
All models share the same /v1/infer request/response format, so the same AIPerf --endpoint-type image_retrieval command works for each.