Profile NIM Image Retrieval with AIPerf
AIPerf supports benchmarking NVIDIA NIM image retrieval endpoints that detect objects and layout elements (tables, charts, titles, etc.) in images and return bounding box coordinates.
This guide covers profiling NIM for Object Detection models such as nemoretriever-page-elements and nemoretriever-graphic-elements using the /v1/infer API.
Section 1. Deploy the NIM Server
Prerequisites
- NVIDIA GPU (Ampere, Hopper, or Lovelace architecture)
- Docker with NVIDIA runtime
- An NGC API key
Authenticate with NGC
Start the NIM Container
Launch the NIM for Object Detection (page elements):
Wait for the server to start, then verify it is ready:
Verify with a Test Request
Sample Response:
Bounding box coordinates are normalized (0-1 range) relative to the top-left corner of the image.
Section 2. Profile with Custom Images
Create a JSONL input file with image paths or URLs:
Each line should contain an image field with either:
- A URL to a remote image
- A local file path (automatically encoded to base64)
- A base64 data URL (passed through as-is)
Run AIPerf against the image retrieval endpoint:
Sample Output:
Since this endpoint does not produce tokens, no TTFT or ITL metrics are reported. The primary metrics are request latency, image throughput, and image latency.
Section 3. Profile with Multiple Images per Request
You can send multiple images in a single request:
When sending multiple images per request, the image throughput metric reflects the total number of images processed per second across all requests.
Section 4. Pass Extra Parameters
Use --extra-inputs to pass additional parameters to the NIM endpoint:
Extra inputs are merged into the request payload alongside the image data.
Section 5. Using Other NIM Object Detection Models
The image_retrieval endpoint works with any NIM that accepts the /v1/infer API format. You can swap models by changing the Docker image:
All models share the same /v1/infer request/response format, so the same AIPerf --endpoint-type image_retrieval command works for each.