Performance for NVIDIA NIM for Object Detection#

To benchmark the performance of NVIDIA NIM for Object Detection, you can use the genai-perf tool. genai-perf is pre-installed in the Triton Server SDK container. For the remainder of this section, we will use genai-perf==0.0.11 that comes packaged with the Triton Server SDK 25.02.

To run a performance benchmark, first create a dataset of image examples that genai-perf can use when making requests to the NIM service. These examples should be representative of the type of data that you expect to receive in a production setting. The dataset should be formatted as a JSONL file where each line contains a {"image": ...} object, as shown in the following example.

Example: (images.jsonl)

{"image": "assets/image_01.jpg"}
{"image": "assets/image_02.jpg"}
{"image": "assets/image_n.jpg"}

Use the following example to run the Triton Inference Server SDK docker container. Mount the directory where you created your JSONL file, which appears as datasets/ in the following example.

export RELEASE="25.02"

docker run -it --rm \
  --gpus=all \
  --network="host" \
  --mount type=bind,source=${PWD}/datasets,target=/datasets \
  nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

Run the following command to run a performance benchmark by using the genai-perf command line tool.

genai-perf profile \
    --model nvidia/nemoretriever-page-elements-v2 \
    --service-kind openai \
    --endpoint-type image_retrieval \
    --batch-size-image 1 \
    --input-file assets/images.jsonl \
    --concurrency 1 \
    --url http://localhost:8000

For the full set of command line options for genai-perf, refer to the GenAI-Perf documentation.