Performance for NVIDIA NIM for Object Detection#
To benchmark the performance of NVIDIA NIM for Object Detection,
you can use the genai-perf tool.
genai-perf is pre-installed in the Triton Server SDK container.
For the remainder of this section, we will use genai-perf==0.0.11 that comes packaged with the Triton Server SDK 25.02.
To run a performance benchmark, first create a dataset of image examples that genai-perf can use when making requests to the NIM service. These examples should be representative of the type of data that you expect to receive in a production setting. The dataset should be formatted as a JSONL file where each line contains a {"image": ...} object, as shown in the following example.
Example: (images.jsonl)
{"image": "assets/image_01.jpg"}
{"image": "assets/image_02.jpg"}
{"image": "assets/image_n.jpg"}
Use the following example to run the Triton Inference Server SDK docker container. Mount the directory where you created your JSONL file, which appears as datasets/ in the following example.
export RELEASE="25.02"
docker run -it --rm \
--gpus=all \
--network="host" \
--mount type=bind,source=${PWD}/datasets,target=/datasets \
nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
Run the following command to run a performance benchmark by using the genai-perf command line tool.
genai-perf profile \
--model nvidia/nemoretriever-page-elements-v2 \
--service-kind openai \
--endpoint-type image_retrieval \
--batch-size-image 1 \
--input-file assets/images.jsonl \
--concurrency 1 \
--url http://localhost:8000
For the full set of command line options for genai-perf, refer to the GenAI-Perf documentation.