Performance for NVIDIA NIM for Image OCR#

To benchmark the performance of NVIDIA NIM for Image OCR under simulated production load, you can use the genai-perf tool. genai-perf is pre-installed in the Triton Server SDK container.

To run a performance benchmark, first create a dataset of image examples that genai-perf can use when making requests to the NIM service. These examples should be representative of the type of data that you expect to receive in a production setting. The dataset should be formatted as a JSONL file where each line contains a {"image": ...} object, as shown in the following example.

Example: (images.jsonl)

{"image": "assets/image_01.jpg"}
{"image": "assets/image_02.jpg"}
{"image": "assets/image_n.jpg"}

Use the following example to run the Triton Inference Server SDK docker container, mounting the directory, shown as datasets/ in the following example, where you created your JSONL file.

export RELEASE="yy.mm" # e.g. export RELEASE="25.01"

docker run -it --rm \
  --gpus=all \
  --network="host" \
  --mount type=bind,source=${PWD}/datasets,target=/datasets \
  nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

Run the following command to run a performance benchmark by using the genai-perf command line tool.

genai-perf profile \
    --model baidu/paddleocr \
    --service-kind openai \
    --endpoint-type image_retrieval \
    --batch-size-image 1 \
    --input-file assets/images.jsonl \
    --concurrency 1 \
    --url http://localhost:8000

For the full set of command line options for genai-perf, refer to the GenAI-Perf documentation.