Performance for NVIDIA NIM for Image OCR#
To benchmark the performance of NVIDIA NIM for Image OCR under simulated production load,
you can use the genai-perf tool.
genai-perf
is pre-installed in the Triton Server SDK container.
To run a performance benchmark, first create a dataset of image examples that genai-perf
can use when making requests to the NIM service. These examples should be representative of the type of data that you expect to receive in a production setting. The dataset should be formatted as a JSONL file where each line contains a {"image": ...}
object, as shown in the following example.
Example: (images.jsonl
)
{"image": "assets/image_01.jpg"}
{"image": "assets/image_02.jpg"}
{"image": "assets/image_n.jpg"}
Use the following example to run the Triton Inference Server SDK docker container, mounting the directory, shown as datasets/
in the following example, where you created your JSONL file.
export RELEASE="yy.mm" # e.g. export RELEASE="25.01"
docker run -it --rm \
--gpus=all \
--network="host" \
--mount type=bind,source=${PWD}/datasets,target=/datasets \
nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
Run the following command to run a performance benchmark by using the genai-perf
command line tool.
genai-perf profile \
--model baidu/paddleocr \
--service-kind openai \
--endpoint-type image_retrieval \
--batch-size-image 1 \
--input-file assets/images.jsonl \
--concurrency 1 \
--url http://localhost:8000
For the full set of command line options for genai-perf
, refer to the GenAI-Perf documentation.