API Reference for NVIDIA NIM for Image OCR#
This documentation contains the API reference for NVIDIA NIM for Image OCR.
OpenAPI Specification#
You can download the complete API spec
. The API spec is subject to change while in Early Access (EA). EA participants are encouraged to provide feedback to NVIDIA prior to the General Access (GA) release.
API Examples#
Extract Text Data from Image#
The v1/infer
endpoint accepts multiple images and returns a list of text detections with associated bounding boxes and confidence scores from each image.
The only supported type
is image_url
.
Each image must be base64 encoded, and should be represented in the following JSON format.
The supported image formats are png
and jpeg
.
{
"type": "image_url",
"url": "data:image/<IMAGE_FORMAT>;base64,<BASE64_ENCODED_IMAGE>"
}
An inference request has an entry for input
. The value for input
is an array of dictionaries that contain fields type
and url
. For example, a JSON payload of three images looks like the following:
{
"input": [
{
"type": "image_url",
"url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
},
{
"type": "image_url",
"url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
},
{
"type": "image_url",
"url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
}
]
}
cURL Example
API_ENDPOINT="http://localhost:8000"
# Create JSON payload with base64 encoded image
IMAGE_SOURCE="https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/image-ocr/example-1.png"
# IMAGE_SOURCE="path/to/your/image.jpg" # Uncomment to use a local file instead
# Encode the image to base64 (handles both URLs and local files)
if [[ $IMAGE_SOURCE == http* ]]; then
# Handle URL
BASE64_IMAGE=$(curl -s ${IMAGE_SOURCE} | base64 -w 0)
else
# Handle local file
BASE64_IMAGE=$(base64 -w 0 ${IMAGE_SOURCE})
fi
# Construct the full JSON payload
JSON_PAYLOAD='{
"input": [{
"type": "image_url",
"url": "data:image/jpeg;base64,'${BASE64_IMAGE}'"
}]
}'
# Send POST request to inference endpoint
echo "${JSON_PAYLOAD}" | \
curl -X POST "${API_ENDPOINT}/v1/infer" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d @-
The following image is used as the input in the previous example.

Response
PaddleOCR NIM output provides confidence scores and float [0, 1] bounding boxes associated with each text detection.
{
"data": [
{
"index": 0,
"text_detections": [
{
"text_prediction": {
"text": "Trunch Parish Council",
"confidence": 0.9192708134651184
},
"bounding_box": {
"points": [
{
"x": 0.33125000000000004,
"y": 0.02227466504263094
},
{
"x": 0.5531250000000001,
"y": 0.01641291108404385
},
{
"x": 0.5531250000000001,
"y": 0.04103227771010962
},
{
"x": 0.33125000000000004,
"y": 0.04689403166869672
}
]
}
},
{
"text_prediction": {
"text": "BANK RECONCILIATION AS AT31STOCTOBER 2019",
"confidence": 0.9528748989105225
},
"bounding_box": {
"points": [
{
"x": 0.18958333333333335,
"y": 0.0984774665042631
},
{
"x": 0.7010416666666667,
"y": 0.0855816077953715
},
{
"x": 0.7010416666666667,
"y": 0.11020097442143728
},
{
"x": 0.18958333333333335,
"y": 0.12309683313032889
}
]
}
},
{
"text_prediction": {
"text": "Account:",
"confidence": 0.940185546875
},
"bounding_box": {
"points": [
{
"x": 0.011458333333333334,
"y": 0.17585261875761266
},
{
"x": 0.10625000000000001,
"y": 0.17350791717417785
},
{
"x": 0.10625000000000001,
"y": 0.20164433617539587
},
{
"x": 0.011458333333333334,
"y": 0.20398903775883073
}
]
}
},
{
"text_prediction": {
"text": "14,389.43",
"confidence": 0.9986979365348816
},
"bounding_box": {
"points": [
{
"x": 0.7645833333333334,
"y": 0.2286084043848965
},
{
"x": 0.8822916666666667,
"y": 0.22626370280146166
},
{
"x": 0.8822916666666667,
"y": 0.2555724725943971
},
{
"x": 0.7645833333333334,
"y": 0.257917174177832
}
]
}
},
{
"text_prediction": {
"text": "BANK STATEMENT BALANCE 30TH SEPTEMBER 2019",
"confidence": 0.945068359375
},
"bounding_box": {
"points": [
{
"x": 0.015625000000000003,
"y": 0.2508830694275274
},
{
"x": 0.5416666666666667,
"y": 0.235642509135201
},
{
"x": 0.5416666666666667,
"y": 0.26026187576126675
},
{
"x": 0.015625000000000003,
"y": 0.2755024360535932
}
]
}
},
{
"text_prediction": {
"text": "83.60",
"confidence": 0.9986327886581421
},
"bounding_box": {
"points": [
{
"x": 0.8083333333333335,
"y": 0.26377892813641907
},
{
"x": 0.8833333333333334,
"y": 0.2614342265529842
},
{
"x": 0.884375,
"y": 0.29074299634591966
},
{
"x": 0.8093750000000001,
"y": 0.29308769792935446
}
]
}
},
{
"text_prediction": {
"text": "PREVIOUS OUTSTANDING CHEQUES",
"confidence": 0.9708949327468872
},
"bounding_box": {
"points": [
{
"x": 0.015625000000000003,
"y": 0.28605359317904994
},
{
"x": 0.3791666666666667,
"y": 0.27667478684531066
},
{
"x": 0.3791666666666667,
"y": 0.3012941534713764
},
{
"x": 0.015625000000000003,
"y": 0.3106729598051157
}
]
}
},
{
"text_prediction": {
"text": "14,305.83",
"confidence": 0.9437391757965088
},
"bounding_box": {
"points": [
{
"x": 0.7645833333333334,
"y": 0.300121802679659
},
{
"x": 0.884375,
"y": 0.29777710109622413
},
{
"x": 0.884375,
"y": 0.3259135200974422
},
{
"x": 0.7645833333333334,
"y": 0.328258221680877
}
]
}
},
{
"text_prediction": {
"text": "CASH BOOK BALANCE31STOCTOBER 2019",
"confidence": 0.9460227489471436
},
"bounding_box": {
"points": [
{
"x": 0.01666666666666667,
"y": 0.3212241169305725
},
{
"x": 0.44895833333333335,
"y": 0.3095006090133983
},
{
"x": 0.44895833333333335,
"y": 0.33411997563946405
},
{
"x": 0.01666666666666667,
"y": 0.34584348355663824
}
]
}
},
{
"text_prediction": {
"text": "ADD CHEQUES OUTSTANDING:",
"confidence": 0.9514973759651184
},
"bounding_box": {
"points": [
{
"x": 0.01666666666666667,
"y": 0.35756699147381243
},
{
"x": 0.3291666666666667,
"y": 0.34936053593179056
},
{
"x": 0.3291666666666667,
"y": 0.3728075517661389
},
{
"x": 0.01666666666666667,
"y": 0.3810140073081609
}
]
}
},
{
"text_prediction": {
"text": "*",
"confidence": 0.98681640625
},
"bounding_box": {
"points": [
{
"x": 0.89375,
"y": 0.3692904993909867
},
{
"x": 0.9062500000000001,
"y": 0.3692904993909867
},
{
"x": 0.9062500000000001,
"y": 0.38804811205846534
},
{
"x": 0.89375,
"y": 0.38804811205846534
}
]
}
},
{
"text_prediction": {
"text": "101719",
"confidence": 0.999755859375
},
"bounding_box": {
"points": [
{
"x": 0.6229166666666667,
"y": 0.41032277710109627
},
{
"x": 0.7041666666666667,
"y": 0.40797807551766146
},
{
"x": 0.7052083333333334,
"y": 0.43845919610231426
},
{
"x": 0.6239583333333334,
"y": 0.4408038976857491
}
]
}
},
{
"text_prediction": {
"text": "83.60",
"confidence": 0.9019531011581421
},
"bounding_box": {
"points": [
{
"x": 0.8125000000000001,
"y": 0.406805724725944
},
{
"x": 0.8958333333333335,
"y": 0.4044610231425092
},
{
"x": 0.8968750000000001,
"y": 0.43376979293544465
},
{
"x": 0.8135416666666667,
"y": 0.4361144945188794
}
]
}
},
{
"text_prediction": {
"text": "*",
"confidence": 0.94482421875
},
"bounding_box": {
"points": [
{
"x": 0.8927083333333334,
"y": 0.406805724725944
},
{
"x": 0.9062500000000001,
"y": 0.406805724725944
},
{
"x": 0.9062500000000001,
"y": 0.4255633373934227
},
{
"x": 0.8927083333333334,
"y": 0.4255633373934227
}
]
}
},
{
"text_prediction": {
"text": "*",
"confidence": 0.9365234375
},
"bounding_box": {
"points": [
{
"x": 0.8927083333333334,
"y": 0.4396315468940317
},
{
"x": 0.909375,
"y": 0.4396315468940317
},
{
"x": 0.909375,
"y": 0.4642509135200975
},
{
"x": 0.8927083333333334,
"y": 0.4642509135200975
}
]
}
},
{
"text_prediction": {
"text": "*",
"confidence": 0.98095703125
},
"bounding_box": {
"points": [
{
"x": 0.8927083333333334,
"y": 0.47714677222898905
},
{
"x": 0.909375,
"y": 0.47714677222898905
},
{
"x": 0.909375,
"y": 0.5005937880633374
},
{
"x": 0.8927083333333334,
"y": 0.5005937880633374
}
]
}
},
{
"text_prediction": {
"text": "*",
"confidence": 0.982421875
},
"bounding_box": {
"points": [
{
"x": 0.8947916666666667,
"y": 0.5146619975639465
},
{
"x": 0.9083333333333334,
"y": 0.5146619975639465
},
{
"x": 0.9083333333333334,
"y": 0.5322472594397076
},
{
"x": 0.8947916666666667,
"y": 0.5322472594397076
}
]
}
},
{
"text_prediction": {
"text": "83.60",
"confidence": 0.9984375238418579
},
"bounding_box": {
"points": [
{
"x": 0.8177083333333335,
"y": 0.6940316686967114
},
{
"x": 0.89375,
"y": 0.6916869671132765
},
{
"x": 0.8947916666666667,
"y": 0.7209957369062119
},
{
"x": 0.8187500000000001,
"y": 0.7233404384896469
}
]
}
},
{
"text_prediction": {
"text": "OUTSTANDING CHEQUES",
"confidence": 0.9662829041481018
},
"bounding_box": {
"points": [
{
"x": 0.025,
"y": 0.7151339829476249
},
{
"x": 0.28020833333333334,
"y": 0.7092722289890377
},
{
"x": 0.28020833333333334,
"y": 0.7327192448233861
},
{
"x": 0.025,
"y": 0.7385809987819734
}
]
}
},
{
"text_prediction": {
"text": "9,148.00",
"confidence": 0.981689453125
},
"bounding_box": {
"points": [
{
"x": 0.7885416666666667,
"y": 0.7678897685749088
},
{
"x": 0.89375,
"y": 0.7643727161997564
},
{
"x": 0.8947916666666667,
"y": 0.793681485992692
},
{
"x": 0.7895833333333334,
"y": 0.7971985383678443
}
]
}
},
{
"text_prediction": {
"text": "RECEIPTS",
"confidence": 0.995849609375
},
"bounding_box": {
"points": [
{
"x": 0.02291666666666667,
"y": 0.7843026796589525
},
{
"x": 0.12187500000000001,
"y": 0.7807856272838004
},
{
"x": 0.12291666666666669,
"y": 0.8089220462850184
},
{
"x": 0.023958333333333335,
"y": 0.8124390986601706
}
]
}
},
{
"text_prediction": {
"text": "4,309.94",
"confidence": 0.96173095703125
},
"bounding_box": {
"points": [
{
"x": 0.7895833333333334,
"y": 0.8054049939098661
},
{
"x": 0.8927083333333334,
"y": 0.8018879415347138
},
{
"x": 0.89375,
"y": 0.8276796589524971
},
{
"x": 0.790625,
"y": 0.8311967113276493
}
]
}
},
{
"text_prediction": {
"text": "PAYMENTS",
"confidence": 0.99652099609375
},
"bounding_box": {
"points": [
{
"x": 0.023958333333333335,
"y": 0.8206455542021925
},
{
"x": 0.14166666666666666,
"y": 0.8171285018270403
},
{
"x": 0.14166666666666666,
"y": 0.8452649208282583
},
{
"x": 0.023958333333333335,
"y": 0.8487819732034105
}
]
}
},
{
"text_prediction": {
"text": "19,227.49",
"confidence": 0.9561631679534912
},
"bounding_box": {
"points": [
{
"x": 0.7781250000000001,
"y": 0.8405755176613886
},
{
"x": 0.8947916666666667,
"y": 0.8394031668696712
},
{
"x": 0.8947916666666667,
"y": 0.8651948842874544
},
{
"x": 0.7781250000000001,
"y": 0.8663672350791718
}
]
}
},
{
"text_prediction": {
"text": "BALANCE 31STOCTOBER2019",
"confidence": 0.9579653739929199
},
"bounding_box": {
"points": [
{
"x": 0.028125,
"y": 0.8569884287454325
},
{
"x": 0.3354166666666667,
"y": 0.8511266747868454
},
{
"x": 0.3354166666666667,
"y": 0.8757460414129111
},
{
"x": 0.028125,
"y": 0.8816077953714982
}
]
}
},
{
"text_prediction": {
"text": "19,227.49*",
"confidence": 0.9873046875
},
"bounding_box": {
"points": [
{
"x": 0.7791666666666667,
"y": 0.9109165651644338
},
{
"x": 0.9156250000000001,
"y": 0.9073995127892814
},
{
"x": 0.9156250000000001,
"y": 0.9367082825822168
},
{
"x": 0.7791666666666667,
"y": 0.9402253349573692
}
]
}
},
{
"text_prediction": {
"text": "BALANCE AS PER BANK STATEMENT",
"confidence": 0.9529061317443848
},
"bounding_box": {
"points": [
{
"x": 0.030208333333333334,
"y": 0.928501827040195
},
{
"x": 0.39062500000000006,
"y": 0.9214677222898905
},
{
"x": 0.39062500000000006,
"y": 0.9460870889159563
},
{
"x": 0.030208333333333334,
"y": 0.9531211936662607
}
]
}
},
{
"text_prediction": {
"text": "0.00",
"confidence": 0.9998779296875
},
"bounding_box": {
"points": [
{
"x": 0.8364583333333334,
"y": 0.9449147381242389
},
{
"x": 0.8989583333333334,
"y": 0.9449147381242389
},
{
"x": 0.8989583333333334,
"y": 0.9753958587088917
},
{
"x": 0.8364583333333334,
"y": 0.9753958587088917
}
]
}
},
{
"text_prediction": {
"text": "DIFFERENCE",
"confidence": 0.9974609613418579
},
"bounding_box": {
"points": [
{
"x": 0.030208333333333334,
"y": 0.9636723507917175
},
{
"x": 0.15520833333333336,
"y": 0.9613276492082827
},
{
"x": 0.15520833333333336,
"y": 0.9859470158343485
},
{
"x": 0.030208333333333334,
"y": 0.9882917174177832
}
]
}
}
]
}
]
}
The following image includes bounding boxes overlaid to visualize the detected text from the response.

Python Example
The following Python code demonstrates how to detect text and visualize the results.
Note
This example requires the requests
and Pillow
libraries. You can install them by using pip (or your preferred package manager). For example: pip install requests Pillow
import requests
import base64
import json
import io
from PIL import Image, ImageDraw
def encode_image(image_source):
"""
Encode an image to base64 data URL.
Args:
image_source: A URL or a local file path
Returns:
A base64-encoded data URL
"""
# Check if the source is a URL or local file
if image_source.startswith(('http://', 'https://')):
# Handle remote URL
response = requests.get(image_source)
response.raise_for_status()
image_bytes = response.content
else:
# Handle local file
with open(image_source, 'rb') as f:
image_bytes = f.read()
# Encode to base64
base64_image = base64.b64encode(image_bytes).decode('utf-8')
return f"data:image/jpeg;base64,{base64_image}"
def extract_text(image_data_url, api_endpoint):
"""
Extract text from images using the PaddleOCR NIM API.
Args:
image_data_url: Data URL of the image to process
api_endpoint: Base URL of the NIM service
Returns:
API response dict
"""
# Prepare payload according to PaddleOCR API format
payload = {
"input": [
{
"type": "image_url",
"url": image_data_url,
}
]
}
# Make inference request
url = f"{api_endpoint}/v1/infer"
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
return response.json()
def visualize_text_detections(image_data, result, output_path):
"""Draw bounding boxes on the image based on API results."""
# Load image from data URL or URL
if image_data.startswith('data:'):
# Extract base64 data after the comma
b64_data = image_data.split(',')[1]
image_bytes = base64.b64decode(b64_data)
image = Image.open(io.BytesIO(image_bytes))
else:
# Download from URL
response = requests.get(image_data)
image = Image.open(io.BytesIO(response.content))
draw = ImageDraw.Draw(image)
# Get image dimensions
width, height = image.size
# Draw detected elements
for detection in result["data"]:
for text_detection in detection["text_detections"]:
box = text_detection["bounding_box"]["points"]
# Convert normalized coordinates to pixels
x_min = int(min([point["x"] for point in box]) * width)
y_min = int(min([point["y"] for point in box]) * height)
x_max = int(max([point["x"] for point in box]) * width)
y_max = int(max([point["y"] for point in box]) * height)
# Draw rectangle
draw.rectangle([x_min, y_min, x_max, y_max], outline="blue", width=3)
# Add label with confidence
label = f"{text_detection['text_prediction']['text']}: {text_detection['text_prediction']['confidence']:.2f}"
draw.text((x_min, y_min-15), label, fill="blue")
# Save the annotated image
image.save(output_path)
print(f"Annotated image saved to {output_path}")
# Example usage
if __name__ == "__main__":
# Process the same sample image used in the cURL example
image_source = "https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/image-ocr/example-1.png"
# Also works with local files
# image_source = "path/to/your/image.jpg"
api_endpoint = "http://localhost:8000"
output_path = "detected_text_elements.jpg"
try:
# Encode the image
image_data_url = encode_image(image_source)
# Detect elements
result = extract_text(image_data_url, api_endpoint)
print(json.dumps(result, indent=2))
# Visualize the results
visualize_text_detections(image_data_url, result, output_path)
except requests.exceptions.RequestException as e:
print(f"API request failed: {e}")
except Exception as e:
print(f"Error: {e}")
Error Handling#
When you use the NVIDIA NIM for Image OCR NIM APIs, you might encounter various errors. Understanding these errors can help you troubleshoot issues in your applications.
Common Error Responses#
Status Code |
Error Type |
Description |
Resolution |
---|---|---|---|
422 (Unprocessable Entity) |
Invalid image URL format |
The image URL doesn’t follow the required data URL format |
Ensure all URLs follow the pattern: |
422 (Unprocessable Entity) |
Invalid base64 content |
The base64-encoded data in the URL is invalid |
Verify that your base64 encoding process is correct and that the image data is not corrupted |
422 (Unprocessable Entity) |
Malformed request |
The JSON payload structure is incorrect |
Verify that your request format matches the API specification |
429 (Too Many Requests) |
Request queue full |
The number of concurrent requests exceeds the configured queue size |
Reduce the request rate, or increase the queue size by using |
500 (Internal Server Error) |
Server error |
An unexpected error occurred during processing |
Check server logs for details and report the issue if persistent |
503 (Service Unavailable) |
Service not ready |
The service is still initializing or loading models |
Check health endpoints and wait for the service to complete initialization |
Error Response Example#
{
"error": "One or more images in the request contain an invalid image URL. Ensure that all URLs are data URLs with an image media type and base64-encoded image data. The pattern for this is 'data:<image-media-type>;base64,<base64-image-data>'."
}
Troubleshooting Tips#
Invalid Image Format: The API supports PNG and JPEG formats. Ensure your images are in one of these formats before encoding.
Image Size Limits: Very large images may cause processing issues. Consider resizing large images before sending them to the API.
Service Health: Use the health check endpoints (
/v1/health/live
and/v1/health/ready
) to verify the service is operational before sending inference requests.Base64 Encoding: When encoding images, ensure you’re using the correct MIME type in the data URL:
For JPEG:
data:image/jpeg;base64,...
For PNG:
data:image/png;base64,...
Request Timeout: If requests are timing out, the model may be processing a large batch or complex images. Consider adjusting timeout settings in your client application.
Rate Limiting: If you’re receiving 429 errors, implement backoff strategies in your client application to handle rate limiting gracefully.
Health Check#
cURL Request
Use the following command to query the health endpoints.
HOSTNAME="localhost"
SERVICE_PORT=8000
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/ready" \
-H 'Accept: application/json'
HOSTNAME="localhost"
SERVICE_PORT=8000
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/live" \
-H 'Accept: application/json'
Response
{
"ready": true
}
{
"live": true
}
OpenAPI Reference for Image OCR NIM#
The following is the OpenAPI reference for NVIDIA NIM for Image OCR.