API Reference for NVIDIA NIM for Image OCR#

This documentation contains the API reference for NVIDIA NIM for Image OCR.

OpenAPI Specification#

You can download the complete API spec. The API spec is subject to change while in Early Access (EA). EA participants are encouraged to provide feedback to NVIDIA prior to the General Access (GA) release.

API Examples#

Extract Text Data from Image#

The v1/infer endpoint accepts multiple images and returns a list of text detections with associated bounding boxes and confidence scores from each image.

The only supported type is image_url.

Each image must be base64 encoded, and should be represented in the following JSON format. The supported image formats are png and jpeg.

{
  "type": "image_url",
  "url": "data:image/<IMAGE_FORMAT>;base64,<BASE64_ENCODED_IMAGE>"
}

An inference request has an entry for input. The value for input is an array of dictionaries that contain fields type and url. For example, a JSON payload of three images looks like the following:

{
  "input": [
    {
      "type": "image_url",
      "url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
    },
    {
      "type": "image_url",
      "url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
    },
    {
      "type": "image_url",
      "url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
    }
  ]
}

cURL Example

API_ENDPOINT="http://localhost:8000"

# Create JSON payload with base64 encoded image
IMAGE_SOURCE="https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/image-ocr/example-1.png"
# IMAGE_SOURCE="path/to/your/image.jpg"  # Uncomment to use a local file instead

# Encode the image to base64 (handles both URLs and local files)
if [[ $IMAGE_SOURCE == http* ]]; then
  # Handle URL
  BASE64_IMAGE=$(curl -s ${IMAGE_SOURCE} | base64 -w 0)
else
  # Handle local file
  BASE64_IMAGE=$(base64 -w 0 ${IMAGE_SOURCE})
fi

# Construct the full JSON payload
JSON_PAYLOAD='{
  "input": [{
    "type": "image_url",
    "url": "data:image/jpeg;base64,'${BASE64_IMAGE}'"
  }]
}'

# Send POST request to inference endpoint
echo "${JSON_PAYLOAD}" | \
  curl -X POST "${API_ENDPOINT}/v1/infer" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d @-

The following image is used as the input in the previous example.

Response

PaddleOCR NIM output provides confidence scores and float [0, 1] bounding boxes associated with each text detection.

{
  "data": [
    {
      "index": 0,
      "text_detections": [
        {
          "text_prediction": {
            "text": "Trunch Parish Council",
            "confidence": 0.9192708134651184
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.33125000000000004,
                "y": 0.02227466504263094
              },
              {
                "x": 0.5531250000000001,
                "y": 0.01641291108404385
              },
              {
                "x": 0.5531250000000001,
                "y": 0.04103227771010962
              },
              {
                "x": 0.33125000000000004,
                "y": 0.04689403166869672
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "BANK RECONCILIATION AS AT31STOCTOBER 2019",
            "confidence": 0.9528748989105225
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.18958333333333335,
                "y": 0.0984774665042631
              },
              {
                "x": 0.7010416666666667,
                "y": 0.0855816077953715
              },
              {
                "x": 0.7010416666666667,
                "y": 0.11020097442143728
              },
              {
                "x": 0.18958333333333335,
                "y": 0.12309683313032889
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "Account:",
            "confidence": 0.940185546875
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.011458333333333334,
                "y": 0.17585261875761266
              },
              {
                "x": 0.10625000000000001,
                "y": 0.17350791717417785
              },
              {
                "x": 0.10625000000000001,
                "y": 0.20164433617539587
              },
              {
                "x": 0.011458333333333334,
                "y": 0.20398903775883073
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "14,389.43",
            "confidence": 0.9986979365348816
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.7645833333333334,
                "y": 0.2286084043848965
              },
              {
                "x": 0.8822916666666667,
                "y": 0.22626370280146166
              },
              {
                "x": 0.8822916666666667,
                "y": 0.2555724725943971
              },
              {
                "x": 0.7645833333333334,
                "y": 0.257917174177832
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "BANK STATEMENT BALANCE 30TH SEPTEMBER 2019",
            "confidence": 0.945068359375
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.015625000000000003,
                "y": 0.2508830694275274
              },
              {
                "x": 0.5416666666666667,
                "y": 0.235642509135201
              },
              {
                "x": 0.5416666666666667,
                "y": 0.26026187576126675
              },
              {
                "x": 0.015625000000000003,
                "y": 0.2755024360535932
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "83.60",
            "confidence": 0.9986327886581421
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.8083333333333335,
                "y": 0.26377892813641907
              },
              {
                "x": 0.8833333333333334,
                "y": 0.2614342265529842
              },
              {
                "x": 0.884375,
                "y": 0.29074299634591966
              },
              {
                "x": 0.8093750000000001,
                "y": 0.29308769792935446
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "PREVIOUS OUTSTANDING CHEQUES",
            "confidence": 0.9708949327468872
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.015625000000000003,
                "y": 0.28605359317904994
              },
              {
                "x": 0.3791666666666667,
                "y": 0.27667478684531066
              },
              {
                "x": 0.3791666666666667,
                "y": 0.3012941534713764
              },
              {
                "x": 0.015625000000000003,
                "y": 0.3106729598051157
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "14,305.83",
            "confidence": 0.9437391757965088
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.7645833333333334,
                "y": 0.300121802679659
              },
              {
                "x": 0.884375,
                "y": 0.29777710109622413
              },
              {
                "x": 0.884375,
                "y": 0.3259135200974422
              },
              {
                "x": 0.7645833333333334,
                "y": 0.328258221680877
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "CASH BOOK BALANCE31STOCTOBER 2019",
            "confidence": 0.9460227489471436
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.01666666666666667,
                "y": 0.3212241169305725
              },
              {
                "x": 0.44895833333333335,
                "y": 0.3095006090133983
              },
              {
                "x": 0.44895833333333335,
                "y": 0.33411997563946405
              },
              {
                "x": 0.01666666666666667,
                "y": 0.34584348355663824
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "ADD CHEQUES OUTSTANDING:",
            "confidence": 0.9514973759651184
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.01666666666666667,
                "y": 0.35756699147381243
              },
              {
                "x": 0.3291666666666667,
                "y": 0.34936053593179056
              },
              {
                "x": 0.3291666666666667,
                "y": 0.3728075517661389
              },
              {
                "x": 0.01666666666666667,
                "y": 0.3810140073081609
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "*",
            "confidence": 0.98681640625
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.89375,
                "y": 0.3692904993909867
              },
              {
                "x": 0.9062500000000001,
                "y": 0.3692904993909867
              },
              {
                "x": 0.9062500000000001,
                "y": 0.38804811205846534
              },
              {
                "x": 0.89375,
                "y": 0.38804811205846534
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "101719",
            "confidence": 0.999755859375
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.6229166666666667,
                "y": 0.41032277710109627
              },
              {
                "x": 0.7041666666666667,
                "y": 0.40797807551766146
              },
              {
                "x": 0.7052083333333334,
                "y": 0.43845919610231426
              },
              {
                "x": 0.6239583333333334,
                "y": 0.4408038976857491
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "83.60",
            "confidence": 0.9019531011581421
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.8125000000000001,
                "y": 0.406805724725944
              },
              {
                "x": 0.8958333333333335,
                "y": 0.4044610231425092
              },
              {
                "x": 0.8968750000000001,
                "y": 0.43376979293544465
              },
              {
                "x": 0.8135416666666667,
                "y": 0.4361144945188794
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "*",
            "confidence": 0.94482421875
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.8927083333333334,
                "y": 0.406805724725944
              },
              {
                "x": 0.9062500000000001,
                "y": 0.406805724725944
              },
              {
                "x": 0.9062500000000001,
                "y": 0.4255633373934227
              },
              {
                "x": 0.8927083333333334,
                "y": 0.4255633373934227
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "*",
            "confidence": 0.9365234375
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.8927083333333334,
                "y": 0.4396315468940317
              },
              {
                "x": 0.909375,
                "y": 0.4396315468940317
              },
              {
                "x": 0.909375,
                "y": 0.4642509135200975
              },
              {
                "x": 0.8927083333333334,
                "y": 0.4642509135200975
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "*",
            "confidence": 0.98095703125
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.8927083333333334,
                "y": 0.47714677222898905
              },
              {
                "x": 0.909375,
                "y": 0.47714677222898905
              },
              {
                "x": 0.909375,
                "y": 0.5005937880633374
              },
              {
                "x": 0.8927083333333334,
                "y": 0.5005937880633374
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "*",
            "confidence": 0.982421875
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.8947916666666667,
                "y": 0.5146619975639465
              },
              {
                "x": 0.9083333333333334,
                "y": 0.5146619975639465
              },
              {
                "x": 0.9083333333333334,
                "y": 0.5322472594397076
              },
              {
                "x": 0.8947916666666667,
                "y": 0.5322472594397076
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "83.60",
            "confidence": 0.9984375238418579
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.8177083333333335,
                "y": 0.6940316686967114
              },
              {
                "x": 0.89375,
                "y": 0.6916869671132765
              },
              {
                "x": 0.8947916666666667,
                "y": 0.7209957369062119
              },
              {
                "x": 0.8187500000000001,
                "y": 0.7233404384896469
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "OUTSTANDING CHEQUES",
            "confidence": 0.9662829041481018
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.025,
                "y": 0.7151339829476249
              },
              {
                "x": 0.28020833333333334,
                "y": 0.7092722289890377
              },
              {
                "x": 0.28020833333333334,
                "y": 0.7327192448233861
              },
              {
                "x": 0.025,
                "y": 0.7385809987819734
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "9,148.00",
            "confidence": 0.981689453125
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.7885416666666667,
                "y": 0.7678897685749088
              },
              {
                "x": 0.89375,
                "y": 0.7643727161997564
              },
              {
                "x": 0.8947916666666667,
                "y": 0.793681485992692
              },
              {
                "x": 0.7895833333333334,
                "y": 0.7971985383678443
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "RECEIPTS",
            "confidence": 0.995849609375
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.02291666666666667,
                "y": 0.7843026796589525
              },
              {
                "x": 0.12187500000000001,
                "y": 0.7807856272838004
              },
              {
                "x": 0.12291666666666669,
                "y": 0.8089220462850184
              },
              {
                "x": 0.023958333333333335,
                "y": 0.8124390986601706
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "4,309.94",
            "confidence": 0.96173095703125
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.7895833333333334,
                "y": 0.8054049939098661
              },
              {
                "x": 0.8927083333333334,
                "y": 0.8018879415347138
              },
              {
                "x": 0.89375,
                "y": 0.8276796589524971
              },
              {
                "x": 0.790625,
                "y": 0.8311967113276493
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "PAYMENTS",
            "confidence": 0.99652099609375
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.023958333333333335,
                "y": 0.8206455542021925
              },
              {
                "x": 0.14166666666666666,
                "y": 0.8171285018270403
              },
              {
                "x": 0.14166666666666666,
                "y": 0.8452649208282583
              },
              {
                "x": 0.023958333333333335,
                "y": 0.8487819732034105
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "19,227.49",
            "confidence": 0.9561631679534912
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.7781250000000001,
                "y": 0.8405755176613886
              },
              {
                "x": 0.8947916666666667,
                "y": 0.8394031668696712
              },
              {
                "x": 0.8947916666666667,
                "y": 0.8651948842874544
              },
              {
                "x": 0.7781250000000001,
                "y": 0.8663672350791718
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "BALANCE 31STOCTOBER2019",
            "confidence": 0.9579653739929199
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.028125,
                "y": 0.8569884287454325
              },
              {
                "x": 0.3354166666666667,
                "y": 0.8511266747868454
              },
              {
                "x": 0.3354166666666667,
                "y": 0.8757460414129111
              },
              {
                "x": 0.028125,
                "y": 0.8816077953714982
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "19,227.49*",
            "confidence": 0.9873046875
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.7791666666666667,
                "y": 0.9109165651644338
              },
              {
                "x": 0.9156250000000001,
                "y": 0.9073995127892814
              },
              {
                "x": 0.9156250000000001,
                "y": 0.9367082825822168
              },
              {
                "x": 0.7791666666666667,
                "y": 0.9402253349573692
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "BALANCE AS PER BANK STATEMENT",
            "confidence": 0.9529061317443848
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.030208333333333334,
                "y": 0.928501827040195
              },
              {
                "x": 0.39062500000000006,
                "y": 0.9214677222898905
              },
              {
                "x": 0.39062500000000006,
                "y": 0.9460870889159563
              },
              {
                "x": 0.030208333333333334,
                "y": 0.9531211936662607
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "0.00",
            "confidence": 0.9998779296875
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.8364583333333334,
                "y": 0.9449147381242389
              },
              {
                "x": 0.8989583333333334,
                "y": 0.9449147381242389
              },
              {
                "x": 0.8989583333333334,
                "y": 0.9753958587088917
              },
              {
                "x": 0.8364583333333334,
                "y": 0.9753958587088917
              }
            ]
          }
        },
        {
          "text_prediction": {
            "text": "DIFFERENCE",
            "confidence": 0.9974609613418579
          },
          "bounding_box": {
            "points": [
              {
                "x": 0.030208333333333334,
                "y": 0.9636723507917175
              },
              {
                "x": 0.15520833333333336,
                "y": 0.9613276492082827
              },
              {
                "x": 0.15520833333333336,
                "y": 0.9859470158343485
              },
              {
                "x": 0.030208333333333334,
                "y": 0.9882917174177832
              }
            ]
          }
        }
      ]
    }
  ]
}

The following image includes bounding boxes overlaid to visualize the detected text from the response.

Python Example

The following Python code demonstrates how to detect text and visualize the results.

Note

This example requires the requests and Pillow libraries. You can install them by using pip (or your preferred package manager). For example: pip install requests Pillow

import requests
import base64
import json
import io
from PIL import Image, ImageDraw

def encode_image(image_source):
    """
    Encode an image to base64 data URL.

    Args:
        image_source: A URL or a local file path

    Returns:
        A base64-encoded data URL
    """
    # Check if the source is a URL or local file
    if image_source.startswith(('http://', 'https://')):
        # Handle remote URL
        response = requests.get(image_source)
        response.raise_for_status()
        image_bytes = response.content
    else:
        # Handle local file
        with open(image_source, 'rb') as f:
            image_bytes = f.read()

    # Encode to base64
    base64_image = base64.b64encode(image_bytes).decode('utf-8')
    return f"data:image/jpeg;base64,{base64_image}"


def extract_text(image_data_url, api_endpoint):
    """
    Extract text from images using the PaddleOCR NIM API.

    Args:
        image_data_url: Data URL of the image to process
        api_endpoint: Base URL of the NIM service

    Returns:
        API response dict
    """
    # Prepare payload according to PaddleOCR API format
    payload = {
        "input": [
            {
                "type": "image_url",
                "url": image_data_url,
            }
        ]
    }

    # Make inference request
    url = f"{api_endpoint}/v1/infer"
    headers = {
        'accept': 'application/json',
        'Content-Type': 'application/json'
    }

    response = requests.post(url, headers=headers, json=payload)
    response.raise_for_status()
    return response.json()


def visualize_text_detections(image_data, result, output_path):
    """Draw bounding boxes on the image based on API results."""
    # Load image from data URL or URL
    if image_data.startswith('data:'):
        # Extract base64 data after the comma
        b64_data = image_data.split(',')[1]
        image_bytes = base64.b64decode(b64_data)
        image = Image.open(io.BytesIO(image_bytes))
    else:
        # Download from URL
        response = requests.get(image_data)
        image = Image.open(io.BytesIO(response.content))

    draw = ImageDraw.Draw(image)

    # Get image dimensions
    width, height = image.size

    # Draw detected elements
    for detection in result["data"]:
        for text_detection in detection["text_detections"]:
            box = text_detection["bounding_box"]["points"]
            # Convert normalized coordinates to pixels
            x_min = int(min([point["x"] for point in box]) * width)
            y_min = int(min([point["y"] for point in box]) * height)
            x_max = int(max([point["x"] for point in box]) * width)
            y_max = int(max([point["y"] for point in box]) * height)

            # Draw rectangle
            draw.rectangle([x_min, y_min, x_max, y_max], outline="blue", width=3)

            # Add label with confidence
            label = f"{text_detection['text_prediction']['text']}: {text_detection['text_prediction']['confidence']:.2f}"
            draw.text((x_min, y_min-15), label, fill="blue")

    # Save the annotated image
    image.save(output_path)
    print(f"Annotated image saved to {output_path}")


# Example usage
if __name__ == "__main__":
    # Process the same sample image used in the cURL example
    image_source = "https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/image-ocr/example-1.png"
    # Also works with local files
    # image_source = "path/to/your/image.jpg"
    api_endpoint = "http://localhost:8000"
    output_path = "detected_text_elements.jpg"

    try:
        # Encode the image
        image_data_url = encode_image(image_source)

        # Detect elements
        result = extract_text(image_data_url, api_endpoint)
        print(json.dumps(result, indent=2))

        # Visualize the results
        visualize_text_detections(image_data_url, result, output_path)

    except requests.exceptions.RequestException as e:
        print(f"API request failed: {e}")
    except Exception as e:
        print(f"Error: {e}")

Error Handling#

When you use the NVIDIA NIM for Image OCR NIM APIs, you might encounter various errors. Understanding these errors can help you troubleshoot issues in your applications.

Common Error Responses#

Status Code	Error Type	Description	Resolution
422 (Unprocessable Entity)	Invalid image URL format	The image URL doesn’t follow the required data URL format	Ensure all URLs follow the pattern: `data:<image-media-type>;base64,<base64-image-data>`
422 (Unprocessable Entity)	Invalid base64 content	The base64-encoded data in the URL is invalid	Verify that your base64 encoding process is correct and that the image data is not corrupted
422 (Unprocessable Entity)	Malformed request	The JSON payload structure is incorrect	Verify that your request format matches the API specification
429 (Too Many Requests)	Request queue full	The number of concurrent requests exceeds the configured queue size	Reduce the request rate, or increase the queue size by using `NIM_TRITON_MAX_QUEUE_SIZE`
500 (Internal Server Error)	Server error	An unexpected error occurred during processing	Check server logs for details and report the issue if persistent
503 (Service Unavailable)	Service not ready	The service is still initializing or loading models	Check health endpoints and wait for the service to complete initialization

Error Response Example#

{
  "error": "One or more images in the request contain an invalid image URL. Ensure that all URLs are data URLs with an image media type and base64-encoded image data. The pattern for this is 'data:<image-media-type>;base64,<base64-image-data>'."
}

Troubleshooting Tips#

Invalid Image Format: The API supports PNG and JPEG formats. Ensure your images are in one of these formats before encoding.
Image Size Limits: Very large images may cause processing issues. Consider resizing large images before sending them to the API.
Service Health: Use the health check endpoints (/v1/health/live and /v1/health/ready) to verify the service is operational before sending inference requests.
Base64 Encoding: When encoding images, ensure you’re using the correct MIME type in the data URL:
- For JPEG: data:image/jpeg;base64,...
- For PNG: data:image/png;base64,...
Request Timeout: If requests are timing out, the model may be processing a large batch or complex images. Consider adjusting timeout settings in your client application.
Rate Limiting: If you’re receiving 429 errors, implement backoff strategies in your client application to handle rate limiting gracefully.

Health Check#

cURL Request

Use the following command to query the health endpoints.

HOSTNAME="localhost"
SERVICE_PORT=8000
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/ready" \
-H 'Accept: application/json'

HOSTNAME="localhost"
SERVICE_PORT=8000
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/live" \
-H 'Accept: application/json'

Response

{
  "ready": true
}

{
  "live": true
}

OpenAPI Reference for Image OCR NIM#

The following is the OpenAPI reference for NVIDIA NIM for Image OCR.