API Reference for NVIDIA NIM for Object Detection#
This documentation contains the API reference for NVIDIA NIM for Object Detection.
OpenAPI Specification#
You can download the complete page-elements API spec
, graphic-elements API spec
, and table-structure API spec
.
API Examples#
The Object Detection NIM supports multiple models for page element, table structure, and graphic element detection. This section provides examples that use the Page Elements NIM, but the API is the same for all models.
See the Table Structure & Graphic Element Example sections for sample images and output to expect if these NIMs were used instead.
Compute Bounding Boxes#
The v1/infer
endpoint accepts multiple images and returns a list of bounding boxes for each image. The bounding box coordinates are defined with respect to the top-left corner of the image. This means that:
x: Represents the horizontal distance from the left edge of the image to the left edge of the bounding box.
y: Represents the vertical distance from the top edge of the image to the top edge of the bounding box.
The only supported type
is image_url
.
Each image must be base64 encoded, and should be represented in the following JSON format.
The supported image formats are png
and jpeg
.
{
"type": "image_url",
"url": "data:image/<IMAGE_FORMAT>;base64,<BASE64_ENCODED_IMAGE>"
}
An inference request has an entry for input
. The value for input
is an array of dictionaries that contain fields type
and url
. For example, a JSON payload of three images looks like the following:
{
"input": [
{
"type": "image_url",
"url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
},
{
"type": "image_url",
"url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
},
{
"type": "image_url",
"url": "data:img/png;base64,<BASE64_ENCODED_IMAGE>"
}
]
}
cURL Example
API_ENDPOINT="http://localhost:8000"
# Create JSON payload with base64 encoded image
# Set your image source - can be a URL or a local file path
IMAGE_SOURCE="https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/object-detection/page-elements-example-1.jpg"
# IMAGE_SOURCE="path/to/your/image.jpg" # Uncomment to use a local file instead
# Encode the image to base64 (handles both URLs and local files)
if [[ $IMAGE_SOURCE == http* ]]; then
# Handle URL
BASE64_IMAGE=$(curl -s ${IMAGE_SOURCE} | base64 -w 0)
else
# Handle local file
BASE64_IMAGE=$(base64 -w 0 ${IMAGE_SOURCE})
fi
# Construct the full JSON payload
JSON_PAYLOAD='{
"input": [{
"type": "image_url",
"url": "data:image/jpeg;base64,'${BASE64_IMAGE}'"
}]
}'
# Send POST request to inference endpoint
echo "${JSON_PAYLOAD}" | \
curl -X POST "${API_ENDPOINT}/v1/infer" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d @-
The following image is used as the input in the previous example.

The following JSON response shows the output from the inference API. The response contains a bounding box for each element, such as tables and charts, that was detected. For each bounding box the response includes the coordinates x_min, y_min, x_max, and y_max, and a confidence score.
Note
Each detected element includes a confidence score between 0 and 1. In production applications, you might want to filter results based on a minimum confidence threshold (for example, 0.5) to reduce false positives.
{
"data": [
{
"index": 0,
"bounding_boxes": {
"table": [
{
"x_min": 0.36,
"y_min": 0.2616,
"x_max": 0.4907,
"y_max": 0.3881,
"confidence": 0.6416
},
{
"x_min": 0.505,
"y_min": 0.2287,
"x_max": 0.6356,
"y_max": 0.3538,
"confidence": 0.5757
},
{
"x_min": 0.2437,
"y_min": 0.7994,
"x_max": 0.7526,
"y_max": 0.8382,
"confidence": 0.4475
},
{
"x_min": 0.6518,
"y_min": 0.1928,
"x_max": 0.7821,
"y_max": 0.3258,
"confidence": 0.4405
},
{
"x_min": 0.2156,
"y_min": 0.3202,
"x_max": 0.3488,
"y_max": 0.438,
"confidence": 0.2427
}
],
"chart": [
{
"x_min": 0.2133,
"y_min": 0.548,
"x_max": 0.7816,
"y_max": 0.8542,
"confidence": 0.8397
}
],
"title": [
{
"x_min": 0.2384,
"y_min": 0.1365,
"x_max": 0.7192,
"y_max": 0.1926,
"confidence": 0.5737
}
]
}
}
],
"usage": {
"images_size_mb": 0.10183906555175781
}
}
The following image shows the input image with the bounding boxes overlaid to visualize the detected page elements.

Python Example
The following Python code demonstrates how to detect page elements and visualize the results.
Note
This example requires the requests
and Pillow
libraries. You can install them by using pip (or your preferred package manager). For example: pip install requests Pillow
import requests
import base64
import json
import io
from PIL import Image, ImageDraw
def encode_image(image_source):
"""
Encode an image to base64 data URL.
Args:
image_source: A URL or a local file path
Returns:
A base64-encoded data URL
"""
# Check if the source is a URL or local file
if image_source.startswith(('http://', 'https://')):
# Handle remote URL
response = requests.get(image_source)
response.raise_for_status()
image_bytes = response.content
else:
# Handle local file
with open(image_source, 'rb') as f:
image_bytes = f.read()
# Encode to base64
base64_image = base64.b64encode(image_bytes).decode('utf-8')
return f"data:image/jpeg;base64,{base64_image}"
def detect_elements(image_data_url, api_endpoint):
"""
Detect page elements in an image using the Page Elements NIM API.
Args:
image_data_url: Data URL of the image to process
api_endpoint: Base URL of the NIM service
Returns:
API response dict
"""
# Prepare payload
payload = {
"input": [{
"type": "image_url",
"url": image_data_url,
}]
}
# Make inference request
url = f"{api_endpoint}/v1/infer"
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
return response.json()
def visualize_detections(image_data, result, output_path):
"""Draw bounding boxes on the image based on API results."""
# Load image from data URL or URL
if image_data.startswith('data:'):
# Extract base64 data after the comma
b64_data = image_data.split(',')[1]
image_bytes = base64.b64decode(b64_data)
image = Image.open(io.BytesIO(image_bytes))
else:
# Download from URL
response = requests.get(image_data)
image = Image.open(io.BytesIO(response.content))
draw = ImageDraw.Draw(image)
# Get image dimensions
width, height = image.size
# Define colors for different element types
colors = {
"table": "red",
"chart": "green",
"title": "blue"
}
# Draw detected elements
for detection in result["data"]:
for element_type, boxes in detection["bounding_boxes"].items():
color = colors.get(element_type, "yellow")
for box in boxes:
# Convert normalized coordinates to pixels
x1 = int(box["x_min"] * width)
y1 = int(box["y_min"] * height)
x2 = int(box["x_max"] * width)
y2 = int(box["y_max"] * height)
# Draw rectangle
draw.rectangle([x1, y1, x2, y2], outline=color, width=3)
# Add label with confidence
label = f"{element_type}: {box['confidence']:.2f}"
draw.text((x1, y1-15), label, fill=color)
# Save the annotated image
image.save(output_path)
print(f"Annotated image saved to {output_path}")
# Example usage
if __name__ == "__main__":
# Process the sample image
image_source = "https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/object-detection/page-elements-example-1.jpg"
# Also works with local files
# image_source = "path/to/your/image.jpg"
api_endpoint = "http://localhost:8000"
output_path = "detected_page_elements.jpg"
try:
# Encode the image
image_data_url = encode_image(image_source)
# Detect elements
result = detect_elements(image_data_url, api_endpoint)
print(json.dumps(result, indent=2))
# Visualize the results
visualize_detections(image_data_url, result, output_path)
except requests.exceptions.RequestException as e:
print(f"API request failed: {e}")
except Exception as e:
print(f"Error: {e}")
Table Structure Example#
API_ENDPOINT="http://localhost:8000"
# Create JSON payload with base64 encoded image
IMAGE_SOURCE="https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/object-detection/table-structure-example-1.png"
# IMAGE_SOURCE="path/to/your/image.jpg" # Uncomment to use a local file instead
# Encode the image to base64 (handles both URLs and local files)
if [[ $IMAGE_SOURCE == http* ]]; then
# Handle URL
BASE64_IMAGE=$(curl -s ${IMAGE_SOURCE} | base64 -w 0)
else
# Handle local file
BASE64_IMAGE=$(base64 -w 0 ${IMAGE_SOURCE})
fi
# Construct the full JSON payload
JSON_PAYLOAD='{
"input": [{
"type": "image_url",
"url": "data:image/jpeg;base64,'${BASE64_IMAGE}'"
}]
}'
# Send POST request to inference endpoint
echo "${JSON_PAYLOAD}" | \
curl -X POST "${API_ENDPOINT}/v1/infer" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d @-
The following image is used as the input in the previous example.

Using the same visualization steps from earlier, the following image shows the bounding boxes overlaid to visualize the detected table structure elements visualized by cell, row, and column.



Graphic Element Example#
API_ENDPOINT="http://localhost:8000"
# Create JSON payload with base64 encoded image
IMAGE_SOURCE="https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/object-detection/graphic-elements-example-1.jpg"
# IMAGE_SOURCE="path/to/your/image.jpg" # Uncomment to use a local file instead
# Encode the image to base64 (handles both URLs and local files)
if [[ $IMAGE_SOURCE == http* ]]; then
# Handle URL
BASE64_IMAGE=$(curl -s ${IMAGE_SOURCE} | base64 -w 0)
else
# Handle local file
BASE64_IMAGE=$(base64 -w 0 ${IMAGE_SOURCE})
fi
# Construct the full JSON payload
JSON_PAYLOAD='{
"input": [{
"type": "image_url",
"url": "data:image/jpeg;base64,'${BASE64_IMAGE}'"
}]
}'
# Send POST request to inference endpoint
echo "${JSON_PAYLOAD}" | \
curl -X POST "${API_ENDPOINT}/v1/infer" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d @-
The following image is used as the input in the previous example.

Using the same visualization steps from earlier, the following image shows the bounding boxes overlaid to visualize the detected graphic elements.

Error Handling#
When you use the NVIDIA NIM for Object Detection NIM APIs, you might encounter various errors. Understanding these errors can help you troubleshoot issues in your applications.
Common Error Responses#
Status Code |
Error Type |
Description |
Resolution |
---|---|---|---|
422 (Unprocessable Entity) |
Invalid image URL format |
The image URL doesn’t follow the required data URL format |
Ensure all URLs follow the pattern: |
422 (Unprocessable Entity) |
Invalid base64 content |
The base64-encoded data in the URL is invalid |
Verify that your base64 encoding process is correct and that the image data is not corrupted |
422 (Unprocessable Entity) |
Malformed request |
The JSON payload structure is incorrect |
Verify that your request format matches the API specification |
429 (Too Many Requests) |
Request queue full |
The number of concurrent requests exceeds the configured queue size |
Reduce the request rate, or increase the queue size by using |
500 (Internal Server Error) |
Server error |
An unexpected error occurred during processing |
Check server logs for details and report the issue if persistent |
503 (Service Unavailable) |
Service not ready |
The service is still initializing or loading models |
Check health endpoints and wait for the service to complete initialization |
Error Response Example#
{
"error": "One or more images in the request contain an invalid image URL. Ensure that all URLs are data URLs with an image media type and base64-encoded image data. The pattern for this is 'data:<image-media-type>;base64,<base64-image-data>'."
}
Troubleshooting Tips#
Invalid Image Format: The API supports PNG and JPEG formats. Ensure your images are in one of these formats before encoding.
Image Size Limits: Very large images may cause processing issues. Consider resizing large images before sending them to the API.
Service Health: Use the health check endpoints (
/v1/health/live
and/v1/health/ready
) to verify the service is operational before sending inference requests.Base64 Encoding: When encoding images, ensure you’re using the correct MIME type in the data URL:
For JPEG:
data:image/jpeg;base64,...
For PNG:
data:image/png;base64,...
Request Timeout: If requests are timing out, the model may be processing a large batch or complex images. Consider adjusting timeout settings in your client application.
Rate Limiting: If you’re receiving 429 errors, implement backoff strategies in your client application to handle rate limiting gracefully.
Health Check#
cURL Request
Use the following command to query the health endpoints.
HOSTNAME="localhost"
SERVICE_PORT=8000
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/ready" \
-H 'Accept: application/json'
HOSTNAME="localhost"
SERVICE_PORT=8000
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/live" \
-H 'Accept: application/json'
Response
{
"ready": true
}
{
"live": true
}
OpenAPI Reference for Page Elements#
The following is the OpenAPI reference for NVIDIA NIM for Page Elements.
OpenAPI Reference for Graphic Elements#
The following is the OpenAPI reference for NVIDIA NIM for Graphic Elements.
OpenAPI Reference for Table Structure#
The following is the OpenAPI reference for NVIDIA NIM for Table Structure.