Overview of NVIDIA NIM for Image OCR#

NVIDIA NeMo™ Retriever NIM APIs provide easy access to state-of-the-art models designed to be core building blocks for enterprise semantic search applications – delivering accuracy, scalability, and real-time information retrieval. These NIM microservices enable developers to build powerful AI-driven extraction and retrieval pipelines that parse, process and connect multimodal data to generative applications. Built on the NVIDIA software platform, NeMo Retriever NIM microservices leverage NVIDIA® CUDA®, NVIDIA TensorRT™ and NVIDIA Triton™ Inference Server for out-of-the-box GPU acceleration, optimizing performance for large-scale AI workloads.

NeMo Retriever includes NIM microservices for creating advanced large-scale, multimodal extraction and retrieval pipelines, which is critical for generative AI applications like retrieval-augmented generation (RAG).

Extraction pipelines retrieve documents from external sources beyond the foundational model’s scope. The Image OCR NIM is an optical character recognition (OCR) microservice that extracts text from images. The PaddleOCR NIM microservices is designed to be used in tandem with the NeMo Retriever object detection NIM microservices for the purposes of extracting content from tables, charts, and infographics. With these microservices orchestrated, downstream retrieval augmented generation applications are now able to retrieve across text as well as other modalities.

The retrieval pipeline fetches relevant document data and generates responses during inference. The following NeMo Retriever microservices provide superior natural language processing and understanding, boosting retrieval performance.

  • Text Embedding NIM - Boosts text question-answering retrieval performance, providing high quality embeddings for many downstream NLP tasks.

  • Text Reranking NIM - Enhances the retrieval performance further with a fine-tuned reranking model, finding the most relevant passages to provide as context when querying an LLM.

The following diagram shows how NeMo Retriever NIM microservices are used to create advanced extraction and retrieval pipelines for a question-answering RAG application in enterprise settings.

_images/image2.png