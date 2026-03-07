Overview of NVIDIA NeMo Retriever Reranking NIM#

The NVIDIA NeMo Retriever Reranking API provide easy access to state-of-the-art models that are foundational building blocks for enterprise semantic search applications, delivering accurate answers quickly at scale. Developers can use these APIs to create robust copilots, chatbots, and AI assistants from start to finish. NeMo Retriever Reranking models are built on the NVIDIA software platform, incorporating CUDA, TensorRT, and Triton to offer out-of-the-box GPU acceleration.

The following NeMo Retriever microservices provide superior natural language processing and understanding, boosting retrieval performance:

NeMo Retriever Embedding NIM - Boosts question-answering retrieval performance, providing high-quality embeddings for many downstream NLP tasks.

NeMo Retriever Reranking NIM - Enhances the retrieval performance further with a fine-tuned reranker, finding the most relevant passages to provide as context when querying an LLM.

The following diagram shows how the NeMo Retriever Reranking API can help a RAG-based application find relevant data based upon Q&A for an Enterprise purpose.