NVIDIA NIM

Accelerate Your AI Deployment With NVIDIA NIM

Part of NVIDIA AI Enterprise, NVIDIA NIM is a set of easy-to-use microservices for accelerating the deployment of foundation models on any cloud or data center and helps keep your data secure. NIM has production-grade runtimes including on-going security updates. Run your business applications with stable APIs backed by enterprise-grade support.

NVIDIA NIM is designed to bridge the gap between the complex world of AI development and the operational needs of enterprise environments, enabling 10-100X more enterprise application developers to contribute to AI transformations of their companies.
Latest Releases
MolMIM is a transformer-based model developed by NVIDIA for controlled small molecule generation. MolMIM optimizes and samples molecules from the latent space guided by user-defined scoring functions, including functions from other models and functions based on experimental data testing for various chemical and biological properties. MolMIM can be deployed in the cloud or on-prem for enterprise-grade inference in computational drug discovery workflows, including virtual screening, lead optimization, and other lab-in-the-loop approaches.
The NVIDIA DiffDock NIM is built for high-performance, scalable molecular docking at enterprise scale. It requires protein and molecule 3D structures as input but does not require any information about a binding pocket. Driven by a generative AI model and accelerated 3D equivariant graph neural networks, DiffDock predicts up to 7.6X more poses per second compared to the baseline published model, reducing the cost of computational drug discovery workflows, including virtual screening and lead optimization.
Latest Release
NIM for LLMs makes it easy for IT and DevOps teams to self-host large language models (LLMs) in their own managed environments while still providing developers with industry standard APIs that enable them to build powerful copilots, chatbots, and AI assistants that can transform their business.
Resources
This document provides insights into how to benchmark deployment of Large Language Models (LLMs), popular metrics and parameters, as well as a step-by-step guide.
Previous Releases
Earlier release of NVIDIA NIM for Large Language Models.
Latest Releases
NVIDIA NeMo™ Retriever text embedding NIM bring the power of state-of-the-art text embedding models to your applications, offering unparalleled natural language processing and understanding capabilities. You can use NeMo Retriever embedding NIMs for semantic search, retrieval-augmented generation (RAG), or any application that uses text embeddings. NeMo Retriever text embedding NIMs are built on the NVIDIA software platform, incorporating NVIDIA® CUDA®, TensorRT™, and Triton™ Inference Server to offer out-of-the-box GPU acceleration.
NVIDIA NeMo™ Retriever text reranking NIM reorder citations by how well they match a query. This is a key step in the retrieval process, especially when the retrieval pipeline involves citations from different datastores that each have their own algorithms for measuring similarity.
Resources
Enterprises are sitting on a goldmine of data waiting to be used to improve efficiency, save money, and ultimately enable higher productivity. With generative AI, developers can build and deploy an agentic flow or a retrieval-augmented generation (RAG) chatbot, while ensuring the insights provided are based on the most accurate and up-to-date information.
Generative AI applications have little, or sometimes negative, value without accuracy — and accuracy is rooted in data.

To help developers efficiently fetch the best proprietary data to generate knowledgeable responses for their AI applications, NVIDIA today announced four new NVIDIA NeMo Retriever NIM inference microservices.
Employing retrieval-augmented generation (RAG) is an effective strategy for ensuring large language model (LLM) responses are up-to-date and not hallucinated.

While various retrieval strategies can improve the recall of documents for generation, there is no one-size-fits-all approach. The retrieval pipeline depends on your data, from hyperparameters like the chunk size, and number of documents returned, to retrieval algorithms like semantic search or graph retrieval.
Businesses seeking to harness the power of AI need customized models tailored to their specific industry needs.

NVIDIA AI Foundry is a service that enables enterprises to use data, accelerated computing and software tools to create and deploy custom models that can supercharge their generative AI initiatives.