Introduction#

NVIDIA NIM for Vision Language Models (VLMs) (NVIDIA NIM for VLMs) brings the power of state-of-the-art vision language models (VLMs) to enterprise applications, providing unmatched natural language and multimodal understanding capabilities.

NIM makes it easy for IT and DevOps teams to self-host vision language models (VLMs) in their own managed environments while still providing developers with industry-standard APIs that allow them to build powerful copilots, chatbots, and AI assistants that can transform their business. Leveraging NVIDIA’s cutting-edge GPU acceleration and scalable deployment, NIM offers the fastest path to inference with unparalleled performance.

Visit Support Matrix for all models supported by NVIDIA NIM for VLMs.

To discover other NIMs and APIs, visit the API catalog.

High-Performance Features#

NIM abstracts away model inference internals such as execution engine and runtime operations. They are also the most performant option available, whether with TRT-LLM, vLLM, or others. NIM offers the following high-performance features:

Scalable Deployment that is performant and can quickly and seamlessly scale from a few users to millions.

Advanced Vision Language Model support with pre-generated optimized engines for a diverse range of cutting-edge VLM architectures.

Flexible Integration to easily incorporate the microservice into existing workflows and applications. Developers are provided an OpenAI API-compatible programming model and custom NVIDIA extensions for additional functionality.

Enterprise-Grade Security emphasizes security by using safetensors, constantly monitoring and patching CVEs in our stack and conducting internal penetration tests.

Applications#

Image Q&A: Empower bots with visual understanding besides human-like language understanding and responsiveness

Image summarization: Generate summaries based on image understanding

Image description: Empower bots to describe the content of an image and engage in multi-turn conversations

Charts and diagram understanding: Generate descriptions of charts, tables, and diagrams present in an image

And many more… The potential applications of NIM are vast, spanning across various industries and use cases.

Architecture#

NIMs are packaged as container images on a model/model family basis. Each NIM is its own Docker container with a model. These containers include a runtime that runs on any NVIDIA GPU with sufficient GPU memory, but only some model/GPU combinations are optimized. NIM automatically downloads the model from NGC, leveraging a local filesystem cache if available. Each NIM is built from a common base, so once a NIM has been downloaded, additional NIMs can be downloaded quickly.

When a NIM is first deployed, it inspects the local hardware configuration and the available model versions in the model registry and automatically chooses the best version of the model for the available hardware. For a subset of NVIDIA GPUs (see Support Matrix), NIM downloads the optimized TRT engines and runs inference using the TRT-LLM library. NIM downloads a non-optimized model for all other NVIDIA GPUs and runs it using the vLLM library.

NIMs are distributed as NGC container images through the NVIDIA NGC Catalog. A security scan report is available for each container in the NGC catalog, which provides a security rating of that image, a breakdown of CVE severity by package, and links to detailed information on the CVEs.

The NVIDIA Developer Program#

Want to learn more about NIMs? Join the NVIDIA Developer Program to get free access to self-hosting NVIDIA NIMs and microservices on up to 16 GPUs on any infrastructure-cloud, data center, or personal workstation.

Once you join the free NVIDIA Developer Program, you can access NIMs through the NVIDIA API Catalog at any time. For enterprise-grade security, support, and API stability, select the option to access NIM through our free 90-day NVIDIA AI Enterprise Trial with a business email address.

See the NVIDIA NIM FAQ for additional information.