NVIDIA NIM for NV-CLIP (NV-CLIP NIM) brings the power of state-of-the-art embedding model to enterprise applications, providing unmatched natural language and multimodal understanding capabilities.

NIM makes it easy for IT and DevOps teams to self-host NV-CLIP NIM in their own managed environments while still providing developers with industry-standard APIs that allow them to build powerful copilots, chatbots, and AI assistants that can transform their business. Leveraging NVIDIA’s cutting-edge GPU acceleration and scalable deployment, NIM offers the fastest path to inference with unparalleled performance.

NV-CLIP NIM brings the power of state-of-the-art text and image embedding models to your applications, offering unparalleled natural language processing and understanding capabilities. You can use NV-CLIP NIM for semantic search, Retrieval Augmented Generation (RAG), or any application that uses text and image embeddings. It is built on the NVIDIA software platform, incorporating CUDA, TensorRT, and Triton to offer out-of-the-box GPU acceleration.

Enterprise-Ready Features# NIM abstracts away model inference internals such as execution engine and runtime operations. They are also the most performant option available, whether with TensorRT or ONNX. NIM offers the following high-performance features: High Performance NV-CLIP NIM is optimized for high-performance deep learning inference with NVIDIA TensorRT ™ and NVIDIA Triton ™ Inference Server. Scalable Deployment that is performant and can quickly and seamlessly scale from a few users to millions. Flexible Integration to easily incorporate the microservice into existing workflows and applications. Developers are provided an OpenAI API-compatible programming model and custom NVIDIA extensions for additional functionality. Enterprise-Grade Security emphasizes security by using safetensors , constantly monitoring and patching CVEs in our stack and conducting internal penetration tests.

Architecture# NIMs are packaged as container images on a model/model family basis. Each NIM is its own Docker container with a model, such as “nvidia/nvclip-vit-h-14”. These containers include a runtime that runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized. NIM automatically downloads the model from NGC, leveraging a local filesystem cache if available. Each NIM is built from a common base, so once an NIM has been downloaded, additional NIMs can be quickly downloaded. For the NVIDIA GPUs listed in Support Matrix, NIM downloads the optimized TensorRT engine and runs an inference using the TensorRT library. NIM downloads a non-optimized model for all other NVIDIA GPUs. NIMs are distributed as NGC container images through the NVIDIA NGC Catalog. A security scan report is available for each container within the NGC catalog, which provides a security rating of that image, a breakdown of CVE severity by package, and links to detailed information on CVEs.