Overview - NVIDIA Docs

Enterprises worldwide know they need to invest in AI to secure their future, but many still struggle with finding the strategy and platform that can enable success. NVIDIA NIM is helping the technology industry put generative AI in reach for every application, such as copilots, code assistants and digital human avatars.

What are NVIDIA NIM Agent Blueprints

NVIDIA NIM Agent Blueprints are pre-trained AI models and microservices that simplify the development of AI applications, providing a foundation for real world use cases. These customizable AI workflow examples equip enterprise developers with NIM microservices, reference code, documentation, and a Helm chart for deployment.

How to Access

With a Blueprint, developers can experience a reference AI application in the NVIDIA API catalog, replicate this experience in their preferred deployment environment using Helm charts and code from our NIM Blueprint GitHub repository, and customize models to their specific use cases and domain knowledge. Using AI workflow examples as a starting point, enterprises can reduce development time and costs, improve the accuracy and performance of your application, and empower you to move it from pilot to production with confidence.

Since workflows leverage multiple NVIDIA NIM microservices, NVIDIA AI Enterprise entitlement or NVIDIA Developer Program membership is required. Downloading blueprints locally gives developers ownership of their customizations, infrastructure choices, and full control of their IP and AI applications.

Deployment Scenarios

Enterprise Developers require a development lab environment where they can try out and create new applications with familiar industry APIs. LLM Application developers want flexibility and independence, while IT Administrators want to manage easy and consistent deployments. NVIDIA NIM provides the best of both worlds: easy and flexible for developers, and fast and reproducible for IT. The following sections describe how enterprises will typically set up lab environments with NIM.

Note

NVIDIA-Certified Systems is a hardware certification program that includes systems that have been proven to deliver predictable performance and enable enterprises to quickly deploy optimized platforms for AI, Data Analytics, HPC, high-density VDI, and other accelerated workloads. A full list of NVIDIA-Certified Systems can be found here.

Enterprise Developer Lab Architectures

Typically, developers will start experimenting for free using NIM API endpoints from the NVIDIA API Catalog before deploying a model locally. This allows developers to quickly bootstrap development applications but is not typically a preferred permanent solution.

There are three primary architectures common in Enterprise developer lab environments.

Local Workstation
Shared Central Compute Resources
Dedicated Developer Server

Important

Please refer to the individual blueprint prerequisites section for hardware and GPU requirements. Some blueprints require larger compute resources which may dictate the developer lab architecture.

These methods are not mutually exclusive and will generally be blended. For example, if all developers have their own workstations, few may still need occasional access to larger models for testing. These models can be deployed on a larger remote compute resource using NVIDIA NIM therefore the API endpoints are self-hosted.

Local Workstation

Developers can utilize their own RTX workstation, which is sufficient for working with smaller LLM models. However these models may exhibit reduced reasoning abilities and retrieval precision, yet benefit from reduced response times since they are locally stored. Such an arrangement is advantageous in the early development stages before scaling to remote compute resources typically provided by enterprise IT departments.

In this scenario, every developer has a known amount of resources on their own workstation, but they sacrifice device portability.

Shared Central Compute Resources

In this scenario, Enterprises have the opportunity to adopt MLOps strategies to deploy generative AI successfully. Utilizing a Kubernetes cluster with KServe, they can use LLM NIM API endpoints which are shared by developers. Deploying NIMs locally ensures complete control over IP and AI applications. A self-hosted NIM setup is often ideal for lab environments as it reflects production settings, leading to greater predictability, portability of development setups, and efficient resource use. However, this complexity leads to increased management demands, which are best addressed through MLOps practices.

Dedicated Developer Server

At times, Developers require robust data center capabilities, specificity of dedicated hardware, and connect to this resource from their own device. This need is addressed by providing each developer with their own dedicated server, either physical or virtual. This approach combines the advantages of self-hosted LLMs, where developers can self-host models using NVIDIA NIM and retain control over their modifications. However, this system might not fully optimize GPU usage since developers may not always be active on their dedicated servers simultaneously.

Note

Though not detailed in AI workflows, NVIDIA Base Command Manager offers a solution for efficiently managing these resources.

Note

If you currently do not have GPU-accelerated hardware for testing and prototyping, please check out NVIDIA Launchpad to gain access to ready-to-use infrastructure, which can be available to you for up to two weeks.