Overview

Important

NVIDIA NIM currently is in limited availability, sign up here to get notified when the latest NIMs are available to download.

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to speed up generative AI deployment in enterprises. Supporting a wide range of AI models, including NVIDIA AI foundation and custom models, it ensures seamless, scalable AI inferencing, on-premises or in the cloud, leveraging industry-standard APIs.

NIMs are containers that provide interactive APIs for running inference on an AI Model. In general, NIMs have:

An API layer
A server layer
A runtime layer
A model “engine”

NIMs have two components: the docker container and the model (weights and biases). The docker containers are obtained by pulling from the NVIDIA Docker Registry on NGC, while the models may come from NGC or other sources. Some NIMs with small model files ship the models inside of the container itself.

Requirements

The following are the requirements necessary to use all NIMs. Specific requirements for individual NIMs are documented in their respective documentation pages.

Hardware and Operating System

Linux with an x86_64/AMD64 processor. ARM processor support is available for select NIMs. See the individual NIM documentation for details.
At least one NVIDIA GPU. NIMs with large models (e.g., LLMs) are optimized with pre-compiled TensorRT engines and therefore have specific GPU model requirements. See the individual documentation for details.

Prerequisite Software

Install Docker

Install the NVIDIA Container Toolkit

Verify your container runtime supports NVIDIA GPUs by running

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Example output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01    Driver Version: 525.78.01    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 41%   30C    P8     1W / 260W |   2244MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

For more information on enumerating multi-GPU systems, please see the NVIDIA Container Toolkit’s GPU Enumeration Docs

NGC (NVIDIA GPU Cloud) Account

Create an account on NGC
Generate an API Key
Docker log in with your NGC API key using docker login nvcr.io --username='$oauthtoken' --password=${NGC_CLI_API_KEY}

NGC CLI Tool

Download the NGC CLI tool for your OS.

Important

Use NGC CLI version 3.41.1 or newer. Here is the command to install this on AMD64 Linux in your home directory:

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.3/files/ngccli_linux.zip -O ~/ngccli_linux.zip && \
unzip ~/ngccli_linux.zip -d ~/ngc && \
chmod u+x ~/ngc/ngc-cli/ngc && \
echo "export PATH=\"\$PATH:~/ngc/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile

Set up your NGC CLI Tool locally (You’ll need your API key for this!):
```
ngc config set
```
Note

After you enter your API key, you may see multiple options for the org and team. Select as desired or hit enter to accept the default.

Individual NIM Documentation

NIM	Domain	Required GPUs	Minimum GPU Memory	Model Source	CPU Architecture Support
stable-diffusion-xl	Text to Image	Single H100 or A100 or L40	24 GB	StabilityAI	x86
sdxl-turbo	Text to Image	Single H100 or A100 or L40	16 GB	StabilityAI	x86