Overview
Important
NVIDIA NIM currently is in limited availability, sign up here to get notified when the latest NIMs are available to download.
NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to speed up generative AI deployment in enterprises. Supporting a wide range of AI models, including NVIDIA AI foundation and custom models, it ensures seamless, scalable AI inferencing, on-premises or in the cloud, leveraging industry-standard APIs.
NIMs are containers that provide interactive APIs for running inference on an AI Model. In general, NIMs have:
An API layer
A server layer
A runtime layer
A model “engine”
NIMs have two components: the docker container and the model (weights and biases). The docker containers are obtained by pulling from the NVIDIA Docker Registry on NGC, while the models may come from NGC or other sources. Some NIMs with small model files ship the models inside of the container itself.
Requirements
The following are the requirements necessary to use all NIMs. Specific requirements for individual NIMs are documented in their respective documentation pages.
Hardware and Operating System
Linux with an x86_64/AMD64 processor. ARM processor support is available for select NIMs. See the individual NIM documentation for details.
At least one NVIDIA GPU. NIMs with large models (e.g., LLMs) are optimized with pre-compiled TensorRT engines and therefore have specific GPU model requirements. See the individual documentation for details.
Prerequisite Software
Install Docker
Install the NVIDIA Container Toolkit
Verify your container runtime supports NVIDIA GPUs by running
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Example output:
1+-----------------------------------------------------------------------------+ 2| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 | 3|-------------------------------+----------------------+----------------------+ 4| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 5| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 6| | | MIG M. | 7|===============================+======================+======================| 8| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | 9| 41% 30C P8 1W / 260W | 2244MiB / 11264MiB | 0% Default | 10| | | N/A | 11+-------------------------------+----------------------+----------------------+ 12 13+-----------------------------------------------------------------------------+ 14| Processes: | 15| GPU GI CI PID Type Process name GPU Memory | 16| ID ID Usage | 17|=============================================================================| 18+-----------------------------------------------------------------------------+
For more information on enumerating multi-GPU systems, please see the NVIDIA Container Toolkit’s GPU Enumeration Docs
NGC (NVIDIA GPU Cloud) Account
Docker log in with your NGC API key using
docker login nvcr.io --username='$oauthtoken' --password=${NGC_CLI_API_KEY}
NGC CLI Tool
Download the NGC CLI tool for your OS.
Important
Use NGC CLI version
3.41.1
or newer. Here is the command to install this on AMD64 Linux in your home directory:1wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.3/files/ngccli_linux.zip -O ~/ngccli_linux.zip && \ 2unzip ~/ngccli_linux.zip -d ~/ngc && \ 3chmod u+x ~/ngc/ngc-cli/ngc && \ 4echo "export PATH=\"\$PATH:~/ngc/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile
Set up your NGC CLI Tool locally (You’ll need your API key for this!):
ngc config set
Note
After you enter your API key, you may see multiple options for the org and team. Select as desired or hit enter to accept the default.
Individual NIM Documentation
NIM |
Domain |
Required GPUs |
Minimum GPU Memory |
Model Source |
CPU Architecture Support |
---|---|---|---|---|---|
Text to Image |
Single H100 or A100 or L40 |
24 GB |
StabilityAI |
x86 |
|
Text to Image |
Single H100 or A100 or L40 |
16 GB |
StabilityAI |
x86 |