Prerequisites#

Before deploying a NIM LLM container, ensure your environment meets the following requirements:

Hardware Requirements#

Minimum required specifications for supported hardware components.

Requirement	Specification
CPU	AMD64, ARM64
GPU	Any supported NVIDIA GPU

Model GPU Memory Requirements#

Different models require different minimum amounts of GPU memory. For example, Llama 3.1 8B Instruct requires a minimum of 24GB of GPU memory.

Ensure your chosen hardware configuration meets the minimum GPU memory requirement for the model profile you wish to run. For more information, refer to Model Profiles and Support Matrix.

Software Requirements#

Minimum required versions for supported software components.

Requirement	Specification
Operating System	Ubuntu 22.04 LTS or later recommended
Container Toolkit	1.14.0 or later
CUDA SDK	12.9 or later
GPU Driver	580 or later
Docker	24.0 or later

Operating System#

While other Linux distributions can be compatible with NIM LLM, they have not been officially validated.

We recommend using Ubuntu 22.04 LTS or later for the best experience.

CUDA SDK#

Install CUDA SDK by following the CUDA installation guide for Linux.

GPU Drivers#

Install the NVIDIA GPU drivers by following the NVIDIA Driver Installation Guide.

Docker#

Docker is required to run the containerized NIM services.

Install Docker Engine for your Linux distribution by following the Docker Engine installation guide.
Verify that the Docker daemon is running and that your user can execute docker commands without sudo. Add your user to the docker group if needed:
```
sudo groupadd docker
sudo usermod -aG docker $USER
```
Log out and back in for the group change to take effect.

Container Toolkit#

The NVIDIA Container Toolkit enables Docker containers to access the host GPU.

Install the toolkit by following the NVIDIA Container Toolkit installation guide.
Configure Docker to use the NVIDIA runtime by following the Docker configuration steps.
Restart the Docker daemon after configuration:
```
sudo systemctl restart docker
```

NIM Container Access#

To download and deploy NIM containers, you need one of the following:

A free NVIDIA Developer Program membership.
An NVIDIA AI Enterprise license. To request a free 90-day evaluation license, refer to Ways to Get Started With NVIDIA AI Enterprise and Activate Your NVIDIA AI Enterprise License.

Generate Access Credentials#

Model-Specific NIM

NGC Personal API Key

An NGC Personal API Key is required to access NVIDIA NIM containers and models hosted on NGC.

Generate the Personal API Key on the Setup API Keys page.
When creating the Personal API key, select at least NGC Catalog from the Services Included list. You can also include additional services if you want to use the same key for other purposes.

Warning

Legacy API keys are not supported by NIM LLM. Always use a Personal API Key.

Model-Free NIM

Hugging Face Access Token

The required credentials depend on your model source.

Complete the steps in the Model-Specific NIM tab to generate your NGC Personal API key. You will need one to pull the NIM container itself.
To download models, create a Hugging Face access token with read permissions (or higher, only if your org or private repos require it). This token allows model-free NIM to fetch models directly from Hugging Face.

Note

If you want to serve a pre-downloaded local model or a private cloud model instead of downloading one from Hugging Face, you do not need a Hugging Face access token. Refer to Model Downloads for your workflow.

Verify NVIDIA Runtime Access#

To ensure that your setup is correct, run the following command:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

This command should produce output similar to one of the following, where you can confirm CUDA driver version, and available GPUs.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          On  |   00000000:1B:00.0 Off |                    0 |
| N/A   36C    P0            112W /  700W |   78489MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+