Getting Started#
Prerequisites#
Setup#
NVIDIA AI Enterprise License: NVIDIA NIM for VLMs is available for self-hosting under the NVIDIA AI Enterprise (NVAIE) License.
NVIDIA GPU(s): NVIDIA NIM for VLMs (NIM for VLMs) runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized. Homogeneous multi-GPUs systems with tensor parallelism enabled are also supported. See the Support Matrix for more information.
CPU: x86_64 architecture only for this release
OS: any Linux distributions that:
Have
glibc
>= 2.35 (see output ofld -v
)
CUDA Drivers: Follow the installation guide.
We recommend:
Using a network repository as part of a package manager installation and skipping the CUDA toolkit installation, as the libraries are available within the NIM container
Installing the open kernels for a specific version:
Major Version |
EOL |
Data Center & RTX/Quadro GPUs |
GeForce GPUs |
---|---|---|---|
> 550 |
TBD |
X |
X |
550 |
Feb 2025 |
X |
X |
545 |
Oct 2023 |
X |
X |
535 |
June 2026 |
X |
|
525 |
Nov 2023 |
X |
|
470 |
Sept 2024 |
X |
Install Docker.
Install the NVIDIA Container Toolkit.
After installing the toolkit, follow the instructions in the Configure Docker section in the NVIDIA Container Toolkit documentation.
To ensure that your setup is correct, run the following command:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
This command should produce output similar to the following, allowing you to confirm the CUDA driver version and available GPUs.
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:1B:00.0 Off | 0 |
| N/A 36C P0 112W / 700W | 78489MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Installing WSL2 for Windows#
Certain downloadable NIMs can be used on an RTX Windows system with Windows System for Linux (WSL). To enable WSL2, perform the following steps.
Be sure your computer can run WSL2 as described in the Prerequisites section of the WSL2 documentation.
Enable WSL2 on your Windows computer by following the steps in Install WSL command. By default, these steps install the Ubuntu distribution of Linux. For alternative installations, see Change the default Linux distribution installed.
Launch NVIDIA NIM for VLMs#
You can download and run the NIM of your choice from either the API catalog or NGC.
From NGC#
Generate an API key#
An NGC API key is required to access NGC resources. The key can be generated here: https://org.ngc.nvidia.com/setup/api-keys.
When creating an NGC API key, ensure that at least NGC Catalog
is selected from the Services Included
dropdown. If this key is to be reused for other purposes, more services can be included.

Export the API key#
Pass the value of the API key to the docker run
command in the next section as the NGC_API_KEY
environment variable to download the appropriate models and resources when starting the NIM.
If you are not familiar with how to create the NGC_API_KEY
environment variable, the simplest way is to export it in your terminal:
export NGC_API_KEY=<value>
Run one of the following commands to make the key available at startup:
# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc
# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Other more secure options include saving the value in a file, which you can retrieve with cat $NGC_API_KEY_FILE
, or using a password manager.
Docker Login to NGC#
To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry using the following command:
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Use $oauthtoken
as the username and NGC_API_KEY
as the password. The $oauthtoken
username is a special name indicating that you will authenticate with an API key, not a username and password.
List Available NIMs#
This documentation uses the NGC CLI tool in several examples. For information on downloading and configuring the tool, see the NGC CLI documentation.
Use the following command to list the available NIMs in CSV format.
ngc registry image list --format_type csv 'nvcr.io/nim/*'
This command should produce output in the following format:
Name,Repository,Latest Tag,Image Size,Updated Date,Permission,Signed Tag?,Access Type,Associated Products
<model-name1>,<repository1>,<latest-tag1>,<image size1>,<updated date1>,<permission1>,<signed tag?1>,<access type1>,<associated products1>
...
<model-nameN>,<repositoryN>,<latest-tagN>,<image sizeN>,<updated dateN>,<permissionN>,<signed tag?N>,<access typeN>,<associated productsN>
Use the **Repository** and **Latest Tag** fields when you call the ``docker run`` command, as shown in the following section.
Note
The following sections outline how to launch and query a NIM for any model. To see model-specific examples, see Querying the API.
Launch NIM#
The following command launches a Docker container for a specific model. To
launch a container for a different NIM, replace the values of Repository
and
Latest_Tag
with values from the previous image list
command and change
the value of CONTAINER_NAME
to something appropriate.
# Choose a container name for bookkeeping
export CONTAINER_NAME=<container-name>
# The container name from the previous ngc registry image list command
Repository=<repository>
Latest_Tag=<latest-tag>
# Choose a VLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:${Latest_Tag}"
# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
# Start the VLM NIM
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NGC_API_KEY=$NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
Docker Run Parameters#
Flags |
Description |
---|---|
|
|
|
Delete the container after it stops (see Docker docs) |
|
Give a name to the NIM container for bookkeeping. Use any preferred value. |
|
Ensure NVIDIA drivers are accessible in the container. |
|
Expose all NVIDIA GPUs inside the container. See the configuration page for mounting specific GPUs. |
|
Allocate host memory for multi-GPU communication. Not required for single GPU models or GPUs with NVLink enabled. |
|
Provide the container with the token necessary to download adequate models and resources from NGC. See Export the API key. |
|
Mount a cache directory from your
system ( |
|
Use the same user as your system user inside the NIM container to avoid permission mismatches when downloading models in your local cache directory. |
|
Forward the port where the NIM
server is published inside the
container to access from the host
system. The left-hand side of
the colon ( |
|
Name and version of the VLM NIM container from NGC. The VLM NIM server automatically starts if no argument is provided after this. |
Note
See the Configuring a NIM topic for information about additional configuration settings.
Note
If you have an issue with permission mismatches when downloading models in your local cache directory, add the -u $(id -u)
option to the docker run
call.
Note
NIM automatically selects the most suitable profile based on your system specifications. For details, see Automatic Profile Selection
Run Inference#
During startup, the NIM container downloads the required resources and serves the model behind an API endpoint. The following message indicates a successful startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Once you see this message, you can validate the deployment of NIM by executing an inference request. In a new terminal, run the following command to show a list of models available for inference:
curl -X GET 'http://0.0.0.0:8000/v1/models'
Tip
Pipe the results of curl
commands into a tool like jq or python -m json.tool
to make the output of the API easier to read. For example: curl -s http://0.0.0.0:8000/v1/models | jq
.
This command should produce output similar to the following:
{
"object": "list",
"data": [
{
"id": "<model-name>",
"object": "model",
"created": 1724796510,
"owned_by": "system",
"root": "<model-name>",
"parent": null,
"max_model_len": 131072,
"permission": [
{
"id": "modelperm-c2e069f426cc43088eb408f388578289",
"object": "model_permission",
"created": 1724796510,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
]
}
To check the readiness of the service:
curl -X GET 'http://0.0.0.0:8000/v1/health/ready'
, which will respond with 200 if the server is ready to accept requests.
Querying the API#
Different models support different features and APIs. For more information, see the following:
Llama 3.2 Vision: Overview, API examples
nemoretriever-parse: Overview, API examples
Stopping the Container#
If a Docker container is launched with the --name
command line option, you can stop the running container using the following command.
# In the previous sections, the environment variable CONTAINER_NAME was
# defined using `export CONTAINER_NAME=<container-name>`
docker stop $CONTAINER_NAME
Use docker kill
if stop
is not responsive. Then, follow it with docker rm $CONTAINER_NAME
if you do not intend to restart this container as-is (using docker start $CONTAINER_NAME
). In which case you will need to re-use the docker run ...
instructions from the top of this section to start a new container for your NIM.
If you did not start a container with --name
, look at the output of docker ps
to get a container ID for the image you used.
Serving models from local assets#
NIM for VLMs provides utilities which enable downloading models to a local directory either as a model repository or to NIM cache. See the Utilities section for details.
Use the previous commands to launch a NIM container. From there, you can view and download models locally.
Use the list-model-profiles
command to list the available profiles.
You can download any of the profiles to the NIM cache using the download-to-cache
command. For example:
download-to-cache --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b
You can also let the download-to-cache
decide the most optimal profile given the hardware to download by providing no profiles to download, as shown in the following example.
download-to-cache
Air Gap Deployment (offline cache route)#
NIM supports serving models in an Air Gap system (also known as air wall, air-gapping or disconnected network).
If NIM detects a previously loaded profile in the cache, it serves that profile from the cache.
After downloading the profiles to cache using download-to-cache
, the cache can be transferred to an air-gapped system to run a NIM without any internet connection and with no connection to the NGC registry.
To see this in action, do NOT provide the NGC_API_KEY, as shown in the following example.
export CONTAINER_NAME=<container-name>
export IMG_NAME=<image-name>
# Create an example air-gapped directory where the downloaded NIM will be deployed
export AIR_GAP_NIM_CACHE=~/.cache/air-gap-nim-cache
mkdir -p "$AIR_GAP_NIM_CACHE"
# Transport the downloaded NIM to an air-gapped directory
cp -r "$LOCAL_NIM_CACHE"/* "$AIR_GAP_NIM_CACHE"
# Assuming the command run prior was `download-to-cache`, downloading the optimal profile
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
# Assuming the command run prior was `download-to-cache --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b`
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NIM_MODEL_PROFILE=09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b \
-v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
Air Gap Deployment (local model directory route)#
Another option for the air gap route is to deploy the created model repository using the create-model-store
command within the NIM Container to create a repository for a single model, as shown in the following example.
create-model-store --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b --model-store /path/to/model-repository
export CONTAINER_NAME=<container-name>
export IMG_NAME=<image-name>
# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
export MODEL_REPO=</path/to/model-repository>
export NIM_SERVED_MODEL_NAME=<model-name>
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NIM_MODEL_NAME=/model-repo \
-e NIM_SERVED_MODEL_NAME \
-v $MODEL_REPO:/model-repo \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME