Getting Started#

Prerequisites#

Setup#

  • NVIDIA AI Enterprise License: NVIDIA NIM for NV-CLIP (NV-CLIP NIM) is available for self-hosting under the NVIDIA AI Enterprise (NVAIE) License.

  • NVIDIA GPU(s): NV-CLIP NIM runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized. See the Support Matrix for more information.

  • CPU: x86_64 & aarch64 architectures are supported in this release

  • OS: any Linux distributions which:

  • CUDA Drivers: Follow the installation guide.

    We recommend:

    • Using a network repository as part of a package manager installation, skipping the CUDA toolkit installation as the libraries are available within the NIM container

    • Installing the open kernels for a specific version:

      Major Version

      EOL

      Data Center & RTX/Quadro GPUs

      GeForce GPUs

      > 550

      TBD

      X

      X

      550

      Feb 2025

      X

      X

      545

      Oct 2023

      X

      X

      535

      June 2026

      X

      525

      Nov 2023

      X

      470

      Sept 2024

      X

  1. Install Docker.

  2. Install the NVIDIA Container Toolkit.

After installing the toolkit, follow the instructions in the Configure Docker section in the NVIDIA Container Toolkit documentation.

To ensure that your setup is correct, run the following command:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

This command should produce output similar to the following, where you can confirm the CUDA driver version and available GPUs.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          On  |   00000000:1B:00.0 Off |                    0 |
| N/A   36C    P0            112W /  700W |   78489MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Installing WSL2 for Windows#

Certain downloadable NIMs can be used on an RTX Windows system with Windows System for Linux (WSL). To enable WSL2, perform the following steps.

  1. Be sure your computer can run WSL2 as described in the Prerequisites section of the WSL2 documentation.

  2. Enable WSL2 on your Windows computer by following the steps in Install WSL command. By default, these steps install the Ubuntu distribution of Linux. For alternative installations, see Change the default Linux distribution installed.

Launch NVIDIA NIM for NV-CLIP#

You can download and run the NIM of your choice from either the API catalog or the NGC.

Option 1: From API Catalog#

Checkout this video, which illustrates the following steps.

Generate an API Key#

  1. Navigate to the API Catalog.

  2. Select a model.

  3. Select an Input option. The following example is of a model that offers a Docker option. Not all of the models offer this option, but all include a “Get API Key” link.

    Docker Run
  4. Select Get API Key and login if prompted.

    Get API Key
  5. Select Generate Key.

    Generate Key
  6. Copy your key and store it in a secure place. Do not share it.

    Copy Key

Login to Docker#

Use the docker login command, as shown in the following screenshot, to log in to Docker. Replace the placeholders for Username and Password with your values.

Copy Code

Download and Launch NVIDIA NIM for LLMs#

Use the following command to pull and run the NIM using Docker.

Copy Code

To modify the docker run parameters, see Docker Run Parameters.

You can now run inference.

Option 2: From NGC#

Generate an API key#

An NGC API key is required to access NGC resources. Navigate to https://org.ngc.nvidia.com/setup/personal-keys to create a key.

When creating an NGC API key, ensure that at least “NGC Catalog” is selected from the “Services Included” dropdown. Include additional services if you want to use this key for other purposes.

Generate Personal Key

Export the API key#

Pass the value of the API key to the docker run command in the next section as the NGC_API_KEY environment variable to download the appropriate models and resources when starting the NIM.

If you’re not familiar with how to create the NGC_API_KEY environment variable, the simplest way is to export it in your terminal:

export NGC_API_KEY=<value>

Run one of the following commands to make the key available at startup:

# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc

# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc

Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE, or using a password manager.

Docker Login to NGC#

To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:

echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin

Use $oauthtoken as the username and NGC_API_KEY as the password. The $oauthtoken username is a special name that indicates that you will authenticate with an API key and not a user name and password.

List Available NIMs#

This documentation uses the ngc CLI tool in a number of examples. See the NGC CLI documentation for information on downloading and configure the tool.

Previously, the ngc tool used NGC_API_KEY environment variable, but has since deprecated that environment variable in favor of the NGC_CLI_API_KEY environment variable. In the previous section, you set NGC_API_KEY and the following sections use it in command examples. If this variable set and you run an ngc command, the command warns you by saying it is deprecated in favor of NGC_CLI_API_KEY. You can safely ignore this warning. Even if you set the NGC_CLI_API_KEY environment variable, as long as NGC_API_KEY is set, you get the warning.

Use the following command to list the available NIMs in CSV format.

ngc registry image list --format_type csv 'nvcr.io/nim/nvidia/nvclip*'

This command should produce output in the following format:

Name,Repository,Latest Tag,Image Size,Updated Date,Permission,Signed Tag?,Access Type,Associated Products
<name1>,<repository1>,<latest tag1>,<image size1>,<updated date1>,<permission1>,<signed tag?1>,<access type1>,<associated products1>
...
<nameN>,<repositoryN>,<latest tagN>,<image sizeN>,<updated dateN>,<permissionN>,<signed tag?N>,<access typeN>,<associated productsN>

Use the **Repository** and **Latest Tag** fields when you call the ``docker run`` command, as shown in the following section.

Launch NIM#

The following command launches a Docker container for the nvidia/nvclip-vit-h-14 model. To launch a container for a different NIM, replace the values of Repository and Latest_Tag with values from the previous image list command and change the value of CONTAINER_NAME to something appropriate.

# Choose a container name for bookkeeping
export CONTAINER_NAME=nvclip

# The container name from the previous ngc registry image list command
Repository=nvclip
Latest_Tag=2.0.0

# Choose a NV-CLIP NIM Image from NGC
export IMG_NAME="nvcr.io/nim/nvidia/${Repository}:${Latest_Tag}"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"

# Start the NV-CLIP NIM
docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -p 8000:8000 \
  $IMG_NAME

Docker Run Parameters#

Flags

Description

-it

--interactive + --tty (see Docker docs)

--rm

Delete the container after it stops (see Docker docs)

--name=nvclip

Give a name to the NIM container for bookkeeping (here nvclip). Use any preferred value.

--runtime=nvidia

Ensure NVIDIA drivers are accessible in the container.

--gpus all

Expose all NVIDIA GPUs inside the container. See the configuration page for mounting specific GPUs.

-e NGC_API_KEY

Provide the container with the token necessary to download adequate models and resources from NGC. See Export the API key.

-v "$LOCAL_NIM_CACHE:/opt/nim/.cache"

Mount a cache directory from your system (~/.cache/nim here) inside the NIM (defaults to /opt/nim/.cache), allowing downloaded models and artifacts to be reused by follow-up runs.

-p 8000:8000

Forward the port where the NIM server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (8000 here), while the right-hand side is the container port where the NIM server is published (defaults to 8000).

$IMG_NAME

Name and version of the NV-CLIP NIM container from NGC. The NV-CLIP NIM server automatically starts if no argument is provided after this.

-e NIM_MANIFEST_PROFILE

Model profile id of the GPU you are running on. See the profiles page.

Note

See the Configuring a NIM topic for information about additional configuration settings.

Note

If you have an issue with permission mismatches when downloading models in your local cache directory, add the `-u $(id -u)` option to the `docker run` call.

Run Inference#

During startup the NIM container downloads the required resources and begins serving the model behind an API endpoint. The following message indicates a successful startup.

INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Once you see this message you can validate the deployment of NIM by executing an inference request. In a new terminal, run the following command to check whether the service is deployed successfully:

curl -X GET 'http://0.0.0.0:8000/v1/health/ready'

Tip

Pipe the results of curl commands into a tool like jq or python -m json.tool to make the output of the API easier to read.

For example: curl -s http://0.0.0.0:8000/v1/health/ready | jq.

This command should produce output similar to the following:

{
"status": "ready"
}

Note

If you are using a windows host, use localhost instead of 0.0.0.0 in the curl requests above to access the loopback interface.

OpenAI Embeddings Request#

OpenAI Embeddings API only supports text support. Embeddings API for NV-CLIP NIM has been extended to include image support as well. Image input can be provided as a base64 str in the format data:image/<image_format>;base64,<base64str>

Important

Update model name according to your requirements. For example, for a nvidia/nvclip-vit-h-14 model, you might use the following command:

curl -X 'POST' \
'http://0.0.0.0:8000/v1/embeddings' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "input": [
            "The quick brown fox jumped over the lazy dog"
        ],
        "model": "nvidia/nvclip-vit-h-14",
        "encoding_format": "float"
    }' | jq .

# Sample Output:
# {
#    "object": "list",
#    "data": [
#        {
#            "index": 0,
#            "embedding": [
#                        < 1024 dimension vector of embedding values >
#                        ],
#            "object": "embedding"
#        }
#    ],
#    "usage": {
#        "num_images": 0,
#        "prompt_tokens": 77,
#        "total_tokens": 77
#    },
#    "model": "nvidia/nvclip-vit-h-14"
# }

You can also use the OpenAI Python API library,and provide image as the base64 string, as shown in the following example:

import base64
import requests
from io import BytesIO

from PIL import Image
from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")

url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png"
image = Image.open(requests.get(url, stream=True).raw)
buffer = BytesIO()
image.save(buffer, format="JPEG")
image_b64 = base64.b64encode(buffer.getvalue()).decode()

response = client.embeddings.create(
    input=["The quick brown fox jumped over the lazy dog",
            f"data:image/png;base64,{image_b64}"],
    model="nvidia/nvclip-vit-h-14",
    encoding_format="float"
)

print(response.data)

# Sample Output
# [Embedding(embedding=[<1024 dimension vector of embedding values>], index=0, object='embedding'),
#  Embedding(embedding=[<1024 dimension vector of embedding values>], index=1, object='embedding')]

Using Langchain:#

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(base_url="http://0.0.0.0:8000/v1",
                            model="nvidia/nvclip-vit-h-14",
                            api_key="not-used",
                            check_embedding_ctx_length=False)


vector = embeddings.embed_query("What is langchain?")
print(vector)

# Sample Output
# [<1024 dimension vector of embedding values>]

Encoding format#

NV-CLIP NIM supports two encoding formats: base64 and float. The encoding_format can be specified in the API call. By default, encoding_format="float". You can provide encoding_format="base64", as shown in the following example:

import base64
import requests
from io import BytesIO

from PIL import Image
from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")

url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png"
image = Image.open(requests.get(url, stream=True).raw)
buffer = BytesIO()
image.save(buffer, format="JPEG")
image_b64 = base64.b64encode(buffer.getvalue()).decode()

response = client.embeddings.create(
    input=["The quick brown fox jumped over the lazy dog",
          f"data:image/png;base64,{image_b64}"],
    model="nvidia/nvclip-vit-h-14",
    encoding_format="base64"
)

print(response.data)

# Sample Output
# [Embedding(embedding=[<1024 dimension vector of embedding values>], index=0, object='embedding'),
#  Embedding(embedding=[<1024 dimension vector of embedding values>], index=1, object='embedding')]

Batching Support#

NV-CLIP NIM supports a max_batch_size of 64 for both text and image input. Multiple images and texts can be provided in the input as list. Output embeddings will be indexed same as the input.

import base64
import requests
from io import BytesIO
from PIL import Image
from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png"
image = Image.open(requests.get(url, stream=True).raw)
buffer = BytesIO()
image.save(buffer, format="JPEG")
image_b64 = base64.b64encode(buffer.getvalue()).decode()


response = client.embeddings.create(
    input=["The quick brown fox jumped over the lazy dog",
            "Nvidia is a great company",
            "",
            f"data:image/png;base64,{image_b64}"],
    model="nvidia/nvclip-vit-h-14",
    encoding_format="float"
)

print(response.data)

# Sample Output
# [Embedding(embedding=[<1024 dimension vector of embedding values>], index=0, object='embedding'),
#  Embedding(embedding=[<1024 dimension vector of embedding values>], index=1, object='embedding')
#  Embedding(embedding=[<1024 dimension vector of embedding values>], index=2, object='embedding')
#  Embedding(embedding=[<1024 dimension vector of embedding values>], index=3, object='embedding')]

Cosine Similarity#

You can calculate cosine similarity between images and text, as shown in the example below:

import base64
import requests
from io import BytesIO
from PIL import Image
from openai import OpenAI
import torch

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png"
image = Image.open(requests.get(url, stream=True).raw)
buffer = BytesIO()
image.save(buffer, format="JPEG")
image_b64 = base64.b64encode(buffer.getvalue()).decode()

embeddings_data = client.embeddings.create(
    input=["",
           f"data:image/png;base64,{image_b64}",
           "The quick brown fox jumped over the lazy dog"],
    model="nvidia/nvclip-vit-h-14",
    encoding_format="float"
).data

all_embeddings = [data.embedding for data in embeddings_data]
image_embeddings = [torch.tensor(embedding) for embedding in all_embeddings[:-1]]

image_embeddings = torch.stack(image_embeddings)
text_embeddings = torch.tensor(all_embeddings[-1])

image_embeddings /= image_embeddings.norm(dim=-1, keepdim=True)
text_embeddings /= text_embeddings.norm(dim=-1, keepdim=True)
probabilities = (100.0 * text_embeddings @ image_embeddings.T).softmax(dim=-1)

probabilities = {f"Image {i+1}": float(d) for i, d in enumerate(probabilities)}
print(probabilities)

# Sample Output
# {'Image 1': 1.0, 'Image 2': 3.4886383559751266e-08}

Stopping the container#

If a Docker container is launched with the --name command line option, you can use the following command to stop the running container.

# In the previous sections, the environment variable CONTAINER_NAME was
# defined using `export CONTAINER_NAME=nvclip`
docker stop $CONTAINER_NAME

Use docker kill if stop is not responsive. Follow either by docker rm $CONTAINER_NAME if you do not intend to restart this container as-is (with docker start $CONTAINER_NAME), in which case you will need to re-use the docker run ... instructions from the beginning of this section to start a new container for your NIM.

If you did not start a container with --name, use the output of the docker ps command to get a container ID for the image you used.