Configure the VLM#

VSS is designed to be configurable with many VLMs, such as:

3rd-Party Endpoints
Community Models

VSS supports integrating custom VLM models. Depending on the model to be integrated, some configurations must be updated or the interface code is implemented. The model can ONLY be selected at initialization time.

Following segments explain those approaches in details.

3rd-Party VLM Endpoints#

We provide the option to utilize externally hosted 3rd-party VLMs which follow the OpenAI API standard. Access to these endpoints are provided through the 3rd party.

Supported Model	Developer
GPT4o	OpenAI

OpenAI (GPT-4o)#

To use GPT-4o as the VLM model in VSS, see Configuration Options and modify the config VLM_MODEL_TO_USE.

Pre-Requisite: API key from https://platform.openai.com/api-keys

Steps:

Fetch the Helm Chart following Deploy Using Helm.
Create a new overrides.yaml file.
Copy the example overrides file from Configuration Options.

Edit the overrides.yaml file and change VLM_MODEL_TO_USE to value: openai-compat and add the environment variable for the OPENAI_API_KEY as shown below.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: openai-compat
          - name: OPENAI_API_KEY
            valueFrom:
              secretKeyRef:
                name: openai-api-key-secret
                key: OPENAI_API_KEY

Create the OpenAI API Key secret:

sudo microk8s kubectl create secret generic openai-api-key-secret --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY
Install the Helm Chart:

sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.3.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f overrides.yaml
Follow steps to Launch VSS UI at Launch VSS UI.

Community Models#

We support multiple community models that are open source, developed through research, or offered by 3rd-parties. If the VLM model provides an OpenAI compatible REST API, refer to Configuration Options. Here is a list of models tested within VSS and steps:

Supported Model	Developer	Size (Paramenters)
NVILA	NVIDIA	15b
NEVA	NVIDIA	22b
Fuyu	NVIDIA	8b

Local NGC Models (VILA & NVILA)#

Follow the steps below to use VLM weights that have been downloaded to a local filepath. This can be used as an alternative way to deploy the VILA 34b model and must be used for NVILA HighRes 15b model.

Download the NGC CLI which will be able to download the models to a specified location.

# Download NGC CLI
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.64.2/files/ngccli_linux.zip -O ngccli_linux.zip && unzip ngccli_linux.zip
chmod u+x ngc-cli/ngc
export PATH="$PATH:$(pwd)/ngc-cli"

Download the model weights you wish to store locally.

VILA 34b

# Download the VILA weights
export NGC_API_KEY=<your-legacy-api-key>
export NGC_CLI_ORG=nim
export NGC_CLI_TEAM=nvidia
ngc registry model download-version "nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"
chmod a+w vila-1.5-40b_vvila-yi-34b-siglip-stage3_1003_video_v8

NVILA HighRes 15b

# Download the NVILA weights
ngc registry model download-version "nvidia/tao/nvila-highres:nvila-lite-15b-highres-lita"
chmod a+w nvila-highres_vnvila-lite-15b-highres-lita

The NVILA weights, for example, will be downloaded to <current-directory>/nvila-highres_vnvila-lite-15b-highres-lita. Use this path to mount the weights as shown in the next step.

Specify the following in your overrides file (see Configuration Options):

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: vila-1.5  # or nvila
          - name: MODEL_PATH
            value: "/tmp/vila"
  extraPodVolumes:
  - name: local-vila-checkpoint
    hostPath:
      path: </path/to/local/vila-checkpoint>
  extraPodVolumeMounts:
  - name: local-vila-checkpoint
    mountPath: /tmp/vila

The vila checkpoint directory </path/to/local/vila-checkpoint> contents should be similar to:

$ ls </path/to/local/vila-checkpoint>
config.json  llm  mm_projector  trainer_state.json  vision_tower

Install the Helm Chart

OpenAI Compatible REST API#

If the VLM model provides an OpenAI compatible REST API, refer to Configuration Options.

Other Custom Models#

VSS allows you to drop in your own models to the model directory by providing the pre-trained weight of the model and implementing an interface to bridge to the VSS pipeline.

The interface includes an inference.py file and a manifest.yaml.

In the inference.py, you must define a class named Inference with the following two methods:

def get_embeddings(self, tensor:torch.tensor) -> tensor:torch.tensor:
   # Generate video embeddings for the chunk / file.
   # Do not implement if explicit video embeddings are not supported by model
   return tensor

def generate(self, prompt:str, input:torch.tensor, configs:Dict):
   # Generate summary string from the input prompt and frame/embedding input.
   # configs contains VLM generation parameters like
   # max_new_tokens, seed, top_p, top_k, temperature
   return summary

The optional get_embeddings method is used to generate embeddings for a given video clip wrapped in a TCHW tensor and must be removed if the model doesn’t support the feature.

The generate method is used to generate the text summary based on the given prompt and the video clip wrapped in the TCHW tensor.

The generate method supports models that need to be executed locally on the system or models with REST APIs.

Some examples are available at NVIDIA-AI-Blueprints/video-search-and-summarization

Examples include models fuyu8b and neva.

The VSS container image or the Blueprint Helm Chart may need to be modified to use custom VLMs. Configuration Options mentions how to use a custom VSS container image and how to specify the model path for custom models. If mounting of custom paths is required, the VSS subchart in the Blueprint Helm Chart can be modified to mount the custom paths.

Example:

For fuyu8b, model weights need to be downloaded, refer to the Fuyu8b README for more details.

Once model weights are downloaded using the Fuyu8b README, the directory structure should look like:

ls /path/to/fuyu8b

inference.py                  fuyu8b                          model-00002-of-00002.safetensors  skateboard.png
architecture.png              generation_config.json          model.safetensors.index.json      special_tokens_map.json
bus.png                       added_tokens.json                    preprocessor_config.json          tokenizer_config.json
chart.png                     manifest.yaml                   __pycache__                       tokenizer.json
config.json                   model-00001-of-00002.safetensors README.md                        tokenizer.model

Directory structure for neva looks like:

ls /path/to/neva

inference.py                  manifest.yaml

Next, you can add the following to the Helm overrides file using the instructions in Configuration Options. For NVIDIA_API_KEY, refer to Using NIMs from build.nvidia.com.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: custom
          - name: MODEL_PATH
            value: "/tmp/custom-model"
          # Needed in case using neva.
          #- name: NVIDIA_API_KEY
          #  valueFrom:
          #    secretKeyRef:
          #      name: nvidia-api-key-secret
          #      key: NVIDIA_API_KEY
  extraPodVolumes:
  - name: custom-model
    hostPath:
      path: /path/to/fuyu8b # contains inference.py and manifest.yaml
  extraPodVolumeMounts:
  - name: custom-model
    mountPath: /tmp/custom-model

Note

Custom VLM models may not work well with GPU-sharing topology.