Configure the VLM#

VSS is designed to be configurable with many VLMs, such as:

VSS supports integrating custom VLM models. Depending on the model to be integrated, some configurations must be updated or the interface code is implemented. The model can ONLY be selected at initialization time.

Following segments explain those approaches in details.

Configuring for GPT-4o#

Obtain OpenAI API Key#

VSS does not use OpenAI GPT-4o by default. This is required only when using the GPT-4o model as the VLM or as the LLM for tool calling.

  1. Login at: https://platform.openai.com/apps.

  2. Select API.

  3. Create a new API key for your project at: https://platform.openai.com/api-keys.

    Create new secret key

Make sure you have access to GPT-4o model at https://platform.openai.com/apps.

Make sure you have enough credits available at Settings > Usage and be educated on rate limits at Settings > Limits. https://platform.openai.com/settings/organization/usage

  1. Store the generated API Key securely for future use.

Override the configuration#

To use GPT-4o as the VLM model in VSS, see Configuration Options and modify the config VLM_MODEL_TO_USE.

Overview of the steps to do this:

  1. Fetch the Helm Chart following Deploy Using Helm.

  2. Create a new overrides.yaml file.

  3. Copy the example overrides file from Configuration Options.

  4. Edit the overrides.yaml file and change VLM_MODEL_TO_USE to value: openai-compat and add the environment variable for the OPENAI_API_KEY as shown below.

    vss:
      applicationSpecs:
        vss-deployment:
          containers:
            vss:
              env:
              - name: VLM_MODEL_TO_USE
                value: openai-compat
              - name: OPENAI_API_KEY
                valueFrom:
                  secretKeyRef:
                    name: openai-api-key-secret
                    key: OPENAI_API_KEY
    
  5. Obtain the OpenAI API Key as described in Obtain OpenAI API Key.

  6. Create the OpenAI API Key secret:

    sudo microk8s kubectl create secret generic openai-api-key-secret --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY

  7. Install the Helm Chart:

    sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.3.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f overrides.yaml

  8. Follow steps to Launch VSS UI at Launch VSS UI.

Configuring for Fine-tuned VILA 1.5 (LoRA)#

Custom finetuned Low-Rank Adaptation (LoRA) checkpoints for VILA 1.5 can be used with VSS and have demonstrated improved accuracy as compared to the base VILA 1.5 model.

Note

Fine tuning VILA 1.5 is no longer supported. Please move to use NVILA 15B HighRes instead. More details in Fine-tuning NVILA model (LoRA). Below section is for users who already have LoRA finetuning done with VILA 1.5.

Once you have a fine-tuned checkpoint, follow the steps to configure VSS to use it as the VLM:

  1. Copy the LoRA checkpoint to a directory <LORA_CHECKPOINT_DIR> on the node where the VSS container will be deployed. The contents of the directory should be similar to:

    $ ls <LORA_CHECKPOINT_DIR>
    adapter_config.json  adapter_model.safetensors  config.json  non_lora_trainables.bin  trainer_state.json
    
  2. Make the <LORA_CHECKPOINT_DIR> directory writable since VSS will generate the TensorRT-LLM weights for the LoRA in the same container.

chmod -R a+w <LORA_CHECKPOINT_DIR>
  1. Add the VILA_LORA_PATH environment variable, extraPodVolumes and extraPodVolumeMounts to the overrides file described in Configuration Options as shown below. Make sure VILA 1.5 is being used as the base model.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: vila-1.5
          - name: MODEL_PATH
            value: "ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"
          - name: VILA_LORA_PATH
            value: /models/lora
  extraPodVolumes:
  - name: lora-checkpoint
    hostPath:
      path: <LORA_CHECKPOINT_DIR>   # Path on host
  - name: secret-ngc-api-key-volume
    secret:
      secretName: ngc-api-key-secret
      items:
      - key: NGC_API_KEY
        path: ngc-api-key
  - name: secret-graph-db-username-volume
    secret:
      secretName: graph-db-creds-secret
      items:
      - key: username
        path: graph-db-username
  - name: secret-graph-db-password-volume
    secret:
      secretName: graph-db-creds-secret
      items:
      - key: password
        path: graph-db-password
  extraPodVolumeMounts:
  - name: lora-checkpoint
    mountPath: /models/lora
  - name: secret-ngc-api-key-volume
    mountPath: /secrets/ngc-api-key
    subPath: ngc-api-key
    readOnly: true
  - name: secret-graph-db-username-volume
    mountPath: /secrets/graph-db-username
    subPath: graph-db-username
    readOnly: true
  - name: secret-graph-db-password-volume
    mountPath: /secrets/graph-db-password
    subPath: graph-db-password
    readOnly: true
  1. Install the Helm Chart:

    sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.3.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f overrides.yaml

  2. Follow steps to Launch VSS UI at Launch VSS UI.

Configuring for NVILA model#

To deploy VSS with the NVILA HighRes model, specify the following in your overrides file (see Configuration Options):

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: nvila
          - name: MODEL_PATH
            value: "ngc:nvidia/tao/nvila-highres:nvila-lite-15b-highres-lita"

Note

The NVILA model can also be loaded from host using a mounted path. To do this, please follow instructions in Configuring for locally downloaded VILA 1.5 / NVILA checkpoint on how to use overrides file and extraPodVolumeMounts.

Fine-tuning NVILA model (LoRA)#

To finetune with NVILA HighRes model, download the fine-tuning microservice container and then follow the steps in the fine-tuning microservice notebook from NGC.

Fuse NVILA base model with a custom LoRA checkpoint#

Note

The fused NVILA model can also be loaded from host using a mounted path. To do this, please follow instructions in Configuring for locally downloaded VILA 1.5 / NVILA checkpoint on how to use overrides file and extraPodVolumeMounts.

  1. Download the NVILA base model and the LoRA checkpoint into the local machine.

cd $DIRECTORY_WITH_BASE_MODEL_AND_LORA_CHECKPOINT
  1. Install VILA and dependencies.

sudo apt install libnccl2 libnccl-dev
git clone https://github.com/NVlabs/VILA.git

Follow steps in the Installation guide for VILA here to install conda and setup the VILA environment.

  1. Download the following python script into the folder:

#To run:
#python3 run_nvila_fuse.py lora-llm-v1/ nvila-15b-lite-highres-v1/ fused/fused_with_lora_nvila_15b

import argparse
import sys

# Add the VILA directory to the Python path
sys.path.append('VILA')

# Import the llava package
import llava

def main(lora_checkpoint, model_base, output_path):
    # Load the model using the provided arguments
    model = llava.load(lora_checkpoint, model_base=model_base)
    
    # Save the model to the specified output path
    model.save_pretrained(output_path)
    print(f"Model saved to {output_path}")

if __name__ == "__main__":
    # Set up argument parser
    parser = argparse.ArgumentParser(description="Load and save a llava model with specified LoRA checkpoints fused into the base model.")
    parser.add_argument("lora_checkpoint", type=str, help="Path to the LoRA checkpoint.")
    parser.add_argument("model_base", type=str, help="Path to the model base checkpoint.")
    parser.add_argument("output_path", type=str, help="Path to save the merged model.")

    # Parse arguments
    args = parser.parse_args()

    # Run the main function with parsed arguments
    main(args.lora_checkpoint, args.model_base, args.output_path)
  1. Run the python script viz:

#python3 run_nvila_fuse.py lora-checkpoint-folder-path/ nvila-15b-path/ output_directory
#Example:
python3 run_nvila_fuse.py lora-llm-v1/ nvila-15b-lite-highres-v1/ fused_with_lora_nvila_15b
  1. On successful run, the following logs will be printed:

Loading additional LLaVA weights...
Loading LoRA weights...
Merging LoRA weights...
Model is loaded...
saving llm to fused_with_lora_nvila_15b/llm
saving vision_tower to fused_with_lora_nvila_15b/vision_tower
saving mm_projector to fused_with_lora_nvila_15b/mm_projector
Model saved to fused_with_lora_nvila_15b

Please refer to the section below to use the fused NVILA checkpoint with VSS.

Configuring for locally downloaded VILA 1.5 / NVILA checkpoint#

To deploy VSS with a locally downloaded VILA 1.5 / NVILA checkpoint, specify the following in your overrides file (see Configuration Options):

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: vila-1.5  # or nvila
          - name: MODEL_PATH
            value: "/tmp/vila"
  extraPodVolumes:
  - name: local-vila-checkpoint
    hostPath:
      path: </path/to/local/vila-checkpoint>
  extraPodVolumeMounts:
  - name: local-vila-checkpoint
    mountPath: /tmp/vila

The vila checkpoint directory </path/to/local/vila-checkpoint> contents should be similar to:

$ ls </path/to/local/vila-checkpoint>
config.json  llm  mm_projector  trainer_state.json  vision_tower

OpenAI Compatible REST API#

If the VLM model provides an OpenAI compatible REST API, refer to Configuration Options.

Other Custom Models#

VSS allows you to drop in your own models to the model directory by providing the pre-trained weight of the model and implementing an interface to bridge to the VSS pipeline.

The interface includes an inference.py file and a manifest.yaml.

In the inference.py, you must define a class named Inference with the following two methods:

def get_embeddings(self, tensor:torch.tensor) -> tensor:torch.tensor:
   # Generate video embeddings for the chunk / file.
   # Do not implement if explicit video embeddings are not supported by model
   return tensor

def generate(self, prompt:str, input:torch.tensor, configs:Dict):
   # Generate summary string from the input prompt and frame/embedding input.
   # configs contains VLM generation parameters like
   # max_new_tokens, seed, top_p, top_k, temperature
   return summary

The optional get_embeddings method is used to generate embeddings for a given video clip wrapped in a TCHW tensor and must be removed if the model doesn’t support the feature.

The generate method is used to generate the text summary based on the given prompt and the video clip wrapped in the TCHW tensor.

The generate method supports models that need to be executed locally on the system or models with REST APIs.

Some examples are available at NVIDIA-AI-Blueprints/video-search-and-summarization

Examples include models fuyu8b and neva.

The VSS container image or the Blueprint Helm Chart may need to be modified to use custom VLMs. Configuration Options mentions how to use a custom VSS container image and how to specify the model path for custom models. If mounting of custom paths is required, the VSS subchart in the Blueprint Helm Chart can be modified to mount the custom paths.

Example:

For fuyu8b, model weights need to be downloaded, refer to the Fuyu8b README for more details.

Once model weights are downloaded using the Fuyu8b README, the directory structure should look like:

ls /path/to/fuyu8b

inference.py                  fuyu8b                          model-00002-of-00002.safetensors  skateboard.png
architecture.png              generation_config.json          model.safetensors.index.json      special_tokens_map.json
bus.png                       added_tokens.json                    preprocessor_config.json          tokenizer_config.json
chart.png                     manifest.yaml                   __pycache__                       tokenizer.json
config.json                   model-00001-of-00002.safetensors README.md                        tokenizer.model

Directory structure for neva looks like:

ls /path/to/neva

inference.py                  manifest.yaml

Next, you can add the following to the Helm overrides file using the instructions in Configuration Options. For NVIDIA_API_KEY, refer to Using NIMs from build.nvidia.com.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: custom
          - name: MODEL_PATH
            value: "/tmp/custom-model"
          # Needed in case using neva.
          #- name: NVIDIA_API_KEY
          #  valueFrom:
          #    secretKeyRef:
          #      name: nvidia-api-key-secret
          #      key: NVIDIA_API_KEY
  extraPodVolumes:
  - name: custom-model
    hostPath:
      path: /path/to/fuyu8b # contains inference.py and manifest.yaml
  extraPodVolumeMounts:
  - name: custom-model
    mountPath: /tmp/custom-model

Note

Custom VLM models may not work well with GPU-sharing topology.