Configure the VLM#
VSS is designed to be configurable with many VLMs, such as:
VSS supports integrating custom VLM models. Depending on the model to be integrated, some configurations must be updated or the interface code is implemented. The model can ONLY be selected at initialization time.
Following segments explain those approaches in details.
Configuring for GPT-4o#
Obtain OpenAI API Key#
VSS does not use OpenAI GPT-4o by default. This is required only when using the GPT-4o model as the VLM or as the LLM for tool calling.
Login at: https://platform.openai.com/apps.
Select API.
Create a new API key for your project at: https://platform.openai.com/api-keys.
Make sure you have access to GPT-4o model at https://platform.openai.com/apps.
Make sure you have enough credits available at Settings > Usage and be educated on rate limits at Settings > Limits. https://platform.openai.com/settings/organization/usage
Store the generated API Key securely for future use.
Override the configuration#
To use GPT-4o as the VLM model in VSS, see VLM Model to Use and modify the env variable VLM_MODEL_TO_USE
in the .env
file.
Obtain the OpenAI API Key as described in Obtain OpenAI API Key.
export VLM_MODEL_TO_USE=openai-compat export OPENAI_API_KEY=<your-openai-api-key>
Configuring for Fine-tuned VILA 1.5 (LoRA)#
Note
Fine tuning VILA 1.5 is no longer supported. Please move to use NVILA 15B HighRes instead. More details in Fine-tuning NVILA model (LoRA). Below section is for users who already have LoRA finetuning done with VILA 1.5.
Custom finetuned Low-Rank Adaptation (LoRA) checkpoints for VILA 1.5 can be used with VSS and have demonstrated improved accuracy as compared to the base VILA 1.5 model. Once you have a fine-tuned checkpoint, follow the steps to configure VSS to use it as the VLM:
Copy the LoRA checkpoint to a directory
<LORA_CHECKPOINT_DIR>
. The contents of the directory should be similar to:$ ls <LORA_CHECKPOINT_DIR> adapter_config.json adapter_model.safetensors config.json non_lora_trainables.bin trainer_state.json
Make the
<LORA_CHECKPOINT_DIR>
directory writable since VSS will generate the TensorRT-LLM weights for the LoRA.
chmod -R a+w <LORA_CHECKPOINT_DIR>
Set below env variables in the .env file and make sure VILA 1.5 is being used as the base model.
export VILA_LORA_PATH=<LORA_CHECKPOINT_DIR> export VLM_MODEL_TO_USE=vila-1.5 export MODEL_PATH="ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8" export MODEL_ROOT_DIR=<MODEL_ROOT_DIR_ON_HOST>
Note
Make sure <LORA_CHECKPOINT_DIR>
is a directory under <MODEL_ROOT_DIR_ON_HOST>
.
<MODEL_ROOT_DIR_ON_HOST>
is a parent directory on the host machine for all the models.
Configuring for NVILA model#
To deploy VSS with the NVILA HighRes model, set the following env variables in the .env file:
export VLM_MODEL_TO_USE=nvila
export MODEL_PATH="ngc:nvidia/tao/nvila-highres:nvila-lite-15b-highres-lita"
Note
The NVILA model can also be loaded from host using a mounted path. To do this, please follow instructions in Configuring for locally downloaded VILA 1.5 / NVILA checkpoint.
Fine-tuning NVILA model (LoRA)#
To finetune with NVILA HighRes model, download the fine-tuning microservice container and then follow the steps in the fine-tuning microservice notebook from NGC.
Fuse NVILA base model with a custom LoRA checkpoint#
Note
The fused NVILA model can also be loaded from host using a mounted path. To do this, please follow instructions in Configuring for locally downloaded VILA 1.5 / NVILA checkpoint.
Download the NVILA base model and the LoRA checkpoint into the local machine.
cd $DIRECTORY_WITH_BASE_MODEL_AND_LORA_CHECKPOINT
Install VILA and dependencies.
sudo apt install libnccl2 libnccl-dev
git clone https://github.com/NVlabs/VILA.git
Follow steps in the Installation guide for VILA here to install conda and setup the VILA environment.
Download the following python script into the folder:
#To run:
#python3 run_nvila_fuse.py lora-llm-v1/ nvila-15b-lite-highres-v1/ fused/fused_with_lora_nvila_15b
import argparse
import sys
# Add the VILA directory to the Python path
sys.path.append('VILA')
# Import the llava package
import llava
def main(lora_checkpoint, model_base, output_path):
# Load the model using the provided arguments
model = llava.load(lora_checkpoint, model_base=model_base)
# Save the model to the specified output path
model.save_pretrained(output_path)
print(f"Model saved to {output_path}")
if __name__ == "__main__":
# Set up argument parser
parser = argparse.ArgumentParser(description="Load and save a llava model with specified LoRA checkpoints fused into the base model.")
parser.add_argument("lora_checkpoint", type=str, help="Path to the LoRA checkpoint.")
parser.add_argument("model_base", type=str, help="Path to the model base checkpoint.")
parser.add_argument("output_path", type=str, help="Path to save the merged model.")
# Parse arguments
args = parser.parse_args()
# Run the main function with parsed arguments
main(args.lora_checkpoint, args.model_base, args.output_path)
Run the python script viz:
#python3 run_nvila_fuse.py lora-checkpoint-folder-path/ nvila-15b-path/ output_directory
#Example:
python3 run_nvila_fuse.py lora-llm-v1/ nvila-15b-lite-highres-v1/ fused_with_lora_nvila_15b
On successful run, the following logs will be printed:
Loading additional LLaVA weights...
Loading LoRA weights...
Merging LoRA weights...
Model is loaded...
saving llm to fused_with_lora_nvila_15b/llm
saving vision_tower to fused_with_lora_nvila_15b/vision_tower
saving mm_projector to fused_with_lora_nvila_15b/mm_projector
Model saved to fused_with_lora_nvila_15b
Please refer to the section below to use the fused NVILA checkpoint with VSS.
Configuring for locally downloaded VILA 1.5 / NVILA checkpoint#
To deploy VSS with a locally downloaded VILA 1.5 / NVILA checkpoint, set the following env variables in the .env file:
export VLM_MODEL_TO_USE=vila-1.5 # or nvila
export MODEL_PATH=</path/to/local/vila-checkpoint>
export MODEL_ROOT_DIR=<MODEL_ROOT_DIR_ON_HOST>
The vila checkpoint directory </path/to/local/vila-checkpoint>
contents should be similar to:
$ ls </path/to/local/vila-checkpoint>
config.json llm mm_projector trainer_state.json vision_tower
Note
Make sure </path/to/local/vila-checkpoint>
is a directory under <MODEL_ROOT_DIR_ON_HOST>
.
<MODEL_ROOT_DIR_ON_HOST>
is a parent directory on the host machine for all the models.
OpenAI Compatible REST API#
If the VLM model provides an OpenAI compatible REST API, set the following env variables in the .env
file:
export VLM_MODEL_TO_USE=openai-compat
export OPENAI_API_KEY=<openai-api-key> # Optional. Can be set if using OpenAI endpoint
export VIA_VLM_API_KEY=<> # Optional. Can be set if using a custom endpoint
export VIA_VLM_OPENAI_MODEL_DEPLOYMENT_NAME=<>
export VIA_VLM_ENDPOINT=<>
export OPENAI_API_VERSION=<> # Optional
For more details on the above environment variables, please refer to VLM Configuration.
Other Custom Models#
VSS allows you to drop in your own models to the model directory by providing the pre-trained weight of the model and implementing an interface to bridge to the VSS pipeline.
The interface includes an inference.py
file and a manifest.yaml
.
In the inference.py
, you must define a class named Inference
with the following two methods:
def get_embeddings(self, tensor:torch.tensor) -> tensor:torch.tensor:
# Generate video embeddings for the chunk / file.
# Do not implement if explicit video embeddings are not supported by model
return tensor
def generate(self, prompt:str, input:torch.tensor, configs:Dict):
# Generate summary string from the input prompt and frame/embedding input.
# configs contains VLM generation parameters like
# max_new_tokens, seed, top_p, top_k, temperature
return summary
The optional get_embeddings
method is used to generate embeddings for
a given video clip wrapped in a TCHW tensor and must be removed if
the model doesn’t support the feature.
The generate method is used to generate the text summary based on the given prompt and the video clip wrapped in the TCHW tensor.
The generate method supports models that need to be executed locally on the system or models with REST APIs.
Some examples are available at NVIDIA-AI-Blueprints/video-search-and-summarization
Examples include models fuyu8b
and neva
.
The VSS container image or the Blueprint Helm Chart may need to be modified to use custom VLMs.
Example:
For fuyu8b
, you can export the following env variables. For fuyu8b, model weights need to be downloaded,
refer to the Fuyu8b README for more details.
export VLM_MODEL_TO_USE=custom
export MODEL_PATH="</path/to/directory/with/inference.py>"
export MODEL_ROOT_DIR=<MODEL_ROOT_DIR_ON_HOST>
Once model weights are downloaded using the Fuyu8b README, the directory structure should look like:
ls /path/to/fuyu8b
inference.py fuyu8b model-00002-of-00002.safetensors skateboard.png
architecture.png generation_config.json model.safetensors.index.json special_tokens_map.json
bus.png added_tokens.json preprocessor_config.json tokenizer_config.json
chart.png manifest.yaml __pycache__ tokenizer.json
config.json model-00001-of-00002.safetensors README.md tokenizer.model
For neva
, you can export the following env variables. For NVIDIA_API_KEY
, refer to Using NIMs from build.nvidia.com.
export NVIDIA_API_KEY=<nvidia-api-key>
export VLM_MODEL_TO_USE=custom
export MODEL_PATH="</path/to/directory/with/inference.py>"
export MODEL_ROOT_DIR=<MODEL_ROOT_DIR_ON_HOST>
Directory structure for neva looks like:
ls /path/to/neva
inference.py manifest.yaml
Note
Make sure </path/to/directory/with/inference.py>
is a directory under <MODEL_ROOT_DIR_ON_HOST>
.
<MODEL_ROOT_DIR_ON_HOST>
is a parent directory on the host machine for all the models.