Model Details#

VSS uses the following models:

VLM Models
CA-RAG Models

VSS VLM Models#

VILA 1.5: VILA 1.5 is Video Language Model (VLM) developed by NVIDIA. This model is deployed locally as part of the blueprint.

VILA 1.5 provides users with the following benefits over proprietary models:

Customizability: Fine-tunable model for improved accuracy on your specific use-cases.
Data Privacy: Deploy on-prem where your data is protected, as it’s not shared for inference or training.
Flexible deployment: Deploy anywhere and maintain control and scalability of your model.
Lower Latency: Deploy near the source of data for faster inference.
Lower Cost: Reduced cost of inference when compared to proprietary AI services.

This is the default model used in VSS deployment.

Note

You need an NGC account to access this model.

GPT-4o: VSS offers support to use OpenAI models like GPT-4o as VLM. GPT-4o is used as a remote endpoint.

To use GPT-4o model in VSS, see Configuring for GPT-4o.

Custom VLM Models: VSS supports integrating custom VLM models. Refer to OpenAI Compatible REST API. Based on the implementation, the model could be locally deployed or used as remote endpoint.

VSS CA-RAG Models:#

LLaMA 3.1 70b Instruct: The LLaMA 3.1 70b Instruct NIM is used for Guardrails and by CA-RAG for summarization. This model is deployed locally as part of the blueprint.

NVIDIA Retrieval QA Llama3.2 1b v2 Embedding: The NVIDIA Retrieval QA Llama3.2 1b Embedding NIM is used as a text embedding model for text captions and query. This model is deployed locally as part of the blueprint.

NVIDIA Retrieval QA Llama3.2 1b v2 Reranking: The NVIDIA Retrieval QA Llama3.2 1b Reranking NIM is used as a reranking model for Q&A. This model is deployed locally as part of the blueprint.

GPT-4o: GPT-4o API is used for tool calling as part of GraphRAG for Q&A. GPT-4o is used as a remote endpoint.