Release Notes#

These Release Notes describe the key features, software enhancements and improvements, and known issues for the VSS release product package.

VSS 3.1.0#

These are the VSS 3.1.0 Release Notes. This is an early access release including a refactored architecture and new features. There are some features in an alpha state and should not be used in production.

Key Features and Enhancements#

Updated the Search Agent Workflow to introduce attribute search, multi-embedding fusion search, and a critic agent to review search results.
Updated the Real-Time Computer Vision (RT-CV) microservice to support embedding generation for detected objects. This release supports two embedding models: RADIO-CLIP and SigLIP2.
Updated Brev Launchable deployment to support the 3.X architecture and deploy all of the agent workflows.
Added support for AGX Thor and DGX Spark with hybrid deployment (remote LLM) of the Base and Alerts profiles.
Added additional deployment options for undefined hardware profiles.

VSS 3.0.0#

These are the VSS 3.0.0 Release Notes. This is an early access release including a refactored architecture and new features. There are some features in an alpha state and should not be used in production.

Key Features and Enhancements#

Updated the out-of-the-box-experience which includes launching a minimal vision agent and allowing developers to add on agent workflows using a combination of microservices. Agent workflows available in this release are:
- Report generation and Q&A: The agent can generate templated reports and answer questions using the VLM. This is part of the base agent profile in Quickstart.
- Video summarization: The agent can generate long video summaries with time-stamped highlights.
- Alert verification: Augment existing CV pipelines with VLMs to verify events and extract additional insights.
- Real-Time VLM alerts: Generate tail-end alerts using VLM.
- Search: Open vocabulary search for actions and events. This is an alpha feature.
Introduced 2 industry-specific, large scale, blueprint examples for smart cities and warehouses.
Modularized the VSS architecture, introducing new microservices and APIs.
Introduced a top-level agent, capable of planning and executing vision-based workflows leveraging the new microservices.
Introduced Real-Time Video Intelligence (RTVI) microservices for accelerated feature extraction from stored and streamed video. Three microservices are included in this release:
- Real-Time VLM (RT-VLM): Generates captions and alerts for live streams using Vision Language Models.
- Real-Time Embedding (RT-Embedding): Generates embeddings for live streams and video files.
- Real-Time Computer Vision (RT-CV): Detects and tracks objects in live streams and video files.
Refactored video summarization workflow into a new microservice, Long Video Summarization (LVS).
Introduced Video IO and Storage (VIOS) microservices, to manage video (stored and streamed), recording, and playback.
Introduced Behavior Analytics microservice, to setup heuristics for event creation based on computer vision metadata.
Introduced calibration microservices, to calibrate the camera position and orientation for 3D and multi-view applications.
Integrated a new API Gateway / MCP (Model Context Protocol) server to route requests to the appropriate microservices.

VSS 2.4.1#

These are the VSS 2.4.1 Release Notes.

Key Features and Enhancements#

Support for NVIDIA Cosmos-Reason2 VLM
Support for Qwen3-VL models including Qwen3-VL-30B-A3B-Instruct and Qwen3-VL-8B-Instruct VLM
Support for GH200 and GB200 platforms.
Removed support for VILA-1.5 and NVILA models.

VSS 2.4.0#

These are the VSS 2.4.0 Release Notes.

Key Features and Enhancements#

Support for NVIDIA Cosmos-Reason1 VLM
Two new APIs
- /generate_vlm_captions to generate VLM captions for a video without summarization.
- /reviewAlert to review an externally generated alert using VLM.
New reference deployment, Event Reviewer, to demonstrate review of an externally generated alert using a VLM.
VSS accuracy evaluation framework to evaluate accuracy on your own videos.
New parameters in the /summarize API:
- system_prompt - System prompt for the VLM.
New retrieval strategies for CA-RAG.

VSS 2.3.1#

These are the VSS 2.3.1 Release Notes.

Key Features and Enhancements#

Support for NVIDIA Blackwell B200 GPU
OneClick script support for GCP deployments
Performance improvements for file burst mode

VSS 2.3.0#

These are the VSS 2.3.0 Release Notes.

Key Features and Enhancements#

Support for Audio in Summarization and Q&A
Support for preprocessing a video to generate Set of Marks (SOM) prompting and additional CV metadata for better accuracy
Multi-stream support for Q&A
Gradio UI Improvements

Additional runtime parameters that can be configured through the /summarize API

summarize_top_p, summarize_temperature, summarize_max_tokens

LLM Sampling parameters for summarization.

chat_top_p, chat_temperature, chat_max_tokens

LLM Sampling parameters for Q&A

notification_top_p, notification_temperature, notification_max_tokens

LLM Sampling parameters for alerts/event detection.

New API /alerts/recent to get recent alerts for all live streams.
Stability improvements
Single GPU Deployment

VSS 2.2.0#

These are the VSS 2.2.0 Release Notes. This release is an Engineering Release to introduce some of the new features. This release includes several fixes from the previous VSS releases and additional changes.

Key Features and Enhancements#

Enhanced multi-stream / concurrent mode support
GraphRAG performance improvements.
Support for NVILA research model.

Additional runtime parameters that can be configured through the /summarize API

vlm_input_width, vlm_input_height

Configure the input resolution of the frames to the VLM

num_frames_per_chunk

Configure the number of frames to sample from each chunk

summarize_batch_size

LLM Batch Size for summarization.

rag_top_k

Number of top rerank results to use during Q&A

rag_batch_size

Number of VLM captions to be batched together for creating graph

summarize_top_p, summarize_temperature, summarize_max_tokens	LLM Sampling parameters for summarization.
chat_top_p, chat_temperature, chat_max_tokens	LLM Sampling parameters for Q&A
notification_top_p, notification_temperature, notification_max_tokens	LLM Sampling parameters for alerts/event detection.

vlm_input_width, vlm_input_height	Configure the input resolution of the frames to the VLM
num_frames_per_chunk	Configure the number of frames to sample from each chunk
summarize_batch_size	LLM Batch Size for summarization.
rag_top_k	Number of top rerank results to use during Q&A
rag_batch_size	Number of VLM captions to be batched together for creating graph