Release Notes#
These Release Notes describe the key features, software enhancements and improvements, and known issues for the VSS release product package.
VSS 3.2.0#
These are the VSS 3.2.0 Release Notes. This release hardens the 3.X architecture and documents the developer profiles.
Key Features and Enhancements#
Agent ready: Agent Skills are now available for agent-driven deployment and operations.
Updated the Video Summarization developer profile to integrate the current Video Summarization microservice and API flow.
Updated Video Summarization to use RTVI-VLM as the default VLM path for video understanding and summarization.
Added support for generating one report across multiple uploaded videos.
Added Helm chart support for the Video Summarization developer profile.
Clarified RTSP delete behavior for the Search profile: live stream registrations are removed while VST storage and indexed data remain until lifecycle cleanup.
Added optional audio-in-video support on the base agent profile: remote Nemotron 3 Nano Omni VLM with audio-enabled prompts and configurations.
Added MV3DT-based 3D multi-camera detection and tracking to the RTVI-CV vision microservice, and integrated it into a new Warehouse Blueprint application.
Early Access (EA) Features#
Early Access (EA) features are available for evaluation but are not recommended for production use.
Video Summarization (live streams): Stream caption generation, stream summaries, stream report generation, and stored-caption Q&A.
Search: Natural language search across video archives, including embed search, attribute search, fusion search, Search by Image (bounding box selection), and critic-agent verification. This release adds image-based search using selected bounding boxes on paused video frames, enables critic-agent support by default with a per-query Enable Critic toggle, updates Search profile GPU guidance for two base GPUs plus an optional local VLM GPU, and updates RTVI-CV and Search defaults to SigLIP v2 object embeddings with 1152 dimensions.
NemoClaw + VSS: The
deploy_nemoclaw_vss.ipynbnotebook for deploying and operating VSS from the NemoClaw / OpenClaw chat UI, bundled with the VSS OpenClaw plugin, VSS Agent Skills, a VSS policy preset, and the host-side VSS Orchestrator MCP server.
Known Issues and Limitations#
Video Summarization live-stream caption generation, stream summary, stream report generation, and Q&A based on captions in Elasticsearch are Early Access (EA) features.
Requests for an RTSP stream summary return empty results if no captions are available in Elasticsearch for the requested time period. This can happen if:
The requested time period is before caption generation started.
The caption generation prompt causes no captions to be stored in Elasticsearch for the requested time period.
If multiple agent sessions or agent instances connect to the same backend, either simultaneously or at different times, the caption generation prompt is overwritten by the latest query.
The agent has no visibility into caption prompts set by other agent instances.
The OpenClaw chat UI can display a duplicate assistant response for a single user query. The duplicate is a rendering issue only; the underlying agent processes the query once and the tool invocations and results are unaffected. Refresh the chat or ignore the duplicated bubble.
The OpenClaw UI can stop rendering agent responses after running for some time and appear stale, for example while periodic VSS deployment status polling continues in the background. Workaround: refresh the UI to see the latest responses.
Docker Engine 29.5.0 and later can fail to pull some NGC-hosted image tags after the image layers download, with an
error from registry: Incorrect Repository Formatmessage. Use a supported Docker Engine version earlier than 29.5.0. If you must use Docker Engine 29.5.0 or later, add or merge the following daemon-side override in/etc/docker/daemon.jsonand restart Docker:{ "features": { "containerd-snapshotter": false } }
This disables the containerd snapshotter image store path for the Docker daemon and uses the legacy Docker graphdriver image store. Preserve any existing daemon settings, such as the required
exec-optscgroup driver configuration.Agent with audio-enabled configuration for Nemotron 3 Nano Omni model is known to encounter multiple retry issues with videos that are missing audio or non-compatible audio streams.
After deploying with the Brev Launchable, Brev can occasionally report
services unhealthyeven though Jupyter Notebook, VSS deployment, and VSS skills continue to function. This appears to be an intermittent Brev status issue and is harmless for VSS operations.(Search profile, Brev 2-GPU environments): When using local VLM, the Search critic is automatically disabled with a warning to avoid invalid local VLM GPU assignment. As a result, critic-based verification is unavailable unless you use a remote VLM or a host with more than 2 GPUs.
[WARN] Brev environment has 2 GPU(s). Disabling Search critic to avoid starting the local VLM on GPU 2.
VSS 3.1.0#
These are the VSS 3.1.0 Release Notes. This is an early access release including a refactored architecture and new features. There are some features in an alpha state and should not be used in production.
Key Features and Enhancements#
Updated the Search Agent Workflow to introduce attribute search, multi-embedding fusion search, and a critic agent to review search results.
Updated the Real-Time Computer Vision (RT-CV) microservice to support embedding generation for detected objects. This release supports two embedding models: RADIO-CLIP and SigLIP2.
Updated Brev Launchable deployment to support the 3.X architecture and deploy all of the agent workflows.
Added support for AGX Thor and DGX Spark with hybrid deployment (remote LLM) of the Base and Alerts profiles.
Added additional deployment options for undefined hardware profiles.
VSS 3.0.0#
These are the VSS 3.0.0 Release Notes. This is an early access release including a refactored architecture and new features. There are some features in an alpha state and should not be used in production.
Key Features and Enhancements#
Updated the out-of-the-box-experience which includes launching a minimal vision agent and allowing developers to add on agent workflows using a combination of microservices. Agent workflows available in this release are:
Report generation and Q&A: The agent can generate templated reports and answer questions using the VLM. This is part of the base agent profile in Quickstart.
Video summarization: The agent can generate long video summaries with time-stamped highlights.
Alert verification: Augment existing CV pipelines with VLMs to verify events and extract additional insights.
Real-Time VLM alerts: Generate tail-end alerts using VLM.
Search: Open vocabulary search for actions and events. This is an alpha feature.
Introduced 2 industry-specific, large scale, blueprint examples for smart cities and warehouses.
Modularized the VSS architecture, introducing new microservices and APIs.
Introduced a top-level agent, capable of planning and executing vision-based workflows leveraging the new microservices.
Introduced Real-Time Video Intelligence (RTVI) microservices for accelerated feature extraction from stored and streamed video. Three microservices are included in this release:
Real-Time VLM (RT-VLM): Generates captions and alerts for live streams using Vision Language Models.
Real-Time Embedding (RT-Embedding): Generates embeddings for live streams and video files.
Real-Time Computer Vision (RT-CV): Detects and tracks objects in live streams and video files.
Refactored video summarization workflow into the Video Summarization microservice.
Introduced Video IO and Storage (VIOS) microservices, to manage video (stored and streamed), recording, and playback.
Introduced Behavior Analytics microservice, to setup heuristics for event creation based on computer vision metadata.
Introduced calibration microservices, to calibrate the camera position and orientation for 3D and multi-view applications.
Integrated a new API Gateway / MCP (Model Context Protocol) server to route requests to the appropriate microservices.
VSS 2.4.1#
These are the VSS 2.4.1 Release Notes.
Key Features and Enhancements#
Support for NVIDIA Cosmos-Reason2 VLM
Support for Qwen3-VL models including Qwen3-VL-30B-A3B-Instruct and Qwen3-VL-8B-Instruct VLM
Support for GH200 and GB200 platforms.
Removed support for VILA-1.5 and NVILA models.
VSS 2.4.0#
These are the VSS 2.4.0 Release Notes.
Key Features and Enhancements#
Support for NVIDIA Cosmos-Reason1 VLM
Two new APIs
/generate_vlm_captionsto generate VLM captions for a video without summarization./reviewAlertto review an externally generated alert using VLM.
New reference deployment, Event Reviewer, to demonstrate review of an externally generated alert using a VLM.
VSS accuracy evaluation framework to evaluate accuracy on your own videos.
New parameters in the
/summarizeAPI:system_prompt- System prompt for the VLM.
New retrieval strategies for CA-RAG.
VSS 2.3.1#
These are the VSS 2.3.1 Release Notes.
Key Features and Enhancements#
Support for NVIDIA Blackwell B200 GPU
OneClick script support for GCP deployments
Performance improvements for file burst mode
VSS 2.3.0#
These are the VSS 2.3.0 Release Notes.
Key Features and Enhancements#
Support for Audio in Summarization and Q&A
Support for preprocessing a video to generate Set of Marks (SOM) prompting and additional CV metadata for better accuracy
Multi-stream support for Q&A
Gradio UI Improvements
Additional runtime parameters that can be configured through the
/summarizeAPIsummarize_top_p, summarize_temperature, summarize_max_tokens
LLM Sampling parameters for summarization.
chat_top_p, chat_temperature, chat_max_tokens
LLM Sampling parameters for Q&A
notification_top_p, notification_temperature, notification_max_tokens
LLM Sampling parameters for alerts/event detection.
New API
/alerts/recentto get recent alerts for all live streams.Stability improvements
Single GPU Deployment
VSS 2.2.0#
These are the VSS 2.2.0 Release Notes. This release is an Engineering Release to introduce some of the new features. This release includes several fixes from the previous VSS releases and additional changes.
Key Features and Enhancements#
Enhanced multi-stream / concurrent mode support
GraphRAG performance improvements.
Support for NVILA research model.
Additional runtime parameters that can be configured through the
/summarizeAPIvlm_input_width, vlm_input_height
Configure the input resolution of the frames to the VLM
num_frames_per_chunk
Configure the number of frames to sample from each chunk
summarize_batch_size
LLM Batch Size for summarization.
rag_top_k
Number of top rerank results to use during Q&A
rag_batch_size
Number of VLM captions to be batched together for creating graph