Release Notes#
These Release Notes describe the key features, software enhancements and improvements, and known issues for the VSS release product package.
VSS 3.2.0#
These are the VSS 3.2.0 Release Notes. This is the first 3.X General Availability (GA) release, hardening the modular architecture and documents the developer profiles.
Key Features and Enhancements#
Source code released for all microservices and agent workflows on GitHub.
Agent Skills (Agent Skills) (Early Access) are now available for agent-driven deployment and operations with support for multiple agent harnesses. Skills are provided for: - Profile deployment - Agent workflow use-cases such as video summarization, search, and report generation - Deployment and management of RTVI, Alerts, and Behavior Analytics microservices - Camera and stream management
Added Helm chart support for all of the agent workflows.
Added optional audio-in-video support on the base agent profile: remote Nemotron 3 Nano Omni VLM with audio-enabled prompts and configurations.
Added a Real-time Computer Vision 3D (RT-CV-3D) microservice with support for MV3DT-based 3D multi-camera detection and tracking, and integrated it into a new Warehouse Blueprint application.
Added the Automatic Calibration microservice for automated camera position and orientation calibration.
NemoClaw + VSS (Early Access): The
deploy_nemoclaw_vss.ipynbnotebook for deploying and operating VSS from the NemoClaw / OpenClaw chat UI, bundled with the VSS OpenClaw plugin, VSS Agent Skills, a VSS policy preset, and the host-side VSS Orchestrator MCP server.
Agent Workflows#
Key Features and Enhancements
Updated the Video Summarization developer profile to integrate the current Video Summarization microservice and API flow (renamed from Long Video Summarization microservice).
Updated Video Summarization to use RTVI-VLM as the default VLM path for video understanding and summarization.
Added support for generating one report across multiple uploaded videos.
Early Access (EA) Feature: Video Summarization live-stream caption generation, stream summary, stream report generation, and Q&A based on captions in Elasticsearch.
Known Issues and Limitations
Requests for an RTSP stream summary return empty results if no captions are available in Elasticsearch for the requested time period. This can happen if:
The requested time period is before caption generation started.
The caption generation prompt causes no captions to be stored in Elasticsearch for the requested time period.
If multiple agent sessions or agent instances connect to the same backend, either simultaneously or at different times, the caption generation prompt is overwritten by the latest query.
The agent has no visibility into caption prompts set by other agent instances.
Key Features and Enhancements
Expanded Base profile model and deployment options around the existing default pair (Nemotron Nano 9B v2 + Cosmos Reason2 8B), with documented local, shared-GPU, and remote endpoint layouts.
Added optional audio-enabled Base workflow support with remote Nemotron Nano Omni VLM and audio-aware prompts/configuration.
Clarified Base profile service scope and dependencies, including VIOS + Agent + NIM deployment boundaries and profile-specific service activation behavior.
Added and documented Base workflow Agent Skills for deployment, video I/O management, visual Q&A, and report generation from coding-agent environments.
Added Helm chart support for Base workflow deployment to match Compose-based profile behavior.
Removed legacy SDR/Envoy routing from Base profile deployment and moved to direct Stream Processing endpoint wiring for a simpler default stack.
Added and hardened chunked video-upload handshake flow in the Base agent API for large-file ingest and completion handling.
Improved browser-reachable clip/media URL handling for Brev and ingress-based deployments by rewriting report and playback links to the public host/port.
Known Issues and Limitations
cosmos-reason2-8bNIM can fail to recover after a stop/crash; redeploy the stack when this occurs.Reports are in-memory by default and are lost on container restart unless persistent storage is configured.
Default Base shared-GPU layout may require manual memory-fraction tuning on constrained GPUs to avoid inference instability.
On single-GPU 48 GB hosts (for example L40S), the default FP16 LLM+VLM shared layout can exceed practical memory headroom and may require dedicated-GPU or remote endpoint configuration.
Key Features and Enhancements
Added Search by Image for finding visually similar objects from selected bounding boxes on paused video frames or referenced object IDs.
Added follow-up support for Search results in Vision Agent workflow, including questions about returned clips and refined searches that adjust the prior query, time range, source type, result count, or critic setting.
Added support for up to 100 streams in Search deployments by tuning the VST
max_devices_supportedsetting for the target device memory.Enabled the critic agent by default for agent/chat search. The critic uses a VLM to verify search results, can be turned on or off per query with the Enable Critic toggle, and evaluates the top results according to
num_videos_to_evaluate.Updated Search profile GPU and model defaults, including local and remote LLM/VLM layouts, the two base GPU requirement for RTVI-CV, VIOS Stream Processing, and RTVI-Embed, LLM defaults, the optional local VLM GPU for critic verification,
Cosmos-Embed1-448p-anomaly-detectionfor RTVI-Embed, andSigLIP2object embeddings for RTVI-CV.Added the
vss-search-archiveAgent Skill to run top-level VSS fusion search on archived video, ingest video files or RTSP streams for search, and delete search-ingested sources from a coding agent.
Known Issues and Limitations
Search embeddings for uploaded videos and streams remain searchable until the Elasticsearch index lifecycle minimum age is reached. The default minimum index age is 48 hours; after cleanup, the media remains visible in video management but must be uploaded or added again before it is searchable.
Adding an RTSP stream starts embedding generation asynchronously. A successful add response confirms the request was accepted, but search results appear only after enough chunks have been indexed in Elasticsearch.
Search quality can vary for broad or ambiguous queries. Negative-intent queries can match positive-intent results, false positives can appear, and single-word queries such as
personcan return no results.
Key Features and Enhancements
Alerts Microservice now supports post-alert processing for various use cases, which include:
** Verification: Provide yes/no response whether an event took place. ** Contextualization: Provide additional information about an event which took place. ** Classification: Classify the event between 2 or more categories.
Alert Microservice supports RTVI-VLM based alert generation for live videos on the new realtime API interface
support for remote managed model endpoints (eg: OpenAI) & RTVI-VLM
“always-on” alerts functionality to automatically start processing added streams for alerts
custom VLM response processing
various performance improvements including non-blocking VLM execution, multi-threaded request processing for increased throughput
agent skills for alerts supporting natural language-based alert creation, query, management and notification .
New API Capabilities for Alerts Microservice
Real-time alert rules: Create, list, retrieve, and delete real-time VLM alert rules registered with RTVI VLM via
POST /api/v1/realtime,GET /api/v1/realtime,GET /api/v1/realtime/{alert_rule_id}, andDELETE /api/v1/realtime/{alert_rule_id}.Always-on alerts: Start or stop always-on alert rules for an incoming camera event via
POST /api/v1/realtime/always-on.Rule replay: Re-apply all persisted active or failed rules onto RTVI VLM via
POST /api/v1/realtime/replay.Incident retrieval: List generated incidents from Elasticsearch via
GET /api/v1/realtime/incidents.On-demand verification: Verify an alert on demand via
POST /api/v1/verification/ondemand.Alert-type verification config: List all configs and create a new config via
GET /api/v1/verification/configandPOST /api/v1/verification/config; get, update, and delete the config for a specific alert type viaGET /api/v1/verification/config/{alert_type},PUT /api/v1/verification/config/{alert_type}, andDELETE /api/v1/verification/config/{alert_type}.
New Environment Variables
RTVI_VLM_MODEL_TO_USE— Selects the VLM model served by the RTVI-VLM backend used for alert verification and real-time alerts (for example,vllm-compatible).RTVI_VLM_BASE_URL— Base URL of the RTVI-VLM microservice (OpenAI-compatible VLM endpoint) that the Alerts microservice calls for inference.CONFIG_PATH— Path to the service configuration file (config.yml).ALWAYS_ON_RULES_CONFIG— Explicit path to the always-on (real-time) alert rules YAML file; takes precedence over the defaultrealtime-config.yamllookup.
Image Tag
Docker Compose x86 default:
nvcr.io/nvidia/vss-core/vss-alert-verification:3.2.0.Helm chart default:
nvcr.io/nvidia/vss-core/vss-alert-verification:3.2.0. Setimage.tagto the required release tag for your deployment package.
Key Features and Enhancements
Added MV3DT and Auto Calibration profiles.
Launchable and Skills:
Warehouse deploy launchable notebook.
Warehouse deploy and debug skills.
Behavior Analytics standalone deploy skill.
Video Analytics API standalone deploy skill.
Added RTVI-VLM always-on alerts for Load Quality, PPE, Spillover, and Pathway / Unexpected Obstructions; RTVI-VLM replaces the local VLM path.
Added Auto Calibration support for 3D profile deployment, image coordinates, and RTSP inputs.
Behavior Analytics: Added support for dynamic configuration changes.
Added dynamic configuration support in Video Analytics API to store and audit configuration changes of Behavior Analytics.
Added Spatial AI Data Utils support for assigning region and group metadata to global ROIs and tripwires, 3D IoU-based tracking evaluation, data validation and evaluation for 3D inference data, and visualization tools.
Added VSS Configurator support for toggling Kafka/Redis configurations and editing JSON configuration files.
Added SDR stream auto re-addition after Docker restart for perception services.
Refactored Docker Compose to improve microservice modularity.
Added retry mechanisms for Kafka broker, Elasticsearch, and related service connections in Video Analytics API.
Added retry mechanisms for Kafka broker, Elasticsearch, Redis, MQTT, and related service connections in Behavior Analytics.
Improved Spatial AI Data Utils camera grouping algorithms.
Added backup and cleanup support for
calibration.jsonand schema validation after VSS Configurator operations.Added JSON output support for VLM verification and improved Near Miss Violation verification accuracy through prompt tuning.
Added top-view image visualization on Thor and Spark platforms, and improved 3D bounding-box visualization by using individual camera timestamps when present.
Models - Core AI models for perception and analytics:
RT-DETR - 2D Warehouse Model v1.0.2 - Model for 2D Single-Camera Object Detection.
Sparse4D - 3D Warehouse Model v2.2 - Model for 3D Multi-Camera Object Detection and Tracking.
Known Limitations
See Warehouse Known Limitations for current warehouse profile constraints and workarounds.
System Components#
Breaking Changes
Endpoint rename:
/v1/generate_captions_alertshas been renamed to/v1/generate_captions. Update any 3.1.0-era client code accordingly.Duplicate live-stream / camera IDs are now rejected with HTTP 409 (
DuplicateCameraId/DuplicateStreamId) instead of being silently overwritten. Remove the prior stream first, or surface the 409 to the caller.Repeated RTSP caption requests now start independent jobs. Repeated
/v1/generate_captionscalls for the same RTSP stream ID no longer reconnect to the existing live caption request. Each call returns a distinct request ID.DELETE /v1/generate_captions/{stream_id}or deleting the stream stops all active caption jobs for that stream ID.
New Models
Cosmos Reason 3 reasoning VLMs. Two checkpoints are supported: Cosmos3-Nano-Reasoner and Cosmos3-Super-Reasoner. Set
VLM_MODEL_TO_USE=cosmos-reason3andMODEL_PATH=git:<HF URL>.Nemotron Nano Omni with native video + audio understanding. Enable with
VLM_TRUST_REMOTE_CODE=trueand (for audio)VLM_MODEL_SUPPORTS_AUDIO=true. UseINSTALL_PROPRIETARY_CODECS=truewhen inputs contain proprietary audio codecs.Qwen 3.5 and Qwen 3.5 MoE VLMs. The service auto-detects
Qwen3_5ForConditionalGeneration/Qwen3_5MoeForConditionalGenerationand runs them against the bundled vLLM runtime. The MoE variant uses thetritonMoE backend (auto-selected on B200, configurable viaVLLM_MOE_BACKEND).Default model is now
cosmos-reason2(ngc:nim/nvidia/cosmos-reason2-8b:0303-fp8-dynamic-kv8).
New API Capabilities
URL-based processing on
/v1/generate_captionsand/v1/files:http://,https://,s3://,file://(with allowlist viaFILE_URL_ALLOWED_DIRS). New request fieldsurl,media_type,creation_time,url_headers; new response fieldchunk_id(zero-based).Native audio for Omni models via
enable_audio: true;audio_transcriptis returned per chunk.Reasoning via
enable_reasoning: true;reasoning_descriptionis returned per chunk.Reasoning propagation to Kafka/protobuf: parsed VLM reasoning is added to
VisionLLM.infoandIncident.infoas bothreasoningandreasoningDescription.Alert categorization via the
alert_categoryfield on incidents.CV-compatible stream API:
POST /v1/stream/add,POST /v1/stream/remove,GET /v1/stream/get-stream-infoalongside the existing/v1/streams/*endpoints.Text-only chat completions via
/v1/chat/completions(no media required); multi-turn conversations and token-level SSE streaming.POST /v1/filesaccepts an HTTP/S URL for server-side fetch.Proprietary audio codec support for Omni models via
INSTALL_PROPRIETARY_CODECS=true.
Optimizations
Efficient Video Sampling (EVS) via
VLM_VIDEO_PRUNING_RATE. Supported on Nemotron Nano VL and Qwen 2.5 VL (not Cosmos Reason).GOP-aware decode optimization via
RTVI_ENABLE_GOP_DECODE_OPT(defaulttrue; file decode only).New vLLM tuning knobs:
VLLM_MM_PROCESSOR_CACHE_GB,VLLM_ENFORCE_EAGER,VLLM_MAX_NUM_BATCHED_TOKENS,VLLM_DISABLE_MM_PREPROCESSOR_CACHE.
New Environment Variables
VLM_MAX_GENERATION_TOKENS— Cap on tokens generated per request (default16384).VLM_PROMPT_MAX_LENGTH— Maximum user-prompt length in characters (default10240).VLM_SYSTEM_PROMPT_MAX_LENGTH— Maximum system-prompt length in characters (default10240).VLM_MODEL_SUPPORTS_AUDIO— Enable native audio decoding for Omni models (defaultfalse).VLM_TRUST_REMOTE_CODE— Enabletrust_remote_codein vLLM andAutoProcessor. Required for Omni and some custom models (defaultfalse).VLM_VIDEO_PRUNING_RATE— EVS video-token pruning rate (0.0–1.0). Supported on Nemotron Nano VL and Qwen 2.5 VL; not on Cosmos Reason.VLLM_MOE_BACKEND— Override vLLM MoE backend (e.g.,triton). Auto-defaults totritonon B200.VLLM_MM_PROCESSOR_CACHE_GB— vLLM multimodal preprocessor cache size in GB (default1).VLLM_ENFORCE_EAGER— Force eager execution in vLLM (A100/Ampere CR2 FP8 workaround) (defaultfalse).VLLM_MAX_NUM_BATCHED_TOKENS— Cap on the prefill batch token size to improve decode throughput (default16384).RTVI_ENABLE_GOP_DECODE_OPT— GOP-aware decode optimization for file decode (defaulttrue; no effect on RTSP).KAFKA_ASYNC_SEND_QUEUE_MAXSIZE— Maximum queued in-flight async Kafka sends (default1024).FILE_URL_ALLOWED_DIRS— Comma-separated absolute directories allowlisted forfile://URL ingestion.file://is rejected when unset.ASSET_DOWNLOAD_SSL_SKIP_VERIFY_DOMAINS— Domains for which SSL verification is skipped on URL download.ASSET_DOWNLOAD_MAX_REDIRECTS— Maximum redirect hops on URL download (default0disables; max10).ASSET_DOWNLOAD_MAX_FILE_SIZE_GB— Maximum size (GB) of files fetched via URL (default8).ASSET_DOWNLOAD_AUTH_TOKENS— Per-domain auth headers (domain1=Bearer xxx;domain2=Basic yyy).ASSET_MAX_AGE_HOURS— Auto-evict uploaded assets older than this many hours (default0disables).
Image Tag
Docker Compose x86 default:
nvcr.io/nvidia/vss-core/vss-rt-vlm:3.2.0.Docker Compose DGX Spark/SBSA image:
nvcr.io/nvidia/vss-core/vss-rt-vlm:3.2.0-sbsa.Helm chart default:
nvcr.io/nvidia/vss-core/vss-rt-vlm:3.2.0. Setimage.tagto the required release tag for your deployment package.
New Features and Enhancements
Source code and Helm chart released on GitHub for the RT-CV microservice.
On-demand image embedding via a new vision-encoder REST API that generates image embeddings on request.
Models Supported
2D Single-Camera Models:
RT-DETR (Warehouse Blueprint): Real-Time Detection Transformer object detection model optimized for warehouse environments
3D Multi-Camera Models:
Sparse4D (Warehouse Blueprint): Multi-Camera 3D Detection and Tracking model with 4D (spatial-temporal) capabilities for Birds-Eye-View (BEV) detection across multiple synchronized camera sensors with temporal instance banking
Embedding Models:
RADIO-CLIP: Object embedding model for appearance-based feature extraction and re-identification
SigLIP v2: Object embedding model for appearance-based feature extraction and re-identification
Optimizations
Vision-encoder inference optimizations. For optimal performance, the following config changes are enabled by default in
ds-main-config.txt:[visionencoder] smart-infer=1 ofa-predict=1
Thor tracker tuning. For optimal tracker performance on Thor, the following config changes are enabled by default in the tracker configuration:
TargetManagement.maxTargetsPerStream: 50 VisualTracker.visualTrackerType: 2 VisualTracker.vpiBackend4DcfTracker: 1
Bug Fixes
Fixed a failure when deleting an RTSP stream that had already ended
Image Tag
Docker Compose x86 default:
nvcr.io/nvidia/vss-core/vss-rt-cv:3.2.0.Docker Compose DGX Spark/SBSA image:
nvcr.io/nvidia/vss-core/vss-rt-cv:3.2.0-sbsa.Helm chart default:
nvcr.io/nvidia/vss-core/vss-rt-cv:3.2.0. Setimage.tagto the required release tag for your deployment package.
Breaking Changes
Duplicate live-stream / camera IDs are now rejected with HTTP 409 (
DuplicateCameraId/DuplicateStreamId) instead of being silently overwritten. Remove the prior stream first, or surface the 409 to the caller.
New Models
Cosmos-Embed1-448p-anomaly-detection: 448p model with anomaly detection support (HF).
New API Capabilities
base64 data URL input:
POST /v1/generate_video_embeddingsnow accepts inline base64 media via RFC 2397data:URIs in theurlfield.CV-compatible stream API:
POST /v1/stream/add,POST /v1/stream/remove,GET /v1/stream/get-stream-infoalongside the existing/v1/streams/*endpoints.
Optimizations
GOP-aware decode optimization via
RTVI_ENABLE_GOP_DECODE_OPT(defaulttrue; file decode only).
New Environment Variables
NGC_API_KEY— NGC API key for downloading models from NGC via thengc:scheme.COSMOS_EMBED1_TRT_PRECISION— trtexec network precision for Cosmos-Embed1 video/text TensorRT engines (fp32,fp16,bf16,int8,fp8,best; defaultfp16).COSMOS_EMBED1_TRT_EXTRA_ARGS— Extra trtexec arguments (shell-quoted string) appended to both video and text engine builds (for example,--stronglyTyped --builderOptimizationLevel=5).RTVI_ENABLE_GOP_DECODE_OPT— GOP-aware decode optimization for file decode (defaulttrue; no effect on RTSP).KAFKA_ASYNC_SEND_QUEUE_MAXSIZE— Maximum queued in-flight async Kafka sends (default1024).FILE_URL_ALLOWED_DIRS— Comma-separated absolute directories allowlisted forfile://URL ingestion.file://is rejected when unset.ASSET_DOWNLOAD_SSL_SKIP_VERIFY_DOMAINS— Domains for which SSL verification is skipped on URL download.ASSET_DOWNLOAD_MAX_REDIRECTS— Maximum redirect hops on URL download (default0disables; max10).ASSET_DOWNLOAD_MAX_FILE_SIZE_GB— Maximum size (GB) of files fetched via URL (default8).ASSET_DOWNLOAD_AUTH_TOKENS— Per-domain auth headers (domain1=Bearer xxx;domain2=Basic yyy).ASSET_MAX_AGE_HOURS— Auto-evict uploaded assets older than this many hours (default0disables).
Image Tag
Docker Compose x86 default:
nvcr.io/nvidia/vss-core/vss-rt-embed:3.2.0.Docker Compose DGX Spark/SBSA image:
nvcr.io/nvidia/vss-core/vss-rt-embed:3.2.0-sbsa.Helm chart default:
nvcr.io/nvidia/vss-core/vss-rt-embed:3.2.0. Setimage.tagto the required release tag for your deployment package.
New Features and Enhancements
Source code and Helm chart released on GitHub for the VIOS microservice (Sensor, StreamProcessing, Ingress, and NVStreamer containers, plus the VIOS Streaming UI).
One-click Docker Compose deployment for the StreamProcessing service that bundles
NVStreamerand removes the legacyminio/mcpservices. Configurable per-deployment via environment variables for ports, video sources, storage paths, deployment profiles, and Prometheus / Grafana hooks.Unified multi-arch container for X86_64, AGX Thor and DGX Spark (single
multiarchimage, replacing the previous Thor / SBSA split) —./build.sh arch=arm64 containeror./build.sh arch=amd64 container.Per-camera timestamp for 3D multi-camera use cases — replay and live-overlay metadata is duplicated per sensor when per-camera time information is present.
B-frame support added to all download use-cases (transcode + remux + overlay) on x86 and aarch64.
HEVC multi-slice and H.265 Aggregation Packet (RFC 7798) support in both VST recording and NVStreamer, including SEI
frameIdinjection on aggregated VCL units.VST audio recording end-to-end fix — AAC
AudioSpecificConfigis now embedded in the container, MKV media-info reports correct sample-rate/channels, and audio delivery waits for the first video buffer to land (no more out-of-sync audio at start-of-recording).NVStreamer republish with audio —
vst-streamernow forwards audio tracks to RTSP consumers (was video-only).Floor map view supports SVG and JPEG inputs and renders on AGX Thor.
VST UI rebranded to VSS VIOS — title bar, dashboard, and bundled UI lib at
/vst-uiinside the ingress container.
New API Capabilities
Picture API in Storage Service —
GET /v1/storage/stream/{streamId}/pictureand/v1/storage/stream/{streamId}/picture/urlreturn JPEG snapshots and temporary URLs for any sensor / clip; works for live and replay, including disconnected sensors.Replay-picture for disconnected H.265 sensors —
GET /v1/replay/stream/{streamId}/picturenow succeeds for codec-specific corner cases that previously timed out.Conflict-aware
POST /v1/sensor/add— returns the existingsensorIdand three distinct error contracts when the URL and / or name collide with an already-onboarded sensor, unblocking the Auto-Calibration MS programmatic onboarding flow.Full-file download without processing on the clip / download path when
userRequestrequests it — bypasses transcode + remux for byte-identical pulls.QoS debug endpoint —
GET /v1/proxy/debug/qosexposes per-stream QoS counters (frame-rate, drops, jitter) consumed by the dashboard QoS view.Hardened request-body validation — added 35 BDD scenarios covering upload, sensor, storage, recording, and picture flows, and tightened input validation across all upload / download APIs.
Optimizations
Clip-generation latency reduced by ~200–300 ms by moving cleanup to an async thread; an additional ~50 ms saved at clip start by the decoder-library update.
RTSP recording start-up latency reduced from ~3 s to a few milliseconds via non-blocking
gst_get_statepolling and unbufferedfilesinkmode.Configurable picture-API timeout via environment variable.
Configurable storage cap —
MAX_STORAGE_SIZEenvironment variable lets operators tune per-deployment video-storage ceilings without rebuilding.Decoder library memory-leak fixes in
libcuvidv4l2for both x86_64 and aarch64.Postgres bind-mount → docker-managed volume to eliminate host-permission flakiness on first boot.
Bug Fixes
Memory leaks fixed in upload-video, download,
libcuvidv4l2decoder (x86_64 + aarch64), and stream-processing cleanup paths.Stability: fixed crash on H100 under 40 concurrent downloads + 20 picture-API requests; fixed NVStreamer buffer-full crash on video+audio streams; fixed audio-path crash and recursion / stack-overflow risk in ADTS / H264 byte-stream sources.
VST audio recording AAC profile and AV-sync regression resolved (NTP-to-epoch offset and unit-mismatch corrections).
Stream / sensor metadata: fixed empty proxy URL in
GET /streams, sensor-count cache drift / ghost files, and blank sensor name on UI for removed file-based sensors (with sensor-id fallback when name is missing).Frame accuracy: corrected last-frame detection in clip download and the frame-accurate picture API.
API contracts:
DELETE /sensor/{id}no longer reports an invalid delete status; UI multipart-upload regression fixed; stream-ID removed from PUT-upload headers.Pipeline fixes: video-wall framerate handling; SW-encoder transcode FPS via
videorate+capsfilter; Live555 proxy burst-mode duplicate-PTS handling now strictly monotonic (eliminating SEI / frameId aliasing); HEVC pre-transcode codec persistence on upload — DB and DeviceManager re-read media information after replace.Security / OSS: resolved CVE-2026-42945 in the ingress container; refreshed OSS base images (Postgres, Redis); disabled IPv6 in the ingress to work around an upstream NGINX dual-stack issue.
Image Tag
VIOS images are published as multi-arch OCI image indexes — a
single tag serves linux/amd64 (x86_64) and linux/arm64
(AGX Thor and DGX Spark / SBSA).
Docker Compose / Helm chart default tags:
nvcr.io/nvidia/vss-core/vss-vios-streamprocessing:3.2.0nvcr.io/nvidia/vss-core/vss-vios-sensor:3.2.0nvcr.io/nvidia/vss-core/vss-vios-ingress:3.2.0nvcr.io/nvidia/vss-core/vss-vios-nvstreamer:3.2.0
New Features
Added support for single camera calibration.
Added RTSP stream recording and injection using VIOS for input.
Added support for global tripwires and ROIs for the 3D use case, including Global ROIs and TWs under the Parameters tab.
Enhancements
Updated the AMC pipeline to use Model Runner for lens distortion estimation and video rectification.
Integrated Model Runner to improve focal length estimation using models.
Added VGGT auto frame selection.
Added image coordinate calibration file export. Users can download ROIs and TWs in pixel format under the Parameters tab by clicking
Export image-mode JSON.Added text boxes for additional attributes when clicking
Full exportunder the Results tab.
Image Tag
Docker Compose default tags:
nvcr.io/nvidia/vss-core/vss-auto-calibration:3.2.0nvcr.io/nvidia/vss-core/vss-auto-calibration-ui:3.2.0
New Features
New microservice — RT-CV-3D is a new Perception microservice that couples the RT-DETR 2D detector with the Multi-View 3D Tracking (MV3DT) framework to produce fused 3D Bird’s Eye View (BEV) outputs across cameras with overlapping fields of view. It ships as two container images,
vss-rt-cv(Perception) andvss-rt-cv-mv3dt-bev-fusion(BEV Fusion). See Object Detection and Tracking.Shared RTVI-CV REST API surface — built on the same RTVI-CV codebase as the RT-CV microservice, the Perception microservice exposes the same core REST API for stream management (dynamic add / remove / query of streams), health-check probes (liveness / readiness / startup), and metrics and telemetry monitoring.
Docker Compose deployment — RT-CV-3D is released with a Docker Compose deployment and its source code, deployed through the Warehouse Blueprint MV3DT profile.
Agent Skills added — new VSS Agent Skills let a coding agent deploy and operate the RT-CV-3D perception and tracking stack end to end, and generate the camera calibration it needs. See the Agent Skills walkthrough.
Models Supported
RT-DETR: Real-Time Detection Transformer object detection model optimized for warehouse environments.
BodyPose3DNet: Pose-estimation model used by MV3DT.
Known Limitations
Standalone launch of the RT-CV-3D microservice is not yet supported. Deploy RT-CV-3D through the Warehouse Blueprint MV3DT profile or the Agent Skills. Standalone launch will be supported in a future release.
A Helm chart is not yet available. Helm support will be added in a future release.
Image Tag
Perception Docker Compose x86 default:
nvcr.io/nvidia/vss-core/vss-rt-cv:3.2.0.Perception Docker Compose DGX Spark/SBSA image:
nvcr.io/nvidia/vss-core/vss-rt-cv:3.2.0-sbsa.BEV Fusion Docker Compose default:
nvcr.io/nvidia/vss-core/vss-rt-cv-mv3dt-bev-fusion:3.2.0.
Known Issues and Limitations#
The OpenClaw chat UI can display a duplicate assistant response for a single user query. The duplicate is a rendering issue only; the underlying agent processes the query once and the tool invocations and results are unaffected. Refresh the chat or ignore the duplicated bubble.
The OpenClaw UI can stop rendering agent responses after running for some time and appear stale, for example while periodic VSS deployment status polling continues in the background. Workaround: refresh the UI to see the latest responses.
Docker Engine 29.5.0 and later can fail to pull some NGC-hosted image tags after the image layers download, with an
error from registry: Incorrect Repository Formatmessage. Use a supported Docker Engine version earlier than 29.5.0. If you must use Docker Engine 29.5.0 or later, add or merge the following daemon-side override in/etc/docker/daemon.jsonand restart Docker:{ "features": { "containerd-snapshotter": false } }
This disables the containerd snapshotter image store path for the Docker daemon and uses the legacy Docker graphdriver image store. Preserve any existing daemon settings, such as the required
exec-optscgroup driver configuration.Agent with audio-enabled configuration for Nemotron 3 Nano Omni model is known to encounter multiple retry issues with videos that are missing audio or non-compatible audio streams.
After deploying with the Brev Launchable, Brev can occasionally report
services unhealthyeven though Jupyter Notebook, VSS deployment, and VSS skills continue to function. This appears to be an intermittent Brev status issue and is harmless for VSS operations.Phoenix must be accessed through HAProxy on port 7777 (
https://7777-<id>.brevlab.com/phoenix), not directly on port 6006. Phoenix is configured for reverse-proxy access withPHOENIX_HOST_ROOT_PATH=/phoenix; HAProxy strips that prefix before forwarding, while direct access on 6006 leaves paths such as/phoenix/graphqlunchanged and returns405 Method Not Allowed, which breaks the UI. Port 6006 does not need a Brev secure link for Phoenix.(Search profile, Brev 2-GPU environments): When using local VLM, the Search critic is automatically disabled with a warning to avoid invalid local VLM GPU assignment. As a result, critic-based verification is unavailable unless you use a remote VLM or a host with more than 2 GPUs.
[WARN] Brev environment has 2 GPU(s). Disabling Search critic to avoid starting the local VLM on GPU 2.
VSS 3.1.0#
These are the VSS 3.1.0 Release Notes. This is an early access release including a refactored architecture and new features. There are some features in an alpha state and should not be used in production.
Key Features and Enhancements#
Updated the Search Agent Workflow to introduce attribute search, multi-embedding fusion search, and a critic agent to review search results.
Updated the Real-Time Computer Vision (RT-CV) microservice to support embedding generation for detected objects. This release supports two embedding models: RADIO-CLIP and SigLIP2.
Updated Brev Launchable deployment to support the 3.X architecture and deploy all of the agent workflows.
Added support for AGX Thor and DGX Spark with hybrid deployment (remote LLM) of the Base and Alerts profiles.
Added additional deployment options for undefined hardware profiles.
VSS 3.0.0#
These are the VSS 3.0.0 Release Notes. This is an early access release including a refactored architecture and new features. There are some features in an alpha state and should not be used in production.
Key Features and Enhancements#
Updated the out-of-the-box-experience which includes launching a minimal vision agent and allowing developers to add on agent workflows using a combination of microservices. Agent workflows available in this release are:
Report generation and Q&A: The agent can generate templated reports and answer questions using the VLM. This is part of the base agent profile in Quickstart.
Video summarization: The agent can generate long video summaries with time-stamped highlights.
Alert verification: Augment existing CV pipelines with VLMs to verify events and extract additional insights.
Real-Time VLM alerts: Generate tail-end alerts using VLM.
Search: Open vocabulary search for actions and events. This is an alpha feature.
Introduced 2 industry-specific, large scale, blueprint examples for smart cities and warehouses.
Modularized the VSS architecture, introducing new microservices and APIs.
Introduced a top-level agent, capable of planning and executing vision-based workflows leveraging the new microservices.
Introduced Real-Time Video Intelligence (RTVI) microservices for accelerated feature extraction from stored and streamed video. Three microservices are included in this release:
Real-Time VLM (RT-VLM): Generates captions and alerts for live streams using Vision Language Models.
Real-Time Embedding (RT-Embedding): Generates embeddings for live streams and video files.
Real-Time Computer Vision (RT-CV): Detects and tracks objects in live streams and video files.
Refactored video summarization workflow into the Video Summarization microservice.
Introduced Video IO and Storage (VIOS) microservices, to manage video (stored and streamed), recording, and playback.
Introduced Behavior Analytics microservice, to setup heuristics for event creation based on computer vision metadata.
Introduced calibration microservices, to calibrate the camera position and orientation for 3D and multi-view applications.
Integrated a new API Gateway / MCP (Model Context Protocol) server to route requests to the appropriate microservices.