Performance#

This section documents benchmark results for the VSS Real-Time Video Intelligence microservices. All benchmarks were measured on VSS 3.2 with validated NVIDIA GPU platforms. Use these numbers to size GPU infrastructure for your deployment and to understand the performance trade-offs between models, use cases, and GPU platforms.

Agent Workflows#

Agent Workflow	Page	What Is Measured
Alert Verification	Alert Verification Performance	End-to-end and per-stage latency for live RTSP alert verification RT-DETR with local LLM across RTX PRO 6000 Blackwell and DGX Spark AB Consumer Lag and VLM inference scaling characteristics
Video Summarization	Video Summarization Performance	E2E latency vs. video length (1 min–12 hr) Burst max concurrency at target avg latency (1, 5, 10 min videos) CR2-8B FP8 across H100, RTX Pro 6000 SE, and L40S
Search	Search Performance	E2E mean and P90 latency vs. concurrent query load (1–50) Per-stage latency breakdown on H100 and RTX Pro 6000 SE Video ingestion latency

Microservices#

Microservice	Page	What Is Measured
Real-Time CV (RT-CV)	RT-CV Performance	Maximum concurrent 1080p streams per GPU RT-DETR (Resnet50 and EfficientViT/L2) and Grounding DINO models Latency-sensitive and throughput-maximizing operating modes
Real-Time VLM (RT-VLM)	RTVI-VLM Performance	Maximum concurrent streams and chunk latency for live RTSP streams Total latency and throughput for pre-recorded video file processing Alerting (OSL=1) and captioning (OSL=100) use cases Cosmos3 Nano Reasoner via vLLM
Real-Time Embedding (RT-Embedding)	RT-Embed Performance	Maximum concurrent streams for live RTSP video embedding Video and text embedding throughput and latency in request mode File embedding latency vs. video length Cosmos-Embed1, FP16, TensorRT
VIOS	VIOS Performance	RTSP and WebRTC stream profiling (GPU/CPU utilization) Max concurrent streams per replica on L40 and H100 Single-stream and concurrent video-download latency (H.264 and H.265) Single-stream and concurrent picture API latency RTX PRO 6000 Blackwell benchmarks with and without transcoding