Performance#

This section documents benchmark results for the VSS Real-Time Video Intelligence microservices. All benchmarks were measured on VSS 3.2 with validated NVIDIA GPU platforms. Use these numbers to size GPU infrastructure for your deployment and to understand the performance trade-offs between models, use cases, and GPU platforms.

Agent Workflows#

Agent Workflow

Page

What Is Measured

Alert Verification

Alert Verification Performance

  • End-to-end and per-stage latency for live RTSP alert verification

  • RT-DETR with local LLM across RTX PRO 6000 Blackwell and DGX Spark

  • AB Consumer Lag and VLM inference scaling characteristics

Video Summarization

Video Summarization Performance

  • E2E latency vs. video length (1 min–12 hr)

  • Burst max concurrency at target avg latency (1, 5, 10 min videos)

  • CR2-8B FP8 across H100, RTX Pro 6000 SE, and L40S

Search

Search Performance

  • E2E mean and P90 latency vs. concurrent query load (1–50)

  • Per-stage latency breakdown on H100 and RTX Pro 6000 SE

  • Video ingestion latency

Microservices#

Microservice

Page

What Is Measured

Real-Time CV (RT-CV)

RT-CV Performance

  • Maximum concurrent 1080p streams per GPU

  • RT-DETR (Resnet50 and EfficientViT/L2) and Grounding DINO models

  • Latency-sensitive and throughput-maximizing operating modes

Real-Time VLM (RT-VLM)

RT-VLM Performance

  • Maximum concurrent streams and chunk latency for live RTSP streams

  • Total latency and throughput for pre-recorded video file processing

  • Alerting (OSL=1) and captioning (OSL=100) use cases

  • Cosmos Reason 2 (CR2-8B) via vLLM

Real-Time Embedding (RT-Embedding)

RT-Embed Performance

  • Maximum concurrent streams for live RTSP video embedding

  • Video and text embedding throughput and latency in request mode

  • File embedding latency vs. video length

  • Cosmos-Embed1, FP16, TensorRT

VIOS

VIOS Performance

  • RTSP and WebRTC stream profiling (GPU/CPU utilization)

  • Max concurrent streams per replica on L40 and H100

  • Single-stream and concurrent video-download latency (H.264 and H.265)

  • Single-stream and concurrent picture API latency

  • RTX PRO 6000 Blackwell benchmarks with and without transcoding