Performance#

This section documents benchmark results for the VSS Real-Time Video Intelligence microservices. All benchmarks were measured on VSS 3.1 with validated NVIDIA GPU platforms. Use these numbers to size GPU infrastructure for your deployment and to understand the performance trade-offs between models, use cases, and GPU platforms.

Microservices#

Microservice

Page

What Is Measured

Real-Time CV (RT-CV)

RT-CV Performance

  • Maximum concurrent 1080p streams per GPU

  • RT-DETR (Resnet50 and EfficientViT/L2) and Grounding DINO models

  • Latency-sensitive and throughput-maximizing operating modes

Real-Time VLM (RT-VLM)

RT-VLM Performance

  • Maximum concurrent streams and chunk latency for live RTSP streams

  • Total latency and throughput for pre-recorded video file processing

  • Alerting (OSL=1) and captioning (OSL=100) use cases

  • Cosmos Reason 2 (CR2-8B) via vLLM

Real-Time Embedding (RT-Embedding)

RT-Embed Performance

  • Maximum concurrent streams for live RTSP video embedding

  • Video and text embedding throughput and latency in request mode

  • File embedding latency vs. video length

  • Cosmos-Embed1, FP16, TensorRT