Performance Benchmarks#
Single File Summarization#
This section provides performance benchmarks for VSS across several video lengths and chunk sizes on an 8xRTXPro B6000 PCIe system with four GPUs used for the LLM (Llama 3.1 70B) and four GPUs shared between the Embedding and VLM (Cosmos Reason2 8B). The summarization latency metrics are measured with Vector RAG and Graph RAG. Vector RAG provides faster summarization, while Graph RAG enables interactive Q&A after the summary has been generated but adds latency.
Video Length
(Minutes)
|
Chunk Size
(Seconds)
|
Chunks
|
Summarization Latency - Vector RAG
(Seconds)
|
Summarization Latency - Graph RAG
(Seconds)
|
|---|---|---|---|---|
1 min |
10s |
6 |
28.9 |
36.4 |
10 min |
10s |
60 |
63.4 |
95.2 |
60 min |
10s |
360 |
174.0 |
261.0 |
360 min |
10s |
2160 |
187.0 |
1085.0 |
720 min |
10s |
4320 |
1616.0 |
2041.0 |
Alert Review Performance#
The following table shows the maximum number of alerts that can be processed by the alert review API with the Cosmos Reason2 8B model. The benchmark measures maximum concurrency that can be supported with a given average latency threshold. VLM OSL = 1 for yes/no answers only, and VLM OSL = 100 for review with captions (description). Input video length is 30 seconds. 8xVLM topology is used for the 8xGPU configuration.
Latency Threshold (sec)
|
1xRTX Pro-CR2-8b (Yes/No)
|
1xRTX Pro-CR2-8b (Caption)
|
8xRTX Pro-CR2-8b (Yes/No)
|
8xRTX Pro-CR2-8b (Caption)
|
|---|---|---|---|---|
1 |
9 |
2 |
65 |
8 |
5 |
81 |
63 |
649 |
429 |
10 |
171 |
150 |
1326 |
1196 |
CV + Alert Review Performance#
The following table shows the performance metrics for combined Computer Vision (CV) and Alert Review workflows using the Cosmos Reason2 8B model. The CV model used is G-Dino with frame skip interval = 2. The conveyor belt sample video used in the benchmark has 6 events in one minute, each of length 10 seconds. VLM OSL = 1 for CV + Alert Review, and VLM OSL = 100 for CV + Alert Review with captions.
Metric
|
1xRTX Pro-CR2-8b
(CV + Alert Review)
|
1xRTX Pro-CR2-8b
(CV + Alert Review
with Captions)
|
Spark-CR2-8b
(CV + Alert
Review)
|
Spark-CR2-8b
(CV + Alert
Review with Captions)
|
Thor-CR2-8b
(CV + Alert
Review)
|
Thor-CR2-8b
(CV + Alert
Review with Captions)
|
|---|---|---|---|---|---|---|
Max Number of Streams
|
9 |
7 |
2 |
1 |
3 |
2 |
P95 Verification Time (sec)
|
5.88 |
9.2 |
22.7 |
19.13 |
11.74 |
54.02 |
Verification Throughput
(Clips/min)
|
26.54 |
25.09 |
8.4 |
4.74 |
7.8 |
2.64 |