Performance#

This section provides performance benchmarks for VSS across several video lengths and chunk sizes on an 8xH100 PCIE system with 4 GPUs used for the LLM and 4 being shared between the Embedding, ReRanker and VLM.

../_images/perf_diagram_deployment_topology.png

The summarization latency metrics for summarization are measured with Graph RAG enabled and disabled. If Graph RAG is enabled, it allows for interactive Q&A after the summary has been generated but adds latency.

Video Length
(Minutes)
Chunk Size
(Seconds)
Chunks

Summarization Latency - Graph RAG Off
(Seconds)
Summarization Latency - Graph RAG on
(Seconds)

1 min

10s

6

12.94

28.56

1 min

20s

3

10.60

24.19

1 min

30s

2

11.15

25.26

10 min

10s

60

55.25

89.88

10 min

20s

30

30.73

59.99

10 min

30s

20

28.17

51.52

30 min

10s

180

75.00

197.89

30 min

20s

90

48.67

116.96

30 min

30s

60

43.81

85.42

60 min

10s

360

138.86

382.25

60 min

20s

180

76.38

194.90

60 min

30s

120

67.59

141.41

120 min

10s

720

204.12

684.58

120 min

20s

360

126.31

364.79

120 min

30s

240

87.11

320.88

From the data, the summarization speed up can be plotted. This is summary generation time compared to the length of the video to show how much faster VSS is to generate summaries compared to manually watching the full video.

../_images/perf_diagram_speed_up.png

The following graph shows the relationship between latency, chunk size and video size. Longer videos and shorter chunk sizes require more processing and have higher latency.

../_images/perf_diagram_latency_trends.png