Search Performance#
Overview#
The Search Workflow lets you query indexed video archives in natural language. During ingestion, uploaded clips are embedded through RTVI-Embed (Cosmos-Embed1-448p) and RTVI-CV/Behavior Analytics; during query execution, the Vision Agent decomposes each request, routes to embed search and attribute search as needed, and optionally verifies candidates with the critic agent (Cosmos Reason 2 (CR2-8B) VLM) before returning timestamped results.
The tables below benchmark that search agent performance. Query latency and throughput measurements use the vad-r1-v2 / benchmark query set (121 natural-language queries over 17 indexed clips) at concurrent query levels 1, 10, 20, and 50. Each run reports client-side end-to-end (E2E) average and P90 latency, successful query throughput (QPS), and per-stage latency on NVIDIA RTX PRO 6000 Blackwell SE and H100 SXM. A separate section reports video upload latency and chunk counts on both platforms for files ranging from short clips to 2-hour sources. Use these numbers to size GPU choice and expected query latency under concurrent search load on VSS 3.2.
Test Configuration#
Search Performance#
Parameter |
Value |
|---|---|
VSS release |
3.2 |
Platforms tested |
NVIDIA RTX PRO 6000 Blackwell SE, H100 SXM |
Dataset |
vad-r1-v2 / benchmark |
Benchmark queries |
121 natural-language queries over 17 indexed videos |
Indexed videos |
17 vad-r1-v2 clips (72.28 MB total corpus) |
Embedding model |
Cosmos-Embed1-448p |
Reasoning LLM |
NVIDIA Nemotron Nano 9B v2 |
Video understanding VLM |
Cosmos Reason 2 (CR2-8B) |
Agent mode |
Enabled (critic enabled) |
Top-K |
5 |
Concurrent queries |
1, 10, 20, 50 |
Video Upload Performance#
Parameter |
Value |
|---|---|
Platforms tested |
NVIDIA RTX PRO 6000 Blackwell SE, H100 SXM |
Chunk duration |
5 seconds |
E2E Performance#
Avg and P90 client-side E2E latency and successful query throughput for the 121-query vad-r1-v2 / benchmark run under increasing concurrent query load.
Note
The reported QPS was bounded by running a single VSS agent container.
Concurrent queries |
E2E Avg (s) |
E2E P90 (s) |
QPS |
|---|---|---|---|
1 |
2.413 |
2.900 |
0.414 |
10 |
6.299 |
9.524 |
1.564 |
20 |
12.378 |
21.154 |
1.514 |
50 |
23.475 |
38.578 |
1.641 |
Concurrent queries |
E2E Avg (s) |
E2E P90 (s) |
Successful QPS |
|---|---|---|---|
1 |
1.596 |
1.875 |
0.626 |
10 |
5.425 |
8.640 |
1.814 |
20 |
9.691 |
14.490 |
1.929 |
50 |
21.903 |
36.629 |
1.757 |
Latency Stage Breakdown#
Per-stage average and P90 latency breakdown.
Latency Stage Breakdown — 1 Concurrent Query
Stage |
Avg (s) |
P90 (s) |
|---|---|---|
End-to-End (client) |
2.413 |
2.900 |
Search (agent chain) |
2.387 |
2.877 |
Video understanding |
0.904 |
1.262 |
Cosmos Reason 2 (CR2-8B) |
0.821 |
1.172 |
Critic agent |
0.975 |
1.304 |
Attribute search |
0.465 |
0.608 |
Nemotron Nano 9B v2 (LLM) |
0.891 |
1.012 |
VST video clip (tool) |
0.051 |
0.066 |
Embed search |
0.033 |
0.039 |
Latency Stage Breakdown — 10 Concurrent Queries
Stage |
Avg (s) |
P90 (s) |
|---|---|---|
End-to-End (client) |
6.299 |
9.524 |
Search (agent chain) |
5.017 |
7.716 |
Video understanding |
2.801 |
4.832 |
Cosmos Reason 2 (CR2-8B) |
2.569 |
4.709 |
Critic agent |
2.932 |
4.783 |
Attribute search |
0.219 |
0.336 |
Nemotron Nano 9B v2 (LLM) |
1.611 |
1.765 |
VST video clip (tool) |
0.126 |
0.208 |
Embed search |
0.037 |
0.077 |
Latency Stage Breakdown — 20 Concurrent Queries
Stage |
Avg (s) |
P90 (s) |
|---|---|---|
End-to-End (client) |
12.378 |
21.154 |
Search (agent chain) |
5.122 |
9.960 |
Video understanding |
2.826 |
5.629 |
Cosmos Reason 2 (CR2-8B) |
2.539 |
5.025 |
Critic agent |
2.828 |
5.774 |
Attribute search |
0.427 |
0.682 |
Nemotron Nano 9B v2 (LLM) |
1.516 |
1.738 |
VST video clip (tool) |
0.165 |
0.334 |
Embed search |
0.057 |
0.054 |
Latency Stage Breakdown — 50 Concurrent Queries
Stage |
Avg (s) |
P90 (s) |
|---|---|---|
End-to-End (client) |
23.475 |
38.578 |
Search (agent chain) |
4.733 |
7.282 |
Video understanding |
2.683 |
5.131 |
Cosmos Reason 2 (CR2-8B) |
2.402 |
4.739 |
Critic agent |
2.752 |
5.107 |
Attribute search |
0.305 |
0.579 |
Nemotron Nano 9B v2 (LLM) |
1.480 |
1.686 |
VST video clip (tool) |
0.165 |
0.267 |
Embed search |
0.034 |
0.060 |
Latency Stage Breakdown — 1 Concurrent Query
Stage |
Avg (s) |
P90 (s) |
|---|---|---|
End-to-End (client) |
1.596 |
1.875 |
Search (agent chain) |
1.578 |
1.854 |
Video understanding |
0.649 |
0.925 |
Cosmos Reason 2 (CR2-8B) |
0.558 |
0.825 |
Critic agent |
0.757 |
1.031 |
Attribute search |
0.243 |
0.323 |
Nemotron Nano 9B v2 (LLM) |
0.526 |
0.581 |
VST video clip (tool) |
0.053 |
0.083 |
Embed search |
0.034 |
0.045 |
Latency Stage Breakdown — 10 Concurrent Queries
Stage |
Avg (s) |
P90 (s) |
|---|---|---|
End-to-End (client) |
5.425 |
8.640 |
Search (agent chain) |
4.332 |
7.656 |
Video understanding |
2.258 |
3.590 |
Cosmos Reason 2 (CR2-8B) |
1.866 |
3.246 |
Critic agent |
2.701 |
4.078 |
Attribute search |
0.294 |
0.506 |
Nemotron Nano 9B v2 (LLM) |
0.872 |
1.119 |
VST video clip (tool) |
0.204 |
0.384 |
Embed search |
0.047 |
0.089 |
Latency Stage Breakdown — 20 Concurrent Queries
Stage |
Avg (s) |
P90 (s) |
|---|---|---|
End-to-End (client) |
9.691 |
14.490 |
Search (agent chain) |
3.985 |
6.989 |
Video understanding |
1.907 |
3.011 |
Cosmos Reason 2 (CR2-8B) |
1.556 |
2.570 |
Critic agent |
2.345 |
3.448 |
Attribute search |
0.288 |
0.393 |
Nemotron Nano 9B v2 (LLM) |
0.888 |
1.066 |
VST video clip (tool) |
0.187 |
0.348 |
Embed search |
0.047 |
0.098 |
Latency Stage Breakdown — 50 Concurrent Queries
Stage |
Avg (s) |
P90 (s) |
|---|---|---|
End-to-End (client) |
21.903 |
36.629 |
Search (agent chain) |
4.473 |
7.563 |
Video understanding |
2.271 |
3.259 |
Cosmos Reason 2 (CR2-8B) |
1.818 |
2.676 |
Critic agent |
2.749 |
3.946 |
Attribute search |
0.420 |
0.597 |
Nemotron Nano 9B v2 (LLM) |
0.838 |
1.089 |
VST video clip (tool) |
0.228 |
0.393 |
Embed search |
0.038 |
0.063 |
Video Upload Latency#
This table shows average and P90 upload latency, with chunk counts, for video ingestion. Each row is one test: either a single video or a batch of multiple videos uploaded together. The Duration column shows how long each video is (or the range of video durations in a batch).
Content |
No. of Videos |
Duration |
Total chunks |
Avg chunks per video |
Avg (s) |
P90 (s) |
Total size |
|---|---|---|---|---|---|---|---|
Video 1 |
1 |
140 min |
1684 |
1684 |
138.5 |
2421.59 MB |
|
Video 2 |
1 |
60 min |
720 |
720 |
59.4 |
1047.76 MB |
|
Batch 1 |
2 |
25 s–210 s |
47 |
23.5 |
4.57 |
5.43 |
183.25 MB |
Batch 2 |
17 |
8 s–37 s |
81 |
4.8 |
2.22 |
2.78 |
72.28 MB |
Content |
No. of Videos |
Duration |
Total chunks |
Avg chunks per video |
Avg (s) |
P90 (s) |
Total size |
|---|---|---|---|---|---|---|---|
Video 1 |
1 |
140 min |
1684 |
1684 |
99.7 |
2421.59 MB |
|
Video 2 |
1 |
60 min |
720 |
720 |
41.8 |
1047.76 MB |
|
Batch 1 |
2 |
25 s–210 s |
47 |
23.5 |
7.90 |
9.10 |
183.25 MB |
Batch 2 |
17 |
8 s–37 s |
81 |
4.8 |
4.01 |
5.48 |
72.28 MB |