Search Performance#

Overview#

The Search Workflow lets you query indexed video archives in natural language. During ingestion, uploaded clips are embedded through RTVI-Embed (Cosmos-Embed1-448p) and RTVI-CV/Behavior Analytics; during query execution, the Vision Agent decomposes each request, routes to embed search and attribute search as needed, and optionally verifies candidates with the critic agent (Cosmos Reason 2 (CR2-8B) VLM) before returning timestamped results.

The tables below benchmark that search agent performance. Query latency and throughput measurements use the vad-r1-v2 / benchmark query set (121 natural-language queries over 17 indexed clips) at concurrent query levels 1, 10, 20, and 50. Each run reports client-side end-to-end (E2E) average and P90 latency, successful query throughput (QPS), and per-stage latency on NVIDIA RTX PRO 6000 Blackwell SE and H100 SXM. A separate section reports video upload latency and chunk counts on both platforms for files ranging from short clips to 2-hour sources. Use these numbers to size GPU choice and expected query latency under concurrent search load on VSS 3.2.

Test Configuration#

Search Performance#

Parameter

Value

VSS release

3.2

Platforms tested

NVIDIA RTX PRO 6000 Blackwell SE, H100 SXM

Dataset

vad-r1-v2 / benchmark

Benchmark queries

121 natural-language queries over 17 indexed videos

Indexed videos

17 vad-r1-v2 clips (72.28 MB total corpus)

Embedding model

Cosmos-Embed1-448p

Reasoning LLM

NVIDIA Nemotron Nano 9B v2

Video understanding VLM

Cosmos Reason 2 (CR2-8B)

Agent mode

Enabled (critic enabled)

Top-K

5

Concurrent queries

1, 10, 20, 50

Video Upload Performance#

Parameter

Value

Platforms tested

NVIDIA RTX PRO 6000 Blackwell SE, H100 SXM

Chunk duration

5 seconds

E2E Performance#

Avg and P90 client-side E2E latency and successful query throughput for the 121-query vad-r1-v2 / benchmark run under increasing concurrent query load.

Note

The reported QPS was bounded by running a single VSS agent container.

Concurrent queries

E2E Avg (s)

E2E P90 (s)

QPS

1

2.413

2.900

0.414

10

6.299

9.524

1.564

20

12.378

21.154

1.514

50

23.475

38.578

1.641

Concurrent queries

E2E Avg (s)

E2E P90 (s)

Successful QPS

1

1.596

1.875

0.626

10

5.425

8.640

1.814

20

9.691

14.490

1.929

50

21.903

36.629

1.757

Latency Stage Breakdown#

Per-stage average and P90 latency breakdown.

Latency Stage Breakdown — 1 Concurrent Query

Stage

Avg (s)

P90 (s)

End-to-End (client)

2.413

2.900

Search (agent chain)

2.387

2.877

Video understanding

0.904

1.262

Cosmos Reason 2 (CR2-8B)

0.821

1.172

Critic agent

0.975

1.304

Attribute search

0.465

0.608

Nemotron Nano 9B v2 (LLM)

0.891

1.012

VST video clip (tool)

0.051

0.066

Embed search

0.033

0.039

Latency Stage Breakdown — 10 Concurrent Queries

Stage

Avg (s)

P90 (s)

End-to-End (client)

6.299

9.524

Search (agent chain)

5.017

7.716

Video understanding

2.801

4.832

Cosmos Reason 2 (CR2-8B)

2.569

4.709

Critic agent

2.932

4.783

Attribute search

0.219

0.336

Nemotron Nano 9B v2 (LLM)

1.611

1.765

VST video clip (tool)

0.126

0.208

Embed search

0.037

0.077

Latency Stage Breakdown — 20 Concurrent Queries

Stage

Avg (s)

P90 (s)

End-to-End (client)

12.378

21.154

Search (agent chain)

5.122

9.960

Video understanding

2.826

5.629

Cosmos Reason 2 (CR2-8B)

2.539

5.025

Critic agent

2.828

5.774

Attribute search

0.427

0.682

Nemotron Nano 9B v2 (LLM)

1.516

1.738

VST video clip (tool)

0.165

0.334

Embed search

0.057

0.054

Latency Stage Breakdown — 50 Concurrent Queries

Stage

Avg (s)

P90 (s)

End-to-End (client)

23.475

38.578

Search (agent chain)

4.733

7.282

Video understanding

2.683

5.131

Cosmos Reason 2 (CR2-8B)

2.402

4.739

Critic agent

2.752

5.107

Attribute search

0.305

0.579

Nemotron Nano 9B v2 (LLM)

1.480

1.686

VST video clip (tool)

0.165

0.267

Embed search

0.034

0.060

Latency Stage Breakdown — 1 Concurrent Query

Stage

Avg (s)

P90 (s)

End-to-End (client)

1.596

1.875

Search (agent chain)

1.578

1.854

Video understanding

0.649

0.925

Cosmos Reason 2 (CR2-8B)

0.558

0.825

Critic agent

0.757

1.031

Attribute search

0.243

0.323

Nemotron Nano 9B v2 (LLM)

0.526

0.581

VST video clip (tool)

0.053

0.083

Embed search

0.034

0.045

Latency Stage Breakdown — 10 Concurrent Queries

Stage

Avg (s)

P90 (s)

End-to-End (client)

5.425

8.640

Search (agent chain)

4.332

7.656

Video understanding

2.258

3.590

Cosmos Reason 2 (CR2-8B)

1.866

3.246

Critic agent

2.701

4.078

Attribute search

0.294

0.506

Nemotron Nano 9B v2 (LLM)

0.872

1.119

VST video clip (tool)

0.204

0.384

Embed search

0.047

0.089

Latency Stage Breakdown — 20 Concurrent Queries

Stage

Avg (s)

P90 (s)

End-to-End (client)

9.691

14.490

Search (agent chain)

3.985

6.989

Video understanding

1.907

3.011

Cosmos Reason 2 (CR2-8B)

1.556

2.570

Critic agent

2.345

3.448

Attribute search

0.288

0.393

Nemotron Nano 9B v2 (LLM)

0.888

1.066

VST video clip (tool)

0.187

0.348

Embed search

0.047

0.098

Latency Stage Breakdown — 50 Concurrent Queries

Stage

Avg (s)

P90 (s)

End-to-End (client)

21.903

36.629

Search (agent chain)

4.473

7.563

Video understanding

2.271

3.259

Cosmos Reason 2 (CR2-8B)

1.818

2.676

Critic agent

2.749

3.946

Attribute search

0.420

0.597

Nemotron Nano 9B v2 (LLM)

0.838

1.089

VST video clip (tool)

0.228

0.393

Embed search

0.038

0.063

Video Upload Latency#

This table shows average and P90 upload latency, with chunk counts, for video ingestion. Each row is one test: either a single video or a batch of multiple videos uploaded together. The Duration column shows how long each video is (or the range of video durations in a batch).

Content

No. of Videos

Duration

Total chunks

Avg chunks per video

Avg (s)

P90 (s)

Total size

Video 1

1

140 min

1684

1684

138.5

2421.59 MB

Video 2

1

60 min

720

720

59.4

1047.76 MB

Batch 1

2

25 s–210 s

47

23.5

4.57

5.43

183.25 MB

Batch 2

17

8 s–37 s

81

4.8

2.22

2.78

72.28 MB

Content

No. of Videos

Duration

Total chunks

Avg chunks per video

Avg (s)

P90 (s)

Total size

Video 1

1

140 min

1684

1684

99.7

2421.59 MB

Video 2

1

60 min

720

720

41.8

1047.76 MB

Batch 1

2

25 s–210 s

47

23.5

7.90

9.10

183.25 MB

Batch 2

17

8 s–37 s

81

4.8

4.01

5.48

72.28 MB