RT-CV Performance#

Overview#

The Real-Time Computer Vision (RT-CV) microservice leverages the NVIDIA DeepStream SDK to perform continuous object detection and multi-object tracking on live RTSP streams. Benchmarks measure the maximum number of concurrent 1080p streams a single GPU can sustain at 30 FPS with object detection and tracker enabled.

RT-CV max concurrent 1080p streams per GPU by model

Max concurrent 1080p streams per GPU. RT-DETR Resnet50 provides the highest stream count on server GPUs (H100, RTX Pro 6000 SE); edge platforms (DGX Spark, AGX Thor) are best suited for lighter-weight deployments.#

Test Configuration#

Parameter

Value

VSS Release

3.1

Video resolution

1920×1080 (1080p)

Input format

H.264

Configured stream FPS

30

Model precision

FP16

Inference engine

TensorRT (TRT)

Tracker

Enabled

Models tested

RT-DETR (Resnet50 backbone), RT-DETR (EfficientViT/L2 backbone), Grounding DINO (GDINO)

GPUs tested

H100, RTX Pro 6000 SE, L40S, DGX Spark, AGX Thor

Performance by GPU#

Model

Backbone

Max Streams

Avg Latency (ms)

p90 (ms)

p95 (ms)

GPU Core (%)

CPU Core (%)

RT-DETR

Resnet50

50

408.46

858.32

901.02

92.1

2.6

RT-DETR

EfficientViT/L2

11

56.0

67.34

69.11

88.4

1.1

Grounding DINO

6

61.47

75.25

75.61

90.0

0.9

Model

Backbone

Max Streams

Avg Latency (ms)

p90 (ms)

p95 (ms)

GPU Core (%)

CPU Core (%)

RT-DETR

Resnet50

29

196.59

258.85

327.31

90.6

1.3

RT-DETR

EfficientViT/L2

17

62.02

73.18

74.84

87.0

0.8

Grounding DINO

5

61.08

70.09

72.43

86.5

0.6

Model

Backbone

Max Streams

Avg Latency (ms)

p90 (ms)

p95 (ms)

GPU Core (%)

CPU Core (%)

RT-DETR

Resnet50

15

65.64

72.97

73.72

87.3

0.9

RT-DETR

EfficientViT/L2

4

53.14

66.07

70.5

88.9

0.6

Grounding DINO

3

52.35

64.27

68.04

84.6

0.5

Model

Backbone

Max Streams

Avg Latency (ms)

p90 (ms)

p95 (ms)

GPU Core (%)

CPU Core (%)

RT-DETR

Resnet50

5

171.91

206.27

221.29

95.5

21.9

RT-DETR

EfficientViT/L2

3

116.67

127.64

128.9

95.0

19.4

Grounding DINO*

1

26.71

44.37

45.07

54.3

13.7

Model

Backbone

Max Streams

Avg Latency (ms)

p90 (ms)

p95 (ms)

GPU Core (%)

CPU Core (%)

RT-DETR

Resnet50

4

56.39

68.25

72.33

59.8

22.3

RT-DETR

EfficientViT/L2

3

60.31

76.68

78.48

88.1

20.6

Grounding DINO*

1

43.23

62.2

64.0

67.7

17.6

Note

All benchmarks were measured at 30 FPS, 1080p, H.264 input, FP16 precision with TensorRT, and object tracker enabled. For production deployments, plan for 10–15% headroom below the maximum stream counts listed above.

Note

* Grounding DINO on DGX Spark and AGX Thor is run with interval=1 — inference is performed on every alternate frame to meet the reported stream counts.