KPI

Accuracy

End-to-End Application

Accuracy of Multi-Camera Fusion - RTLS Microservice (results are based on 3D locations & unique global IDs).

Version

HOTA

DetA

AssA

LocA

MOTA

IDF1

v2.1

82.196%

80.174%

84.854%

89.330%

91.171%

93.943%

Evaluation Using the NVIDIA Retail Synthetic Dataset

  • 20 people

  • 8 cameras

  • 6 mins each

  • 612K 2D bounding boxes & 3D locations

  • 86K frames

Evaluation Metrics - Measured based on 3D locations

  • HOTA (Higher Order Tracking Accuracy): HOTA is a metric that measures tracking accuracy by considering both detection and association aspects. It combines aspects of localization, association, and detection in one single score.

  • DetA (Detection Accuracy): Part of the HOTA metric, DetA evaluates how well the objects are detected within the frames.

  • AssA (Association Accuracy): Part of the HOTA metric, AssA assesses how well the detected objects are matched across different frames.

  • LocA (Localization Accuracy): Part of the HOTA metric, LocA measures the accuracy of the localized position of detected objects.

  • MOTA (Multiple Object Tracking Accuracy): MOTA is a popular metric for evaluating multiple object tracking. It considers missed detections, false alarms, and identity switches.

  • IDF1 (ID F1 Score): IDF1 is a metric that evaluates the ratio of correctly identified detections over the average number of ground-truth and computed detections.

Steps for Evaluation (RTLS Mode)

  1. Annotate the ground truth of multiple object tracking (MOT) using the format <camera_id> <object_id> <frame_id> <bbox_left> <bbox_top> <bbox_width> <bbox_height> <x> <y> which is similar but not identical to the MOTChallenge format.

    • Currently, we support evaluation for 3D locations, where valid 3D location coordinates should be used for <x> <y> values.

    • You may look for GitHub repositories of MOT annotating tools to accelerate the labeling process.

    • Ensure that the <camera_id> field corresponds to the sensor id from your calibration file. Example: sensor_id Camera_01 should be denoted as 1 in the <camera_id> ID field, sensor_id Camera_02 should be denoted as 2 in the <camera_id> ID field & so on.

  2. Log the RTLS output as a txt file from the mdx-rtls topic when running the perception & RTLS pipeline. You can refer to this script for consuming the kafka messages.

  3. Set up the environment & variables as mentioned in the 3D evaluation notebook evaluate_rtls_results.ipynb & run the notebook.

Notes

  • The results on other datasets may vary due to domain gap with the training data, occlusion, cluttered background, lighting condition, etc. To improve the accuracy, please fine-tune the configuration parameters following the instructions here. You may also train the detection and re-identification models with more labeled data in your target domain using the NVIDIA TAO Toolkit.

  • The stream processing results may get worse when the behavior retention time is shortened.

  • Depending on the memory available, there may be frame drops when the number of streams to be processed in parallel is too large. In the evaluation metrics, please compare the count of detections and the count of ground truths. If the count of detections is significantly smaller, there may be frame drops in the perception pipeline.

  • When the input consists of RTSP streams, there could be misalignment of frame IDs in raw data and ground truth. Please adjust the groundTruthFrameIdOffset in app_config.json accordingly to align the frame IDs to correct the evaluation. To measure the offset of frame IDs, please use the visualization script and set vizMode as frames in viz_config.json.

Perception Pipeline

  • People Detection: PeopleNet v2.6 accuracy

  • People Detection: PeopleNet Transformer accuracy

  • People ReID: ReidentificationNet v1.1 accuracy

  • People ReID: ReidentificationNet Transformer accuracy

Performance

Perception Microservice

Runtime profiling of the Perception (DeepStream) microservice:

DS Pipeline

GPU

Number of Streams (30FPS)

CPU Utilization

GPU Utilization

Detector

ReID Model

PeopleNet v2.6 (ResNet-34)

ReIdentificationNet v1.1 (ResNet-50)

NVIDIA T4

4

1.4%

60%

NVIDIA L4

15

10.6%

99%

NVIDIA A30

16

1.9%

88%

NVIDIA A100

21

290% (3 cores used)

~65%

ReIdentificationNet Transformer (Swin Tiny)

NVIDIA A100

13

180% (2 cores used)

71%

PeopleNet Transformer (FAN-Small+DINO)

ReIdentificationNet v1.1 (ResNet-50)

NVIDIA A100

4

177% (2 cores used)

94%

ReIdentificationNet Transformer (Swin Tiny)

NVIDIA A100

3

130.5% (2 cores used)

96%

ReIdentificationNet Transformer (Swin Tiny)

NVIDIA H100

6

222% (3 cores used)

96%

Model Inference Resolution:

  • PeopleNet v2.6: INT8 inference

  • ReidentificationNet v1.1: FP16 inference

  • ReIdentificationNet Transformer: FP16 inference

  • PeopleNet Transformer: FP16 inference

Metrics obtained on the below system configuration:

  • NVIDIA T4 : Intel(R) Xeon(R) Gold 6258R CPU @ 2.70 GHz

  • NVIDIA L4 : AMD EPYC 7313P CPU @ 3.0 GHz

  • NVIDIA A30 : Intel(R) Xeon(R) Gold 6338N CPU @ 2.20 GHz

  • NVIDIA A100 : Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz

Above measurements are captured in scenes containing approximately 20 objects.

Analytics Components

Runtime profiling of various analytics & API components.

Each one is profiled separately. Perception metadata is from a perception microservice, on a different machine but in the same cluster, processing RTSP streams.

Profiling of Analytics Components

Components

GPU Utilization

CPU Memory

CPU Cores Allocated

CPU Utilization

Comment

RTLS

N/A

282.5 MiB

10

913%

Behavior Analytics - Driver

N/A

3.922 GiB

2

150%

Behavior Analytics - Exec

N/A

19.6 GiB per pod

1 core per pod

150%

Total 2 pods.

Web API

N/A

91.82 MiB

1

1.88%

CPU usage will go up based on how many users accessing the UI.

Kafka

N/A

1.801 GiB

1

7%

Logstash

N/A

1.039 GiB

1

61.38%

Elastic

N/A

1.804 GiB

1

38.59%

Number of Input Metadata Streams: 4

Metrics obtained on the below system configuration:

  • GPU: NVIDIA A100

  • CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz - 12 cores

Media Microservices

Runtime profiling of the media microservices:

Profiling of Analytics Components

Components

GPU Utilization

CPU Memory

CPU Utilization

Comment

VST

Decoder: 25%, Encoder: 25%, GPU: 15%

27.36 GiB

53.07%

Max streams: 30. GPU usage will go up based on how many users are accessing the UI and using the overlay feature.

Number of Input RTSP Streams: 30

Metrics obtained on the below system configuration:

  • GPU: NVIDIA A100

  • CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz - 12 cores