KPI
Accuracy
End-to-End Application
Version |
HOTA |
DetA |
AssA |
LocA |
MOTA |
IDF1 |
---|---|---|---|---|---|---|
v1.0 |
47.976% |
57.926% |
39.743% |
77.502% |
78.577% |
71.923% |
v1.0.1 |
59.102% |
61.036% |
57.253% |
77.417% |
83.296% |
88.254% |
v1.1 |
61.524% |
64.016% |
59.139% |
80.507% |
82.021% |
87.184% |
v2.0/2.1 |
62.910% |
64.772% |
61.109% |
80.677% |
83.270% |
88.195% |
Evaluation Using the NVIDIA MTMC Dataset
8 people
7 cameras
10 mins each
310K 2D bounding boxes
86K frames
Evaluation Metrics - Measured based on 2D bounding boxes
HOTA (Higher Order Tracking Accuracy): HOTA is a metric that measures tracking accuracy by considering both detection and association aspects. It combines aspects of localization, association, and detection in one single score.
DetA (Detection Accuracy): Part of the HOTA metric, DetA evaluates how well the objects are detected within the frames.
AssA (Association Accuracy): Part of the HOTA metric, AssA assesses how well the detected objects are matched across different frames.
LocA (Localization Accuracy): Part of the HOTA metric, LocA measures the accuracy of the localized position of detected objects.
MOTA (Multiple Object Tracking Accuracy): MOTA is a popular metric for evaluating multiple object tracking. It considers missed detections, false alarms, and identity switches.
IDF1 (ID F1 Score): IDF1 is a metric that evaluates the ratio of correctly identified detections over the average number of ground-truth and computed detections.
Steps for Evaluation (Batch Mode)
Prepare synchronized videos for evaluation.
Annotate the ground truth of multiple object tracking (MOT) using the format
<camera_id> <object_id> <frame_id> <bbox_left> <bbox_top> <bbox_width> <bbox_height> <x> <y>
which is similar but not identical to the MOTChallenge format.
Currently, we support evaluation for
2D bounding boxes (default), where
-1 -1
can be used as<x> <y>
values.3D locations, where valid 3D location coordinates should be used for
<x> <y>
values.Valid
<camera_id> <object_id> <frame_id> <bbox_left> <bbox_top> <bbox_width> <bbox_height>
are required for both the above evaluation types.Ensure that the
<camera_id>
field corresponds to the sensor id from your calibration file. Example: sensor_idCamera_01
should be denoted as1
in the<camera_id>
ID field, sensor_idCamera_02
should be denoted as2
in the<camera_id>
ID field & so on.You may look for GitHub repositories of MOT annotating tools to accelerate the labeling process.
You can refer to the MTMC GT generation section for more details.
Log the raw data as a JSON file or protobuf bytes from the
mdx-raw
topic when running the perception pipeline. Setmsg-conv-payload-type
as 1 or 2 in the perception configuration to output the raw messages in the expected format. Refer to Configuration.For the next 2 steps, learn more about the MTMC batch processing mode, and refer to two
README.md
files in themetropolis-apps-standalone-deployment/modules/multi-camera-tracking/
directory and in itssynchronize_metadata/
sub-directory.
Set the paths to raw data and ground truth in the app config file for multi-camera tracking
resources/app_config.json
.Run the batch processing script
python3 -m main_batch_processing --config resources/app_config.json --calibration resources/calibration.json
Notes
The results on other datasets may vary due to domain gap with the training data, occlusion, cluttered background, lighting condition, etc. To improve the accuracy, please fine-tune the configuration parameters following the instructions here. You may also train the detection and re-identification models with more labeled data in your target domain using the NVIDIA TAO Toolkit.
The stream processing results may get worse when the behavior retention time is shortened.
Depending on the memory available, there may be frame drops when the number of streams to be processed in parallel is too large. In the evaluation metrics, please compare the count of detections and the count of ground truths. If the count of detections is significantly smaller, there may be frame drops in the perception pipeline.
When the input consists of RTSP streams, there could be misalignment of frame IDs in raw data and ground truth. Please adjust the
groundTruthFrameIdOffset
inapp_config.json
accordingly to align the frame IDs to correct the evaluation. To measure the offset of frame IDs, please use the visualization script and setvizMode
asframes
inviz_config.json
.
Perception Pipeline
Performance
Perception Microservice
Runtime profiling of the Perception (DeepStream) microservice:
DS Pipeline |
GPU |
Number of Streams (30FPS) |
CPU Utilization |
GPU Utilization |
|
---|---|---|---|---|---|
Detector |
ReID Model |
||||
PeopleNet v2.6 (ResNet-34) |
ReIdentificationNet v1.1 (ResNet-50) |
NVIDIA T4 |
4 |
1.4% |
60% |
NVIDIA L4 |
15 |
10.6% |
99% |
||
NVIDIA A30 |
16 |
1.9% |
88% |
||
NVIDIA A100 |
21 |
290% (3 cores used) |
~65% |
||
ReIdentificationNet Transformer (Swin Tiny) |
NVIDIA A100 |
13 |
180% (2 cores used) |
71% |
|
PeopleNet Transformer (FAN-Small+DINO) |
ReIdentificationNet v1.1 (ResNet-50) |
NVIDIA A100 |
4 |
177% (2 cores used) |
94% |
ReIdentificationNet Transformer (Swin Tiny) |
NVIDIA A100 |
3 |
130.5% (2 cores used) |
96% |
|
ReIdentificationNet Transformer (Swin Tiny) |
NVIDIA H100 |
6 |
222% (3 cores used) |
96% |
Model Inference Resolution:
PeopleNet v2.6: INT8 inference
ReidentificationNet v1.1: FP16 inference
ReIdentificationNet Transformer: FP16 inference
PeopleNet Transformer: FP16 inference
Metrics obtained on the below system configuration:
NVIDIA T4 : Intel(R) Xeon(R) Gold 6258R CPU @ 2.70 GHz
NVIDIA L4 : AMD EPYC 7313P CPU @ 3.0 GHz
NVIDIA A30 : Intel(R) Xeon(R) Gold 6338N CPU @ 2.20 GHz
NVIDIA A100 : Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
Above measurements are captured in scenes containing approximately 20 objects.
Analytics Components
Runtime profiling of various analytics & API components.
Each one is profiled separately. Perception metadata is from a perception microservice, on a different machine but in the same cluster, processing RTSP streams.
Components |
GPU Utilization |
CPU Memory |
CPU Cores Allocated |
CPU Utilization |
Comment |
---|---|---|---|---|---|
Multi-Camera Tracking |
N/A |
1.3 GiB |
8 |
40% |
GPU is used only during micro-batch processing. |
Behavior Analytics - Driver |
N/A |
3.5 GiB |
6 |
150% (2 cores) |
|
Behavior Analytics - Exec |
N/A |
19.9 GiB per pod (Spark Exec is running in 2 pods) |
150% per pod (Spark Exec is running in 2 pods) |
||
Web API |
N/A |
108 MiB |
60% |
CPU usage will go up based on how many users accessing the UI. |
|
Kafka |
N/A |
8.9 GiB (across 3 pods) |
40% per pod |
||
Milvus |
N/A |
3 GiB |
68% per pod |
||
Logstash |
N/A |
13 GiB (overall 3 pods) |
75% |
Logstash is running on 3 pods |
|
Elastic |
N/A |
9.31 GiB |
260% |
Number of Input Metadata Streams: 4
Metrics obtained on the below system configuration:
GPU: NVIDIA A100
CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz - 12 cores
Media Microservices
Media Microservice KPI details vst kpi