Overview

Introduction

The Multi-Camera Tracking app includes two microservices: Multi-Camera Fusion - MTMC (Multi-Target Multi-Camera Tracking) and Multi-Camera Fusion - RTLS (Real-Time Location System). We can deploy either or both microservices simultaneously. They first transform raw data into behaviors, including trajectories and embeddings. Then these behaviors are clustered and global IDs are assigned, leveraging both re-identification (ReID) features and spatio-temporal information. Each microservice can be executed in either Batch Processing or Stream Processing. Additionally, the pipeline offers modules for both evaluation and visualization.

  • Batch Processing handles raw data from local files. In Multi-Camera Fusion - MTMC, the entire dataset is processed as a single batch. In Multi-Camera Fusion - RTLS, batch processing simulates stream processing by setting a fixed bucket size of 200 ms and sleeping time of 100 ms.

  • Stream Processing deals with live data input from a Kafka topic, processing information in micro-batches. This approach maintains the state of behavioral data, integrating new clustering results with pre-existing IDs.

Multi-Camera Fusion - MTMC

The Multi-Camera Fusion - MTMC microservice leverages hierarchical clustering to cluster live behaviors and refines these clusters through iterative reassignment using the Hungarian matching algorithm, delivering multi-camera tracking results with higher accuracy for every frame.

Architecture - Multi-Camera Fusion - MTMC

The MTMC microservice processes input from the perception pipeline including detection, tracking, and embedding results. The processed data, along with camera calibration information, are directed to Kafka. This system consists of the following modules:

  • The Behavior State Management module maintains live behaviors from previous batches and concatenates them with data from incoming micro-batches.

  • The Behavior Processing module captures trajectories, embeddings, and other pertinent data, and filters them for further processing.

  • Multi-Camera Tracking is used for two-step hierarchical clustering, re-assigning behaviors that co-exist, and suppressing overlapping behaviors.

  • The Merging IDs module consolidates individual object IDs into global IDs, maintaining a correlation of objects observed across multiple sensors.

  • The system integrates with Elasticsearch for querying functions by ID, supporting matches over extended periods using appearance and spatio-temporal cues. Logstash acts as a secondary conduit, enhancing the data flow and analytics capabilities of the MTMC microservice.

Multi-Camera Fusion - RTLS

The Multi-Camera Fusion - RTLS microservice initiates by establishing clusters and then continuously updates the locations of global IDs using online Hungarian matching, providing real-time location updates. This mechanism ensures continuous and smooth trajectories, optimizing real-time tracking efficiency. It requires that the fields of view (FOVs) of all cameras provide comprehensive and unbroken coverage of the entire area traversed by the targets. It is recommended that each target is covered by 3-4 camera views at any time.

Architecture - Multi-Camera Fusion - RTLS

This real-time online multi-camera tracking algorithm is tailored for sub-second batch intervals and more consistent assignment of global IDs. The process begins with video input through perception and behavior management. Multi-processing is used to process the incoming raw data from perception microservice. Each sub-process has a Kafka consumer which consumes the raw byte data. The byte data is converted to protobuf and eventually converted to behavior objects. The behavior objects are shared between each sub-process and main process via a shared memory list. This approach allows us to scale the system for a larger number of cameras.

Initially, hierarchical clustering and re-assignment schemes from MTMC clustering are used on the behaviors obtained from each sub-process to initialize the RTLS state. These initial states act as “anchors” for reference in subsequent online tracking:

  • In future batches, the Hungarian matching algorithm is employed to assign micro-batch behaviors to these “anchors.”

  • With each successful match, the embeddings of RTLS state objects are updated at a predefined learning rate.

  • We have a mechanism to re-initialize anchors via clustering. The re-initialized anchors will be stitched with existing anchors to maintain the consistency of global IDs. The re-initialization of anchors is triggered if one of the following conditions is met:

    • The ratio of matched behaviors in the previous batch is under a specified threshold.

    • The number of clusters is too large or too small compared with the pre-defined number of clusters.

  • We also provided an option of Dynamic Anchor Update (DAU) consisting of (1) removal of inactive anchors and (2) activation of shadow anchors. DAU can improve long-term tracking robustness and handle entering/exiting targets in RTLS.

    • Removal of inactive anchors cleans up anchors whose associated behaviors have all expired.

    • Activation of shadow anchors creates new anchors from shadow anchors that are initialized using unmatched behaviors.

    • A drawback of DAU is that it’s sensitive to the accuracy of people detection. When testing on noisy detection results with many false positives, the maximum ID could grow faster than expected. Therefore, this option is disabled by default.

Note: RTLS doesn’t support run-time (dynamic) changes of config and calibration.

Refer to the Multi-Camera Tracking reference application for more information for the usage.

Comparison of Different Approaches

Compare side-by-side the MTMC microservice, RTLS microservice and Query-by-Example (QBE) feature or API:

Multi-Camera Fusion - MTMC

Multi-Camera Fusion - RTLS

Query-by-Example (QBE)

Input

Raw data in JSON/protobuf format logged from the perception pipeline or streamed Kafka messages from the mdx-raw topic. Please refer to the example playback data in metropolis-apps-data.tar.gz.

Raw data in JSON/protobuf format logged from the perception pipeline or streamed Kafka messages from the mdx-raw topic. Please refer to the example playback data in metropolis-apps-data.tar.gz.

REST API input has example’s object ID, sensor ID and timestamp. Input can also include time range params, match score threshold, top K matches and sensor ID from which similar behaviors have to be obtained.

Output

MTMC objects representing clustered behaviors. Stream processing results are consistent with batch processing if behavior retention time is longer than the data length. Global IDs are updated per batch.

Locations of MTMC objects and their count, updated per micro-batch. The global IDs and locations are more consistent through online tracking.

Behaviors with similar embeddings and their match score are obtained as the output of the REST API.

Behavior

Processes raw data in single batch (batch mode) or micro batches from live streams, maintaining behaviors and MTMC objects in state. Evaluation runs at end of batch mode if ground truth is available.

Processes raw data in sub-second micro batches for real-time updates, with behaviors and MTMC objects maintained in state. Batch mode processes micro batches in a pre-defined bucket size of 200 ms.

The embedding of the object is normalized before being used to search for behavior IDs in Milvus with similar embeddings. This behavior ID is further used by REST API to obtain the behavior metadata from Elasticsearch.

Important Configurations

Filtering parameters for locations, embeddings, behaviors, clustering parameters, and parameters for stream processing such as Kafka parameters.

Filtering parameters, retention time for state management, initialization buffer for MTMC plus state, online matching parameters, spatio-temporal constraint, dynamic update of MTMC plus state, pre-defined frame batch size, time interval for matching behaviors, etc.

nprobeQbe, topK are couple of params that can be configured in Web-API or as part of query params for QBE query. The following can be configured when running a similarity query via Milvus UI: metric_type (current value: IP). Check milvus docs for more info.

Computation Performance

Batch mode requires more memory but total runtime should be shorter than stream mode. Additional processing time needed for each micro batch.

Can run in real time (update in less than a second) for up to 100 cameras. The memory is also less than MTMC tracking.

Depends on the performance of Milvus database.

Accuracy

Best accuracy as all available data are processed in offline mode, but dependent on behavior state retention time.

Lower recall rate than MTMC microservice, with global IDs and locations more consistent through online tracking.

Accuracy depends on the quality of embeddings. QBE is better suited to find similar behaviors in the past data.

For more details on the various approaches, refer to this documentation section as well as the README.md in the metropolis-apps-standalone-deployment/modules/multi-camera-tracking/ directory.

Input & Output

The input and output of each module in the microservice are described as follows:

Module

Input (source)

Output (sink)

Description

Perception

Dataset

Protobuf (Kafka)

Processes video data and generates mdx-raw messages with single-camera tracking results and ReID feature embeddings

Behavior State Management

Dataset

Dataset of Behaviors (backed up by persistence storage)

Maintains the state of Behavior objects across micro-batches

Behavior Processing

Dataset

Dataset of Behaviors

Processes Behavior objects through georeferencing and filtering

Multi-Camera Tracking

Dataset of Behaviors

Dataset of MTMC Objects

Clusters behaviors based on hierarchical clustering, refines the clusters by Hungarian algorithm, and suppresses overlapping behaviors by linear programming

Merging IDs

Dataset of Micro-Batch MTMC Objects

Dataset of Merged MTMC Objects

Merges micro-batch MTMC objects with existing global IDs

Visualization Samples

In MTMC visualization, by defining the setup.vizMode (“frames”, “behaviors”, “mtmc_objects”, “ground_truth_bboxes”, or “ground_truth_locations”) and setup.vizMtmcObjectsMode (“grid”, “sequence”, or “topview”) in the configuration, you can plot the results from batch processing by running the visualization script main_mtmc_visualization.py. There is also an example notebook (viz_e2e_mtmc_results.ipynb) as a reference.

In RTLS visualization, use the viz_rtls_results.ipynb notebook to plot the locations on the top-view map. The perception results in each sensor can be displayed on the side for reference.

For more details, refer to the README.md in the metropolis-apps-standalone-deployment/modules/multi-camera-tracking/ directory. Example screenshots of different visualization modes are demonstrated below.

Frames


Visualization of Frames

Behaviors


Visualization of Behaviors

MTMC Objects in Grid


Visualization of MTMC Objects in Grid

MTMC Objects in Sequence


Visualization of MTMC Objects in Sequence (A)

Visualization of MTMC Objects in Sequence (B)

MTMC Objects in Top View


Visualization of MTMC Objects in Top View

Ground Truth Bounding Boxes


Visualization of Ground Truth Bounding Boxes

Ground Truth Locations


Visualization of Ground Truth Locations

RTLS Results


Visualization of RTLS Results