Configuration

For more context on configuration, when the microservice is used:

In the Multi-Target Multi-Camera Tracking (MTMC) app, please refer to its Operation Parameters section.
In the Real Time Location System (RTLS) app, please refer to its Operation Parameters section.
As a standalone microservice, refer to the README.md in its respective directory within metropolis-apps-standalone-deployment/modules/.

App Config

App Config in JSON

{
  "io": {
    "enableDebug": false,
    "inMtmcPlusBatchMode": false,
    "batchId": "1",
    "selectedSensorIds": [],
    "outputDirPath": "results",
    "videoDirPath": "metropolis-apps-data/videos/mtmc-app",
    "jsonDataPath": "metropolis-apps-data/playback/mtmc_buildingK_playback.json",
    "protobufDataPath": "",
    "groundTruthPath": "",
    "groundTruthFrameIdOffset": 1,
    "useFullBodyGroundTruth": false,
    "use3dEvaluation": false,
    "plotEvaluationGraphs": false
  },
  "preprocessing": {
    "filterByRegionsOfInterest": false,
    "timestampThreshMin": 120,
    "locationBboxBottomGapThresh": 0.02,
    "locationConfidenceThresh": 0.5,
    "locationBboxAreaThresh": 0.0008,
    "locationBboxAspectRatioThresh": 0.6,
    "embeddingBboxBottomGapThresh": 0.02,
    "embeddingConfidenceThresh": 0.5,
    "embeddingBboxAreaThresh": 0.0008,
    "embeddingBboxAspectRatioThresh": 0.6,
    "embeddingVisibilityThresh": 0.5,
    "behaviorConfidenceThresh": 0.45,
    "behaviorBboxAreaThresh": 0.0007,
    "behaviorBboxAspectRatioThresh": 0.75,
    "behaviorLengthThreshSec": 0.0,
    "shortBehaviorFinishThreshSec": 1.0,
    "behaviorNumLocationsMax": 9000,
    "behaviorSplitThreshSec": 6,
    "behaviorRetentionInStateSec": 600.0,
    "mtmcPlusRetentionInStateSec": 10.0,
    "mtmcPlusInitBufferLenSec": 10.0,
    "mtmcPlusReinitRatioAssignedBehaviors": 0.75,
    "mtmcPlusReinitDiffRatioClusters": null
  },
  "localization": {
    "rectifyBboxByCalibration": false,
    "peopleHeightMaxLengthSec": 600,
    "peopleHeightNumSamplesMax": 1000,
    "peopleHeightNumBatchFrames": 10000,
    "peopleHeightEstimationRatio": 0.7,
    "peopleHeightVisibilityThresh": 0.8,
    "overwrittenPeopleHeightMeter": null
  },
  "clustering": {
    "clusteringAlgo": "HDBSCAN",
    "overwrittenNumClusters": null,
    "agglomerativeClusteringDistThresh": 3.5,
    "hdbscanMinClusterSize": 5,
    "numReassignmentIterations": 4,
    "reassignmentDistLooseThresh": 1.0,
    "reassignmentDistTightThresh": 0.12,
    "spatioTemporalDistLambda": 0.15,
    "spatioTemporalDistType": "Hausdorff",
    "spatioTemporalDirMagnitudeThresh": 0.5,
    "enableOnlineSpatioTemporalConstraint": true,
    "onlineSpatioTemporalDistThresh": 15.0,
    "suppressOverlappingBehaviors": false,
    "meanEmbeddingsUpdateRate": 0.1,
    "skipAssignedBehaviors": true,
    "enableOnlineDynamicUpdate": false,
    "dynamicUpdateAppearanceDistThresh": 0.2,
    "dynamicUpdateSpatioTemporalDistThresh": 10.0,
    "dynamicUpdateLengthThreshSec": 9.0
  },
  "streaming": {
    "kafkaBootstrapServers": "mdx-kafka-cluster-kafka-brokers:9092",
    "kafkaProducerLingerMs": 0,
    "kafkaMicroBatchIntervalSec": 60.0,
    "kafkaRawConsumerPollTimeoutMs": 10000,
    "kafkaNotificationConsumerPollTimeoutMs": 100,
    "kafkaConsumerMaxRecordsPerPoll": 100000,
    "sendEmptyMtmcPlusMessages": true,
    "mtmcPlusFrameBatchSizeMs": 180,
    "mtmcPlusBehaviorBatchesConsumed": 4,
    "mtmcPlusFrameBufferResetSec": 4.0,
    "mtmcPlusTimestampDelayMs": 100,
    "mtmcPlusLocationWindowSec": 1.0,
    "mtmcPlusSmoothingWindowSec": 1.0,
    "mtmcPlusNumProcessesMax": 8
  }
}

Instructions for Fine-Tuning App Config

Key areas for parameter fine-tuning:

Behavior Pre-processing: Adjust data quality and behavior retention for streaming.
Localization: Enhance tracking by addressing occlusions and estimating person height.
Clustering: Configure clustering algorithms and manage overlapping behaviors.
Streaming (Kafka): Control the duration of micro-batches for streaming.

Note

Pre-processing, localization, and clustering parameters can be updated in real-time via API. For more information, see here.

1. Behavior Pre-processing (Filtering)

Fine-tuning the parameters for behavior pre-processing, particularly the filtering process, involves adjusting various thresholds based on location, embeddings, and behavior. It’s important to remember the following:

Thresholds for the size or area of bounding boxes should be considered relative to the overall frame size or area.
The filterByRegionsOfInterest option allows for filtering based on predefined regions of interest established during the calibration phase.

Key groups of parameters:

Location-based Thresholds: These are crucial for filtering ground plane trajectories, which are used to calculate spatio-temporal distances. Relying too much on omitting locations can cause the algorithm to depend more heavily on appearance features.

Embedding-based Thresholds: These help to filter out feature embeddings that represent object appearances.

Behavior-based Thresholds: These thresholds have a direct impact on behavior analysis. Increasing these thresholds may reduce outliers during the clustering process.

The behaviorRetentionInStateSec parameter indicates how long (in seconds) a behavior is maintained in the system’s state. If a behavior ends before this timeframe in the current micro batch, it is removed from the state. A longer retention time means more historical data is kept, potentially increasing accuracy but requiring more memory and processing power for clustering. For the Multi-Camera Fusion - MTMC microservice, it’s best to limit retention time to the shorter of two figures: either the maximum predicted time an object could disappear from all cameras before reappearing, or the maximum time an object is expected to be tracked across multiple cameras in a semi-online mode. For the Multi-Camera Fusion - RTLS microservice, a shorter retention time is recommended to facilitate real-time processing. For more information on prolonged durations, refer to Query-by-Example.

Similarly, the mtmcPlusRetentionInStateSec parameter defines how long (in seconds) an MTMC plus object is retained in the system. A longer retention period allows the online tracking algorithm to utilize more spatio-temporal information for matching behaviors to improve accuracy. However, keeping this value low is advised for real-time processing efficiency.

For effective online tracking in RTLS, it’s necessary to initialize the MTMC plus state early on to enable Hungarian matching in subsequent batches. Behaviors are accumulated until the mtmcPlusInitBufferLenSec threshold is reached. If the initial object locations are unsatisfactory, increasing this buffer length may improve tracking accuracy.

In the RTLS microservice, the MTMC plus state can be re-initialized to adapt to dynamic changes in the following circumstances.

The ratio of matched behaviors in the previous batch falls below a specified threshold.
The number of clusters deviates significantly from a pre-defined number, being either too large or too small.

To minimize the frequency of MTMC plus state re-initializations, it is advisable to adjust the parameters mtmcPlusReinitRatioAssignedBehaviors and mtmcPlusReinitDiffRatioClusters. Specifically, reducing the former and/or increasing the latter can be effective, provided that overwrittenNumClusters is set accordingly. It is important to note that re-initialization involves running clustering and iterative Hungarian re-assignment processes, which may momentarily interrupt online tracking.

2. Localization

For addressing occlusions, rectifyBboxByCalibration can be enabled. By default, the system uses the “foot position” (center point of the lower bounding box edge) to determine an individual’s location in a 3D environment. In cases of occlusion where only the upper body is visible, enabling calibration-based rectification is useful.

Upon enabling rectifyBboxByCalibration, the system: - In the camera view, computes “head position” (x_head, y_head) at the top bounding box edge’s center, and “foot position” (x_foot, y_foot) at the bottom edge’s center. - Projects the “head position” to the Z=people_height plane in the 3D world, determining the “foot position” in 3D as (X, Y, 0). - Projects this “foot position” back to the camera view (x_foot_estimated, y_foot_estimated). - Computes visibility as min (1, (y_foot-y_head)/(y_foot_estimated-y_head)). - If visibility falls below peopleHeightVisibilityThresh, the bounding box is adjusted.

The system can estimate the average height of people by either collecting data at the start or by using a pre-defined height if overwrittenPeopleHeightMeter is set. Related parameters include:

rectifyBboxByCalibration: To activate the calibration-based rectification.

peopleHeightMaxLengthSec: Max duration for initial data collection to estimate height in streaming mode.

peopleHeightNumSamplesMax: Max number of bounding boxes for initial height estimation in streaming mode.

peopleHeightNumBatchFrames: Max frames for initial data collection in batch mode.

peopleHeightEstimationRatio: Portion of data used for height estimation.

peopleHeightVisibilityThresh: Visibility threshold for bounding box adjustments.

overwrittenPeopleHeightMeter: Manually set height value to bypass system estimation.

3. Clustering

Our system offers support for two main clustering algorithms through the clusteringAlgo parameter. Depending on your selection:

HDBSCAN: Use hdbscanMinClusterSize to fine-tune accuracy. Generally, a larger minimum cluster size results in fewer, but potentially more meaningful, output clusters.
AgglomerativeClustering: Adjust agglomerativeClusteringDistThresh to achieve the best clustering outcomes. A higher distance threshold tends to yield a smaller number of output clusters.

For effective parameter tuning, start with a representative micro batch to assess the total count of global IDs. Adjust these parameters until you reach the desired cluster count. If the number of resulting clusters is less than the maximum number of co-existing behaviors, the system will automatically make corrections. The overwrittenNumClusters parameter allows for the direct setting of the cluster count for the agglomerative clustering algorithm.

Depending on the robustness of the re-identification features, adjust the reassignmentDistLooseThresh, reassignmentDistTightThresh accordingly to suppress ID switches. The spatioTemporalDistLambda, spatioTemporalDirMagnitudeThresh and spatioTemporalDistType are used to control the factor of spatio-temporal distance when combining with apperance distance for Hungarian matching.

In RTLS microservice, to maintain continuous and smooth object locations, set enableOnlineSpatioTemporalConstraint to true. This ensures that only behaviors within a certain distance (onlineSpatioTemporalDistThresh) can be matched to an MTMC plus object in the state. The meanEmbeddingsUpdateRate controls how frequently the appearance embeddings of each MTMC plus object are updated upon matching with new behaviors. To keep online matching swift and support real-time processing, you can opt to skip behaviors already assigned in the current batch by enabling skipAssignedBehaviors.

Furthermore, enableOnlineDynamicUpdate in RTLS microservice allows the system to handle dynamic change of MTMC plus state objects and adapt to objects entering or exiting the scene during online tracking. Shadow MTMC plus state objects are created for unmatched behaviors. These shadow objects will be merged if their appearance distance and spatio-temporal distance fall within dynamicUpdateAppearanceDistThresh and dynamicUpdateSpatioTemporalDistThresh, respectively. These shadow objects become normal ones if their accumulated length exceeds dynamicUpdateLengthThreshSec.

Additional parameters include:

numReassignmentIterations: Specifies the number of iterations for re-assigning co-existing behaviors using the Hungarian algorithm. More iterations can improve accuracy but may increase computation time.

reassignmentDistThresh: Sets the distance threshold for re-assigning a behavior to a cluster, using Hungarian matching, with values ranging from 0.0 to 1.0.

spatioTemporalDistLambda: Balances how normalized spatio-temporal distances are combined with appearance-based distances for re-assigning co-existing behaviors.

spatioTemporalDistType: Offers two types of distance calculations: “Hausdorff” and “pairwise”, with the latter being more computationally efficient.

suppressOverlappingBehaviors: Controls the suppression of overlapping behaviors via linear programming. Disabling this feature may increase the algorithm’s adaptability and accuracy.

4. Streaming (Kafka)

The pivotal parameter for streaming in Kafka is kafkaMicroBatchIntervalSec. This defines the duration of each micro-batch. Once a micro batch’s raw data is received, it’s pre-processed into behaviors which are then merged with those already in the state. The live behaviors from the current state are then used for clustering, leading to MTMC object creation.

Considerations: - Micro-batch intervals: Shorter intervals guarantee faster and regular UI updates, but can also lead to fragmented global IDs. - Processing times: If intervals are too brief, resulting in processing durations surpassing these intervals, outputs might lag. - Computation cost: Clustering happens for every micro-batch. Hence, multiple smaller batches could be more resource-intensive than a longer one.

In the RTLS microservice, the trajectories of each MTMC plus object can be smoothed by adjusting two configuration parameters: mtmcPlusLocationWindowSec and mtmcPlusSmoothingWindowSec. The mtmcPlusLocationWindowSec parameter (default: 1.0 second) aggregates individual locations from all sensors to calculate the current global location, while the mtmcPlusSmoothingWindowSec parameter (default: 1.0 second) aggregates these global locations over a temporal window to calculate an average, smoothing the trajectories. Increasing these parameters will introduce a delay in the actual locations received at the Kafka consumer. The delay introduced is (mtmcPlusLocationWindowSec + mtmcPlusSmoothingWindowSec) / 2 seconds, resulting in a default delay of 1 second. These parameters should be minimized to reduce the delay introduced for RTLS display. To address specific issues in difficult scenarios, increase mtmcPlusLocationWindowSec to reduce “ghosting dots`` (flashing locations) and increase mtmcPlusSmoothingWindowSec to reduce jittering trajectories.

It is also recommended to use larger mtmcPlusNumProcessesMax depending on the available CPU cores.

Note

Out of the above 4 mtmc config categories, the following 3 category’s configs can be dynamically updated during runtime: preprocessing, localization and clustering. These configs can be updated by using the /config/update/:docType analytics API endpoint (docType will have the value mdx-mtmc-analytics). For more details check the open-api spec.

App Config Details

Parameters
Name	Category	Type	Default	Range	Description
enableDebug	io	bool	False	True or False	If true, save intermediate results, i.e., frames, behaviors, and MTMC objects in JSON format that can be used for visualization, during MTMC tracking. In RTLS, this flag can add `frameId` in the Kafka messages that is required for evaluation and visualization.
inMtmcPlusBatchMode	io	bool	False	True or False	If true, use the maximum timestamp in each batch as the current timestamp, only for RTLS batch processing. Set this parameter to false if the current timestamp is available and accurate.
batchId	io	str	“1”		The pre-defined batch ID for MTMC batch processing.
selectedSensorIds	io	list	[]		The selected sensor IDs to be processed. If empty, all the sensors are processed.
outputDirPath	io	str			The output directory for saving files.
videoDirPath	io	str			The directory of input videos.
jsonDataPath	io	str			The input raw data file in JSON format.
protobufDataPath	io	str			The input raw data file in protobuf format.
groundTruthPath	io	str			The input ground truth file in the format of MOTChallenge. If not found, the evaluation is not conducted.
groundTruthFrameIdOffset	io	int	1		The offset of frame IDs in the ground truth in comparison with the raw data.
useFullBodyGroundTruth	io	bool	False	True or False	If true, use full-body bounding boxes recovered from estimated foot points for evaluation.
use3dEvaluation	io	bool	False	True or False	If true, use projected foot points on the ground plane in 3D for evaluation.
plotEvaluationGraphs	io	bool	False	True or False	If true, plot the evaluation graphs in the output directory.
filterByRegionsOfInterest	preprocessing	bool	False	True or False	If true, filter the behaviors and corresponding embeddings and locations by the regions of interest in calibration.
timestampThreshMin	preprocessing	Optional[float]	None	>= 0 or None	The timestamp threshold in minute, which is used to filter away old frames in raw data. It is disabled when the value is None.
locationBboxBottomGapThresh	preprocessing	float	0.02	>= 0 and <= 1	The threshold for filtering locations based on the gap between the bounding box’s bottom and the bottom of the frame image. It is a ratio against the frame height. Locations whose corresponding bounding boxes’ bottom gaps are smaller than this threshold are filtered away.
locationConfidenceThresh	preprocessing	float	0.5	>= 0 and <= 1	The detection confidence threshold for filtering locations. Locations whose corresponding detection confidences are smaller than this threshold are filtered away.
locationBboxAreaThresh	preprocessing	float	0.0008	>= 0 and <= 1	The bounding box area threshold for filtering locations. It is a ratio against the frame area. Locations whose corresponding bounding boxes’ areas are smaller than this threshold are filtered away.
locationBboxAspectRatioThresh	preprocessing	float	0.6	>= 0	The bounding box aspect ratio threshold for filtering locations. Locations whose corresponding bounding boxes’ aspect ratios are larger than this threshold are filtered away.
embeddingBboxBottomGapThresh	preprocessing	float	0.02	>= 0 and <= 1	The threshold for filtering embeddings based on the gap between the bounding box’s bottom and the bottom of the frame image. It is a ratio against the frame height. Embeddings whose corresponding bounding boxes’ bottom gaps are smaller than this threshold are filtered away.
embeddingConfidenceThresh	preprocessing	float	0.5	>= 0 and <= 1	The detection confidence threshold for filtering embeddings. Embeddings whose corresponding detection confidences are smaller than this threshold are filtered away.
embeddingBboxAreaThresh	preprocessing	float	0.0008	>= 0 and <= 1	The bounding box area threshold for filtering embeddings. It is a ratio against the frame area. Embeddings whose corresponding bounding boxes’ areas are smaller than this threshold are filtered away.
embeddingBboxAspectRatioThresh	preprocessing	float	0.6	>= 0	The bounding box aspect ratio threshold for filtering embeddings. Embeddings whose corresponding bounding boxes’ aspect ratios are larger than this threshold are filtered away.
embeddingVisibilityThresh	preprocessing	float	0.5	>= 0 and <= 1	The bounding box visibility threshold for filtering embeddings. Embeddings whose corresponding bounding boxes’ visibilities are smaller than this threshold are filtered away.
behaviorConfidenceThresh	preprocessing	float	0.45	>= 0 and <= 1	The detection confidence threshold for filtering behaviors. Behaviors whose corresponding mean of detection confidences are smaller than this threshold are filtered away.
behaviorBboxAreaThresh	preprocessing	float	0.0007	>= 0 and <= 1	The bounding box area threshold for filtering behaviors. It is a ratio against the frame area. Behaviors whose corresponding mean of bounding boxes’ areas are smaller than this threshold are filtered away.
behaviorBboxAspectRatioThresh	preprocessing	float	0.75	>= 0	The bounding box aspect ratio threshold for filtering behaviors. Behaviors whose corresponding mean of bounding boxes’ aspect ratios are larger than this threshold are filtered away.
behaviorLengthThreshSec	preprocessing	float	0.0	>= 0	The behavior length threshold in second for filtering behaviors. Behaviors whose corresponding lengths are smaller than this threshold are filtered away. The `shortBehaviorFinishThreshSec` needs to be adjusted accordingly to enable filtering by behavior length.
shortBehaviorFinishThreshSec	preprocessing	Optional[float]	None	>= 0 or None	The threshold in minute for filtering away short behaviors (under `behaviorLengthThreshSec`) that have not finished. It is disabled when the value is None.
behaviorNumLocationsMax	preprocessing	int	9000	>= 0	The maximum number of locations for a behavior. If the number of locations is above this threshold, the locations are sampled.
behaviorSplitThreshSec	preprocessing	int	6	>= 0	The threshold in second to split a behavior if the gap between timestamps is above this value.
behaviorRetentionInStateSec	preprocessing	float	600.0	>= 0	The retention time limit in second for the behavior records in state, ignored in MTMC batch processing.
mtmcPlusRetentionInStateSec	preprocessing	float	10.0	>= 0	The retention time limit in second for the MTMC plus records in state, ignored in MTMC microservice.
mtmcPlusInitBufferLenSec	preprocessing	float	10.0	>= 0	The length of the buffer in second for initializing MTMC plus state at RTLS microservice.
mtmcPlusReinitRatioAssignedBehaviors	preprocessing	float	0.75	>= 0 and <= 1	The minimum ratio of assigned behaviors to trigger re-initialization of MTMC plus state, ignored in MTMC microservice.
mtmcPlusReinitDiffRatioClusters	preprocessing	Optional[float]	None	>= 0 and <= 1 or None	The maximum ratio of difference in the number of clusters compared to `overwrittenNumClusters` to trigger re-initialization of MTMC plus state, ignored in MTMC microservice.
rectifyBboxByCalibration	localization	bool	False	True or False	The flag to enable the calibration-based rectification of bounding boxes by estimating people’s height.
peopleHeightMaxLengthSec	localization	int	600	> 0	The max time duration in second for collecting data to estimate people’s height at start. The estimation of people’s height is conducted when either the condition of `peopleHeightMaxLengthSec` or `peopleHeightNumSamplesMax` is met.
peopleHeightNumSamplesMax	localization	int	1000	> 0	The max number of bounding boxes for collecting data to estimate people’s height at start. The estimation of people’s height is conducted when either the condition of `peopleHeightMaxLengthSec` or `peopleHeightNumSamplesMax` is met.
peopleHeightNumBatchFrames	localization	int	10000	> 0	The max number of frames for collecting data to estimate people’s height at start, ignored in stream processing.
peopleHeightEstimationRatio	localization	float	0.7	> 0 and <= 1	The potion of collected data to be used for people height estimation. The smaller people’s heights are likely to be from occluded instances, and thus this parameter is used to filter them away.
peopleHeightVisibilityThresh	localization	float	0.8	>= 0 and <= 1	The bounding box visibility threshold for rectifying bounding boxes. Bounding boxes whose corresponding visibilities are smaller than this threshold are rectified.
overwrittenPeopleHeightMeter	localization	Optional[float]	1.8	> 0 or None	The people’s height in meter for overwriting the estimated people’s height.
clusteringAlgo	clustering	str	“HDBSCAN”	[“HDBSCAN”, “AgglomerativeClustering”]	The choice of clustering algorithm, which can be chosen from “HDBSCAN” and “AgglomerativeClustering”.
overwrittenNumClusters	clustering	Optional[int]	None	> 0 or None	The number of clusters for overwriting the clustering results of agglomerative clustering, used when `clusteringAlgo` is “AgglomerativeClustering”. It can be used when the number of objects is fixed and known.
agglomerativeClusteringDistThresh	clustering	float	3.5	> 0	The distance threshold for agglomerative clustering, used when `clusteringAlgo` is “AgglomerativeClustering”. It can be tuned by estimating the number of objects in a sampled batch. A higher threshold results in smaller number of clusters.
hdbscanMinClusterSize	clustering	int	5	>= 2	The minimum size of clusters for HDBSCAN, i.e., the minimum number of behaviors that should be included in each cluster, used when `clusteringAlgo` is “HDBSCAN”. It can be tuned by estimating the number of objects in a sampled batch. A higher value results in smaller number of clusters.
numReassignmentIterations	clustering	int	4	>= 0	The number of iterations for re-assignment of co-existing behaviors based on the Hungarian algorithm. More iterations usually result in better accuracy, but it requires more computation time.
reassignmentDistLooseThresh	clustering	float	1.0	>= 0 and <= 1	The distance threshold (combination of appearance distance and spatio-temporal distance) for re-assigning behaviors to clusters during Hungarian matching. Only when a distance is smaller than this threshold, an assignment can be made. This threshold is used for both MTMC and RTLS modes.
reassignmentDistTightThresh	clustering	float	0.12	>= 0 and <= 1	When a distance (combination of appearance distance and spatio-temporal distance) during re-assignment is smaller than this threshold, force the behavior to be assigned to the corresponding cluster, which can be used to correct ID switches in single-camera matching. This threshold is used in RTLS microservice only.
spatioTemporalDistLambda	clustering	float	0.1	>= 0 and <= 1	The lambda of (normalized) spatio-temporal distance to integrate with appearance-based distance for the re-assignment of co-existing behaviors. A larger value indicates that the spatio-temporal distance is given more weight.
spatioTemporalDirMagnitudeThresh	clustering	float	0.5	>= 0	The spatio-temporal distance is enhanced by a direction influence. The direction influence is applied when the magnitude of direction vector is larger than this threshold. The unit is meter, or the corresponding unit used in calibration.
spatioTemporalDistType	clustering	str	“Hausdorff”	[“Hausdorff”, “pairwise”]	The type of spatio-temporal distance, which can be chosen from “Hausdorff” and “pairwise”.
enableOnlineSpatioTemporalConstraint	clustering	bool	False	True or False	The flag to enable spatio-temporal constraint to yield continuous and smooth locations, used in RTLS microservice.
onlineSpatioTemporalDistThresh	clustering	Optional[float]	None	> 0 or None	The hard spatio-temporal distance threshold for limiting the assignment of behaviors when `enableOnlineSpatioTemporalConstraint` is true. A higher value will limit the number of matches, and thus the number of MTMC plus objects in the state will reduce more rapidly.
suppressOverlappingBehaviors	clustering	bool	False	True or False	The flag to enable suppression of overlapping behaviors based on linear programming. Although overlapping behaviors are due to clustering failures, disabling this feature usually gives more flexibility to the algorithm and yields higher accuracy.
meanEmbeddingsUpdateRate	clustering	float	0.1	>= 0 and <= 1	The ratio of mean embeddings for each MTMC plus object in the state to be updated upon matching with new behaviors, used in RTLS microservice. A higher value will make the appearance more adaptive to changes in the scene.
skipAssignedBehaviors	clustering	bool	True	True or False	The flag to enable skipping assigned behaviors in the current batch to support real-time processing in RTLS microservice.
enableOnlineDynamicUpdate	clustering	bool	True	True or False	The flag to enable dynamic update of MTMC plus objects in the state during online tracking in RTLS microservice. This is usually used to handle entering and exiting objects in the scene.
dynamicUpdateAppearanceDistThresh	clustering	float	0.2	>= 0 and <= 1	The appearance distance threshold for merging temporary MTMC plus objects in the state when `enableOnlineDynamicUpdate` is true.
dynamicUpdateSpatioTemporalDistThresh	clustering	float	10.0	> 0	The spatio-temporal distance threshold for merging temporary MTMC plus objects in the state when `enableOnlineDynamicUpdate` is true.
dynamicUpdateLengthThreshSec	clustering	float	9.0	> 0	The length threshold in second for converting temporary MTMC plus objects in the state to permanent ones when `enableOnlineDynamicUpdate` is true.
kafkaBootstrapServers	streaming	str	“localhost:9092”		A comma-separated list of host-port pairs that are the addresses of the Kafka brokers in a “bootstrap” Kafka cluster that a Kafka client connects initially to bootstrap itself, ignored in MTMC batch processing.
kafkaProducerLingerMs	streaming	int	0	>= 0	The time in millisecond to wait before sending messages out to Kafka, ignored in MTMC batch processing.
kafkaMicroBatchIntervalSec	streaming	float	60.0	>= 0	The time interval in second for each micro batch, ignored in MTMC batch processing. The filter of time duration in the web UI needs to be larger than this value to have events displayed.
kafkaRawConsumerPollTimeoutMs	streaming	int	10000	>= 0	The timeout in millisecond to poll `mdx-raw` messages, ignored in MTMC batch processing.
kafkaNotificationConsumerPollTimeoutMs	streaming	int	100	>= 0	The timeout in millisecond to poll `mdx-notification` messages, ignored in MTMC batch processing.
kafkaConsumerMaxRecordsPerPoll	streaming	int	100000	>= 0	The maximum records per poll, ignored in MTMC batch processing.
sendEmptyMtmcPlusMessages	clustering	bool	True	True or False	The flag to allow empty `mdx-rtls` messages to be sent in RTLS microservice.
mtmcPlusFrameBatchSizeMs	streaming	int	180	>= 0	The frame batch size in millisecond, used in RTLS microservice.
mtmcPlusBehaviorBatchesConsumed	streaming	int	4	>= 1	The behavior batches consumed in RTLS microservice.
mtmcPlusFrameBufferResetSec	streaming	float	4.0	>= 0	The time in second for resetting the frame buffer, used in RTLS microservice.
mtmcPlusTimestampDelayMs	streaming	int	100	>= 0	The time in millisecond to delay the timestamps for synchronizing behaviors from multiple processed, used in RTLS microservice.
mtmcPlusLocationWindowSec	streaming	float	1.0	>= 0	The time window in second to aggregate the matched behaviors’ locations and compute the location of each MTMC plus object, used in RTLS microservice.
mtmcPlusSmoothingWindowSec	streaming	float	1.0	>= 0	The time window in second to smoothen the locations of each MTMC plus object, used in RTLS microservice.
mtmcPlusNumProcessesMax	streaming	int	8	> 0	The max number of processes to run behavior pre-processing in RTLS microservice. This config’s value is determined by the number of cores that are available in the system and the number of partitions assigned to Kafka topic `mdx-raw`.

Viz MTMC Config

Viz Config in JSON

{
  "setup": {
    "vizMode": "mtmc_objects",
    "vizMtmcObjectsMode": "grid",
    "enableMultiprocessing": false,
    "ffmpegRequired": false
  },
  "io": {
    "selectedSensorIds": [],
    "selectedBehaviorIds": [],
    "selectedGlobalIds": [],
    "outputDirPath": "results",
    "videoDirPath": "metropolis-apps-data/videos/mtmc-app",
    "mapPath": "images/building=Nvidia-Bldg-K-Map.png",
    "framesPath": "results/frames.json",
    "behaviorsPath": "results/behaviors.json",
    "mtmcObjectsPath": "results/mtmc_objects.json",
    "groundTruthPath": ""
  },
  "plotting": {
    "gridLayout": [2, 2],
    "blankOutEmptyFrames": false,
    "vizFilteredFrames": true,
    "outputFrameHeight": 1080,
    "tailLengthMax": 200,
    "smoothingTailLengthThresh": 5,
    "smoothingTailWindow": 30
  }
}

Categorization of Config Parameters

MTMC visualization configuration parameters are categorized as:

Setup Parameters: vizMode, vizMtmcObjectsMode, enableMultiprocessing, and ffmpegRequired
Input/Output Parameters: selectedSensorIds, selectedBehaviorIds, selectedGlobalIds, outputDirPath, videoDirPath, mapPath, framesPath, behaviorsPath, mtmcObjectsPath, and groundTruthPath
Plotting Parameters: gridLayout, blankOutEmptyFrames, vizFilteredFrames, outputFrameHeight, tailLengthMax, smoothingTailLengthThresh, and smoothingTailWindow

Viz Config Details

Parameters
Name	Category	Type	Default	Range	Description
vizMode	setup	str	“mtmc_objects”	[“frames”, “behaviors”, “mtmc_objects”, “ground_truth_bboxes”, “ground_truth_locations”]	The choice of visualization mode, which can be chosen from “frames”, “behaviors”, “mtmc_objects”, “ground_truth_bboxes”, and “ground_truth_locations”.
vizMtmcObjectsMode	setup	str	“grid”	[“grid”, “sequence”, “topview”]	The choice of visualization mode for MTMC objects, which can be chosen from “grid”, “sequence”, and “topview”, used when `vizMode` is “mtmc_objects”.
enableMultiprocessing	setup	bool	False	True or False	The flag to enable multi-processing when plotting the output. It may cause the system to get stuck when the number of parallel processes is too large.
ffmpegRequired	setup	bool	False	True or False	The flag to enable conversion of the output videos from MPEG-4 to H.264 format.
selectedSensorIds	io	list	[]		The selected sensor IDs to be plotted. If empty, all the sensors are plotted.
selectedBehaviorIds	io	list	[]		The selected behavior IDs to be plotted. If empty, all the behaviors are plotted.
selectedGlobalIds	io	list	[]		The selected global IDs to be plotted. If empty, all the MTMC objects are plotted.
outputDirPath	io	str			The directory for saving output videos.
videoDirPath	io	str			The directory of input videos, ignored when `vizMtmcObjectsMode` is “topview”.
mapPath	io	str			The path to the input map image for top-view visualization of MTMC objects, used when `vizMtmcObjectsMode` is “topview”.
framesPath	io	str			The path to the input frames’ data in JSON format.
behaviorsPath	io	str			The path to the input behaviors’ data in JSON format.
mtmcObjectsPath	io	str			The path to the input MTMC objects’ data in JSON format.
groundTruthPath	io	str			The path to the ground truth in MOTChallenge format, used when `vizMode` is “ground_truth_bboxes” or “ground_truth_locations”.
gridLayout	plotting	list	[2, 2]	2 positive integers	The grid layout for visualizing MTMC objects in the grid mode.
blankOutEmptyFrames	plotting	bool	False	True or False	If true, blank out frame images where no object presents, used when `vizMtmcObjectsMode` is “grid”.
vizFilteredFrames	plotting	bool	False	True or False	If true, visualize frames that have been processed by filtering.
outputFrameHeight	plotting	int	-1	> 0 or -1	The frame height for the output videos. If -1, the original frame height is used.
tailLengthMax	plotting	int	200	>= 0	The maximum length (number of frames) for plotting tails, i.e., past trajectories.
smoothingTailLengthThresh	plotting	int	5	>= 0	The threshold for the tail length (number of frames) to apply smoothing. The tails shorter than this length are not smoothed.
smoothingTailWindow	plotting	int	30	>= 0	The window (number of frames) for smoothing the tails.

Viz RTLS Config

Viz Config in JSON

{
  "input": {
    "calibrationPath": "path/to/calibration.json",
    "mapPath": "path/to/map.png",
    "rtlsLogPath": "path/to/mdx-rtls.log",
    "videoDirPath": "path/to/folder/containing/videos",
    "rawDataPath": "path/to/raw_data.log"
  },
  "output": {
    "outputVideoPath": "path/to/output_video.mp4",
    "outputMapHeight": 1080,
    "displaySensorViews": false,
    "sensorViewsLayout": "radial",
    "sensorViewDisplayMode": "rotational",
    "sensorFovDisplayMode": "rotational",
    "skippedBeginningTimeSec": 0.0,
    "outputVideoDurationSec": 60.0,
    "sensorSetup": 8,
    "bufferLengthThreshSec": 3.0,
    "trajectoryLengthThreshSec": 5.0,
    "sensorViewStartTimeSec": 2.0,
    "sensorViewDurationSec": 1.0,
    "sensorViewGapSec": 0.1
  }
}

Categorization of Config Parameters

RTLS visualization configuration parameters are categorized as:

Input Parameters: calibrationPath, mapPath, rtlsLogPath, videoDirPath, and rawDataPath
Output Parameters: outputVideoPath, outputMapHeight, displaySensorViews, sensorViewsLayout, sensorViewDisplayMode, sensorFovDisplayMode, skippedBeginningTimeSec, outputVideoDurationSec, sensorSetup, bufferLengthThreshSec, trajectoryLengthThreshSec, sensorViewStartTimeSec, sensorViewDurationSec, and sensorViewGapSec

Viz Config Details

Parameters
Name	Category	Type	Default	Range	Description
calibrationPath	input	str			The path to the calibration file in JSON format.
mapPath	input	str			The path to the input map image for top-view visualization.
rtlsLogPath	input	str			The path to the RTLS log from the Kafka topic of `mdx-rtls`.
videoDirPath	input	str			The path to the directory of video files.
rawDataPath	input	str			The path to the raw data file (protobuf format by default).
outputVideoPath	output	str			The path to the output video file.
outputMapHeight	output	int	1080	> 0	The height in pixel for scaling the map image in the output video.
displaySensorViews	output	bool	False	True or False	If true, display the sensor views around the top-view visualization.
sensorViewsLayout	output	str	“radial”	[“radial”, “split”]	The sensor views’ layout used when `displaySensorViews` is set to true. The radial layout shows sensor views surrounding the map view at the center. The split layout shows the sensor views on the left and the map view on the right.
sensorViewDisplayMode	output	str	“rotational”	[“rotational”, “cumulative”]	The display mode for the sensor views used when `displaySensorViews` is set to true. The rotational mode highlights sensor views one at a time, and the cumulative model keeps all the previous sensor view highlighted while circling through all sensor views.
sensorFovDisplayMode	output	str	“rotational”	[“rotational”, “cumulative”]	The display mode for FOVs used when `displaySensorViews` is set to true. The rotational mode displays FOVs one at a time in the map view, and the cumulative mode keeps all the previous FOVs displayed while cirling through all sensors.
skippedBeginningTimeSec	output	float	0.0	>= 0	The time in second to skip at the beginning in the output video.
outputVideoDurationSec	output	float	60.0	> 0	The duration of the output video in second.
sensorSetup	output	int	30	[8, 12, 16, 30, 40, 96, 100]	The pre-defined setup according to the number of sensors, used when `displaySensorViews` is set to true.
bufferLengthThreshSec	output	float	3.0	> 0	The buffer length in second for smoothing the locations for visualization.
trajectoryLengthThreshSec	output	float	5.0	> 0	The trajectory length limit in second for for plotting the tails of locations.
sensorViewStartTimeSec	output	float	2.0	> 0	The starting time in second to display the sensor views in rotation, used when `displaySensorViews` is set to true.
sensorViewDurationSec	output	float	1.0	> 0	The duration in second to display the each sensor view in rotation, used when `displaySensorViews` is set to true.
sensorViewGapSec	output	float	0.1	> 0	The gap in second to display the sensor views in rotation, used when `displaySensorViews` is set to true.

Calibration

Use the Calibration tool to generate the JSON. For more details, refer to the Camera Calibration. The calibration JSON structure is shown below:

{
  "version": "1.0",
  "osmURL": "",
  "calibrationType": "cartesian",
  "sensors": [
    {
      "type": "camera",
      "id": "Retail_Synthetic_Cam01",
      "origin": {
        "lng": 0,
        "lat": 0
      },
      "geoLocation": {
        "lng": 0,
        "lat": 0
      },
      "coordinates": {
        "x": 27.752674116114072,
        "y": 29.520047192178833
      },
      "scaleFactor": 29.3610391053046,
      "attributes": [
        {
          "name": "fps",
          "value": "30"
        },
        {
          "name": "depth",
          "value": ""
        },
        {
          "name": "fieldOfView",
          "value": ""
        },
        {
          "name": "direction",
          "value": "80.83911991119385"
        },
        {
          "name": "source",
          "value": "vst"
        },
        {
          "name": "frameWidth",
          "value": "1920"
        },
        {
          "name": "frameHeight",
          "value": "1080"
        },
        {
          "name": "fieldOfViewPolygon",
          "value": "POLYGON((37.30339706547104 23.53302611402356, 32.39147622088676 27.72174026541681, 23.888735595644498 26.37998053209448, 18.37785774763376 17.302857646758667, 15.06215357071955 1.9512522630593612, 38.851128391907295 1.7875684444191782, 37.30339706547104 23.53302611402356))"
        }
      ],
      "place": [
          {
            "name": "building",
            "value": "Retail-Store"
          }
      ],
      "imageCoordinates": [
          {
            "x": 141.6635912555878,
            "y": 269.48316499503875
          },
          {
            "x": 240.98584878571785,
            "y": 265.9477324692906
          },
          {
            "x": 474.86495021238807,
            "y": 256.8443141943412
          },
          {
            "x": 560.3212274031938,
            "y": 253.34065913954282
          },
          {
            "x": 726.9749882013346,
            "y": 246.47581521243487
          },
          {
            "x": 806.7600319715643,
            "y": 243.18033763144211
          },
          {
            "x": 1049.1798543328828,
            "y": 210.84647658915785
          },
          {
            "x": 599.1893869353772,
            "y": 158.40456743892264
          },
          {
            "x": 826.3199319061305,
            "y": 430.18982743871027
          },
          {
            "x": 692.2412605037435,
            "y": 439.5586563490481
          },
          {
            "x": 293.8022700703916,
            "y": 940.2497293827583
          },
          {
            "x": 1402.657869447458,
            "y": 809.4458238430801
          },
          {
            "x": 1110.7821636347703,
            "y": 413.2539924280829
          },
          {
            "x": 929.7185023273412,
            "y": 246.31588848277949
          },
          {
            "x": 1014.7021052459154,
            "y": 286.1249896426217
          },
          {
            "x": 590.1431423343173,
            "y": 334.4703629959116
          },
          {
            "x": 415.03589063341894,
            "y": 204.6531095428365
          },
          {
            "x": 902.927686501944,
            "y": 159.48107191410975
          },
          {
            "x": 532.7209218694804,
            "y": 828.0865212903391
          },
          {
            "x": 1115.481527217435,
            "y": 318.46536548174294
          },
          {
            "x": 1475.2066751140385,
            "y": 275.30376809549483
          }
      ],
      "globalCoordinates": [
          {
            "x": 16.979024629976095,
            "y": 18.983219038249672
          },
          {
            "x": 18.482709318763998,
            "y": 18.968406614904072
          },
          {
            "x": 22.166473566193257,
            "y": 18.95933943696289
          },
          {
            "x": 23.56924098641491,
            "y": 18.95869975313936
          },
          {
            "x": 26.396910248181758,
            "y": 18.981927421337907
          },
          {
            "x": 27.799359187521056,
            "y": 18.99127020773497
          },
          {
            "x": 32.715754693039486,
            "y": 20.88230250608199
          },
          {
            "x": 22.729803639723098,
            "y": 28.773798981032822
          },
          {
            "x": 27.812523744506358,
            "y": 10.739385291736074
          },
          {
            "x": 26.40710095590235,
            "y": 10.745029820964021
          },
          {
            "x": 24.85870713145265,
            "y": 4.792837852944807
          },
          {
            "x": 30.96384273807492,
            "y": 4.728927314341821
          },
          {
            "x": 30.964239137891095,
            "y": 10.71346720864782
          },
          {
            "x": 29.94779515259621,
            "y": 18.42203428733013
          },
          {
            "x": 30.927376677045483,
            "y": 15.754878369497227
          },
          {
            "x": 24.785512670974654,
            "y": 14.430034306444753
          },
          {
            "x": 19.81976862078707,
            "y": 23.58767170645169
          },
          {
            "x": 30.437280874103934,
            "y": 27.3475721093038
          },
          {
            "x": 25.900291031269614,
            "y": 5.398659264709973
          },
          {
            "x": 32.00392031859042,
            "y": 13.957021804422336
          },
          {
            "x": 38.26375971510251,
            "y": 15.039096669496686
          }
      ],
      "tripwires": [],
      "rois": []
    }
  ]
}

A calibration comprises of arrays of sensors, where each sensor record consists of multiple attributes, the ones which are used by the pipeline are:

type : type of the sensor. For example, camera.
id : unique ID of the sensor.
origin comprises of:
- origin.lng : Locations need to be in Cartesian coordinates often. A small area like a city can be considered planar and all locations of the city can be measured using Cartesian coordinates. A random or specific location of the city can be used as the origin. origin.lng represents the longitude of the origin.
- origin.lat represents the latitude of the origin.
geoLocation : the geo-location of the sensor, consisting of [lng,lat].
coordinates : the location of the sensor in the Cartesian coordinates, consisting of [x,y].
translationToGlobalCoordinates : the translation vector to convert the locations to global coordinates for plotting on the map, consisting of [x,y].
scaleFactor : the scale factor of global coordinates from the unit of interest, e.g., meter, to pixel unit on the map image.
attributes : an array of name-value pairs consisting of:
- fps : the video frame rate.
- depth : the depth of the sensor.
- fieldOfView : the field of view (FOV) of the sensor.
- direction : the direction of the sensor.
- source : the video source, e.g., VST.
- frameWidth : the frame width of input video.
- frameHeight : the frame height of input video.
- fieldOfViewPolygon : the FOV polygon in WKT format.
place : an array of name-value pairs to represent a place, e.g., city=santa-clara/building=bldg_K/room=G.
imageCoordinates : the image coordinate locations to be mapped to globalCoordinates by the calibration tool. The mapping is used to generate the homography matrix.
globalCoordinates : see imageCoordinates mentioned above.
intrinsicMatrix : the 3-by-3 matrix of intrinsic camera parameters, such as focal lengths, focal center and scale factor.
extrinsicMatrix : the 3-by-3 matrix of extrinsic camera parameters for translation and rotation to convert from camera coordinates to 3D world coordinates.
cameraMatrix : the 3-by-4 camera matrix to convert the coordinates in the 3D world to the pixel locations in the sensor view.
homography : the 3-by-3 camera matrix to convert the coordinates on the 3D ground plane to the pixel locations in the sensor view.
tripwires : list of tripwires formed by arrays of points, which are usually drawn at the entrances of the doors to count the number of people going in or getting out.
rois : list of regions of interest formed by arrays of points.