Configuration

For more context on configuration, when the microservice is used:

  • In the Multi-Target Multi-Camera Tracking (MTMC) app, please refer to its Operation Parameters section.

  • In the Real Time Location System (RTLS) app, please refer to its Operation Parameters section.

  • As a standalone microservice, refer to the README.md in its respective directory within metropolis-apps-standalone-deployment/modules/.

App Config

App Config in JSON

{
  "io": {
    "enableDebug": false,
    "inMtmcPlusBatchMode": false,
    "batchId": "1",
    "selectedSensorIds": [],
    "outputDirPath": "results",
    "videoDirPath": "metropolis-apps-data/videos/mtmc-app",
    "jsonDataPath": "metropolis-apps-data/playback/mtmc_buildingK_playback.json",
    "protobufDataPath": "",
    "groundTruthPath": "",
    "groundTruthFrameIdOffset": 1,
    "useFullBodyGroundTruth": false,
    "use3dEvaluation": false,
    "plotEvaluationGraphs": false
  },
  "preprocessing": {
    "filterByRegionsOfInterest": false,
    "timestampThreshMin": 120,
    "locationBboxBottomGapThresh": 0.02,
    "locationConfidenceThresh": 0.5,
    "locationBboxAreaThresh": 0.0008,
    "locationBboxAspectRatioThresh": 0.6,
    "embeddingBboxBottomGapThresh": 0.02,
    "embeddingConfidenceThresh": 0.5,
    "embeddingBboxAreaThresh": 0.0008,
    "embeddingBboxAspectRatioThresh": 0.6,
    "embeddingVisibilityThresh": 0.5,
    "behaviorConfidenceThresh": 0.45,
    "behaviorBboxAreaThresh": 0.0007,
    "behaviorBboxAspectRatioThresh": 0.75,
    "behaviorLengthThreshSec": 0.0,
    "shortBehaviorFinishThreshSec": 1.0,
    "behaviorNumLocationsMax": 9000,
    "behaviorSplitThreshSec": 6,
    "behaviorRetentionInStateSec": 600.0,
    "mtmcPlusRetentionInStateSec": 10.0,
    "mtmcPlusInitBufferLenSec": 10.0,
    "mtmcPlusReinitRatioAssignedBehaviors": 0.75,
    "mtmcPlusReinitDiffRatioClusters": null
  },
  "localization": {
    "rectifyBboxByCalibration": false,
    "peopleHeightMaxLengthSec": 600,
    "peopleHeightNumSamplesMax": 1000,
    "peopleHeightNumBatchFrames": 10000,
    "peopleHeightEstimationRatio": 0.7,
    "peopleHeightVisibilityThresh": 0.8,
    "overwrittenPeopleHeightMeter": null
  },
  "clustering": {
    "clusteringAlgo": "HDBSCAN",
    "overwrittenNumClusters": null,
    "agglomerativeClusteringDistThresh": 3.5,
    "hdbscanMinClusterSize": 5,
    "numReassignmentIterations": 4,
    "reassignmentDistLooseThresh": 1.0,
    "reassignmentDistTightThresh": 0.12,
    "spatioTemporalDistLambda": 0.15,
    "spatioTemporalDistType": "Hausdorff",
    "spatioTemporalDirMagnitudeThresh": 0.5,
    "enableOnlineSpatioTemporalConstraint": true,
    "onlineSpatioTemporalDistThresh": 15.0,
    "suppressOverlappingBehaviors": false,
    "meanEmbeddingsUpdateRate": 0.1,
    "skipAssignedBehaviors": true,
    "enableOnlineDynamicUpdate": false,
    "dynamicUpdateAppearanceDistThresh": 0.2,
    "dynamicUpdateSpatioTemporalDistThresh": 10.0,
    "dynamicUpdateLengthThreshSec": 9.0
  },
  "streaming": {
    "kafkaBootstrapServers": "mdx-kafka-cluster-kafka-brokers:9092",
    "kafkaProducerLingerMs": 0,
    "kafkaMicroBatchIntervalSec": 60.0,
    "kafkaRawConsumerPollTimeoutMs": 10000,
    "kafkaNotificationConsumerPollTimeoutMs": 100,
    "kafkaConsumerMaxRecordsPerPoll": 100000,
    "sendEmptyMtmcPlusMessages": true,
    "mtmcPlusFrameBatchSizeMs": 180,
    "mtmcPlusBehaviorBatchesConsumed": 4,
    "mtmcPlusFrameBufferResetSec": 4.0,
    "mtmcPlusTimestampDelayMs": 100,
    "mtmcPlusLocationWindowSec": 1.0,
    "mtmcPlusSmoothingWindowSec": 1.0,
    "mtmcPlusNumProcessesMax": 8
  }
}

Instructions for Fine-Tuning App Config

Key areas for parameter fine-tuning:

  1. Behavior Pre-processing: Adjust data quality and behavior retention for streaming.

  2. Localization: Enhance tracking by addressing occlusions and estimating person height.

  3. Clustering: Configure clustering algorithms and manage overlapping behaviors.

  4. Streaming (Kafka): Control the duration of micro-batches for streaming.

Note

Pre-processing, localization, and clustering parameters can be updated in real-time via API. For more information, see here.

1. Behavior Pre-processing (Filtering)

Fine-tuning the parameters for behavior pre-processing, particularly the filtering process, involves adjusting various thresholds based on location, embeddings, and behavior. It’s important to remember the following:

  • Thresholds for the size or area of bounding boxes should be considered relative to the overall frame size or area.

  • The filterByRegionsOfInterest option allows for filtering based on predefined regions of interest established during the calibration phase.

Key groups of parameters:

  • Location-based Thresholds: These are crucial for filtering ground plane trajectories, which are used to calculate spatio-temporal distances. Relying too much on omitting locations can cause the algorithm to depend more heavily on appearance features.

  • Embedding-based Thresholds: These help to filter out feature embeddings that represent object appearances.

  • Behavior-based Thresholds: These thresholds have a direct impact on behavior analysis. Increasing these thresholds may reduce outliers during the clustering process.

The behaviorRetentionInStateSec parameter indicates how long (in seconds) a behavior is maintained in the system’s state. If a behavior ends before this timeframe in the current micro batch, it is removed from the state. A longer retention time means more historical data is kept, potentially increasing accuracy but requiring more memory and processing power for clustering. For the Multi-Camera Fusion - MTMC microservice, it’s best to limit retention time to the shorter of two figures: either the maximum predicted time an object could disappear from all cameras before reappearing, or the maximum time an object is expected to be tracked across multiple cameras in a semi-online mode. For the Multi-Camera Fusion - RTLS microservice, a shorter retention time is recommended to facilitate real-time processing. For more information on prolonged durations, refer to Query-by-Example.

Similarly, the mtmcPlusRetentionInStateSec parameter defines how long (in seconds) an MTMC plus object is retained in the system. A longer retention period allows the online tracking algorithm to utilize more spatio-temporal information for matching behaviors to improve accuracy. However, keeping this value low is advised for real-time processing efficiency.

For effective online tracking in RTLS, it’s necessary to initialize the MTMC plus state early on to enable Hungarian matching in subsequent batches. Behaviors are accumulated until the mtmcPlusInitBufferLenSec threshold is reached. If the initial object locations are unsatisfactory, increasing this buffer length may improve tracking accuracy.

In the RTLS microservice, the MTMC plus state can be re-initialized to adapt to dynamic changes in the following circumstances.

  • The ratio of matched behaviors in the previous batch falls below a specified threshold.

  • The number of clusters deviates significantly from a pre-defined number, being either too large or too small.

To minimize the frequency of MTMC plus state re-initializations, it is advisable to adjust the parameters mtmcPlusReinitRatioAssignedBehaviors and mtmcPlusReinitDiffRatioClusters. Specifically, reducing the former and/or increasing the latter can be effective, provided that overwrittenNumClusters is set accordingly. It is important to note that re-initialization involves running clustering and iterative Hungarian re-assignment processes, which may momentarily interrupt online tracking.

2. Localization

For addressing occlusions, rectifyBboxByCalibration can be enabled. By default, the system uses the “foot position” (center point of the lower bounding box edge) to determine an individual’s location in a 3D environment. In cases of occlusion where only the upper body is visible, enabling calibration-based rectification is useful.

Upon enabling rectifyBboxByCalibration, the system: - In the camera view, computes “head position” (x_head, y_head) at the top bounding box edge’s center, and “foot position” (x_foot, y_foot) at the bottom edge’s center. - Projects the “head position” to the Z=people_height plane in the 3D world, determining the “foot position” in 3D as (X, Y, 0). - Projects this “foot position” back to the camera view (x_foot_estimated, y_foot_estimated). - Computes visibility as min (1, (y_foot-y_head)/(y_foot_estimated-y_head)). - If visibility falls below peopleHeightVisibilityThresh, the bounding box is adjusted.

The system can estimate the average height of people by either collecting data at the start or by using a pre-defined height if overwrittenPeopleHeightMeter is set. Related parameters include:

  • rectifyBboxByCalibration: To activate the calibration-based rectification.

  • peopleHeightMaxLengthSec: Max duration for initial data collection to estimate height in streaming mode.

  • peopleHeightNumSamplesMax: Max number of bounding boxes for initial height estimation in streaming mode.

  • peopleHeightNumBatchFrames: Max frames for initial data collection in batch mode.

  • peopleHeightEstimationRatio: Portion of data used for height estimation.

  • peopleHeightVisibilityThresh: Visibility threshold for bounding box adjustments.

  • overwrittenPeopleHeightMeter: Manually set height value to bypass system estimation.

3. Clustering

Our system offers support for two main clustering algorithms through the clusteringAlgo parameter. Depending on your selection:

  • HDBSCAN: Use hdbscanMinClusterSize to fine-tune accuracy. Generally, a larger minimum cluster size results in fewer, but potentially more meaningful, output clusters.

  • AgglomerativeClustering: Adjust agglomerativeClusteringDistThresh to achieve the best clustering outcomes. A higher distance threshold tends to yield a smaller number of output clusters.

For effective parameter tuning, start with a representative micro batch to assess the total count of global IDs. Adjust these parameters until you reach the desired cluster count. If the number of resulting clusters is less than the maximum number of co-existing behaviors, the system will automatically make corrections. The overwrittenNumClusters parameter allows for the direct setting of the cluster count for the agglomerative clustering algorithm.

Depending on the robustness of the re-identification features, adjust the reassignmentDistLooseThresh, reassignmentDistTightThresh accordingly to suppress ID switches. The spatioTemporalDistLambda, spatioTemporalDirMagnitudeThresh and spatioTemporalDistType are used to control the factor of spatio-temporal distance when combining with apperance distance for Hungarian matching.

In RTLS microservice, to maintain continuous and smooth object locations, set enableOnlineSpatioTemporalConstraint to true. This ensures that only behaviors within a certain distance (onlineSpatioTemporalDistThresh) can be matched to an MTMC plus object in the state. The meanEmbeddingsUpdateRate controls how frequently the appearance embeddings of each MTMC plus object are updated upon matching with new behaviors. To keep online matching swift and support real-time processing, you can opt to skip behaviors already assigned in the current batch by enabling skipAssignedBehaviors.

Furthermore, enableOnlineDynamicUpdate in RTLS microservice allows the system to handle dynamic change of MTMC plus state objects and adapt to objects entering or exiting the scene during online tracking. Shadow MTMC plus state objects are created for unmatched behaviors. These shadow objects will be merged if their appearance distance and spatio-temporal distance fall within dynamicUpdateAppearanceDistThresh and dynamicUpdateSpatioTemporalDistThresh, respectively. These shadow objects become normal ones if their accumulated length exceeds dynamicUpdateLengthThreshSec.

Additional parameters include:

  • numReassignmentIterations: Specifies the number of iterations for re-assigning co-existing behaviors using the Hungarian algorithm. More iterations can improve accuracy but may increase computation time.

  • reassignmentDistThresh: Sets the distance threshold for re-assigning a behavior to a cluster, using Hungarian matching, with values ranging from 0.0 to 1.0.

  • spatioTemporalDistLambda: Balances how normalized spatio-temporal distances are combined with appearance-based distances for re-assigning co-existing behaviors.

  • spatioTemporalDistType: Offers two types of distance calculations: “Hausdorff” and “pairwise”, with the latter being more computationally efficient.

  • suppressOverlappingBehaviors: Controls the suppression of overlapping behaviors via linear programming. Disabling this feature may increase the algorithm’s adaptability and accuracy.

4. Streaming (Kafka)

The pivotal parameter for streaming in Kafka is kafkaMicroBatchIntervalSec. This defines the duration of each micro-batch. Once a micro batch’s raw data is received, it’s pre-processed into behaviors which are then merged with those already in the state. The live behaviors from the current state are then used for clustering, leading to MTMC object creation.

Considerations: - Micro-batch intervals: Shorter intervals guarantee faster and regular UI updates, but can also lead to fragmented global IDs. - Processing times: If intervals are too brief, resulting in processing durations surpassing these intervals, outputs might lag. - Computation cost: Clustering happens for every micro-batch. Hence, multiple smaller batches could be more resource-intensive than a longer one.

In the RTLS microservice, the trajectories of each MTMC plus object can be smoothed by adjusting two configuration parameters: mtmcPlusLocationWindowSec and mtmcPlusSmoothingWindowSec. The mtmcPlusLocationWindowSec parameter (default: 1.0 second) aggregates individual locations from all sensors to calculate the current global location, while the mtmcPlusSmoothingWindowSec parameter (default: 1.0 second) aggregates these global locations over a temporal window to calculate an average, smoothing the trajectories. Increasing these parameters will introduce a delay in the actual locations received at the Kafka consumer. The delay introduced is (mtmcPlusLocationWindowSec + mtmcPlusSmoothingWindowSec) / 2 seconds, resulting in a default delay of 1 second. These parameters should be minimized to reduce the delay introduced for RTLS display. To address specific issues in difficult scenarios, increase mtmcPlusLocationWindowSec to reduce “ghosting dots`` (flashing locations) and increase mtmcPlusSmoothingWindowSec to reduce jittering trajectories.

It is also recommended to use larger mtmcPlusNumProcessesMax depending on the available CPU cores.

Note

Out of the above 4 mtmc config categories, the following 3 category’s configs can be dynamically updated during runtime: preprocessing, localization and clustering. These configs can be updated by using the /config/update/:docType analytics API endpoint (docType will have the value mdx-mtmc-analytics). For more details check the open-api spec.

App Config Details

Parameters

Name

Category

Type

Default

Range

Description

enableDebug

io

bool

False

True or False

If true, save intermediate results, i.e., frames, behaviors, and MTMC objects in JSON format that can be used for visualization, during MTMC tracking. In RTLS, this flag can add frameId in the Kafka messages that is required for evaluation and visualization.

inMtmcPlusBatchMode

io

bool

False

True or False

If true, use the maximum timestamp in each batch as the current timestamp, only for RTLS batch processing. Set this parameter to false if the current timestamp is available and accurate.

batchId

io

str

“1”

The pre-defined batch ID for MTMC batch processing.

selectedSensorIds

io

list

[]

The selected sensor IDs to be processed. If empty, all the sensors are processed.

outputDirPath

io

str

The output directory for saving files.

videoDirPath

io

str

The directory of input videos.

jsonDataPath

io

str

The input raw data file in JSON format.

protobufDataPath

io

str

The input raw data file in protobuf format.

groundTruthPath

io

str

The input ground truth file in the format of MOTChallenge. If not found, the evaluation is not conducted.

groundTruthFrameIdOffset

io

int

1

The offset of frame IDs in the ground truth in comparison with the raw data.

useFullBodyGroundTruth

io

bool

False

True or False

If true, use full-body bounding boxes recovered from estimated foot points for evaluation.

use3dEvaluation

io

bool

False

True or False

If true, use projected foot points on the ground plane in 3D for evaluation.

plotEvaluationGraphs

io

bool

False

True or False

If true, plot the evaluation graphs in the output directory.

filterByRegionsOfInterest

preprocessing

bool

False

True or False

If true, filter the behaviors and corresponding embeddings and locations by the regions of interest in calibration.

timestampThreshMin

preprocessing

Optional[float]

None

>= 0 or None

The timestamp threshold in minute, which is used to filter away old frames in raw data. It is disabled when the value is None.

locationBboxBottomGapThresh

preprocessing

float

0.02

>= 0 and <= 1

The threshold for filtering locations based on the gap between the bounding box’s bottom and the bottom of the frame image. It is a ratio against the frame height. Locations whose corresponding bounding boxes’ bottom gaps are smaller than this threshold are filtered away.

locationConfidenceThresh

preprocessing

float

0.5

>= 0 and <= 1

The detection confidence threshold for filtering locations. Locations whose corresponding detection confidences are smaller than this threshold are filtered away.

locationBboxAreaThresh

preprocessing

float

0.0008

>= 0 and <= 1

The bounding box area threshold for filtering locations. It is a ratio against the frame area. Locations whose corresponding bounding boxes’ areas are smaller than this threshold are filtered away.

locationBboxAspectRatioThresh

preprocessing

float

0.6

>= 0

The bounding box aspect ratio threshold for filtering locations. Locations whose corresponding bounding boxes’ aspect ratios are larger than this threshold are filtered away.

embeddingBboxBottomGapThresh

preprocessing

float

0.02

>= 0 and <= 1

The threshold for filtering embeddings based on the gap between the bounding box’s bottom and the bottom of the frame image. It is a ratio against the frame height. Embeddings whose corresponding bounding boxes’ bottom gaps are smaller than this threshold are filtered away.

embeddingConfidenceThresh

preprocessing

float

0.5

>= 0 and <= 1

The detection confidence threshold for filtering embeddings. Embeddings whose corresponding detection confidences are smaller than this threshold are filtered away.

embeddingBboxAreaThresh

preprocessing

float

0.0008

>= 0 and <= 1

The bounding box area threshold for filtering embeddings. It is a ratio against the frame area. Embeddings whose corresponding bounding boxes’ areas are smaller than this threshold are filtered away.

embeddingBboxAspectRatioThresh

preprocessing

float

0.6

>= 0

The bounding box aspect ratio threshold for filtering embeddings. Embeddings whose corresponding bounding boxes’ aspect ratios are larger than this threshold are filtered away.

embeddingVisibilityThresh

preprocessing

float

0.5

>= 0 and <= 1

The bounding box visibility threshold for filtering embeddings. Embeddings whose corresponding bounding boxes’ visibilities are smaller than this threshold are filtered away.

behaviorConfidenceThresh

preprocessing

float

0.45

>= 0 and <= 1

The detection confidence threshold for filtering behaviors. Behaviors whose corresponding mean of detection confidences are smaller than this threshold are filtered away.

behaviorBboxAreaThresh

preprocessing

float

0.0007

>= 0 and <= 1

The bounding box area threshold for filtering behaviors. It is a ratio against the frame area. Behaviors whose corresponding mean of bounding boxes’ areas are smaller than this threshold are filtered away.

behaviorBboxAspectRatioThresh

preprocessing

float

0.75

>= 0

The bounding box aspect ratio threshold for filtering behaviors. Behaviors whose corresponding mean of bounding boxes’ aspect ratios are larger than this threshold are filtered away.

behaviorLengthThreshSec

preprocessing

float

0.0

>= 0

The behavior length threshold in second for filtering behaviors. Behaviors whose corresponding lengths are smaller than this threshold are filtered away. The shortBehaviorFinishThreshSec needs to be adjusted accordingly to enable filtering by behavior length.

shortBehaviorFinishThreshSec

preprocessing

Optional[float]

None

>= 0 or None

The threshold in minute for filtering away short behaviors (under behaviorLengthThreshSec) that have not finished. It is disabled when the value is None.

behaviorNumLocationsMax

preprocessing

int

9000

>= 0

The maximum number of locations for a behavior. If the number of locations is above this threshold, the locations are sampled.

behaviorSplitThreshSec

preprocessing

int

6

>= 0

The threshold in second to split a behavior if the gap between timestamps is above this value.

behaviorRetentionInStateSec

preprocessing

float

600.0

>= 0

The retention time limit in second for the behavior records in state, ignored in MTMC batch processing.

mtmcPlusRetentionInStateSec

preprocessing

float

10.0

>= 0

The retention time limit in second for the MTMC plus records in state, ignored in MTMC microservice.

mtmcPlusInitBufferLenSec

preprocessing

float

10.0

>= 0

The length of the buffer in second for initializing MTMC plus state at RTLS microservice.

mtmcPlusReinitRatioAssignedBehaviors

preprocessing

float

0.75

>= 0 and <= 1

The minimum ratio of assigned behaviors to trigger re-initialization of MTMC plus state, ignored in MTMC microservice.

mtmcPlusReinitDiffRatioClusters

preprocessing

Optional[float]

None

>= 0 and <= 1 or None

The maximum ratio of difference in the number of clusters compared to overwrittenNumClusters to trigger re-initialization of MTMC plus state, ignored in MTMC microservice.

rectifyBboxByCalibration

localization

bool

False

True or False

The flag to enable the calibration-based rectification of bounding boxes by estimating people’s height.

peopleHeightMaxLengthSec

localization

int

600

> 0

The max time duration in second for collecting data to estimate people’s height at start. The estimation of people’s height is conducted when either the condition of peopleHeightMaxLengthSec or peopleHeightNumSamplesMax is met.

peopleHeightNumSamplesMax

localization

int

1000

> 0

The max number of bounding boxes for collecting data to estimate people’s height at start. The estimation of people’s height is conducted when either the condition of peopleHeightMaxLengthSec or peopleHeightNumSamplesMax is met.

peopleHeightNumBatchFrames

localization

int

10000

> 0

The max number of frames for collecting data to estimate people’s height at start, ignored in stream processing.

peopleHeightEstimationRatio

localization

float

0.7

> 0 and <= 1

The potion of collected data to be used for people height estimation. The smaller people’s heights are likely to be from occluded instances, and thus this parameter is used to filter them away.

peopleHeightVisibilityThresh

localization

float

0.8

>= 0 and <= 1

The bounding box visibility threshold for rectifying bounding boxes. Bounding boxes whose corresponding visibilities are smaller than this threshold are rectified.

overwrittenPeopleHeightMeter

localization

Optional[float]

1.8

> 0 or None

The people’s height in meter for overwriting the estimated people’s height.

clusteringAlgo

clustering

str

“HDBSCAN”

[“HDBSCAN”, “AgglomerativeClustering”]

The choice of clustering algorithm, which can be chosen from “HDBSCAN” and “AgglomerativeClustering”.

overwrittenNumClusters

clustering

Optional[int]

None

> 0 or None

The number of clusters for overwriting the clustering results of agglomerative clustering, used when clusteringAlgo is “AgglomerativeClustering”. It can be used when the number of objects is fixed and known.

agglomerativeClusteringDistThresh

clustering

float

3.5

> 0

The distance threshold for agglomerative clustering, used when clusteringAlgo is “AgglomerativeClustering”. It can be tuned by estimating the number of objects in a sampled batch. A higher threshold results in smaller number of clusters.

hdbscanMinClusterSize

clustering

int

5

>= 2

The minimum size of clusters for HDBSCAN, i.e., the minimum number of behaviors that should be included in each cluster, used when clusteringAlgo is “HDBSCAN”. It can be tuned by estimating the number of objects in a sampled batch. A higher value results in smaller number of clusters.

numReassignmentIterations

clustering

int

4

>= 0

The number of iterations for re-assignment of co-existing behaviors based on the Hungarian algorithm. More iterations usually result in better accuracy, but it requires more computation time.

reassignmentDistLooseThresh

clustering

float

1.0

>= 0 and <= 1

The distance threshold (combination of appearance distance and spatio-temporal distance) for re-assigning behaviors to clusters during Hungarian matching. Only when a distance is smaller than this threshold, an assignment can be made. This threshold is used for both MTMC and RTLS modes.

reassignmentDistTightThresh

clustering

float

0.12

>= 0 and <= 1

When a distance (combination of appearance distance and spatio-temporal distance) during re-assignment is smaller than this threshold, force the behavior to be assigned to the corresponding cluster, which can be used to correct ID switches in single-camera matching. This threshold is used in RTLS microservice only.

spatioTemporalDistLambda

clustering

float

0.1

>= 0 and <= 1

The lambda of (normalized) spatio-temporal distance to integrate with appearance-based distance for the re-assignment of co-existing behaviors. A larger value indicates that the spatio-temporal distance is given more weight.

spatioTemporalDirMagnitudeThresh

clustering

float

0.5

>= 0

The spatio-temporal distance is enhanced by a direction influence. The direction influence is applied when the magnitude of direction vector is larger than this threshold. The unit is meter, or the corresponding unit used in calibration.

spatioTemporalDistType

clustering

str

“Hausdorff”

[“Hausdorff”, “pairwise”]

The type of spatio-temporal distance, which can be chosen from “Hausdorff” and “pairwise”.

enableOnlineSpatioTemporalConstraint

clustering

bool

False

True or False

The flag to enable spatio-temporal constraint to yield continuous and smooth locations, used in RTLS microservice.

onlineSpatioTemporalDistThresh

clustering

Optional[float]

None

> 0 or None

The hard spatio-temporal distance threshold for limiting the assignment of behaviors when enableOnlineSpatioTemporalConstraint is true. A higher value will limit the number of matches, and thus the number of MTMC plus objects in the state will reduce more rapidly.

suppressOverlappingBehaviors

clustering

bool

False

True or False

The flag to enable suppression of overlapping behaviors based on linear programming. Although overlapping behaviors are due to clustering failures, disabling this feature usually gives more flexibility to the algorithm and yields higher accuracy.

meanEmbeddingsUpdateRate

clustering

float

0.1

>= 0 and <= 1

The ratio of mean embeddings for each MTMC plus object in the state to be updated upon matching with new behaviors, used in RTLS microservice. A higher value will make the appearance more adaptive to changes in the scene.

skipAssignedBehaviors

clustering

bool

True

True or False

The flag to enable skipping assigned behaviors in the current batch to support real-time processing in RTLS microservice.

enableOnlineDynamicUpdate

clustering

bool

True

True or False

The flag to enable dynamic update of MTMC plus objects in the state during online tracking in RTLS microservice. This is usually used to handle entering and exiting objects in the scene.

dynamicUpdateAppearanceDistThresh

clustering

float

0.2

>= 0 and <= 1

The appearance distance threshold for merging temporary MTMC plus objects in the state when enableOnlineDynamicUpdate is true.

dynamicUpdateSpatioTemporalDistThresh

clustering

float

10.0

> 0

The spatio-temporal distance threshold for merging temporary MTMC plus objects in the state when enableOnlineDynamicUpdate is true.

dynamicUpdateLengthThreshSec

clustering

float

9.0

> 0

The length threshold in second for converting temporary MTMC plus objects in the state to permanent ones when enableOnlineDynamicUpdate is true.

kafkaBootstrapServers

streaming

str

“localhost:9092”

A comma-separated list of host-port pairs that are the addresses of the Kafka brokers in a “bootstrap” Kafka cluster that a Kafka client connects initially to bootstrap itself, ignored in MTMC batch processing.

kafkaProducerLingerMs

streaming

int

0

>= 0

The time in millisecond to wait before sending messages out to Kafka, ignored in MTMC batch processing.

kafkaMicroBatchIntervalSec

streaming

float

60.0

>= 0

The time interval in second for each micro batch, ignored in MTMC batch processing. The filter of time duration in the web UI needs to be larger than this value to have events displayed.

kafkaRawConsumerPollTimeoutMs

streaming

int

10000

>= 0

The timeout in millisecond to poll mdx-raw messages, ignored in MTMC batch processing.

kafkaNotificationConsumerPollTimeoutMs

streaming

int

100

>= 0

The timeout in millisecond to poll mdx-notification messages, ignored in MTMC batch processing.

kafkaConsumerMaxRecordsPerPoll

streaming

int

100000

>= 0

The maximum records per poll, ignored in MTMC batch processing.

sendEmptyMtmcPlusMessages

clustering

bool

True

True or False

The flag to allow empty mdx-rtls messages to be sent in RTLS microservice.

mtmcPlusFrameBatchSizeMs

streaming

int

180

>= 0

The frame batch size in millisecond, used in RTLS microservice.

mtmcPlusBehaviorBatchesConsumed

streaming

int

4

>= 1

The behavior batches consumed in RTLS microservice.

mtmcPlusFrameBufferResetSec

streaming

float

4.0

>= 0

The time in second for resetting the frame buffer, used in RTLS microservice.

mtmcPlusTimestampDelayMs

streaming

int

100

>= 0

The time in millisecond to delay the timestamps for synchronizing behaviors from multiple processed, used in RTLS microservice.

mtmcPlusLocationWindowSec

streaming

float

1.0

>= 0

The time window in second to aggregate the matched behaviors’ locations and compute the location of each MTMC plus object, used in RTLS microservice.

mtmcPlusSmoothingWindowSec

streaming

float

1.0

>= 0

The time window in second to smoothen the locations of each MTMC plus object, used in RTLS microservice.

mtmcPlusNumProcessesMax

streaming

int

8

> 0

The max number of processes to run behavior pre-processing in RTLS microservice. This config’s value is determined by the number of cores that are available in the system and the number of partitions assigned to Kafka topic mdx-raw.

Viz MTMC Config

Viz Config in JSON

{
  "setup": {
    "vizMode": "mtmc_objects",
    "vizMtmcObjectsMode": "grid",
    "enableMultiprocessing": false,
    "ffmpegRequired": false
  },
  "io": {
    "selectedSensorIds": [],
    "selectedBehaviorIds": [],
    "selectedGlobalIds": [],
    "outputDirPath": "results",
    "videoDirPath": "metropolis-apps-data/videos/mtmc-app",
    "mapPath": "images/building=Nvidia-Bldg-K-Map.png",
    "framesPath": "results/frames.json",
    "behaviorsPath": "results/behaviors.json",
    "mtmcObjectsPath": "results/mtmc_objects.json",
    "groundTruthPath": ""
  },
  "plotting": {
    "gridLayout": [2, 2],
    "blankOutEmptyFrames": false,
    "vizFilteredFrames": true,
    "outputFrameHeight": 1080,
    "tailLengthMax": 200,
    "smoothingTailLengthThresh": 5,
    "smoothingTailWindow": 30
  }
}

Categorization of Config Parameters

MTMC visualization configuration parameters are categorized as:

  1. Setup Parameters: vizMode, vizMtmcObjectsMode, enableMultiprocessing, and ffmpegRequired

  2. Input/Output Parameters: selectedSensorIds, selectedBehaviorIds, selectedGlobalIds, outputDirPath, videoDirPath, mapPath, framesPath, behaviorsPath, mtmcObjectsPath, and groundTruthPath

  3. Plotting Parameters: gridLayout, blankOutEmptyFrames, vizFilteredFrames, outputFrameHeight, tailLengthMax, smoothingTailLengthThresh, and smoothingTailWindow

Viz Config Details

Parameters

Name

Category

Type

Default

Range

Description

vizMode

setup

str

“mtmc_objects”

[“frames”, “behaviors”, “mtmc_objects”, “ground_truth_bboxes”, “ground_truth_locations”]

The choice of visualization mode, which can be chosen from “frames”, “behaviors”, “mtmc_objects”, “ground_truth_bboxes”, and “ground_truth_locations”.

vizMtmcObjectsMode

setup

str

“grid”

[“grid”, “sequence”, “topview”]

The choice of visualization mode for MTMC objects, which can be chosen from “grid”, “sequence”, and “topview”, used when vizMode is “mtmc_objects”.

enableMultiprocessing

setup

bool

False

True or False

The flag to enable multi-processing when plotting the output. It may cause the system to get stuck when the number of parallel processes is too large.

ffmpegRequired

setup

bool

False

True or False

The flag to enable conversion of the output videos from MPEG-4 to H.264 format.

selectedSensorIds

io

list

[]

The selected sensor IDs to be plotted. If empty, all the sensors are plotted.

selectedBehaviorIds

io

list

[]

The selected behavior IDs to be plotted. If empty, all the behaviors are plotted.

selectedGlobalIds

io

list

[]

The selected global IDs to be plotted. If empty, all the MTMC objects are plotted.

outputDirPath

io

str

The directory for saving output videos.

videoDirPath

io

str

The directory of input videos, ignored when vizMtmcObjectsMode is “topview”.

mapPath

io

str

The path to the input map image for top-view visualization of MTMC objects, used when vizMtmcObjectsMode is “topview”.

framesPath

io

str

The path to the input frames’ data in JSON format.

behaviorsPath

io

str

The path to the input behaviors’ data in JSON format.

mtmcObjectsPath

io

str

The path to the input MTMC objects’ data in JSON format.

groundTruthPath

io

str

The path to the ground truth in MOTChallenge format, used when vizMode is “ground_truth_bboxes” or “ground_truth_locations”.

gridLayout

plotting

list

[2, 2]

2 positive integers

The grid layout for visualizing MTMC objects in the grid mode.

blankOutEmptyFrames

plotting

bool

False

True or False

If true, blank out frame images where no object presents, used when vizMtmcObjectsMode is “grid”.

vizFilteredFrames

plotting

bool

False

True or False

If true, visualize frames that have been processed by filtering.

outputFrameHeight

plotting

int

-1

> 0 or -1

The frame height for the output videos. If -1, the original frame height is used.

tailLengthMax

plotting

int

200

>= 0

The maximum length (number of frames) for plotting tails, i.e., past trajectories.

smoothingTailLengthThresh

plotting

int

5

>= 0

The threshold for the tail length (number of frames) to apply smoothing. The tails shorter than this length are not smoothed.

smoothingTailWindow

plotting

int

30

>= 0

The window (number of frames) for smoothing the tails.

Viz RTLS Config

Viz Config in JSON

{
  "input": {
    "calibrationPath": "path/to/calibration.json",
    "mapPath": "path/to/map.png",
    "rtlsLogPath": "path/to/mdx-rtls.log",
    "videoDirPath": "path/to/folder/containing/videos",
    "rawDataPath": "path/to/raw_data.log"
  },
  "output": {
    "outputVideoPath": "path/to/output_video.mp4",
    "outputMapHeight": 1080,
    "displaySensorViews": false,
    "sensorViewsLayout": "radial",
    "sensorViewDisplayMode": "rotational",
    "sensorFovDisplayMode": "rotational",
    "skippedBeginningTimeSec": 0.0,
    "outputVideoDurationSec": 60.0,
    "sensorSetup": 8,
    "bufferLengthThreshSec": 3.0,
    "trajectoryLengthThreshSec": 5.0,
    "sensorViewStartTimeSec": 2.0,
    "sensorViewDurationSec": 1.0,
    "sensorViewGapSec": 0.1
  }
}

Categorization of Config Parameters

RTLS visualization configuration parameters are categorized as:

  1. Input Parameters: calibrationPath, mapPath, rtlsLogPath, videoDirPath, and rawDataPath

  2. Output Parameters: outputVideoPath, outputMapHeight, displaySensorViews, sensorViewsLayout, sensorViewDisplayMode, sensorFovDisplayMode, skippedBeginningTimeSec, outputVideoDurationSec, sensorSetup, bufferLengthThreshSec, trajectoryLengthThreshSec, sensorViewStartTimeSec, sensorViewDurationSec, and sensorViewGapSec

Viz Config Details

Parameters

Name

Category

Type

Default

Range

Description

calibrationPath

input

str

The path to the calibration file in JSON format.

mapPath

input

str

The path to the input map image for top-view visualization.

rtlsLogPath

input

str

The path to the RTLS log from the Kafka topic of mdx-rtls.

videoDirPath

input

str

The path to the directory of video files.

rawDataPath

input

str

The path to the raw data file (protobuf format by default).

outputVideoPath

output

str

The path to the output video file.

outputMapHeight

output

int

1080

> 0

The height in pixel for scaling the map image in the output video.

displaySensorViews

output

bool

False

True or False

If true, display the sensor views around the top-view visualization.

sensorViewsLayout

output

str

“radial”

[“radial”, “split”]

The sensor views’ layout used when displaySensorViews is set to true. The radial layout shows sensor views surrounding the map view at the center. The split layout shows the sensor views on the left and the map view on the right.

sensorViewDisplayMode

output

str

“rotational”

[“rotational”, “cumulative”]

The display mode for the sensor views used when displaySensorViews is set to true. The rotational mode highlights sensor views one at a time, and the cumulative model keeps all the previous sensor view highlighted while circling through all sensor views.

sensorFovDisplayMode

output

str

“rotational”

[“rotational”, “cumulative”]

The display mode for FOVs used when displaySensorViews is set to true. The rotational mode displays FOVs one at a time in the map view, and the cumulative mode keeps all the previous FOVs displayed while cirling through all sensors.

skippedBeginningTimeSec

output

float

0.0

>= 0

The time in second to skip at the beginning in the output video.

outputVideoDurationSec

output

float

60.0

> 0

The duration of the output video in second.

sensorSetup

output

int

30

[8, 12, 16, 30, 40, 96, 100]

The pre-defined setup according to the number of sensors, used when displaySensorViews is set to true.

bufferLengthThreshSec

output

float

3.0

> 0

The buffer length in second for smoothing the locations for visualization.

trajectoryLengthThreshSec

output

float

5.0

> 0

The trajectory length limit in second for for plotting the tails of locations.

sensorViewStartTimeSec

output

float

2.0

> 0

The starting time in second to display the sensor views in rotation, used when displaySensorViews is set to true.

sensorViewDurationSec

output

float

1.0

> 0

The duration in second to display the each sensor view in rotation, used when displaySensorViews is set to true.

sensorViewGapSec

output

float

0.1

> 0

The gap in second to display the sensor views in rotation, used when displaySensorViews is set to true.

Calibration

Use the Calibration tool to generate the JSON. For more details, refer to the Camera Calibration. The calibration JSON structure is shown below:

{
  "version": "1.0",
  "osmURL": "",
  "calibrationType": "cartesian",
  "sensors": [
    {
      "type": "camera",
      "id": "Retail_Synthetic_Cam01",
      "origin": {
        "lng": 0,
        "lat": 0
      },
      "geoLocation": {
        "lng": 0,
        "lat": 0
      },
      "coordinates": {
        "x": 27.752674116114072,
        "y": 29.520047192178833
      },
      "scaleFactor": 29.3610391053046,
      "attributes": [
        {
          "name": "fps",
          "value": "30"
        },
        {
          "name": "depth",
          "value": ""
        },
        {
          "name": "fieldOfView",
          "value": ""
        },
        {
          "name": "direction",
          "value": "80.83911991119385"
        },
        {
          "name": "source",
          "value": "vst"
        },
        {
          "name": "frameWidth",
          "value": "1920"
        },
        {
          "name": "frameHeight",
          "value": "1080"
        },
        {
          "name": "fieldOfViewPolygon",
          "value": "POLYGON((37.30339706547104 23.53302611402356, 32.39147622088676 27.72174026541681, 23.888735595644498 26.37998053209448, 18.37785774763376 17.302857646758667, 15.06215357071955 1.9512522630593612, 38.851128391907295 1.7875684444191782, 37.30339706547104 23.53302611402356))"
        }
      ],
      "place": [
          {
            "name": "building",
            "value": "Retail-Store"
          }
      ],
      "imageCoordinates": [
          {
            "x": 141.6635912555878,
            "y": 269.48316499503875
          },
          {
            "x": 240.98584878571785,
            "y": 265.9477324692906
          },
          {
            "x": 474.86495021238807,
            "y": 256.8443141943412
          },
          {
            "x": 560.3212274031938,
            "y": 253.34065913954282
          },
          {
            "x": 726.9749882013346,
            "y": 246.47581521243487
          },
          {
            "x": 806.7600319715643,
            "y": 243.18033763144211
          },
          {
            "x": 1049.1798543328828,
            "y": 210.84647658915785
          },
          {
            "x": 599.1893869353772,
            "y": 158.40456743892264
          },
          {
            "x": 826.3199319061305,
            "y": 430.18982743871027
          },
          {
            "x": 692.2412605037435,
            "y": 439.5586563490481
          },
          {
            "x": 293.8022700703916,
            "y": 940.2497293827583
          },
          {
            "x": 1402.657869447458,
            "y": 809.4458238430801
          },
          {
            "x": 1110.7821636347703,
            "y": 413.2539924280829
          },
          {
            "x": 929.7185023273412,
            "y": 246.31588848277949
          },
          {
            "x": 1014.7021052459154,
            "y": 286.1249896426217
          },
          {
            "x": 590.1431423343173,
            "y": 334.4703629959116
          },
          {
            "x": 415.03589063341894,
            "y": 204.6531095428365
          },
          {
            "x": 902.927686501944,
            "y": 159.48107191410975
          },
          {
            "x": 532.7209218694804,
            "y": 828.0865212903391
          },
          {
            "x": 1115.481527217435,
            "y": 318.46536548174294
          },
          {
            "x": 1475.2066751140385,
            "y": 275.30376809549483
          }
      ],
      "globalCoordinates": [
          {
            "x": 16.979024629976095,
            "y": 18.983219038249672
          },
          {
            "x": 18.482709318763998,
            "y": 18.968406614904072
          },
          {
            "x": 22.166473566193257,
            "y": 18.95933943696289
          },
          {
            "x": 23.56924098641491,
            "y": 18.95869975313936
          },
          {
            "x": 26.396910248181758,
            "y": 18.981927421337907
          },
          {
            "x": 27.799359187521056,
            "y": 18.99127020773497
          },
          {
            "x": 32.715754693039486,
            "y": 20.88230250608199
          },
          {
            "x": 22.729803639723098,
            "y": 28.773798981032822
          },
          {
            "x": 27.812523744506358,
            "y": 10.739385291736074
          },
          {
            "x": 26.40710095590235,
            "y": 10.745029820964021
          },
          {
            "x": 24.85870713145265,
            "y": 4.792837852944807
          },
          {
            "x": 30.96384273807492,
            "y": 4.728927314341821
          },
          {
            "x": 30.964239137891095,
            "y": 10.71346720864782
          },
          {
            "x": 29.94779515259621,
            "y": 18.42203428733013
          },
          {
            "x": 30.927376677045483,
            "y": 15.754878369497227
          },
          {
            "x": 24.785512670974654,
            "y": 14.430034306444753
          },
          {
            "x": 19.81976862078707,
            "y": 23.58767170645169
          },
          {
            "x": 30.437280874103934,
            "y": 27.3475721093038
          },
          {
            "x": 25.900291031269614,
            "y": 5.398659264709973
          },
          {
            "x": 32.00392031859042,
            "y": 13.957021804422336
          },
          {
            "x": 38.26375971510251,
            "y": 15.039096669496686
          }
      ],
      "tripwires": [],
      "rois": []
    }
  ]
}

A calibration comprises of arrays of sensors, where each sensor record consists of multiple attributes, the ones which are used by the pipeline are:

  • type : type of the sensor. For example, camera.

  • id : unique ID of the sensor.

  • origin comprises of:

    • origin.lng : Locations need to be in Cartesian coordinates often. A small area like a city can be considered planar and all locations of the city can be measured using Cartesian coordinates. A random or specific location of the city can be used as the origin. origin.lng represents the longitude of the origin.

    • origin.lat represents the latitude of the origin.

  • geoLocation : the geo-location of the sensor, consisting of [lng,lat].

  • coordinates : the location of the sensor in the Cartesian coordinates, consisting of [x,y].

  • translationToGlobalCoordinates : the translation vector to convert the locations to global coordinates for plotting on the map, consisting of [x,y].

  • scaleFactor : the scale factor of global coordinates from the unit of interest, e.g., meter, to pixel unit on the map image.

  • attributes : an array of name-value pairs consisting of:

    • fps : the video frame rate.

    • depth : the depth of the sensor.

    • fieldOfView : the field of view (FOV) of the sensor.

    • direction : the direction of the sensor.

    • source : the video source, e.g., VST.

    • frameWidth : the frame width of input video.

    • frameHeight : the frame height of input video.

    • fieldOfViewPolygon : the FOV polygon in WKT format.

  • place : an array of name-value pairs to represent a place, e.g., city=santa-clara/building=bldg_K/room=G.

  • imageCoordinates : the image coordinate locations to be mapped to globalCoordinates by the calibration tool. The mapping is used to generate the homography matrix.

  • globalCoordinates : see imageCoordinates mentioned above.

  • intrinsicMatrix : the 3-by-3 matrix of intrinsic camera parameters, such as focal lengths, focal center and scale factor.

  • extrinsicMatrix : the 3-by-3 matrix of extrinsic camera parameters for translation and rotation to convert from camera coordinates to 3D world coordinates.

  • cameraMatrix : the 3-by-4 camera matrix to convert the coordinates in the 3D world to the pixel locations in the sensor view.

  • homography : the 3-by-3 camera matrix to convert the coordinates on the 3D ground plane to the pixel locations in the sensor view.

  • tripwires : list of tripwires formed by arrays of points, which are usually drawn at the entrances of the doors to count the number of people going in or getting out.

  • rois : list of regions of interest formed by arrays of points.