Configuration

For more context on configuration, when the microservice is used

  • In the Occupancy Analytics app, please refer to its Operation Parameters section.

  • As a standalone microservice, refer to the README.md in its respective directory within metropolis-apps-standalone-deployment/modules/.

Ingestion Config

"ingestion":[
         {
              "name":  "dataLocation",
              "value": "./data/behaviors"
         },
         {
              "name": "checkpointLocation",
              "value": "./tmp"
         },
        {
              "name": "ingestionTrigger",
              "value": "5 minutes"
         },
        {
              "name": "dataRetention",
              "value": 168
         },
         {
              "name": "vacuumTrigger",
              "value": 1
        },
        {
              "name": "deletedFileRetention",
              "value": 4
         },
         {
              "name": "minPointsInTrajectory",
              "value": 30
         }
]
Parameters and Descriptions

Parameters

Description

ingestionTrigger

Sets the trigger interval for spark streaming to execute batch writes into delta. Should be tuned considering memory limits and desired file sizes.

dataRetention

Number of hours to retain data in delta table. Data files older than the retention period will be deleted by vacuum.

vacuumTrigger

Trigger interval in hours to run vacuum and delete files older than dataRetention. Vacuum will also delete tombstoned files that are generated due to merge, helping in keeping the memory usage low.

deletedFileRetention

Determines how long to keep deleted files before vacuuming them. If vacuum cleans up files that are in use, concurrent readers can fail or tables can be corrupted. You must choose an interval that is longer than the longest running concurrent transaction.

minPointsInTrajectory

Behaviors which have fewer points in their trajectory than minPointsInTrajectory are filtered out.

Clustering Config

"clustering":[
                 {
                       "sensor" : "default",
                        "config":[
                    {
                            "name": "minPtsForCluster" ,
                            "value": 50
                    },
                    {
                            "name": "minSampleForCore",
                            "value": 5
                    },
                    {
                            "name": "encodeProperty",
                            "value": "none"
                    },
        {
                            "name": "objectType",
                            "value": "people"
                    }
                ]
            },
            {
                   "sensor" : "xyz",
                   "config":[
                    {
                            "name": "encodeProperty",
                            "value": "direction"
                    }
                ]
            }
        ]
Parameters and Descriptions

Parameter

Description

minPtsForCluster

Determines the minimum number of behaviors required to form a cluster.

minSampleForCore

The larger the value of minSampleForCore the more conservative the clustering – more points will be declared as noise, and clusters will be restricted to progressively more dense areas. In hdbscan this parameter is referred to as min_samples.

encodeProperty

Can be set to “direction”, “flow” or “none”. Clustering will take into account the specified property of behaviors when generating labels.

If encodeProperty is set to “direction”, the direction field in the behavior data will be used as a additional metric to form clusters. Note that encoding direction can be sensitive to noisy data.

If encodeProperty is set to “flow”, clustering will take into account the movement of the trajectories when clustering. This is done by breaking up the trajectories and calculating hausdorff on smaller pieces. Often, such a clustering will respect direction, while being less sensitive to noisy and erroneous data. By default “flow” is encoded

If encodeProperty is set to “none”, then only the hausdorff distance is used for clustering.

objectType

Object type of the behaviors being clustered, can be either “vehicle” or “people”.

The default clustering parameters for all sensors, should be given in the config object of the default sensor. To add sensor specific configurations, another dictionary with the sensor name and a minimal config to overwrite the defaults can be provided. In the above config example, sensor xyz’s behavior data will be clustered on direction instead of flow, but all other parameters will be taken from the default configuration.

The parameters determine how conservative the clustering is and do not directly affect the number of clusters produced. The number of clusters is determined by hdbscan internally.

Model Config

"models":[
       {
            "name": "versionsToRetain",
            "value": 2
      },
      {
            "name": "trainingFrequency",
            "value": 48
      },
      {
            "name": "maxTrainingData",
            "value": 6000
      },
      {
            "name": "minTrainingData",
            "value": 500
      },
      {
            "name": "noiseClassificationThreshold",
            "value": 0.80
      },
      {
            "name": "gpu",
            "value": 0
      }
]
Parameters

Parameters

Description

versionsToRetain

The number of versions to retain for a model.

trainingFrequency

How often training is run in hours. Training is always run at the start of model training.

minTrainingData

The minimum amount of training data required for a sensor, to start the process of training a model.

maxTrainingData

The maximum amount of data that will be used for training.

noiseClassificationThreshold

Determines the probability threshold required for a behavior to be assigned a cluster label. If the behavior does not meet the probability threshold for any label, it is classified as noise.

Deep Learning Config

"deepLearning":[
       {
              "name": "EPOCHS",
              "value": 20
       },
       {
              "name": "DROPOUT",
              "value": 0.2
       },
       {
              "name": "excludeNoise",
              "value": true
       },
       {
              "name": "learningRate",
              "value": 0.0007
       },
       {
              "name": "validationBatchSize",
              "value": 64
       },
       {
              "name": "trainBatchSize",
              "value": 64
       },
       {
              "name": "testBatchSize",
              "value": 512
       }
]
Parameters

Parameters

Description

excludeNoise

Determines whether behaviors with noise label should be used for training. Default is True.

Remaining parameters in the config are standard deep learning parameters.

Behavior Data Config

"behaviorData": [
      {
            "name": "numPoints",
            "value": 100
       },
 {
            "name": "minTrajectoryDistance",
            "value": 3
       }
]
Parameters

Parameters

Description

numPoints

The number of points each behavior’s coordinates will be extrapolated to.

minTrajectoryDistance

Minimum distance (in meters) that a behavior’s trajectory must have to be considered for clustering and deep-learning.