Configuration

For more context on configuration, when the microservice is used

In the Occupancy Analytics app, please refer to its Operation Parameters section.
As a standalone microservice, refer to the README.md in its respective directory within metropolis-apps-standalone-deployment/modules/.

Ingestion Config

"ingestion":[
         {
              "name":  "dataLocation",
              "value": "./data/behaviors"
         },
         {
              "name": "checkpointLocation",
              "value": "./tmp"
         },
        {
              "name": "ingestionTrigger",
              "value": "5 minutes"
         },
        {
              "name": "dataRetention",
              "value": 168
         },
         {
              "name": "vacuumTrigger",
              "value": 1
        },
        {
              "name": "deletedFileRetention",
              "value": 4
         },
         {
              "name": "minPointsInTrajectory",
              "value": 30
         }
]

Parameters and Descriptions
Parameters	Description
ingestionTrigger	Sets the trigger interval for spark streaming to execute batch writes into delta. Should be tuned considering memory limits and desired file sizes.
dataRetention	Number of hours to retain data in delta table. Data files older than the retention period will be deleted by vacuum.
vacuumTrigger	Trigger interval in hours to run vacuum and delete files older than dataRetention. Vacuum will also delete tombstoned files that are generated due to merge, helping in keeping the memory usage low.
deletedFileRetention	Determines how long to keep deleted files before vacuuming them. If vacuum cleans up files that are in use, concurrent readers can fail or tables can be corrupted. You must choose an interval that is longer than the longest running concurrent transaction.
minPointsInTrajectory	Behaviors which have fewer points in their trajectory than `minPointsInTrajectory` are filtered out.

Clustering Config

"clustering":[
                 {
                       "sensor" : "default",
                        "config":[
                    {
                            "name": "minPtsForCluster" ,
                            "value": 50
                    },
                    {
                            "name": "minSampleForCore",
                            "value": 5
                    },
                    {
                            "name": "encodeProperty",
                            "value": "none"
                    },
        {
                            "name": "objectType",
                            "value": "people"
                    }
                ]
            },
            {
                   "sensor" : "xyz",
                   "config":[
                    {
                            "name": "encodeProperty",
                            "value": "direction"
                    }
                ]
            }
        ]

Parameters and Descriptions
Parameter	Description
minPtsForCluster	Determines the minimum number of behaviors required to form a cluster.
minSampleForCore	The larger the value of minSampleForCore the more conservative the clustering – more points will be declared as noise, and clusters will be restricted to progressively more dense areas. In hdbscan this parameter is referred to as min_samples.
encodeProperty	Can be set to “direction”, “flow” or “none”. Clustering will take into account the specified property of behaviors when generating labels. If encodeProperty is set to “direction”, the direction field in the behavior data will be used as a additional metric to form clusters. Note that encoding direction can be sensitive to noisy data. If encodeProperty is set to “flow”, clustering will take into account the movement of the trajectories when clustering. This is done by breaking up the trajectories and calculating hausdorff on smaller pieces. Often, such a clustering will respect direction, while being less sensitive to noisy and erroneous data. By default “flow” is encoded If encodeProperty is set to “none”, then only the hausdorff distance is used for clustering.
objectType	Object type of the behaviors being clustered, can be either “vehicle” or “people”.

The default clustering parameters for all sensors, should be given in the config object of the default sensor. To add sensor specific configurations, another dictionary with the sensor name and a minimal config to overwrite the defaults can be provided. In the above config example, sensor xyz’s behavior data will be clustered on direction instead of flow, but all other parameters will be taken from the default configuration.

The parameters determine how conservative the clustering is and do not directly affect the number of clusters produced. The number of clusters is determined by hdbscan internally.

Model Config

"models":[
       {
            "name": "versionsToRetain",
            "value": 2
      },
      {
            "name": "trainingFrequency",
            "value": 48
      },
      {
            "name": "maxTrainingData",
            "value": 6000
      },
      {
            "name": "minTrainingData",
            "value": 500
      },
      {
            "name": "noiseClassificationThreshold",
            "value": 0.80
      },
      {
            "name": "gpu",
            "value": 0
      }
]

Parameters
Parameters	Description
versionsToRetain	The number of versions to retain for a model.
trainingFrequency	How often training is run in hours. Training is always run at the start of model training.
minTrainingData	The minimum amount of training data required for a sensor, to start the process of training a model.
maxTrainingData	The maximum amount of data that will be used for training.
noiseClassificationThreshold	Determines the probability threshold required for a behavior to be assigned a cluster label. If the behavior does not meet the probability threshold for any label, it is classified as noise.

Deep Learning Config

"deepLearning":[
       {
              "name": "EPOCHS",
              "value": 20
       },
       {
              "name": "DROPOUT",
              "value": 0.2
       },
       {
              "name": "excludeNoise",
              "value": true
       },
       {
              "name": "learningRate",
              "value": 0.0007
       },
       {
              "name": "validationBatchSize",
              "value": 64
       },
       {
              "name": "trainBatchSize",
              "value": 64
       },
       {
              "name": "testBatchSize",
              "value": 512
       }
]

Parameters
Parameters	Description
excludeNoise	Determines whether behaviors with noise label should be used for training. Default is True.

Remaining parameters in the config are standard deep learning parameters.

Behavior Data Config

"behaviorData": [
      {
            "name": "numPoints",
            "value": 100
       },
 {
            "name": "minTrajectoryDistance",
            "value": 3
       }
]

Parameters
Parameters	Description
numPoints	The number of points each behavior’s coordinates will be extrapolated to.
minTrajectoryDistance	Minimum distance (in meters) that a behavior’s trajectory must have to be considered for clustering and deep-learning.