Configuration
For more context on configuration, when the microservice is used
In the Occupancy Analytics app, please refer to its Operation Parameters section.
As a standalone microservice, refer to the
README.md
in its respective directory withinmetropolis-apps-standalone-deployment/modules/
.
Ingestion Config
"ingestion":[
{
"name": "dataLocation",
"value": "./data/behaviors"
},
{
"name": "checkpointLocation",
"value": "./tmp"
},
{
"name": "ingestionTrigger",
"value": "5 minutes"
},
{
"name": "dataRetention",
"value": 168
},
{
"name": "vacuumTrigger",
"value": 1
},
{
"name": "deletedFileRetention",
"value": 4
},
{
"name": "minPointsInTrajectory",
"value": 30
}
]
Parameters |
Description |
---|---|
ingestionTrigger |
Sets the trigger interval for spark streaming to execute batch writes into delta. Should be tuned considering memory limits and desired file sizes. |
dataRetention |
Number of hours to retain data in delta table. Data files older than the retention period will be deleted by vacuum. |
vacuumTrigger |
Trigger interval in hours to run vacuum and delete files older than dataRetention. Vacuum will also delete tombstoned files that are generated due to merge, helping in keeping the memory usage low. |
deletedFileRetention |
Determines how long to keep deleted files before vacuuming them. If vacuum cleans up files that are in use, concurrent readers can fail or tables can be corrupted. You must choose an interval that is longer than the longest running concurrent transaction. |
minPointsInTrajectory |
Behaviors which have fewer points in their trajectory than |
Clustering Config
"clustering":[
{
"sensor" : "default",
"config":[
{
"name": "minPtsForCluster" ,
"value": 50
},
{
"name": "minSampleForCore",
"value": 5
},
{
"name": "encodeProperty",
"value": "none"
},
{
"name": "objectType",
"value": "people"
}
]
},
{
"sensor" : "xyz",
"config":[
{
"name": "encodeProperty",
"value": "direction"
}
]
}
]
Parameter |
Description |
---|---|
minPtsForCluster |
Determines the minimum number of behaviors required to form a cluster. |
minSampleForCore |
The larger the value of minSampleForCore the more conservative the clustering – more points will be declared as noise, and clusters will be restricted to progressively more dense areas. In hdbscan this parameter is referred to as min_samples. |
encodeProperty |
Can be set to “direction”, “flow” or “none”. Clustering will take into account the specified property of behaviors when generating labels. If encodeProperty is set to “direction”, the direction field in the behavior data will be used as a additional metric to form clusters. Note that encoding direction can be sensitive to noisy data. If encodeProperty is set to “flow”, clustering will take into account the movement of the trajectories when clustering. This is done by breaking up the trajectories and calculating hausdorff on smaller pieces. Often, such a clustering will respect direction, while being less sensitive to noisy and erroneous data. By default “flow” is encoded If encodeProperty is set to “none”, then only the hausdorff distance is used for clustering. |
objectType |
Object type of the behaviors being clustered, can be either “vehicle” or “people”. |
The default clustering parameters for all sensors, should be given in the config object of the default sensor. To add sensor specific configurations, another dictionary with the sensor name and a minimal config to overwrite the defaults can be provided. In the above config example, sensor xyz’s behavior data will be clustered on direction instead of flow, but all other parameters will be taken from the default configuration.
The parameters determine how conservative the clustering is and do not directly affect the number of clusters produced. The number of clusters is determined by hdbscan internally.
Model Config
"models":[
{
"name": "versionsToRetain",
"value": 2
},
{
"name": "trainingFrequency",
"value": 48
},
{
"name": "maxTrainingData",
"value": 6000
},
{
"name": "minTrainingData",
"value": 500
},
{
"name": "noiseClassificationThreshold",
"value": 0.80
},
{
"name": "gpu",
"value": 0
}
]
Parameters |
Description |
---|---|
versionsToRetain |
The number of versions to retain for a model. |
trainingFrequency |
How often training is run in hours. Training is always run at the start of model training. |
minTrainingData |
The minimum amount of training data required for a sensor, to start the process of training a model. |
maxTrainingData |
The maximum amount of data that will be used for training. |
noiseClassificationThreshold |
Determines the probability threshold required for a behavior to be assigned a cluster label. If the behavior does not meet the probability threshold for any label, it is classified as noise. |
Deep Learning Config
"deepLearning":[
{
"name": "EPOCHS",
"value": 20
},
{
"name": "DROPOUT",
"value": 0.2
},
{
"name": "excludeNoise",
"value": true
},
{
"name": "learningRate",
"value": 0.0007
},
{
"name": "validationBatchSize",
"value": 64
},
{
"name": "trainBatchSize",
"value": 64
},
{
"name": "testBatchSize",
"value": 512
}
]
Parameters |
Description |
---|---|
excludeNoise |
Determines whether behaviors with noise label should be used for training. Default is True. |
Remaining parameters in the config are standard deep learning parameters.
Behavior Data Config
"behaviorData": [
{
"name": "numPoints",
"value": 100
},
{
"name": "minTrajectoryDistance",
"value": 3
}
]
Parameters |
Description |
---|---|
numPoints |
The number of points each behavior’s coordinates will be extrapolated to. |
minTrajectoryDistance |
Minimum distance (in meters) that a behavior’s trajectory must have to be considered for clustering and deep-learning. |