This module function sets up modular Digital Fingerprinting Pipeline instance.
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
dict |
Options for the inference pipeline module |
See Below |
|
|
dict |
Options for the training pipeline module |
See Below |
|
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
dict |
Options for batching the data |
See Below |
|
|
str |
Directory to cache the rolling window data |
“/path/to/cache/dir” |
|
|
dict |
Options for configuring the data frame encoder |
See Below |
|
|
dict |
Options for the MLflow model writer |
See Below |
|
|
dict |
Options for preprocessing the data |
See Below |
|
|
dict |
Options for aggregating the data by stream |
See Below |
|
|
str |
Name of the timestamp column used in the data |
“my_timestamp” |
|
|
dict |
Options for splitting the data by user |
See Below |
|
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
dict |
Options for batching the data |
See Below |
|
|
str |
Directory to cache the rolling window data |
“/path/to/cache/dir” |
|
|
dict |
Criteria for filtering detections |
See Below |
|
|
str |
User ID to use if user ID not found |
“generic_user” |
|
|
dict |
Options for the inference module |
See Below |
|
|
str |
Format string for the model name |
“model_{timestamp}” |
|
|
int |
Number of output ports for the module |
3 |
|
|
str |
Name of the timestamp column in the input data |
“timestamp” |
|
|
dict |
Options for aggregating the data by stream |
See Below |
|
|
dict |
Options for splitting the data by user |
See Below |
|
|
dict |
Options for writing the detections to a file |
See Below |
|
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
datetime/string |
Endtime of the time window |
“2023-03-14T23:59:59” |
|
|
string |
Regex pattern for ISO date matching |
“\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}” |
|
|
dictionary |
Additional arguments for the parser |
{} |
|
|
string |
Time period for grouping files |
“1d” |
|
|
integer |
Sampling rate in seconds |
60 |
|
|
datetime/string |
Start time of the time window |
“2023-03-01T00:00:00” |
|
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
list |
List of feature columns to train on |
[“column1”, “column2”, “column3”] |
|
|
int |
Number of epochs to train for |
50 |
|
|
dict |
Keyword arguments to pass to the model |
{“encoder_layers”: [64, 32], “decoder_layers”: [32, 64], “activation”: “relu”, “swap_p”: 0.1, “lr”: 0.001, “lr_decay”: 0.9, “batch_size”: 32, “verbose”: 1, “optimizer”: “adam”, “scalar”: “min_max”, “min_cats”: 10, “progress_bar”: false, “device”: “cpu”} |
|
|
float |
Size of the validation set |
0.1 |
|
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
string |
Conda environment for the model |
“path/to/conda_env.yml” |
|
|
dictionary |
Permissions for the model |
See Below |
|
|
string |
Formatter for the experiment name |
“experiment_name_{timestamp}” |
|
|
string |
Formatter for the model name |
“model_name_{timestamp}” |
|
|
string |
Name of the timestamp column |
“timestamp” |
|
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
string |
The user ID to use if the user ID is not found |
“batch” |
|
|
int |
Minimum history to trigger a new training event |
1 |
|
|
int |
Maximum history to include in a new training event |
0 |
|
|
string |
Name of the column containing timestamps |
“timestamp” |
|
|
string |
Lookback timespan for training data in a new training event |
“60d” |
|
|
bool |
Whether or not to cache streaming data to disk |
false |
|
|
string |
Directory to use for caching streaming data |
“./.cache” |
|
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
str |
The user ID to use if the user ID is not found |
“generic_user” |
|
|
bool |
Whether to include a generic user ID in the output |
false |
|
|
bool |
Whether to include individual user IDs in the output |
true |
|
|
list |
List of user IDs to include; others will be excluded |
[“user1”, “user2”, “user3”] |
|
|
list |
List of user IDs to exclude from the output |
[“user4”, “user5”] |
|
|
str |
Name of the column containing timestamps |
“timestamp” |
|
|
str |
Name of the column containing user IDs |
“username” |
|
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
float |
Threshold for filtering detections |
0.5 |
|
|
str |
Name of the field to filter by threshold |
“score” |
|
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
string |
Formatter for model names |
“user_{username}_model” |
|
|
string |
Fallback user to use if no model is found for a user |
“generic_user” |
|
|
string |
Name of the timestamp column |
“timestamp” |
|
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
string |
Path to the output file |
“output.csv” |
|
|
string |
Type of file to write |
“CSV” |
|
|
bool |
If true, flush the file after each write |
false |
|
|
bool |
If true, include the index column |
false |
|
|
bool |
If true, overwrite the file if it exists |
true |
|