DFP Training Pipe Module#
This module function consolidates multiple DFP pipeline modules relevant to the training process into a single module.
Configurable Parameters#
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
|
Name of the timestamp column used in the data. |
|
|
|
|
Directory to cache the rolling window data. |
|
|
|
|
Options for batching files. |
Refer Below |
|
|
|
Options for splitting data by user. |
Refer Below |
|
|
|
Options for aggregating data by stream. |
Refer Below |
|
|
|
Options for preprocessing the data. |
|
|
|
|
Options for configuring the data frame encoder, used for training the model. |
Refer Below |
|
|
|
Options for the MLflow model writer, which is responsible for saving the trained model. |
Refer Below |
|
batching_options
#
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
|
End time of the time range to process. |
|
|
|
|
ISO date regex pattern. |
|
|
|
|
Keyword arguments to pass to the parser. |
|
|
|
|
Time period to batch the data. |
|
|
|
|
Sampling rate in seconds. |
|
|
|
|
Start time of the time range to process. |
|
|
user_splitting_options
#
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
|
Fallback user to use if no model is found for a user. |
|
|
|
|
Include generic models in the results. |
|
|
|
|
Include individual models in the results. |
|
|
|
|
List of users to include in the results. |
|
|
|
|
List of users to exclude from the results. |
|
|
|
|
Column name for the user ID. |
|
|
stream_aggregation_options
#
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
|
Mode for managing user cache. Setting to |
|
|
|
|
Minimum history to trigger a new training event |
|
|
|
|
Maximum history to include in a new training event |
|
|
|
|
Name of the column containing timestamps |
|
|
|
|
Look back time span for training data in a new training event |
|
|
|
|
Whether or not to cache streaming data to disk |
|
|
|
|
Directory to use for caching streaming data |
|
|
dfencoder_options
#
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
|
List of feature columns to train on |
|
|
|
|
Number of epochs to train for |
|
|
|
|
Keyword arguments to pass to the model |
|
|
|
|
Size of the validation set |
|
|
mlflow_writer_options
#
Key |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
|
Conda environment for the model |
|
|
|
|
Permissions for the model |
- |
|
|
|
Formatter for the experiment name |
|
|
|
|
Formatter for the model name |
|
|
|
|
Name of the timestamp column |
|
|