NVIDIA Morpheus (24.10.01)
(Latest Version)

DFP Training Pipe Module

This module function consolidates multiple DFP pipeline modules relevant to the training process into a single module.

Key

Type

Description

Example Value

Default Value

timestamp_column_name str Name of the timestamp column used in the data. "timestamp" -
cache_dir str Directory to cache the rolling window data. "/tmp/cache" -
batching_options dict Options for batching files. Refer Below -
user_splitting_options dict Options for splitting data by user. Refer Below -
stream_aggregation_options dict Options for aggregating data by stream. Refer Below -
preprocessing_options dict Options for preprocessing the data. - -
dfencoder_options dict Options for configuring the data frame encoder, used for training the model. Refer Below -
mlflow_writer_options dict Options for the MLflow model writer, which is responsible for saving the trained model. Refer Below -

Key

Type

Description

Example Value

Default Value

end_time str End time of the time range to process. "2023-03-01T00:00:00" -
iso_date_regex_pattern str ISO date regex pattern. "\\\\d{4}-\\\\d{2}-\\\\d{2}T\\\\d{2}:\\\\d{2}:\\\\d{2}" -
parser_kwargs dict Keyword arguments to pass to the parser. {} -
period str Time period to batch the data. "1min" -
sampling_rate_s float Sampling rate in seconds. 60 -
start_time str Start time of the time range to process. "2023-02-01T00:00:00" -

Key

Type

Description

Example Value

Default Value

fallback_username str Fallback user to use if no model is found for a user. "generic" -
include_generic bool Include generic models in the results. true -
include_individual bool Include individual models in the results. true -
only_users list[str] List of users to include in the results. [] -
skip_users list[str] List of users to exclude from the results. [] -
userid_column_name str Column name for the user ID. "user_id" -

Key

Type

Description

Example Value

Default Value

cache_mode str Mode for managing user cache. Setting to batch flushes cache once trigger conditions are met. Otherwise, continue to aggregate user’s history. "batch" "batch"
min_history int Minimum history to trigger a new training event 1 1
max_history int Maximum history to include in a new training event 0 0
timestamp_column_name str Name of the column containing timestamps 'timestamp' 'timestamp'
aggregation_span str Look back time span for training data in a new training event "60d" 60d
cache_to_disk bool Whether or not to cache streaming data to disk false false
cache_dir str Directory to use for caching streaming data "./.cache" "./.cache"

Parameter

Type

Description

Example Value

Default Value

feature_columns list List of feature columns to train on ["column1", "column2", "column3"] -
epochs int Number of epochs to train for 50 -
model_kwargs dict Keyword arguments to pass to the model {"encoder_layers": [64, 32], "decoder_layers": [32, 64], "activation": "relu", "swap_p": 0.1, "lr": 0.001, "lr_decay": 0.9, "batch_size": 32, "verbose": 1, "optimizer": "adam", "scalar": "min_max", "min_cats": 10, "progress_bar": false, "device": "cpu"} -
validation_size float Size of the validation set 0.1 -

Key

Type

Description

Example Value

Default Value

conda_env str Conda environment for the model "path/to/conda_env.yml" [Required]
databricks_permissions dict Permissions for the model - None
experiment_name_formatter str Formatter for the experiment name "experiment_name_{timestamp}" [Required]
model_name_formatter str Formatter for the model name "model_name_{timestamp}" [Required]
timestamp_column_name str Name of the timestamp column "timestamp" "timestamp"
Previous DFP Split Users Module
Next DFP Training Module
© Copyright 2024, NVIDIA. Last updated on Dec 3, 2024.