NVIDIA Morpheus (24.10.01)

DFP Deployment Module

This module function sets up modular Digital Fingerprinting Pipeline instance.

Parameter

Type

Description

Example Value

Default Value

inference_options dict Options for the inference pipeline module Refer Below [Required]
training_options dict Options for the training pipeline module Refer Below [Required]

Parameter

Type

Description

Example Value

Default Value

batching_options dict Options for batching the data Refer Below -
cache_dir str Directory to cache the rolling window data "/path/to/cache/dir" "./.cache"
dfencoder_options dict Options for configuring the data frame encoder Refer Below -
mlflow_writer_options dict Options for the MLflow model writer Refer Below -
preprocessing_options dict Options for preprocessing the data Refer Below -
stream_aggregation_options dict Options for aggregating the data by stream Refer Below -
timestamp_column_name str Name of the timestamp column used in the data "my_timestamp" "timestamp"
user_splitting_options dict Options for splitting the data by user Refer Below -

Parameter

Type

Description

Example Value

Default Value

batching_options dict Options for batching the data Refer Below -
cache_dir str Directory to cache the rolling window data "/path/to/cache/dir" "./.cache"
detection_criteria dict Criteria for filtering detections Refer Below -
fallback_username str User ID to use if user ID not found "generic_user" "generic_user"
inference_options dict Options for the inference module Refer Below -
model_name_formatter str Format string for the model name "model_{timestamp}" [Required]
num_output_ports int Number of output ports for the module 3 -
timestamp_column_name str Name of the timestamp column in the input data "timestamp" "timestamp"
stream_aggregation_options dict Options for aggregating the data by stream Refer Below -
user_splitting_options dict Options for splitting the data by user Refer Below -
write_to_file_options dict Options for writing the detections to a file Refer Below -

Key

Type

Description

Example Value

Default Value

end_time datetime str End of the time window "2023-03-14T23:59:59"
iso_date_regex_pattern str Regex pattern for ISO date matching "\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}" <iso_date_regex_pattern>
parser_kwargs dict Additional arguments for the parser {} {}
period str Time period for grouping files "1d" D
sampling_rate_s int Sampling rate in seconds 0 None
start_time datetime str Start time of the time window "2023-03-01T00:00:00"

Parameter

Type

Description

Example Value

Default Value

feature_columns list List of feature columns to train on ["column1", "column2", "column3"] -
epochs int Number of epochs to train for 50 -
model_kwargs dict Keyword arguments to pass to the model {"encoder_layers": [64, 32], "decoder_layers": [32, 64], "activation": "relu", "swap_p": 0.1, "lr": 0.001, "lr_decay": 0.9, "batch_size": 32, "verbose": 1, "optimizer": "adam", "scalar": "min_max", "min_cats": 10, "progress_bar": False, "device": "cpu"} -
validation_size float Size of the validation set 0.1 -

Key

Type

Description

Example Value

Default Value

description str Name to show for this Monitor Stage in the console window "Progress" Progress
silence_monitors bool Silence the monitors on the console True False
smoothing float Smoothing parameter to determine how much the throughput should be averaged 0.01 0.05
unit str Units to show in the rate value "messages" "messages"
delayed_start bool When enabled, the progress bar will not be shown until the first message is received. Otherwise, the progress bar is shown on pipeline startup and will begin timing immediately. In large pipelines, this option may be desired to give a more accurate timing. True False
determine_count_fn_schema str Custom function for determining the count in a message "Progress" "Progress"
log_level str Enable this stage when the configured log level is at log_level or lower. "DEBUG" "INFO"

Key

Type

Description

Example Value

Default Value

conda_env str Conda environment for the model "path/to/conda_env.yml" [Required]
databricks_permissions dict Permissions for the model Refer Below None
experiment_name_formatter str Formatter for the experiment name "experiment_name_{timestamp}" [Required]
model_name_formatter str Formatter for the model name "model_name_{timestamp}" [Required]
timestamp_column_name str Name of the timestamp column "timestamp" "timestamp"

Parameter

Type

Description

Example Value

Default Value

cache_mode str Mode for managing user cache. Setting to batch flushes cache once trigger conditions are met. Otherwise, continue to aggregate user’s history. "batch" "batch"
min_history int Minimum history to trigger a new training event 1 1
max_history int Maximum history to include in a new training event 0 0
timestamp_column_name str Name of the column containing timestamps "timestamp" "timestamp"
aggregation_span str Look back time span for training data in a new training event "60d" "60d"
cache_to_disk bool Whether or not to cache streaming data to disk False False
cache_dir str Directory to use for caching streaming data "./.cache" "./.cache"

Key

Type

Description

Example Value

Default Value

fallback_username str The user ID to use if the user ID is not found "generic_user" "generic_user"
include_generic bool Whether to include a generic user ID in the output False False
include_individual bool Whether to include individual user IDs in the output True False
only_users list List of user IDs to include; others will be excluded ["user1", "user2", "user3"] []
skip_users list List of user IDs to exclude from the output ["user4", "user5"] []
timestamp_column_name str Name of the column containing timestamps "timestamp" "timestamp"
userid_column_name str Name of the column containing user IDs "username" "username"

Key

Type

Description

Example Value

Default Value

threshold float Threshold for filtering detections 0.5 0.5
field_name str Name of the field to filter by threshold "score" "probs"

Parameter

Type

Description

Example Value

Default Value

model_name_formatter str Formatter for model names "user_{username}_model" [Required]
fallback_username str Fallback user to use if no model is found for a user "generic_user" "generic_user"
timestamp_column_name str Name of the timestamp column "timestamp" "timestamp"

Key

Type

Description

Example Value

Default Value

filename str Path to the output file "output.csv" None
file_type str Type of file to write "CSV" "AUTO"
flush bool If true, flush the file after each write False False
include_index_col bool If true, include the index column False True
overwrite bool If true, overwrite the file if it exists True False
Previous DFP Data Prep Module
Next dfp_inference_pipe
© Copyright 2024, NVIDIA. Last updated on Dec 3, 2024.