`dfp_inference_pipe`#

This module function allows for the consolidation of multiple DFP pipeline modules relevant to the inference process into a single module.

Configurable Parameters#

Parameter	Type	Description	Example Value	Default Value
`batching_options`	dictionary	Options for batching files.	Refer below	`-`
`cache_dir`	string	Directory used for caching intermediate results.	`"/tmp/cache"`	`-`
`detection_criteria`	dictionary	Criteria for filtering detections.	-	`-`
`inference_options`	dictionary	Options for configuring the inference process.	Refer below	`-`
`preprocessing_options`	dictionary	Options for preprocessing data.	-	`-`
`stream_aggregation_options`	dictionary	Options for aggregating data by stream.	Refer below	`-`
`timestamp_column_name`	string	Name of the column containing timestamps.	`"timestamp"`	`-`
`user_splitting_options`	dictionary	Options for splitting data by user.	Refer below	`-`
`write_to_file_options`	dictionary	Options for writing results to a file.	-	`-`

`batching_options`#

Parameter	Type	Description	Example Value	Default Value
`end_time`	string	End time of the time range to process.	`"2022-01-01T00:00:00Z"`	`-`
`iso_date_regex_pattern`	string	ISO date regex pattern.	`"\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z"`	`-`
`parser_kwargs`	dict	Keyword arguments to pass to the parser.	-	`-`
`period`	string	Time period to batch the data.	`"1D"`	`-`
`sampling_rate_s`	float	Sampling rate in seconds.	`"1.0"`	`-`
`start_time`	string	Start time of the time range to process.	`"2021-01-01T00:00:00Z"`	`-`

`user_splitting_options`#

Parameter	Type	Description	Example Value	Default Value
`fallback_username`	string	Fallback user to use if no model is found for a user.	`"generic_user"`	`"generic_user"`
`include_generic`	boolean	Include generic models in the results.	`True`	`True`
`include_individual`	boolean	Include individual models in the results.	`True`	`False`
`only_users`	list	List of users to include in the results.	`["user_a","user_b"]`	`-`
`skip_users`	list	List of users to exclude from the results.	`["user_c"]`	`-`
`userid_column_name`	string	Column	`"name for the user ID."`	`"user_id"`

`stream_aggregation_options`#

Parameter	Type	Description	Example Value	Default Value
`cache_mode`	string	Mode for managing user cache. Setting to `batch` flushes cache once trigger conditions are met. Otherwise, continue to aggregate user’s history.	`"batch"`	`batch`
`min_history`	int	Minimum history to trigger a new training event	`1`	`1`
`max_history`	int	Maximum history to include in a new training event	`0`	`0`
`timestamp_column_name`	string	Name of the column containing timestamps	`"timestamp"`	`timestamp`
`aggregation_span`	string	Look back time span for training data in a new training event	`"60d"`	`60d`
`cache_to_disk`	boolean	Whether or not to cache streaming data to disk	`False`	`False`
`cache_dir`	string	Directory to use for caching streaming data	`"./.cache"`	`"./.cache"`

`inference_options`#

Parameter	Type	Description	Example Value	Default Value
`model_name_formatter`	string	Formatter for model names	`"user_{username}_model"`	`[Required]`
`fallback_username`	string	Fallback user to use if no model is found for a user	`"generic_user"`	`"generic_user"`
`timestamp_column_name`	string	Name of the timestamp column	`"timestamp"`	`"timestamp"`

Example JSON Configuration#

{
  "timestamp_column_name": "timestamp",
  "cache_dir": "/tmp/cache",
  "batching_options": {
    "end_time": "2022-01-01T00:00:00Z",
    "iso_date_regex_pattern": "\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z",
    "parser_kwargs": {},
    "period": "1D",
    "sampling_rate_s": 1.0,
    "start_time": "2021-01-01T00:00:00Z"
  },
  "user_splitting_options": {
    "fallback_username": "generic",
    "include_generic": true,
    "include_individual": true,
    "only_users": [
      "user_a",
      "user_b"
    ],
    "skip_users": [
      "user_c"
    ],
    "userid_column_name": "user_id"
  },
  "stream_aggregation_options": {
    "timestamp_column_name": "timestamp",
    "cache_mode": "batch",
    "trigger_on_min_history": true,
    "aggregation_span": "1D",
    "trigger_on_min_increment": true,
    "cache_to_disk": false
  },
  "preprocessing_options": {},
  "inference_options": {
    "model_name_formatter": "{model_name}",
    "fallback_username": "generic",
    "timestamp_column_name": "timestamp"
  },
  "detection_criteria": {},
  "write_to_file_options": {}
}