NVIDIA Docs Hub NVIDIA Morpheus NVIDIA Morpheus (24.06) morpheus.stages.input.duo_source_stage.DuoSourceStage

(Latest Version)

morpheus.stages.input.duo_source_stage.DuoSourceStage

class DuoSourceStage(c, input_glob, watch_directory=False, max_files=-1, file_type=<FileTypes.Auto: 0>, repeat=1, sort_glob=False, recursive=True, queue_max_size=128, batch_timeout=5.0)[source]

Bases: morpheus.stages.input.autoencoder_source_stage.AutoencoderSourceStage

Source stage is used to load Duo Authentication messages.

Adds the following derived features:

Parameters

c : morpheus.config.Config
input_glob
watch_directory
max_files: int, default = -1
file_type : morpheus.common.FileTypes
repeat: int, default = 1
sort_glob
recursive: bool, default = True
queue_max_size: int, default = 128
batch_timeout: float, default = 5.0

Attributes

has_multi_input_ports
has_multi_output_ports
input_count
input_ports
is_built
is_pre_built
name
output_ports
unique_name

Methods

`batch_user_split`(x, userid_column_name, ...)	Creates a dataframe for each userid.
`build`(builder[, do_propagate])	Build this stage.
`can_build`([check_ports])	Determines if all inputs have been built allowing this node to be built.
`can_pre_build`([check_ports])	Determines if all inputs have been built allowing this node to be built.
`change_columns`(df)	Removes characters (_,.,{,},:) from the names of the dataframe columns.
`compute_schema`(schema)	Compute the schema for this stage based on the incoming schema from upstream stages.
`derive_features`(df, feature_columns)	Derives feature columns from the DUO (logs) source columns.
`files_to_dfs_per_user`(x, userid_column_name, ...)	After loading the input batch of DUO logs into a dataframe, this method builds a dataframe for each set of userid rows in accordance with the specified filter condition.
`get_all_input_stages`()	Get all input stages to this stage.
`get_all_inputs`()	Get all input senders to this stage.
`get_all_output_stages`()	Get all output stages from this stage.
`get_all_outputs`()	Get all output receivers from this stage.
`get_match_pattern`(glob_split)	Return a file match pattern
`get_needed_columns`()	Stages which need to have columns inserted into the dataframe, should populate the `self._needed_columns` dictionary with mapping of column names to `morpheus.common.TypeId`.
`join`()	Awaitable method that stages can implement this to perform cleanup steps when pipeline is stopped.
`repeat_df`(df, repeat_count)	This function iterates over the same dataframe to extending small datasets in debugging with incremental updates to the `event_dt` and `eventTime` columns.
`set_needed_columns`(needed_columns)	Sets the columns needed to perform preallocation.
`start_async`()	This function is called along with on_start during stage initialization.
`stop`()	Stages can implement this to perform cleanup steps when pipeline is stopped.
`supports_cpp_node`()	Indicate that this stages does not support a C++ node.

_build(builder, input_nodes)[source]

This function is responsible for constructing this stage’s internal mrc.SegmentObject object. The input of this function contains the returned value from the upstream stage.

The input values are the mrc.Builder for this stage and a list of parent nodes.

Parameters

builder : mrc.Builder
input_nodes : list[mrc.SegmentObject]

Returns

list[mrc.SegmentObject]

_build_source(builder)[source]

Abstract method all derived Source classes should implement. Returns the same value as build.

Returns

mrc.SegmentObject:

_build_sources(builder)[source]

Abstract method all derived Source classes should implement. Returns the same value as build.

Returns

mrc.SegmentObject:

static batch_user_split(x, userid_column_name, userid_filter, datetime_column_name='event_dt')[source]

Creates a dataframe for each userid.

Parameters

x
userid_column_name
userid_filter
datetime_column_name

Returns

user_dfs

build(builder, do_propagate=True)[source]

Build this stage.

Parameters

builder : mrc.Builder
do_propagate

can_build(check_ports=False)[source]

Determines if all inputs have been built allowing this node to be built.

Parameters

check_ports

Returns

bool

can_pre_build(check_ports=False)[source]

Determines if all inputs have been built allowing this node to be built.

Parameters

check_ports

Returns

bool

static change_columns(df)[source]

Removes characters (_,.,{,},:) from the names of the dataframe columns.

Parameters

df : pd.DataFrame

Returns

df : pd.DataFrame

compute_schema(schema)[source]

static derive_features(df, feature_columns)[source]

Derives feature columns from the DUO (logs) source columns.

Parameters

df
feature_columns

Returns

df

static files_to_dfs_per_user(x, userid_column_name, feature_columns, userid_filter=None, repeat_count=1)[source]

After loading the input batch of DUO logs into a dataframe, this method builds a dataframe for each set of userid rows in accordance with the specified filter condition.

Parameters

x
userid_column_name
feature_columns
userid_filter
repeat_count

Returns

df_per_user

get_all_input_stages()[source]

Get all input stages to this stage.

Returns

list[morpheus.pipeline.pipeline.StageBase]

get_all_inputs()[source]

Get all input senders to this stage.

Returns

list[morpheus.pipeline.pipeline.Sender]

get_all_output_stages()[source]

Get all output stages from this stage.

Returns

list[morpheus.pipeline.pipeline.StageBase]

get_all_outputs()[source]

Get all output receivers from this stage.

Returns

list[morpheus.pipeline.pipeline.Receiver]

get_match_pattern(glob_split)[source]

get_needed_columns()[source]

property has_multi_input_ports: bool

Indicates if this stage has multiple input ports.

Returns

bool

property has_multi_output_ports: bool

Indicates if this stage has multiple output ports.

Returns

bool

property input_count: int

property input_ports: list[morpheus.pipeline.receiver.Receiver]

Input ports to this stage.

Returns

list[morpheus.pipeline.pipeline.Receiver]

property is_built: bool

Indicates if this stage has been built.

Returns

bool

property is_pre_built: bool

Indicates if this stage has been built.

Returns

bool

async join()[source]

property name: str

property output_ports: list[morpheus.pipeline.sender.Sender]

Output ports from this stage.

Returns

list[morpheus.pipeline.pipeline.Sender]

static repeat_df(df, repeat_count)[source]

This function iterates over the same dataframe to extending small datasets in debugging with incremental updates to the event_dt and eventTime columns.

Parameters

df
repeat_count

Returns

df_array

set_needed_columns(needed_columns)[source]

async start_async()[source]

stop()[source]

supports_cpp_node()[source]

property unique_name: str

Unique name of stage. Generated by appending stage id to stage name.

Returns

str

Previous morpheus.stages.input.duo_source_stage

Next morpheus.stages.input.file_source_stage