NVIDIA Docs Hub NVIDIA Morpheus morpheus.stages.input.azure_source_stage.AzureSourceStage

morpheus.stages.input.azure_source_stage.AzureSourceStage

class AzureSourceStage(c, input_glob, watch_directory=False, max_files=-1, file_type=<FileTypes.Auto: 0>, repeat=1, sort_glob=False, recursive=True, queue_max_size=128, batch_timeout=5.0)[source]

Bases: morpheus.stages.input.autoencoder_source_stage.AutoencoderSourceStage

Source stage is used to load Azure Active Directory messages.

Adds the following derived features:

Parameters

cmorpheus.config.Config
input_globstr
watch_directorybool, default = False
max_files: int, default = -1
file_typemorpheus.common.FileTypes, default = ‘FileTypes.Auto’.
repeat: int, default = 1
sort_globbool, default = False
recursive: bool, default = True
queue_max_size: int, default = 128
batch_timeout: float, default = 5.0

Attributes

has_multi_input_ports
has_multi_output_ports
input_count
input_ports
is_built
name
output_ports
unique_name

Methods

`batch_user_split`(x, userid_column_name, ...)	Creates a dataframe for each userid.
`build`(builder[, do_propagate])	Build this stage.
`can_build`([check_ports])	Determines if all inputs have been built allowing this node to be built.
`change_columns`(df)	Removes characters (_,.,{,},:) from the names of the dataframe columns.
`derive_features`(df, feature_columns)	Derives feature columns from the AzureAD (logs) source columns.
`files_to_dfs_per_user`(x, userid_column_name, ...)	After loading the input batch of AzureAD logs into a dataframe, this method builds a dataframe for each set of userid rows in accordance with the specified filter condition.
`get_all_input_stages`()	Get all input stages to this stage.
`get_all_inputs`()	Get all input senders to this stage.
`get_all_output_stages`()	Get all output stages from this stage.
`get_all_outputs`()	Get all output receivers from this stage.
`get_match_pattern`(glob_split)	Return a file match pattern
`get_needed_columns`()	Stages which need to have columns inserted into the dataframe, should populate the `self._needed_columns` dictionary with mapping of column names to `morpheus.common.TypeId`.
`join`()	Awaitable method that stages can implement this to perform cleanup steps when pipeline is stopped.
`repeat_df`(df, repeat_count)	This function iterates over the same dataframe to extending small datasets in debugging with incremental updates to the `event_dt` and `eventTime` columns.
`set_needed_columns`(needed_columns)	Sets the columns needed to perform preallocation.
`stop`()	Stages can implement this to perform cleanup steps when pipeline is stopped.
`supports_cpp_node`()	Specifies whether this Stage is capable of creating C++ nodes.

_build(builder, in_ports_streams)[source]

This function is responsible for constructing this stage’s internal mrc.SegmentObject object. The input of this function contains the returned value from the upstream stage.

The input values are the mrc.Builder for this stage and a StreamPair tuple which contain the input mrc.SegmentObject object and the message data type.

Parameters

buildermrc.Builder
in_ports_streamsmorpheus.pipeline.pipeline.StreamPair

Returns

typing.List[morpheus.pipeline.pipeline.StreamPair]

_build_source(seg)[source]

Abstract method all derived Source classes should implement. Returns the same value as build.

Returns

morpheus.pipeline.pipeline.StreamPair:

static batch_user_split(x, userid_column_name, userid_filter, datetime_column_name='event_dt')[source]

Creates a dataframe for each userid.

Parameters

xtyping.List[pd.DataFrame]
userid_column_namestr
userid_filterstr
datetime_column_namestr

Returns

user_dfstyping.Dict[str, pd.DataFrame]

build(builder, do_propagate=True)[source]

Build this stage.

Parameters

buildermrc.Builder
do_propagatebool, optional

can_build(check_ports=False)[source]

Determines if all inputs have been built allowing this node to be built.

Parameters

check_portsbool, optional

Returns

bool

static change_columns(df)[source]

Removes characters (_,.,{,},:) from the names of the dataframe columns.

Parameters

dfpd.DataFrame

Returns

dfpd.DataFrame

static derive_features(df, feature_columns)[source]

Derives feature columns from the AzureAD (logs) source columns.

Parameters

dfpd.DataFrame
feature_columnstyping.List[str]

Returns

dftyping.List[pd.DataFrame]

static files_to_dfs_per_user(x, userid_column_name, feature_columns, userid_filter=None, repeat_count=1)[source]

After loading the input batch of AzureAD logs into a dataframe, this method builds a dataframe for each set of userid rows in accordance with the specified filter condition.

Parameters

xtyping.List[str]
userid_column_namestr
feature_columnstyping.List[str]
userid_filterstr
repeat_countstr

Returns

df_per_usertyping.Dict[str, pd.DataFrame]

get_all_input_stages()[source]

Get all input stages to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.StreamWrapper]

get_all_inputs()[source]

Get all input senders to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Sender]

get_all_output_stages()[source]

Get all output stages from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.StreamWrapper]

get_all_outputs()[source]

Get all output receivers from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Receiver]

get_match_pattern(glob_split)[source]

get_needed_columns()[source]

property has_multi_input_ports: bool

Indicates if this stage has multiple input ports.

Returns

bool

property has_multi_output_ports: bool

Indicates if this stage has multiple output ports.

Returns

bool

property input_count: int

property input_ports: List[morpheus.pipeline.receiver.Receiver]

Input ports to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Receiver]

property is_built: bool

Indicates if this stage has been built.

Returns

bool

async join()[source]

property name: str

The name of the stage. Used in logging. Each derived class should override this property with a unique name.

Returns

str

property output_ports: List[morpheus.pipeline.sender.Sender]

Output ports from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Sender]

static repeat_df(df, repeat_count)[source]

This function iterates over the same dataframe to extending small datasets in debugging with incremental updates to the event_dt and eventTime columns.

Parameters

dfpd.DataFrame
repeat_countint

Returns

df_arraytyping.List[pd.DataFrame]

set_needed_columns(needed_columns)[source]

stop()[source]

supports_cpp_node()[source]

property unique_name: str

Unique name of stage. Generated by appending stage id to stage name.

Returns

str