NVIDIA Docs Hub NVIDIA Morpheus morpheus.stages.input.cloud_trail_source_stage.CloudTrailSourceStage

morpheus.stages.input.cloud_trail_source_stage.CloudTrailSourceStage

class CloudTrailSourceStage(c, input_glob, watch_directory=False, max_files=-1, file_type=<FileTypes.Auto: 0>, repeat=1, sort_glob=False, recursive=True, queue_max_size=128, batch_timeout=5.0)[source]

Bases: morpheus.stages.input.autoencoder_source_stage.AutoencoderSourceStage

Load messages from a Cloudtrail directory.

Attributes

has_multi_input_ports
has_multi_output_ports
input_count
input_ports
is_built
name
output_ports
unique_name

Methods

`batch_user_split`(x, userid_column_name, ...)	Creates a dataframe for each userid.
`build`(builder[, do_propagate])	Build this stage.
`can_build`([check_ports])	Determines if all inputs have been built allowing this node to be built.
`cleanup_df`(df, feature_columns)	This function does clean up certain columns in the dataframe.
`derive_features`(df, feature_columns)	If any features are available to be derived, can be implemented by overriding this function.
`files_to_dfs_per_user`(x, userid_column_name, ...)	After loading the input batch of CloudTrail logs into a dataframe, this method builds a dataframe for each set of userid rows in accordance with the specified filter condition.
`get_all_input_stages`()	Get all input stages to this stage.
`get_all_inputs`()	Get all input senders to this stage.
`get_all_output_stages`()	Get all output stages from this stage.
`get_all_outputs`()	Get all output receivers from this stage.
`get_match_pattern`(glob_split)	Return a file match pattern
`get_needed_columns`()	Stages which need to have columns inserted into the dataframe, should populate the `self._needed_columns` dictionary with mapping of column names to `morpheus.common.TypeId`.
`join`()	Awaitable method that stages can implement this to perform cleanup steps when pipeline is stopped.
`read_file`(filename, file_type)	Reads a file into a dataframe.
`repeat_df`(df, repeat_count)	This function iterates over the same dataframe to extending small datasets in debugging with incremental updates to the `event_dt` and `eventTime` columns.
`set_needed_columns`(needed_columns)	Sets the columns needed to perform preallocation.
`stop`()	Stages can implement this to perform cleanup steps when pipeline is stopped.
`supports_cpp_node`()	Specifies whether this Stage is capable of creating C++ nodes.

_build(builder, in_ports_streams)[source]

This function is responsible for constructing this stage’s internal mrc.SegmentObject object. The input of this function contains the returned value from the upstream stage.

The input values are the mrc.Builder for this stage and a StreamPair tuple which contain the input mrc.SegmentObject object and the message data type.

Parameters

buildermrc.Builder
in_ports_streamsmorpheus.pipeline.pipeline.StreamPair

Returns

typing.List[morpheus.pipeline.pipeline.StreamPair]

_build_source(seg)[source]

Abstract method all derived Source classes should implement. Returns the same value as build.

Returns

morpheus.pipeline.pipeline.StreamPair:

static batch_user_split(x, userid_column_name, userid_filter, datetime_column_name='event_dt')[source]

Creates a dataframe for each userid.

Parameters

xtyping.List[pd.DataFrame]
userid_column_namestr
userid_filterstr
datetime_column_namestr

Returns

user_dfstyping.Dict[str, pd.DataFrame]

build(builder, do_propagate=True)[source]

Build this stage.

Parameters

buildermrc.Builder
do_propagatebool, optional

can_build(check_ports=False)[source]

Determines if all inputs have been built allowing this node to be built.

Parameters

check_portsbool, optional

Returns

bool

static cleanup_df(df, feature_columns)[source]

This function does clean up certain columns in the dataframe.

Parameters

dfpd.DataFrame
feature_columnstyping.List[str]

Returns

dftyping.List[pd.DataFrame]

static derive_features(df, feature_columns)[source]

If any features are available to be derived, can be implemented by overriding this function.

Parameters

dfpd.DataFrame
feature_columnstyping.List[str]

Returns

dftyping.List[pd.DataFrame]

static files_to_dfs_per_user(x, userid_column_name, feature_columns, userid_filter=None, repeat_count=1)[source]

After loading the input batch of CloudTrail logs into a dataframe, this method builds a dataframe for each set of userid rows in accordance with the specified filter condition.

Parameters

xtyping.List[str]
userid_column_namestr
feature_columnstyping.List[str]
userid_filterstr
repeat_countstr

Returns

df_per_usertyping.Dict[str, pd.DataFrame]

get_all_input_stages()[source]

Get all input stages to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.StreamWrapper]

get_all_inputs()[source]

Get all input senders to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Sender]

get_all_output_stages()[source]

Get all output stages from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.StreamWrapper]

get_all_outputs()[source]

Get all output receivers from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Receiver]

get_match_pattern(glob_split)[source]

get_needed_columns()[source]

property has_multi_input_ports: bool

Indicates if this stage has multiple input ports.

Returns

bool

property has_multi_output_ports: bool

Indicates if this stage has multiple output ports.

Returns

bool

property input_count: int

property input_ports: List[morpheus.pipeline.receiver.Receiver]

Input ports to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Receiver]

property is_built: bool

Indicates if this stage has been built.

Returns

bool

async join()[source]

property name: str

The name of the stage. Used in logging. Each derived class should override this property with a unique name.

Returns

str

property output_ports: List[morpheus.pipeline.sender.Sender]

Output ports from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Sender]

static read_file(filename, file_type)[source]

Reads a file into a dataframe.

Parameters

filenamestr
file_typemorpheus.common.FileTypes

Returns

pandas.DataFrame

Raises

RuntimeError

static repeat_df(df, repeat_count)[source]

This function iterates over the same dataframe to extending small datasets in debugging with incremental updates to the event_dt and eventTime columns.

Parameters

dfpd.DataFrame
repeat_countint

Returns

df_arraytyping.List[pd.DataFrame]

set_needed_columns(needed_columns)[source]

stop()[source]

supports_cpp_node()[source]

property unique_name: str

Unique name of stage. Generated by appending stage id to stage name.

Returns

str