NVIDIA Docs Hub NVIDIA Morpheus morpheus.stages.input.autoencoder_source_stage.AutoencoderSourceStage

morpheus.stages.input.autoencoder_source_stage.AutoencoderSourceStage

class AutoencoderSourceStage(c, input_glob, watch_directory=False, max_files=-1, file_type=<FileTypes.Auto: 0>, repeat=1, sort_glob=False, recursive=True, queue_max_size=128, batch_timeout=5.0)[source]

Bases: morpheus.pipeline.preallocator_mixin.PreallocatorMixin, morpheus.pipeline.single_output_source.SingleOutputSource

All AutoEncoder source stages must extend this class and implement the files_to_dfs_per_user abstract method. Feature columns can be managed by overriding the derive_features method. Otherwise, all columns from input data pass through to next stage.

Extend this class to load messages from a files and dump contents into a DFP pipeline immediately. Useful for testing performance and accuracy of a pipeline.

Parameters

cmorpheus.config.Config
input_globstr
watch_directorybool, default = False
max_files: int, default = -1
file_typemorpheus.common.FileTypes, default = ‘FileTypes.Auto’.
repeat: int, default = 1
sort_globbool, default = False
recursive: bool, default = True
queue_max_size: int, default = 128
batch_timeout: float, default = 5.0

Attributes

has_multi_input_ports
has_multi_output_ports
input_count
input_ports
is_built
name
output_ports
unique_name

Methods

`batch_user_split`(x, userid_column_name, ...)	Creates a dataframe for each userid.
`build`(builder[, do_propagate])	Build this stage.
`can_build`([check_ports])	Determines if all inputs have been built allowing this node to be built.
`derive_features`(df, feature_columns)	If any features are available to be derived, can be implemented by overriding this function.
`files_to_dfs_per_user`(x, userid_column_name, ...)	Stages that extend `AutoencoderSourceStage` must implement this abstract function in order to convert messages in the files to dataframes per userid.
`get_all_input_stages`()	Get all input stages to this stage.
`get_all_inputs`()	Get all input senders to this stage.
`get_all_output_stages`()	Get all output stages from this stage.
`get_all_outputs`()	Get all output receivers from this stage.
`get_match_pattern`(glob_split)	Return a file match pattern
`get_needed_columns`()	Stages which need to have columns inserted into the dataframe, should populate the `self._needed_columns` dictionary with mapping of column names to `morpheus.common.TypeId`.
`join`()	Awaitable method that stages can implement this to perform cleanup steps when pipeline is stopped.
`repeat_df`(df, repeat_count)	This function iterates over the same dataframe to extending small datasets in debugging with incremental updates to the `event_dt` and `eventTime` columns.
`set_needed_columns`(needed_columns)	Sets the columns needed to perform preallocation.
`stop`()	Stages can implement this to perform cleanup steps when pipeline is stopped.
`supports_cpp_node`()	Specifies whether this Stage is capable of creating C++ nodes.

_build(builder, in_ports_streams)[source]

This function is responsible for constructing this stage’s internal mrc.SegmentObject object. The input of this function contains the returned value from the upstream stage.

The input values are the mrc.Builder for this stage and a StreamPair tuple which contain the input mrc.SegmentObject object and the message data type.

Parameters

buildermrc.Builder
in_ports_streamsmorpheus.pipeline.pipeline.StreamPair

Returns

typing.List[morpheus.pipeline.pipeline.StreamPair]

_build_source(seg)[source]

Abstract method all derived Source classes should implement. Returns the same value as build.

Returns

morpheus.pipeline.pipeline.StreamPair:

static batch_user_split(x, userid_column_name, userid_filter, datetime_column_name='event_dt')[source]

Creates a dataframe for each userid.

Parameters

xtyping.List[pd.DataFrame]
userid_column_namestr
userid_filterstr
datetime_column_namestr

Returns

user_dfstyping.Dict[str, pd.DataFrame]

build(builder, do_propagate=True)[source]

Build this stage.

Parameters

buildermrc.Builder
do_propagatebool, optional

can_build(check_ports=False)[source]

Determines if all inputs have been built allowing this node to be built.

Parameters

check_portsbool, optional

Returns

bool

static derive_features(df, feature_columns)[source]

If any features are available to be derived, can be implemented by overriding this function.

Parameters

dfpd.DataFrame
feature_columnstyping.List[str]

Returns

dftyping.List[pd.DataFrame]

abstract static files_to_dfs_per_user(x, userid_column_name, feature_columns, userid_filter=None, repeat_count=1)[source]

Stages that extend AutoencoderSourceStage must implement this abstract function in order to convert messages in the files to dataframes per userid.

Parameters

xtyping.List[str]
userid_column_namestr
feature_columnstyping.List[str]
userid_filterstr
repeat_countstr

Returns

: typing.Dict[str, pd.DataFrame]
Dataframe per userid.

get_all_input_stages()[source]

Get all input stages to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.StreamWrapper]

get_all_inputs()[source]

Get all input senders to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Sender]

get_all_output_stages()[source]

Get all output stages from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.StreamWrapper]

get_all_outputs()[source]

Get all output receivers from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Receiver]

get_match_pattern(glob_split)[source]

get_needed_columns()[source]

property has_multi_input_ports: bool

Indicates if this stage has multiple input ports.

Returns

bool

property has_multi_output_ports: bool

Indicates if this stage has multiple output ports.

Returns

bool

property input_count: int

property input_ports: List[morpheus.pipeline.receiver.Receiver]

Input ports to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Receiver]

property is_built: bool

Indicates if this stage has been built.

Returns

bool

async join()[source]

abstract property name: str

The name of the stage. Used in logging. Each derived class should override this property with a unique name.

Returns

str

property output_ports: List[morpheus.pipeline.sender.Sender]

Output ports from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Sender]

static repeat_df(df, repeat_count)[source]

This function iterates over the same dataframe to extending small datasets in debugging with incremental updates to the event_dt and eventTime columns.

Parameters

dfpd.DataFrame
repeat_countint

Returns

df_arraytyping.List[pd.DataFrame]

set_needed_columns(needed_columns)[source]

stop()[source]

abstract supports_cpp_node()[source]

property unique_name: str

Unique name of stage. Generated by appending stage id to stage name.

Returns

str