morpheus.models.dfencoder

class AEModule(*args, **kwargs)[source]

Auto Encoder Pytorch Module.

Methods

`__call__`(args, *kwargs)	Call self as a function.
`build`(numeric_fts, binary_fts, categorical_fts)	Constructs the autoencoder model.
`decode`(x[, layers])	Decodes the input using the decoder layers and computes the outputs.
`encode`(x[, layers])	Encodes the input using the encoder layers.
`forward`(input)	Passes the input through the model and returns the outputs.

build(numeric_fts, binary_fts, categorical_fts)[source]

Constructs the autoencoder model.

Parameters

numeric_fts
binary_fts
categorical_fts

decode(x, layers=None)[source]

Decodes the input using the decoder layers and computes the outputs.

Parameters

x
layers

Returns

tuple of Union[torch.Tensor, List[torch.Tensor]]

encode(x, layers=None)[source]

Encodes the input using the encoder layers.

Parameters

x
layers

Returns

torch.Tensor

forward(input)[source]

Passes the input through the model and returns the outputs.

Parameters

input

Returns

tuple of Union[torch.Tensor, List[torch.Tensor]]

class AutoEncoder(*args, **kwargs)[source]

Methods

`__call__`(args, *kwargs)	Call self as a function.
`compute_baseline_performance`(in_, out_)	Baseline performance is computed by generating a strong
`compute_loss_from_targets`(num, bin, cat, ...)	Computes the loss from targets.
`decode_outputs_to_df`(num, bin, cat)	Converts the model outputs of the numerical, binary, and categorical features back into a pandas dataframe.
`df_predict`(df)	Runs end-to-end model.
`encode_input`(df)	Handles raw df inputs.
`fit`(training_data[, rank, world_size, ...])	Fit the model in a distributed or centralized fashion, depending on self.distributed_training with early stopping based on validation loss.
`get_anomaly_score`(df)	Returns a per-row loss of the input dataframe.
`get_anomaly_score_losses`(df)	Run the input dataframe `df` through the autoencoder to get the recovery losses by feature type (numerical/boolean/categorical).
`get_deep_stack_features`(df)	records and outputs all internal representations of input df as row-wise vectors.
`get_representation`(df[, layer])	Computes latent feature vector from hidden layer given input dataframe.
`get_results_from_dataset`(dataset, preloaded_df)	Returns a pandas dataframe of inference results and losses for a given dataset.
`prepare_df`(df)	Does data preparation on copy of input dataframe.
`preprocess_data`(df, shuffle_rows_in_batch, ...)	Preprocesses a pandas dataframe `df` for input into the autoencoder model.
`preprocess_training_data`(df[, ...])	Wrapper function round `self.preprocess_data` feeding in the args suitable for a training set.
`preprocess_validation_data`(df[, ...])	Wrapper function round `self.preprocess_data` feeding in the args suitable for a validation set.

build_input_tensor
compute_loss
compute_targets
create_binary_col_max
create_categorical_col_max
create_numerical_col_max
get_anomaly_score_with_losses
get_feature_count
get_results
get_scaler
get_variable_importance
return_feature_names
scale_losses

build_input_tensor(df)[source]

compute_baseline_performance(in_, out_)[source]

Baseline performance is computed by generating a strong
This should be roughly the loss we expect when the encoder degenerates
Returns net loss on baseline performance computation

compute_loss(num, bin, cat, target_df, should_log=True, _id=False)[source]

compute_loss_from_targets(num, bin, cat, num_target, bin_target, cat_target, should_log=True, _id=False)[source]

Computes the loss from targets.

Parameters

num
bin
cat
num_target
bin_target
cat_target
should_log
_id

Returns

Tuple[Union[float, List[float]]]

compute_targets(df)[source]

create_binary_col_max(bin_names, bce_loss)[source]

create_categorical_col_max(cat_names, cce_loss)[source]

create_numerical_col_max(num_names, mse_loss)[source]

decode_outputs_to_df(num, bin, cat)[source]

df_predict(df)[source]

encode_input(df)[source]

fit(training_data, rank=0, world_size=1, epochs=1, validation_data=None, run_validation=False, use_val_for_loss_stats=False)[source]

Fit the model in a distributed or centralized fashion, depending on self.distributed_training with early stopping based on validation loss. If run_validation is True, the validation_dataset will be used for validation during training and early stopping will be applied based on patience argument.

Parameters

training_data
rank
world_size
epochs
validation_data
run_validation
use_val_for_loss_stats

Raises

ValueError

get_anomaly_score(df)[source]

get_anomaly_score_losses(df)[source]

get_anomaly_score_with_losses(df)[source]

get_deep_stack_features(df)[source]

get_feature_count()[source]

get_representation(df, layer=0)[source]

get_results(df, return_abs=False)[source]

get_results_from_dataset(dataset, preloaded_df, return_abs=False)[source]

Returns a pandas dataframe of inference results and losses for a given dataset. Note. this function requires the whole inference set to be in loaded into memory as a pandas df

Parameters

dataset
preloaded_df
return_abs

Returns

pd.DataFrame

get_scaler(name)[source]

get_variable_importance(num_names, cat_names, bin_names, mse_loss, bce_loss, cce_loss, cloudtrail_df)[source]

prepare_df(df)[source]

Does data preparation on copy of input dataframe.

Parameters

df

Returns

pandas.DataFrame

preprocess_data(df, shuffle_rows_in_batch, include_original_input_tensor, include_swapped_input_by_feature_type)[source]

Preprocesses a pandas dataframe df for input into the autoencoder model.

Parameters

df
shuffle_rows_in_batch
include_original_input_tensor
include_swapped_input_by_feature_type

Returns

Dict[str, Union[int, torch.Tensor]]

preprocess_training_data(df, shuffle_rows_in_batch=True)[source]

preprocess_validation_data(df, shuffle_rows_in_batch=False)[source]

return_feature_names()[source]

scale_losses(mse, bce, cce)[source]

class BasicLogger(fts, baseline_loss=0.0)[source]

A minimal class for logging training progress.

Methods

end_epoch
id_val_step
training_step
val_step

end_epoch()[source]

id_val_step(losses)[source]

training_step(losses)[source]

val_step(losses)[source]

class CompleteLayer(*args, **kwargs)[source]

Impliments a layer with linear transformation and optional activation and dropout.

Methods

`__call__`(args, *kwargs)	Call self as a function.
`forward`(x)	Performs a forward pass through the CompleteLayer object.
`interpret_activation`([act])	Interprets the name of the activation function and returns the appropriate PyTorch function.

forward(x)[source]

Performs a forward pass through the CompleteLayer object.

Parameters

x

Returns

torch.Tensor

interpret_activation(act=None)[source]

Interprets the name of the activation function and returns the appropriate PyTorch function.

Parameters

act

Returns

PyTorch function

Raises

Exception

class DFEncoderDataLoader(*args, **kwargs)[source]

Methods

`__call__`(args, *kwargs)	Call self as a function.
`get_distributed_training_dataloader_from_dataset`(...)	Returns a distributed training DataLoader given a dataset and other arguments.
`get_distributed_training_dataloader_from_df`(...)	A helper funtion to get a distributed training DataLoader given a pandas dataframe.
`get_distributed_training_dataloader_from_path`(...)	A helper funtion to get a distributed training DataLoader given a path to a folder containing data.

static get_distributed_training_dataloader_from_dataset(dataset, rank, world_size, pin_memory=False, num_workers=0)[source]

Returns a distributed training DataLoader given a dataset and other arguments.

Parameters

dataset
rank
world_size
pin_memory
num_workers

Returns

DataLoader

static get_distributed_training_dataloader_from_df(model, df, rank, world_size, pin_memory=False, num_workers=0)[source]

A helper funtion to get a distributed training DataLoader given a pandas dataframe.

Parameters

model
df
rank
world_size
pin_memory
num_workers

Returns

DFEncoderDataLoader

static get_distributed_training_dataloader_from_path(model, data_folder, rank, world_size, load_data_fn=pandas.read_csv, pin_memory=False, num_workers=0)[source]

A helper funtion to get a distributed training DataLoader given a path to a folder containing data.

Parameters

model
data_folder
rank
world_size
load_data_fn
pin_memory
num_workers

Returns

DFEncoderDataLoader

class DataframeDataset(*args, **kwargs)[source]

Attributes

batch_size
num_samples
preprocess_fn
shuffle_batch_indices
shuffle_rows_in_batch

Methods

__call__(*args, **kwargs) Call self as a function.

property batch_size

property num_samples

property preprocess_fn

property shuffle_batch_indices

property shuffle_rows_in_batch

class DistributedAutoEncoder(*args, **kwargs)[source]

Methods

__call__(*args, **kwargs) Call self as a function.

class EncoderDataFrame(*args, **kwargs)[source]

Methods

`__call__`(args, *kwargs)	Call self as a function.
`swap`([likelihood])	Performs random swapping of data.

swap(likelihood=0.15)[source]

Performs random swapping of data.

Parameters

likelihood

Returns

pandas.DataFrame

class FileSystemDataset(*args, **kwargs)[source]

A dataset class that reads data in batches from a folder and applies preprocessing to each batch. * This class assumes that the data is saved in small csv files in one folder.

Attributes

batch_size
num_samples
preprocess_fn
shuffle_batch_indices
shuffle_rows_in_batch

Methods

`__call__`(args, *kwargs)	Call self as a function.
`get_preloaded_data`()	Loads all data from the files into memory and returns it as a pandas.DataFrame.

property batch_size

get_preloaded_data()[source]

property num_samples

property preprocess_fn

property shuffle_batch_indices

property shuffle_rows_in_batch

class GaussRankScaler[source]

So-called “Gauss Rank” scaling. Forces a transformation, uses bins to perform inverse mapping.

Uses sklearn QuantileTransformer to work.

Methods

fit
fit_transform
inverse_transform
transform

fit(x)[source]

fit_transform(x)[source]

inverse_transform(x)[source]

transform(x)[source]

class IpynbLogger(*args, **kwargs)[source]

Plots Logging Data in jupyter notebook

Methods

end_epoch
id_val_step
plot_progress
training_step
val_step

end_epoch(val_losses=None)[source]

plot_progress()[source]

class ModifiedScaler[source]

Implements scaling using modified z score. Reference: https://www.ibm.com/docs/el/cognos-analytics/11.1.0?topic=terms-modified-z-score

Methods

fit
fit_transform
inverse_transform
transform

MAD_SCALING_FACTOR = 1.486

MEANAD_SCALING_FACTOR = 1.253314

fit(x)[source]

fit_transform(x)[source]

inverse_transform(x)[source]

transform(x)[source]

class NullScaler[source]

Methods

fit
fit_transform
inverse_transform
transform

fit(x)[source]

fit_transform(x)[source]

inverse_transform(x)[source]

transform(x)[source]

class StandardScaler[source]

Impliments standard (mean/std) scaling.

Methods

fit
fit_transform
inverse_transform
transform

fit(x)[source]

fit_transform(x)[source]

inverse_transform(x)[source]

transform(x)[source]

class TensorboardXLogger(logdir='logdir/', run=None, *args, **kwargs)[source]

Methods

end_epoch
id_val_step
show_embeddings
training_step
val_step

end_epoch(val_losses=None)[source]

id_val_step(losses)[source]

show_embeddings(categories)[source]

training_step(losses)[source]

val_step(losses)[source]

`morpheus.models.dfencoder.ae_module`
`morpheus.models.dfencoder.autoencoder`
`morpheus.models.dfencoder.dataframe`
`morpheus.models.dfencoder.dataloader`
`morpheus.models.dfencoder.distributed_ae`
`morpheus.models.dfencoder.logging`
`morpheus.models.dfencoder.multiprocessing`
`morpheus.models.dfencoder.scalers`