morpheus.utils.compare_df#

Functions

compare_df(df_a, df_b[, include_columns, ...])

Compares two pandas Dataframe, returning a comparison summary as a dict in the form of.

filter_df(df, include_columns, exclude_columns)

Filters the dataframe df including and excluding the columns specified by include_columns and exclude_columns respectively.

compare_df(
df_a,
df_b,
include_columns=None,
exclude_columns=None,
replace_idx=None,
abs_tol=0.001,
rel_tol=0.005,
dfa_name='val',
dfb_name='res',
show_report=False,
)[source]#

Compares two pandas Dataframe, returning a comparison summary as a dict in the form of:

{
    "total_rows": <int>,
    "matching_rows": <int>,
    "diff_rows": <int>,
    "matching_cols": <[str]>,
    "extra_cols": extra_cols: <[str]>,
    "missing_cols": missing_cols: <[str]>,
}
filter_df(df, include_columns, exclude_columns, replace_idx=None)[source]#

Filters the dataframe df including and excluding the columns specified by include_columns and exclude_columns respectively. If a column is matched by both include_columns and exclude_columns, it will be excluded.

Parameters:
dfpd.DataFrame

Dataframe to filter.

include_columnstyping.List[str]

List of regular expression strings of columns to be included.

exclude_columnstyping.List[str]

List of regular expression strings of columns to be excluded.

replace_idx: str, optional

When replace_idx is not None and existsa in the dataframe it will be set as the index.

Returns:
pd.DataFrame

Filtered slice of df.