nat.eval.dataset_handler.dataset_filter#

Classes#

DatasetFilter

Apply allowlist and denylist filters to the DataFrame based on specified column filters.

Module Contents#

class DatasetFilter(
filter_config: nat.data_models.dataset_handler.EvalFilterConfig,
)#
Apply allowlist and denylist filters to the DataFrame based on specified column filters.
  • If a allowlist is provided, only keep rows matching the filter values.

  • If a denylist is provided, remove rows matching the filter values.

  • If the filter column does not exist in the DataFrame, the filtering is skipped for that column.

  • Supports Unix shell-style wildcards (*, ?, [seq], [!seq]) for string matching.

This is a utility class that is dataset agnostic and can be used to filter any DataFrame based on the provided filter configuration.

filter_config#
static _match_wildcard_patterns(
series: pandas.Series,
patterns: list[str | int | float],
) pandas.Series#

Match series values against wildcard patterns and exact values.

Args:

series (pd.Series): pandas Series to match against patterns (list[str | int | float]): List of patterns/values

Returns:

pd.Series: Boolean Series indicating matches

apply_filters(df) pandas.DataFrame#