utils.fuzzy_dedup_utils.output_map_utils#

Module Contents#

Functions#

build_partition

Given an array of items and a max bin size this method attempts to return a grouping of items such that no group exceeds the max bin size using the Next-fit-decreasing bin packing approach.

get_agg_text_bytes_df

Groupby bucket and calculate total bytes for a bucket.

Data#

API#

utils.fuzzy_dedup_utils.output_map_utils.build_partition(sizes: numpy.ndarray, max_size: int) numpy.ndarray#

Given an array of items and a max bin size this method attempts to return a grouping of items such that no group exceeds the max bin size using the Next-fit-decreasing bin packing approach.

utils.fuzzy_dedup_utils.output_map_utils.dask_cudf#

‘gpu_only_import(…)’

utils.fuzzy_dedup_utils.output_map_utils.get_agg_text_bytes_df(
df: utils.fuzzy_dedup_utils.output_map_utils.dask_cudf,
agg_column: str,
bytes_column: str,
n_partitions: int,
shuffle: bool = False,
) tuple[utils.fuzzy_dedup_utils.output_map_utils.dask_cudf, int]#

Groupby bucket and calculate total bytes for a bucket.