utils.fuzzy_dedup_utils.output_map_utils
#
Module Contents#
Functions#
Given an array of items and a max bin size this method attempts to return a grouping of items such that no group exceeds the max bin size using the Next-fit-decreasing bin packing approach. |
|
Groupby bucket and calculate total bytes for a bucket. |
Data#
API#
- utils.fuzzy_dedup_utils.output_map_utils.build_partition(sizes: numpy.ndarray, max_size: int) numpy.ndarray #
Given an array of items and a max bin size this method attempts to return a grouping of items such that no group exceeds the max bin size using the Next-fit-decreasing bin packing approach.
- utils.fuzzy_dedup_utils.output_map_utils.dask_cudf#
‘gpu_only_import(…)’
- utils.fuzzy_dedup_utils.output_map_utils.get_agg_text_bytes_df(
- df: utils.fuzzy_dedup_utils.output_map_utils.dask_cudf,
- agg_column: str,
- bytes_column: str,
- n_partitions: int,
- shuffle: bool = False,
Groupby bucket and calculate total bytes for a bucket.