IO utilities.
Functions
cudf_string_cols_exceed_max_bytes (df, ...) |
Checks a cudf DataFrame for string columns that exceed a maximum number of bytes and thus need to be truncated by calling truncate_string_cols_by_bytes . |
filter_null_data (x[, column_name]) |
Filters out null row in a dataframe's 'data' column if it exists. |
truncate_string_cols_by_bytes (df, ...[, ...]) |
Truncates all string columns in a dataframe to a maximum number of bytes. |
- cudf_string_cols_exceed_max_bytes(df, column_max_bytes)[source]
Checks a cudf DataFrame for string columns that exceed a maximum number of bytes and thus need to be truncated by calling
truncate_string_cols_by_bytes
.This method utilizes a cudf method
Series.str.byte_count()
method that pandas lacks, which can avoid a costly call to truncate_string_cols_by_bytes.- Parameters
- df
- column_max_bytes: dict[str, int]
The dataframe to check.
A mapping of string column names to the maximum number of bytes for each column.
- Returns
- bool
True if truncation is needed, False otherwise.
- filter_null_data(x, column_name='data')[source]
Filters out null row in a dataframe’s ‘data’ column if it exists.
- Parameters
- x
- column_name
The dataframe to fix.
The column name to filter on.
- truncate_string_cols_by_bytes(df, column_max_bytes, warn_on_truncate=True)[source]
Truncates all string columns in a dataframe to a maximum number of bytes. This operation is performed in-place on the dataframe.
- Parameters
- df
- column_max_bytes: dict[str, int]
- warn_on_truncate: bool, default True
The dataframe to truncate.
A mapping of string column names to the maximum number of bytes for each column.
Whether to log a warning when truncating a column.
- Returns
- bool
True if truncation was performed, False otherwise.