morpheus.io.utils

(Latest Version)

IO utilities.

Functions

cudf_string_cols_exceed_max_bytes(df, ...) Checks a cudf DataFrame for string columns that exceed a maximum number of bytes and thus need to be truncated by calling truncate_string_cols_by_bytes.
filter_null_data(x[, column_name]) Filters out null row in a dataframe's 'data' column if it exists.
truncate_string_cols_by_bytes(df, ...[, ...]) Truncates all string columns in a dataframe to a maximum number of bytes.
cudf_string_cols_exceed_max_bytes(df, column_max_bytes)[source]

Checks a cudf DataFrame for string columns that exceed a maximum number of bytes and thus need to be truncated by calling truncate_string_cols_by_bytes.

This method utilizes a cudf method Series.str.byte_count() method that pandas lacks, which can avoid a costly call to truncate_string_cols_by_bytes.

Parameters
df

The dataframe to check.

column_max_bytes: dict[str, int]

A mapping of string column names to the maximum number of bytes for each column.

Returns
bool

True if truncation is needed, False otherwise.

filter_null_data(x, column_name='data')[source]

Filters out null row in a dataframe’s ‘data’ column if it exists.

Parameters
x

The dataframe to fix.

column_name

The column name to filter on.

truncate_string_cols_by_bytes(df, column_max_bytes, warn_on_truncate=True)[source]

Truncates all string columns in a dataframe to a maximum number of bytes. This operation is performed in-place on the dataframe.

Parameters
df

The dataframe to truncate.

column_max_bytes: dict[str, int]

A mapping of string column names to the maximum number of bytes for each column.

warn_on_truncate: bool, default True

Whether to log a warning when truncating a column.

Returns
bool

True if truncation was performed, False otherwise.

Previous morpheus.io.serializers
Next morpheus.llm
© Copyright 2024, NVIDIA. Last updated on Jul 8, 2024.