What can I help you with?
NVIDIA Morpheus (25.02.01)

morpheus.io.utils

IO utilities.

Functions

cudf_string_cols_exceed_max_bytes(df, ...) Checks a cudf DataFrame for string columns that exceed a maximum number of bytes and thus need to be truncated by calling truncate_string_cols_by_bytes.
filter_null_data(x[, column_name]) Filters out null row in a dataframe's 'data' column if it exists.
get_csv_reader() Return the appropriate CSV reader based on the execution mode.
get_json_reader() Return the appropriate JSON reader based on the execution mode.
get_parquet_reader() Return the appropriate Parquet reader based on the execution mode.
truncate_string_cols_by_bytes(df, ...[, ...]) Truncates all string columns in a dataframe to a maximum number of bytes.

cudf_string_cols_exceed_max_bytes(df, column_max_bytes)[source]

Checks a cudf DataFrame for string columns that exceed a maximum number of bytes and thus need to be truncated by calling truncate_string_cols_by_bytes.

This method utilizes a cudf method Series.str.byte_count() method that pandas lacks, which can avoid a costly call to truncate_string_cols_by_bytes.

Parameters
dfDataFrameType

The dataframe to check.

column_max_bytes: dict[str, int]

A mapping of string column names to the maximum number of bytes for each column.

Returns
bool

True if truncation is needed, False otherwise.

filter_null_data(x, column_name='data')[source]

Filters out null row in a dataframe’s ‘data’ column if it exists.

Parameters
xDataFrameType

The dataframe to fix.

column_namestr, default ‘data’

The column name to filter on.

get_csv_reader(selector: Literal['cudf', 'pandas'])Callable[..., DataFrameType][source]
get_csv_reader(selector: morpheus.config.ExecutionMode)Callable[..., DataFrameType]

Return the appropriate CSV reader based on the execution mode.

get_json_reader(selector: Literal['cudf', 'pandas'])Callable[..., DataFrameType][source]
get_json_reader(selector: morpheus.config.ExecutionMode)Callable[..., DataFrameType]

Return the appropriate JSON reader based on the execution mode.

get_parquet_reader(selector: Literal['cudf', 'pandas'])Callable[..., DataFrameType][source]
get_parquet_reader(selector: morpheus.config.ExecutionMode)Callable[..., DataFrameType]

Return the appropriate Parquet reader based on the execution mode.

truncate_string_cols_by_bytes(df, column_max_bytes, warn_on_truncate=True)[source]

Truncates all string columns in a dataframe to a maximum number of bytes. This operation is performed in-place on the dataframe.

Parameters
dfDataFrameType

The dataframe to truncate.

column_max_bytes: dict[str, int]

A mapping of string column names to the maximum number of bytes for each column.

warn_on_truncate: bool, default True

Whether to log a warning when truncating a column.

Returns
bool

True if truncation was performed, False otherwise.

Previous morpheus.io.serializers
Next morpheus.loaders
© Copyright 2024, NVIDIA. Last updated on Mar 3, 2025.