NVIDIA Docs Hub NVIDIA Morpheus NVIDIA Morpheus (25.02.01) morpheus.io.utils

morpheus.io.utils

IO utilities.

Functions

`cudf_string_cols_exceed_max_bytes`(df, ...)	Checks a cudf DataFrame for string columns that exceed a maximum number of bytes and thus need to be truncated by calling `truncate_string_cols_by_bytes`.
`filter_null_data`(x[, column_name])	Filters out null row in a dataframe's 'data' column if it exists.
`get_csv_reader`()	Return the appropriate CSV reader based on the execution mode.
`get_json_reader`()	Return the appropriate JSON reader based on the execution mode.
`get_parquet_reader`()	Return the appropriate Parquet reader based on the execution mode.
`truncate_string_cols_by_bytes`(df, ...[, ...])	Truncates all string columns in a dataframe to a maximum number of bytes.

cudf_string_cols_exceed_max_bytes(df, column_max_bytes)[source]

Checks a cudf DataFrame for string columns that exceed a maximum number of bytes and thus need to be truncated by calling truncate_string_cols_by_bytes.

This method utilizes a cudf method Series.str.byte_count() method that pandas lacks, which can avoid a costly call to truncate_string_cols_by_bytes.

Parameters

dfDataFrameType: The dataframe to check.
column_max_bytes: dict[str, int]: A mapping of string column names to the maximum number of bytes for each column.

Returns

bool: True if truncation is needed, False otherwise.

filter_null_data(x, column_name='data')[source]

Filters out null row in a dataframe’s ‘data’ column if it exists.

Parameters

xDataFrameType: The dataframe to fix.
column_namestr, default ‘data’: The column name to filter on.

get_csv_reader(selector: Literal['cudf', 'pandas']) → Callable[..., DataFrameType][source]
get_csv_reader(selector: morpheus.config.ExecutionMode) → Callable[..., DataFrameType]: Return the appropriate CSV reader based on the execution mode.

get_json_reader(selector: Literal['cudf', 'pandas']) → Callable[..., DataFrameType][source]
get_json_reader(selector: morpheus.config.ExecutionMode) → Callable[..., DataFrameType]: Return the appropriate JSON reader based on the execution mode.

get_parquet_reader(selector: Literal['cudf', 'pandas']) → Callable[..., DataFrameType][source]
get_parquet_reader(selector: morpheus.config.ExecutionMode) → Callable[..., DataFrameType]: Return the appropriate Parquet reader based on the execution mode.

truncate_string_cols_by_bytes(df, column_max_bytes, warn_on_truncate=True)[source]

Truncates all string columns in a dataframe to a maximum number of bytes. This operation is performed in-place on the dataframe.

Parameters

dfDataFrameType: The dataframe to truncate.
column_max_bytes: dict[str, int]: A mapping of string column names to the maximum number of bytes for each column.
warn_on_truncate: bool, default True: Whether to log a warning when truncating a column.

Returns

bool: True if truncation was performed, False otherwise.

Previous morpheus.io.serializers

Next morpheus.loaders