morpheus.controllers.file_to_df_controller#

Morpheus pipeline module for fetching files and emitting them as DataFrames.

Functions

single_object_to_dataframe(file_object, ...)

Converts a file object into a Pandas DataFrame with optional preprocessing.

Classes

FileToDFController(schema, filter_null, ...)

Controller class for converting file objects to Pandas DataFrames with optional preprocessing.

single_object_to_dataframe(
file_object,
schema,
file_type,
filter_null,
parser_kwargs,
)[source]#

Converts a file object into a Pandas DataFrame with optional preprocessing.

Parameters:
file_objectfsspec.core.OpenFile

A file object, typically from a remote storage system.

schemamorpheus.utils.column_info.DataFrameInputSchema

A schema defining how to process the data.

file_typemorpheus.common.FileTypes

The type of the file being processed (e.g., CSV, Parquet).

filter_nullbool

Flag to indicate whether to filter out null values.

parser_kwargsdict

Additional keyword arguments to pass to the file parser.

Returns:
pd.DataFrame: The resulting Pandas DataFrame after processing and optional preprocessing.