morpheus.controllers.file_to_df_controller.FileToDFController#

class FileToDFController(
schema,
filter_null,
file_type,
parser_kwargs,
cache_dir,
timestamp_column_name,
download_method=DownloadMethods.DASK_THREAD,
)[source]#

Bases: object

Controller class for converting file objects to Pandas DataFrames with optional preprocessing.

Parameters:
schemaDataFrameInputSchema

A schema defining how to process the data.

filter_nullbool

Flag to indicate whether to filter out null values.

file_typeFileTypes

The type of the file being processed (e.g., CSV, Parquet).

parser_kwargsdict

Additional keyword arguments to pass to the file parser.

cache_dirstr

Directory where cache will be stored.

timestamp_column_namestr

Name of the timestamp column.

download_methodtyping.Union[DownloadMethods, str], optional, default = DownloadMethods.DASK_THREAD

The download method to use, if the MORPHEUS_FILE_DOWNLOAD_TYPE environment variable is set, it takes presedence.

Methods

close()

Close the resources used by the controller.

convert_to_dataframe(file_object_batch)

Convert a batch of file objects to a DataFrame.

close()[source]#

Close the resources used by the controller.

convert_to_dataframe(file_object_batch)[source]#

Convert a batch of file objects to a DataFrame.

Parameters:
file_object_batchtyping.Tuple[fsspec.core.OpenFiles, int]

A batch of file objects and batch count.

Returns:
pd.DataFrame

The resulting DataFrame.