morpheus.controllers.file_to_df_controller.FileToDFController
- class FileToDFController(schema, filter_null, file_type, parser_kwargs, cache_dir, timestamp_column_name, download_method=DownloadMethods.DASK_THREAD)[source]
Bases:
object
Controller class for converting file objects to Pandas DataFrames with optional preprocessing.
- Parameters
- schemaDataFrameInputSchema
A schema defining how to process the data.
- filter_nullbool
Flag to indicate whether to filter out null values.
- file_typeFileTypes
The type of the file being processed (e.g., CSV, Parquet).
- parser_kwargsdict
Additional keyword arguments to pass to the file parser.
- cache_dirstr
Directory where cache will be stored.
- timestamp_column_namestr
Name of the timestamp column.
- download_methodtyping.Union[DownloadMethods, str], optional, default = DownloadMethods.DASK_THREAD
The download method to use, if the
MORPHEUS_FILE_DOWNLOAD_TYPE
environment variable is set, it takes presedence.
Methods
close
()Close the resources used by the controller. convert_to_dataframe
(file_object_batch)Convert a batch of file objects to a DataFrame. - close()[source]
Close the resources used by the controller.
- convert_to_dataframe(file_object_batch)[source]
Convert a batch of file objects to a DataFrame.
- Parameters
- file_object_batchtyping.Tuple[fsspec.core.OpenFiles, int]
A batch of file objects and batch count.
- Returns
- pd.DataFrame
The resulting DataFrame.