File to DataFrame Module
This module reads data from the batched files into a dataframe after receiving input from the “FileBatcher” module. In addition to loading data from the disk, it has the ability to load the file content from S3 buckets.
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
cache_dir |
string | Directory to cache the rolling window data | “/path/to/cache” | - |
file_type |
string | Type of the input file | “csv” | "JSON" |
filter_null |
boolean | Whether to filter out null values | true | false |
parser_kwargs |
dictionary | Keyword arguments to pass to the parser | {“delimiter”: “,”} | - |
schema |
dictionary | Schema of the input data | See Below | - |
timestamp_column_name |
string | Name of the timestamp column | “timestamp” | - |
{
"cache_dir": "/path/to/cache",
"file_type": "csv",
"filter_null": true,
"parser_kwargs": {
"delimiter": ","
},
"schema": {
"schema_str": "string",
"encoding": "latin1"
},
"timestamp_column_name": "timestamp"
}