File to DataFrame Module#
This module reads data from the batched files into a DataFrame after receiving input from the FileBatcher
module. In
addition to loading data from the disk, it has the ability to load the file content from S3 buckets.
Configurable Parameters#
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
string |
Directory to cache the rolling window data |
|
|
|
string |
Type of the input file |
|
|
|
boolean |
Whether to filter out null values |
|
|
|
dictionary |
Keyword arguments to pass to the parser |
|
|
|
dictionary |
Schema of the input data |
Refer Below |
|
|
string |
Name of the timestamp column |
|
|
Example JSON Configuration#
{
"cache_dir": "/path/to/cache",
"file_type": "csv",
"filter_null": true,
"parser_kwargs": {
"delimiter": ","
},
"schema": {
"schema_str": "string",
"encoding": "latin1"
},
"timestamp_column_name": "timestamp"
}