This module reads data from the batched files into a dataframe after receiving input from the “FileBatcher” module. In addition to loading data from the disk, it has the ability to load the file content from S3 buckets.
Parameter |
Type |
Description |
Example Value |
Default Value |
---|---|---|---|---|
|
string |
Directory to cache the rolling window data |
“/path/to/cache” |
|
|
string |
Type of the input file |
“csv” |
|
|
boolean |
Whether to filter out null values |
true |
|
|
dictionary |
Keyword arguments to pass to the parser |
{“delimiter”: “,”} |
|
|
dictionary |
Schema of the input data |
See Below |
|
|
string |
Name of the timestamp column |
“timestamp” |
|
{
"cache_dir": "/path/to/cache",
"file_type": "csv",
"filter_null": true,
"parser_kwargs": {
"delimiter": ","
},
"schema": {
"schema_str": "string",
"encoding": "latin1"
},
"timestamp_column_name": "timestamp"
}