File to DataFrame Module#

This module reads data from the batched files into a DataFrame after receiving input from the FileBatcher module. In addition to loading data from the disk, it has the ability to load the file content from S3 buckets.

Configurable Parameters#

Parameter

Type

Description

Example Value

Default Value

cache_dir

string

Directory to cache the rolling window data

"/path/to/cache"

-

file_type

string

Type of the input file

"csv"

"JSON"

filter_null

boolean

Whether to filter out null values

True

False

parser_kwargs

dictionary

Keyword arguments to pass to the parser

{"delimiter": ","}

-

schema

dictionary

Schema of the input data

Refer Below

-

timestamp_column_name

string

Name of the timestamp column

"timestamp"

-

Example JSON Configuration#

{
  "cache_dir": "/path/to/cache",
  "file_type": "csv",
  "filter_null": true,
  "parser_kwargs": {
    "delimiter": ","
  },
  "schema": {
    "schema_str": "string",
    "encoding": "latin1"
  },
  "timestamp_column_name": "timestamp"
}