File to DataFrame Module

This module reads data from the batched files into a dataframe after receiving input from the “FileBatcher” module. In addition to loading data from the disk, it has the ability to load the file content from S3 buckets.

Parameter

Type

Description

Example Value

Default Value

cache_dir string Directory to cache the rolling window data “/path/to/cache” -
file_type string Type of the input file “csv” "JSON"
filter_null boolean Whether to filter out null values true false
parser_kwargs dictionary Keyword arguments to pass to the parser {“delimiter”: “,”} -
schema dictionary Schema of the input data See Below -
timestamp_column_name string Name of the timestamp column “timestamp” -
Copy
Copied!
            

{ "cache_dir": "/path/to/cache", "file_type": "csv", "filter_null": true, "parser_kwargs": { "delimiter": "," }, "schema": { "schema_str": "string", "encoding": "latin1" }, "timestamp_column_name": "timestamp" }

Previous File Batcher Module
Next Filter Control Message Module
© Copyright 2023, NVIDIA. Last updated on Feb 2, 2024.