(Latest Version)

File to DataFrame Module

This module reads data from the batched files into a dataframe after receiving input from the “FileBatcher” module. In addition to loading data from the disk, it has the ability to load the file content from S3 buckets.

Parameter

Type

Description

Example Value

Default Value

cache_dir

string

Directory to cache the rolling window data

“/path/to/cache”

-

file_type

string

Type of the input file

“csv”

"JSON"

filter_null

boolean

Whether to filter out null values

true

false

parser_kwargs

dictionary

Keyword arguments to pass to the parser

{“delimiter”: “,”}

-

schema

dictionary

Schema of the input data

See Below

-

timestamp_column_name

string

Name of the timestamp column

“timestamp”

-

Copy
Copied!
            

{ "cache_dir": "/path/to/cache", "file_type": "csv", "filter_null": true, "parser_kwargs": { "delimiter": "," }, "schema": { "schema_str": "string", "encoding": "latin1" }, "timestamp_column_name": "timestamp" }

© Copyright 2023, NVIDIA. Last updated on Apr 11, 2023.