morpheus.io.serializers

DataFrame serializers.

Functions

`df_to_csv`(df[, include_header, ...])	Serializes a DataFrame into CSV and returns the serialized output seperated by lines.
`df_to_json`(df[, strip_newlines, ...])	Serializes a DataFrame into JSON and returns the serialized output seperated by lines.
`df_to_parquet`(df[, strip_newlines])	Serializes a DataFrame into Parquet and returns the serialized output seperated by lines.
`df_to_stream_csv`(df, stream[, ...])	Serializes a DataFrame into CSV into the provided stream object.
`df_to_stream_json`(df, stream[, ...])	Serializes a DataFrame into JSON into the provided stream object.
`df_to_stream_parquet`(df, stream)	Serializes a DataFrame into Parquet format into the provided stream object.
`write_df_to_file`(df, file_name[, file_type])	Writes the provided DataFrame into the file specified using the specified format.

df_to_csv(df, include_header=False, strip_newlines=False, include_index_col=True)[source]

Serializes a DataFrame into CSV and returns the serialized output seperated by lines.

Parameters

dfDataFrameType: Input DataFrame to serialize.
include_headerbool, optional: Whether or not to include the header, by default False.
strip_newlinesbool, optional: Whether or not to strip the newline characters from each string, by default False.
include_index_col: bool, optional: Write out the index as a column, by default True.

Returns

df_to_json(df, strip_newlines=False, include_index_col=True)[source]

Serializes a DataFrame into JSON and returns the serialized output seperated by lines.

Parameters

dfDataFrameType: Input DataFrame to serialize.
strip_newlinesbool, optional: Whether or not to strip the newline characters from each string, by default False.
include_index_col: bool, optional: Write out the index as a column, by default True. Note: This value is currently being ignored due to a known issue in Pandas: https://github.com/pandas-dev/pandas/issues/37600
Returns
——-
typing.List[str]: List of strings for each line.

df_to_parquet(df, strip_newlines=False)[source]

Serializes a DataFrame into Parquet and returns the serialized output seperated by lines.

Parameters

dfDataFrameType: Input DataFrame to serialize.
strip_newlinesbool, optional: Whether or not to strip the newline characters from each string, by default False.
Returns
——-
typing.List[str]: List of strings for each line.

df_to_stream_csv(df, stream, include_header=False, include_index_col=True)[source]

Serializes a DataFrame into CSV into the provided stream object.

Parameters

dfDataFrameType: Input DataFrame to serialize.
streamIOBase: The stream where the serialized DataFrame will be written to.
include_headerbool, optional: Whether or not to include the header, by default False.
include_index_col: bool, optional: Write out the index as a column, by default True.

df_to_stream_json(df, stream, include_index_col=True, lines=True)[source]

Serializes a DataFrame into JSON into the provided stream object.

Parameters

dfDataFrameType: Input DataFrame to serialize.
streamIOBase: The stream where the serialized DataFrame will be written to.
include_index_col: bool, optional: Write out the index as a column, by default True.
linesbool, optional: Write out the JSON in lines format, by default True.

df_to_stream_parquet(df, stream)[source]

Serializes a DataFrame into Parquet format into the provided stream object.

Parameters

write_df_to_file(df, file_name, file_type=<FileTypes.Auto: 0>, **kwargs)[source]

Writes the provided DataFrame into the file specified using the specified format.

Parameters

dfDataFrameType: The DataFrame to serialize
file_namestr: The location to store the DataFrame
file_typeFileTypes, optional: The type of serialization to use. By default this is FileTypes.Auto which will determine the type from the filename extension
**kwargsdict: Additional arguments forwarded to the underlying serialization function. Where the underlying serialization function is one of write_df_to_file_cpp, df_to_stream_csv, or df_to_stream_json.