morpheus.io.serializers#

DataFrame serializers.

Functions

df_to_csv(df[, include_header, ...])

Serializes a DataFrame into CSV and returns the serialized output seperated by lines.

df_to_json(df[, strip_newlines, ...])

Serializes a DataFrame into JSON and returns the serialized output seperated by lines.

df_to_parquet(df[, strip_newlines, ...])

Serializes a DataFrame into Parquet and returns the serialized output seperated by lines.

df_to_stream_csv(df, stream[, ...])

Serializes a DataFrame into CSV into the provided stream object.

df_to_stream_json(df, stream[, ...])

Serializes a DataFrame into JSON into the provided stream object.

df_to_stream_parquet(df, stream[, ...])

Serializes a DataFrame into Parquet format into the provided stream object.

write_df_to_file(df, file_name[, file_type])

Writes the provided DataFrame into the file specified using the specified format.

df_to_csv(
df,
include_header=False,
strip_newlines=False,
include_index_col=True,
)[source]#

Serializes a DataFrame into CSV and returns the serialized output seperated by lines.

Parameters:
dfDataFrameType

Input DataFrame to serialize.

include_headerbool, optional

Whether or not to include the header, by default False.

strip_newlinesbool, optional

Whether or not to strip the newline characters from each string, by default False.

include_index_col: bool, optional

Write out the index as a column, by default True.

Returns:
typing.List[str]

List of strings for each line

df_to_json(df, strip_newlines=False, include_index_col=True)[source]#

Serializes a DataFrame into JSON and returns the serialized output seperated by lines.

Parameters:
dfDataFrameType

Input DataFrame to serialize.

strip_newlinesbool, optional

Whether or not to strip the newline characters from each string, by default False.

include_index_col: bool, optional

Write out the index as a column, by default True. Note: This value is currently being ignored due to a known issue in Pandas: pandas-dev/pandas#37600

Returns
——-
typing.List[str]

List of strings for each line.

df_to_parquet(df, strip_newlines=False, include_index_col=True)[source]#

Serializes a DataFrame into Parquet and returns the serialized output seperated by lines.

Parameters:
dfDataFrameType

Input DataFrame to serialize.

strip_newlinesbool, default False

Whether or not to strip the newline characters from each string, by default False.

include_index_col: bool, default True

Write out the index as a column, by default True.

Returns
——-
typing.List[str]

List of strings for each line.

df_to_stream_csv(
df,
stream,
include_header=False,
include_index_col=True,
)[source]#

Serializes a DataFrame into CSV into the provided stream object.

Parameters:
dfDataFrameType

Input DataFrame to serialize.

streamIOBase

The stream where the serialized DataFrame will be written to.

include_headerbool, optional

Whether or not to include the header, by default False.

include_index_col: bool, optional

Write out the index as a column, by default True.

df_to_stream_json(df, stream, include_index_col=True, lines=True)[source]#

Serializes a DataFrame into JSON into the provided stream object.

Parameters:
dfDataFrameType

Input DataFrame to serialize.

streamIOBase

The stream where the serialized DataFrame will be written to.

include_index_col: bool, optional

Write out the index as a column, by default True.

linesbool, optional

Write out the JSON in lines format, by default True.

df_to_stream_parquet(df, stream, include_index_col=True)[source]#

Serializes a DataFrame into Parquet format into the provided stream object.

Parameters:
dfDataFrameType

Input DataFrame to serialize.

streamIOBase

The stream where the serialized DataFrame will be written to.

include_index_col: bool, default True

Write out the index as a column.

write_df_to_file(
df,
file_name,
file_type=<FileTypes.Auto: 0>,
**kwargs,
)[source]#

Writes the provided DataFrame into the file specified using the specified format.

Parameters:
dfDataFrameType

The DataFrame to serialize

file_namestr

The location to store the DataFrame

file_typeFileTypes, optional

The type of serialization to use. By default this is FileTypes.Auto which will determine the type from the filename extension

**kwargsdict

Additional arguments forwarded to the underlying serialization function. Where the underlying serialization function is one of write_df_to_file_cpp, df_to_stream_csv, or df_to_stream_json.