stages.text.io.writer.parquet#

Module Contents#

Classes#

ParquetWriter

Writer that writes a DocumentBatch to a Parquet file using pandas.

API#

class stages.text.io.writer.parquet.ParquetWriter#

Bases: stages.text.io.writer.base.BaseWriter

Writer that writes a DocumentBatch to a Parquet file using pandas.

file_extension: str#

‘parquet’

write_data(
task: nemo_curator.tasks.DocumentBatch,
file_path: str,
) None#

Write data to Parquet file using pandas DataFrame.to_parquet.

write_kwargs: dict[str, Any]#

‘field(…)’