nemo_curator.stages.interleaved.io.writers.tabular

View as Markdown

Module Contents

Classes

NameDescription
InterleavedParquetWriterStageWrite interleaved rows to Parquet with optional binary materialization.

API

class nemo_curator.stages.interleaved.io.writers.tabular.InterleavedParquetWriterStage(
path: str,
file_extension: str = 'parquet',
write_kwargs: dict[str, typing.Any] = dict(),
materialize_on_write: bool = True,
name: str = 'interleaved_parquet_writer',
mode: typing.Literal['ignore', 'overwrite', 'append', 'error'] = 'ignore',
append_mode_implemented: bool = False
)
Dataclass

Bases: BaseInterleavedWriter

Write interleaved rows to Parquet with optional binary materialization.

file_extension
str = 'parquet'
name
str = 'interleaved_parquet_writer'
nemo_curator.stages.interleaved.io.writers.tabular.InterleavedParquetWriterStage._write_dataframe(
df: pandas.DataFrame,
file_path: str,
write_kwargs: dict[str, typing.Any]
) -> None