nemo_curator.stages.text.io.writer.base
nemo_curator.stages.text.io.writer.base
Module Contents
Classes
API
DataclassAbstract
Bases: ProcessingStage[DocumentBatch, FileGroupTask]
Base class for all writer stages.
This abstract base class provides common functionality for writing DocumentBatch tasks to files, including file naming, metadata handling, and filesystem operations.
append_mode_implemented
fields
file_extension
mode
name
path
write_kwargs
Return the file extension for this writer format.
Process a DocumentBatch and write to files.
Parameters:
task
DocumentBatch containing data to write
Returns: FileGroupTask
Task containing paths to written files
abstract
Write data to file using format-specific implementation.