nemo_curator.stages.text.modifiers.string.newline_normalizer

View as Markdown

Module Contents

Classes

NameDescription
NewlineNormalizerReplaces 3 or more consecutive newline characters with only 2 newline characters.

Data

THREE_OR_MORE_NEWLINES_REGEX

THREE_OR_MORE_WINDOWS_NEWLINES_REGEX

API

class nemo_curator.stages.text.modifiers.string.newline_normalizer.NewlineNormalizer()

Bases: DocumentModifier

Replaces 3 or more consecutive newline characters with only 2 newline characters.

nemo_curator.stages.text.modifiers.string.newline_normalizer.NewlineNormalizer.modify_document(
text: str
) -> str
nemo_curator.stages.text.modifiers.string.newline_normalizer.THREE_OR_MORE_NEWLINES_REGEX = re.compile('(\\n){3,}')
nemo_curator.stages.text.modifiers.string.newline_normalizer.THREE_OR_MORE_WINDOWS_NEWLINES_REGEX = re.compile('(\\r\\n){3,}')