nemo_curator.stages.text.modifiers.string.c4
nemo_curator.stages.text.modifiers.string.c4
Module Contents
Classes
API
Bases: DocumentModifier
If the sentence contains any of the boilerplate strings then discard. This includes things like “terms of use”, “privacy policy”, etc. Source: Adapted significantly from Google C4 processing.
_boilerplate_paragraph_indices
_name