modifiers.c4#

Module Contents#

Classes#

BoilerPlateStringModifier

If the sentence contains any of the boilerplate strings then discard. This includes things like “terms of use”, “privacy policy”, etc. Source: Adapted significantly from Google C4 processing.

API#

class modifiers.c4.BoilerPlateStringModifier(remove_if_at_top_or_bottom: bool = True)#

Bases: nemo_curator.modifiers.doc_modifier.DocumentModifier

If the sentence contains any of the boilerplate strings then discard. This includes things like “terms of use”, “privacy policy”, etc. Source: Adapted significantly from Google C4 processing.

Initialization

modify_document(text: str) str#