modifiers.quotation_remover#

Module Contents#

Classes#

QuotationRemover

Removes quotations from a document following a few rules:

API#

class modifiers.quotation_remover.QuotationRemover#

Bases: nemo_curator.modifiers.DocumentModifier

Removes quotations from a document following a few rules:

  • If the document is less than 2 characters, it is returned unchanged.

  • If the document starts and ends with a quotation mark and there are no newlines in the document, the quotation marks are removed.

  • If the document starts and ends with a quotation mark and there are newlines in the document, the quotation marks are removed only if the first line does not end with a quotation mark.

Initialization

modify_document(text: str) str#