nemo_curator.stages.text.modifiers.string.url_remover

View as Markdown

Module Contents

Classes

NameDescription
UrlRemoverRemoves all URLs in a document.

Data

URL_REGEX

API

class nemo_curator.stages.text.modifiers.string.url_remover.UrlRemover()

Bases: DocumentModifier

Removes all URLs in a document.

nemo_curator.stages.text.modifiers.string.url_remover.UrlRemover.modify_document(
text: str
) -> str
nemo_curator.stages.text.modifiers.string.url_remover.URL_REGEX = re.compile('https?://\\S+|www\\.\\S+', flags=(re.IGNORECASE))