nemo_curator.stages.text.modifiers.string.markdown_remover

View as Markdown

Module Contents

Classes

NameDescription
MarkdownRemoverRemoves Markdown formatting in a document including bold, italic, underline, and URL text.

Data

MARKDOWN_BOLD_REGEX

MARKDOWN_ITALIC_REGEX

MARKDOWN_LINK_REGEX

MARKDOWN_UNDERLINE_REGEX

API

class nemo_curator.stages.text.modifiers.string.markdown_remover.MarkdownRemover()

Bases: DocumentModifier

Removes Markdown formatting in a document including bold, italic, underline, and URL text.

nemo_curator.stages.text.modifiers.string.markdown_remover.MarkdownRemover.modify_document(
text: str
) -> str
nemo_curator.stages.text.modifiers.string.markdown_remover.MARKDOWN_BOLD_REGEX = '\\*\\*(.*?)\\*\\*'
nemo_curator.stages.text.modifiers.string.markdown_remover.MARKDOWN_ITALIC_REGEX = '\\*(.*?)\\*'
nemo_curator.stages.text.modifiers.string.markdown_remover.MARKDOWN_LINK_REGEX = '\\[.*?\\]\\((.*?)\\)'
nemo_curator.stages.text.modifiers.string.markdown_remover.MARKDOWN_UNDERLINE_REGEX = '_(.*?)_'