> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

# nemo_curator.stages.text.download.html_extractors

## Subpackages

* **[`nemo_curator.stages.text.download.html_extractors.utils`](/nemo-curator/nemo_curator/stages/text/download/html_extractors/utils)**

## Submodules

* **[`nemo_curator.stages.text.download.html_extractors.base`](/nemo-curator/nemo_curator/stages/text/download/html_extractors/base)**
* **[`nemo_curator.stages.text.download.html_extractors.justext`](/nemo-curator/nemo_curator/stages/text/download/html_extractors/justext)**
* **[`nemo_curator.stages.text.download.html_extractors.resiliparse`](/nemo-curator/nemo_curator/stages/text/download/html_extractors/resiliparse)**
* **[`nemo_curator.stages.text.download.html_extractors.trafilatura`](/nemo-curator/nemo_curator/stages/text/download/html_extractors/trafilatura)**

## Package Contents

### Data

[`__all__`](#nemo_curator-stages-text-download-html_extractors-__all__)

### API

<Anchor id="nemo_curator-stages-text-download-html_extractors-__all__">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.text.download.html_extractors.__all__ = ['HTMLExtractorAlgorithm', 'JusTextExtractor', 'ResiliparseExtractor', 'Trafilat...
    ```
  </CodeBlock>
</Anchor>