> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

# nemo_curator.stages.text.modifiers.string.c4

## Module Contents

### Classes

| Name                                                                                                   | Description                                                           |
| ------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------- |
| [`BoilerPlateStringModifier`](#nemo_curator-stages-text-modifiers-string-c4-BoilerPlateStringModifier) | If the sentence contains any of the boilerplate strings then discard. |

### API

<Anchor id="nemo_curator-stages-text-modifiers-string-c4-BoilerPlateStringModifier">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.stages.text.modifiers.string.c4.BoilerPlateStringModifier(
        remove_if_at_top_or_bottom: bool = True
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  **Bases:** [DocumentModifier](/nemo-curator/nemo_curator/stages/text/modifiers/doc_modifier#nemo_curator-stages-text-modifiers-doc_modifier-DocumentModifier)

  If the sentence contains any of the boilerplate strings then discard.
  This includes things like "terms of use", "privacy policy", etc.
  Source: Adapted significantly from Google C4 processing.

  <ParamField path="_boilerplate_paragraph_indices" type="= []" />

  <ParamField path="_name" type="= 'boilerplate_string_ratio'" />

  <Anchor id="nemo_curator-stages-text-modifiers-string-c4-BoilerPlateStringModifier-modify_document">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.text.modifiers.string.c4.BoilerPlateStringModifier.modify_document(
          text: str
      ) -> str
      ```
    </CodeBlock>
  </Anchor>

  <Indent />
</Indent>