***

layout: overview
slug: nemo-curator/nemo\_curator/utils/split\_large\_files
title: nemo\_curator.utils.split\_large\_files
----------------------------------------------

## Module Contents

### Functions

| Name                                                                                             | Description |
| ------------------------------------------------------------------------------------------------ | ----------- |
| [`_split_table`](#nemo_curator-utils-split_large_files-_split_table)                             | -           |
| [`_write_table_to_file`](#nemo_curator-utils-split_large_files-_write_table_to_file)             | -           |
| [`main`](#nemo_curator-utils-split_large_files-main)                                             | -           |
| [`parse_args`](#nemo_curator-utils-split_large_files-parse_args)                                 | -           |
| [`split_parquet_file_by_size`](#nemo_curator-utils-split_large_files-split_parquet_file_by_size) | -           |

### API

<Anchor id="nemo_curator-utils-split_large_files-_split_table">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.split_large_files._split_table(
        table: pyarrow.Table,
        target_size: int
    ) -> list[pyarrow.Table]
    ```
  </CodeBlock>
</Anchor>

<Indent />

<Anchor id="nemo_curator-utils-split_large_files-_write_table_to_file">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.split_large_files._write_table_to_file(
        table: pyarrow.Table,
        outdir: str,
        output_prefix: str,
        ext: str,
        file_idx: int
    ) -> int
    ```
  </CodeBlock>
</Anchor>

<Indent />

<Anchor id="nemo_curator-utils-split_large_files-main">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.split_large_files.main(
        args: argparse.ArgumentParser | None = None
    ) -> None
    ```
  </CodeBlock>
</Anchor>

<Indent />

<Anchor id="nemo_curator-utils-split_large_files-parse_args">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.split_large_files.parse_args(
        args: argparse.ArgumentParser | None = None
    ) -> argparse.Namespace
    ```
  </CodeBlock>
</Anchor>

<Indent />

<Anchor id="nemo_curator-utils-split_large_files-split_parquet_file_by_size">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.split_large_files.split_parquet_file_by_size(
        input_file: str,
        outdir: str,
        target_size_mb: int
    ) -> None
    ```
  </CodeBlock>
</Anchor>

<Indent />
