*** layout: overview slug: nemo-curator/nemo\_curator/utils/split\_large\_files title: nemo\_curator.utils.split\_large\_files ---------------------------------------------- ## Module Contents ### Functions | Name | Description | | ------------------------------------------------------------------------------------------------ | ----------- | | [`_split_table`](#nemo_curator-utils-split_large_files-_split_table) | - | | [`_write_table_to_file`](#nemo_curator-utils-split_large_files-_write_table_to_file) | - | | [`main`](#nemo_curator-utils-split_large_files-main) | - | | [`parse_args`](#nemo_curator-utils-split_large_files-parse_args) | - | | [`split_parquet_file_by_size`](#nemo_curator-utils-split_large_files-split_parquet_file_by_size) | - | ### API ```python nemo_curator.utils.split_large_files._split_table( table: pyarrow.Table, target_size: int ) -> list[pyarrow.Table] ``` ```python nemo_curator.utils.split_large_files._write_table_to_file( table: pyarrow.Table, outdir: str, output_prefix: str, ext: str, file_idx: int ) -> int ``` ```python nemo_curator.utils.split_large_files.main( args: argparse.ArgumentParser | None = None ) -> None ``` ```python nemo_curator.utils.split_large_files.parse_args( args: argparse.ArgumentParser | None = None ) -> argparse.Namespace ``` ```python nemo_curator.utils.split_large_files.split_parquet_file_by_size( input_file: str, outdir: str, target_size_mb: int ) -> None ```