***

layout: overview
slug: nemo-curator/nemo\_curator/utils/grouping
title: nemo\_curator.utils.grouping
-----------------------------------

Utility Functions for grouping iterables.

This module provides a collection of utility functions designed to assist with common tasks related to manipulating
and transforming iterables in Python.

These utilities are generic and work with any iterable types. They're particularly useful for data processing tasks,
batching operations, and other scenarios where dividing data into specific groupings is necessary.

## Module Contents

### Functions

| Name                                                                      | Description                                                |
| ------------------------------------------------------------------------- | ---------------------------------------------------------- |
| [`pairwise`](#nemo_curator-utils-grouping-pairwise)                       | Return pairs of consecutive items from the input iterable. |
| [`split_by_chunk_size`](#nemo_curator-utils-grouping-split_by_chunk_size) | Split an iterable into chunks of the specified size.       |
| [`split_into_n_chunks`](#nemo_curator-utils-grouping-split_into_n_chunks) | Split an iterable into a specified number of chunks.       |

### Data

[`T`](#nemo_curator-utils-grouping-T)

### API

<Anchor id="nemo_curator-utils-grouping-pairwise">
  <CodeBlock links={{"nemo_curator.utils.grouping.T":"#nemo_curator-utils-grouping-T"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.grouping.pairwise(
        iterable: collections.abc.Iterable[nemo_curator.utils.grouping.T]
    ) -> collections.abc.Iterable[tuple[nemo_curator.utils.grouping.T, nemo_curator.utils.grouping.T]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Return pairs of consecutive items from the input iterable.

  **Parameters:**

  <ParamField path="iterable" type="Iterable[T]">
    The input iterable.
  </ParamField>

  **Returns:** `Iterable[tuple[T, T]]`

  Iterable\[tuple\[T, T]]: Pairs of consecutive items.
</Indent>

<Anchor id="nemo_curator-utils-grouping-split_by_chunk_size">
  <CodeBlock links={{"nemo_curator.utils.grouping.T":"#nemo_curator-utils-grouping-T"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.grouping.split_by_chunk_size(
        iterable: collections.abc.Iterable[nemo_curator.utils.grouping.T],
        chunk_size: int,
        custom_size_func: typing.Callable[[T], int] = lambda x: 1,
        drop_incomplete_chunk: bool = False
    ) -> collections.abc.Generator[list[nemo_curator.utils.grouping.T], None, None]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Split an iterable into chunks of the specified size.

  Yields:

  * Generator\[list\[T], None, None]: Chunks of the input iterable.

  **Parameters:**

  <ParamField path="iterable" type="Iterable[T]">
    The input iterable to be split.
  </ParamField>

  <ParamField path="chunk_size" type="int">
    Size of each chunk.
  </ParamField>

  <ParamField path="custom_size_func" type="typing.Callable" default="lambda x: 1">
    function
  </ParamField>

  <ParamField path="drop_incomplete_chunk" type="bool" default="False">
    If True, drops the last chunk if its size is less than the
    specified chunk size. Defaults to False.
  </ParamField>
</Indent>

<Anchor id="nemo_curator-utils-grouping-split_into_n_chunks">
  <CodeBlock links={{"nemo_curator.utils.grouping.T":"#nemo_curator-utils-grouping-T"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.grouping.split_into_n_chunks(
        iterable: collections.abc.Iterable[nemo_curator.utils.grouping.T],
        num_chunks: int
    ) -> collections.abc.Generator[list[nemo_curator.utils.grouping.T], None, None]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Split an iterable into a specified number of chunks.

  Yields:

  * Generator\[list\[T], None, None]: Chunks of the input iterable.

  **Parameters:**

  <ParamField path="iterable" type="Iterable[T]">
    The input iterable to be split.
  </ParamField>

  <ParamField path="num_chunks" type="int">
    The desired number of chunks.
  </ParamField>
</Indent>

<Anchor id="nemo_curator-utils-grouping-T">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.grouping.T = typing.TypeVar('T')
    ```
  </CodeBlock>
</Anchor>
