For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • API Reference
    • Overview
        • Nemo Curator
          • Backends
          • Config
          • Core
          • Metrics
          • Models
          • Package Info
          • Pipeline
          • Stages
          • Tasks
          • Utils
            • Client Utils
            • Column Utils
            • Decoder Utils
            • File Utils
            • Gpu Utils
            • Grouping
            • Hf Download Utils
            • Merge File Prefixes
            • Nvcodec Utils
            • Operation Utils
            • Performance Utils
            • Prompts
            • Ray Utils
            • Split Large Files
            • Storage Utils
            • Vllm Utils
            • Windowing Utils
            • Writer Utils
    • Pipeline
    • ProcessingStage
    • CompositeStage
    • Resources
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Module Contents
  • Functions
  • Data
  • API
API ReferenceFull Library ReferenceNemo CuratorNemo CuratorUtils

nemo_curator.utils.grouping

||View as Markdown|
Previous

nemo_curator.utils.gpu_utils

Next

nemo_curator.utils.hf_download_utils

Utility Functions for grouping iterables.

This module provides a collection of utility functions designed to assist with common tasks related to manipulating and transforming iterables in Python.

These utilities are generic and work with any iterable types. They’re particularly useful for data processing tasks, batching operations, and other scenarios where dividing data into specific groupings is necessary.

Module Contents

Functions

NameDescription
pairwiseReturn pairs of consecutive items from the input iterable.
split_by_chunk_sizeSplit an iterable into chunks of the specified size.
split_into_n_chunksSplit an iterable into a specified number of chunks.

Data

T

API

nemo_curator.utils.grouping.pairwise(
iterable: collections.abc.Iterable[nemo_curator.utils.grouping.T]
) -> collections.abc.Iterable[tuple[nemo_curator.utils.grouping.T, nemo_curator.utils.grouping.T]]) -> collections.abc.Iterable[tuple[nemo_curator.utils.grouping.T, nemo_curator.utils.grouping.T]]

Return pairs of consecutive items from the input iterable.

Parameters:

iterable
Iterable[T]

The input iterable.

Returns: Iterable[tuple[T, T]]

Iterable[tuple[T, T]]: Pairs of consecutive items.

nemo_curator.utils.grouping.split_by_chunk_size(
iterable: collections.abc.Iterable[nemo_curator.utils.grouping.T],
chunk_size: int,
custom_size_func: typing.Callable[[T], int] = lambda x: 1,
drop_incomplete_chunk: bool = False
) -> collections.abc.Generator[list[nemo_curator.utils.grouping.T], None, None]

Split an iterable into chunks of the specified size.

Yields:

  • Generator[list[T], None, None]: Chunks of the input iterable.

Parameters:

iterable
Iterable[T]

The input iterable to be split.

chunk_size
int

Size of each chunk.

custom_size_func
typing.CallableDefaults to lambda x: 1

function

drop_incomplete_chunk
boolDefaults to False

If True, drops the last chunk if its size is less than the specified chunk size. Defaults to False.

nemo_curator.utils.grouping.split_into_n_chunks(
iterable: collections.abc.Iterable[nemo_curator.utils.grouping.T],
num_chunks: int
) -> collections.abc.Generator[list[nemo_curator.utils.grouping.T], None, None]

Split an iterable into a specified number of chunks.

Yields:

  • Generator[list[T], None, None]: Chunks of the input iterable.

Parameters:

iterable
Iterable[T]

The input iterable to be split.

num_chunks
int

The desired number of chunks.

nemo_curator.utils.grouping.T = typing.TypeVar('T')