nemo_rl.data.packing.algorithms#

Sequence packing algorithms for efficient batching of variable-length sequences.

Module Contents#

Classes#

PackingAlgorithm

Enum for supported sequence packing algorithms.

SequencePacker

Abstract base class for sequence packing algorithms.

ConcatenativePacker

Concatenative packing algorithm.

FirstFitPacker

Base class for First-Fit algorithms.

FirstFitDecreasingPacker

First-Fit Decreasing (FFD) algorithm for sequence packing.

FirstFitShufflePacker

First-Fit Shuffle algorithm for sequence packing.

ModifiedFirstFitDecreasingPacker

Modified First-Fit Decreasing (MFFD) algorithm for sequence packing.

Functions#

get_packer

Factory function to get a sequence packer based on the algorithm.

API#

class nemo_rl.data.packing.algorithms.PackingAlgorithm(*args, **kwds)[source]#

Bases: enum.Enum

Enum for supported sequence packing algorithms.

Initialization

CONCATENATIVE#

β€˜concatenative’

FIRST_FIT_DECREASING#

β€˜first_fit_decreasing’

FIRST_FIT_SHUFFLE#

β€˜first_fit_shuffle’

MODIFIED_FIRST_FIT_DECREASING#

β€˜modified_first_fit_decreasing’

class nemo_rl.data.packing.algorithms.SequencePacker(
bin_capacity: int,
collect_metrics: bool = False,
min_bin_count: Optional[int] = None,
bin_count_multiple: Optional[int] = None,
)[source]#

Bases: abc.ABC

Abstract base class for sequence packing algorithms.

Sequence packing is the process of efficiently arranging sequences of different lengths into fixed-capacity bins (batches) to maximize computational efficiency.

Initialization

Initialize the sequence packer.

Parameters:
  • bin_capacity – The maximum capacity of each bin.

  • collect_metrics – Whether to collect metrics across multiple packing operations.

  • min_bin_count – Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.

  • bin_count_multiple – The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.

Raises:

ValueError – If min_bin_count or bin_count_multiple are invalid.

abstractmethod _pack_implementation(
sequence_lengths: List[int],
) List[List[int]][source]#

Implementation of the packing algorithm.

Parameters:

sequence_lengths – A list of sequence lengths to pack.

Returns:

A list of bins, where each bin is a list of indices into the original sequence_lengths list.

_adjust_bin_count(
bins: List[List[int]],
) List[List[int]][source]#

Adjust the number of bins to meet minimum and multiple constraints.

This method preserves the existing bin packing as much as possible and only moves sequences one at a time to create additional bins when needed.

Parameters:

bins – The original bins from the packing algorithm.

Returns:

Adjusted bins with minimal changes to meet constraints.

Raises:

ValueError – If there aren’t enough sequences to fill the required number of bins.

pack(
sequence_lengths: List[int],
) List[List[int]][source]#

Pack sequences into bins and update metrics if enabled.

Parameters:

sequence_lengths – A list of sequence lengths to pack.

Returns:

A list of bins, where each bin is a list of indices into the original sequence_lengths list. The number of bins will satisfy min_bin_count and bin_count_multiple constraints if specified.

reset_metrics() None[source]#

Reset collected metrics.

compute_metrics(
sequence_lengths: List[int],
bins: List[List[int]],
) Dict[str, float][source]#

Calculate metrics for a packing solution without updating the metrics tracker.

Parameters:
  • sequence_lengths – List of sequence lengths

  • bins – List of bins, where each bin is a list of indices

Returns:

Dictionary of packing metrics

get_aggregated_metrics() Dict[str, float][source]#

Get aggregated metrics across all packing operations.

Returns:

Dictionary of aggregated metrics, or empty dict if not collecting

print_metrics() None[source]#

Print the current metrics in a formatted way.

_validate_sequence_lengths(sequence_lengths: List[int]) None[source]#

Validate that all sequence lengths are within bin capacity.

Parameters:

sequence_lengths – A list of sequence lengths to validate.

Raises:

ValueError – If any sequence length exceeds bin capacity.

_create_indexed_lengths(
sequence_lengths: List[int],
reverse: bool = False,
) List[Tuple[int, int]][source]#

Create a list of (length, index) pairs from sequence lengths.

Parameters:
  • sequence_lengths – A list of sequence lengths.

  • reverse – Whether to sort in descending order (True) or ascending order (False).

Returns:

A list of (length, index) pairs, optionally sorted.

_estimate_bins_needed(sequence_lengths: List[int]) int[source]#

Estimate the number of bins needed based on total length.

Parameters:

sequence_lengths – A list of sequence lengths.

Returns:

Estimated number of bins needed.

class nemo_rl.data.packing.algorithms.ConcatenativePacker(
bin_capacity: int,
collect_metrics: bool = False,
min_bin_count: Optional[int] = None,
bin_count_multiple: Optional[int] = None,
)[source]#

Bases: nemo_rl.data.packing.algorithms.SequencePacker

Concatenative packing algorithm.

This algorithm simply concatenates sequences in order until reaching the bin capacity, then starts a new bin. It doesn’t try to optimize the packing in any way.

Time complexity: O(n) where n is the number of sequences.

Example:

>>> examples = {
...     "sequence_lengths": [4, 1, 3, 2, 1, 3, 4, 5]
... }
>>> # If packed with seq_length=5:
... {"bins": [ [0, 1], [2, 3], [4, 5], [6], [7] ]}
>>> # If packed with seq_length=8:
... {"bins": [ [0, 1, 2], [3, 4, 5], [6], [7] ]}

Initialization

Initialize the sequence packer.

Parameters:
  • bin_capacity – The maximum capacity of each bin.

  • collect_metrics – Whether to collect metrics across multiple packing operations.

  • min_bin_count – Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.

  • bin_count_multiple – The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.

Raises:

ValueError – If min_bin_count or bin_count_multiple are invalid.

max_sequences_per_bin#

None

_pack_implementation(
sequence_lengths: List[int],
) List[List[int]][source]#

Pack sequences using the Concatenative algorithm.

Parameters:

sequence_lengths – A list of sequence lengths to pack.

Returns:

A list of bins, where each bin is a list of indices into the original sequence_lengths list.

class nemo_rl.data.packing.algorithms.FirstFitPacker(
bin_capacity: int,
collect_metrics: bool = False,
min_bin_count: Optional[int] = None,
bin_count_multiple: Optional[int] = None,
)[source]#

Bases: nemo_rl.data.packing.algorithms.SequencePacker

Base class for First-Fit algorithms.

First-Fit algorithms place each sequence into the first bin where it fits. If no bin can fit the sequence, a new bin is created.

This is an abstract base class that provides the common implementation for First-Fit variants. Subclasses must implement the _prepare_sequences method to determine the order in which sequences are processed.

Initialization

Initialize the sequence packer.

Parameters:
  • bin_capacity – The maximum capacity of each bin.

  • collect_metrics – Whether to collect metrics across multiple packing operations.

  • min_bin_count – Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.

  • bin_count_multiple – The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.

Raises:

ValueError – If min_bin_count or bin_count_multiple are invalid.

abstractmethod _prepare_sequences(
sequence_lengths: List[int],
) List[Tuple[int, int]][source]#

Prepare sequences for packing.

This method determines the order in which sequences are processed. Subclasses must override this method.

Parameters:

sequence_lengths – A list of sequence lengths to pack.

Returns:

A list of (length, index) pairs.

_pack_implementation(
sequence_lengths: List[int],
) List[List[int]][source]#

Pack sequences using the First-Fit algorithm.

Parameters:

sequence_lengths – A list of sequence lengths to pack.

Returns:

A list of bins, where each bin is a list of indices into the original sequence_lengths list.

class nemo_rl.data.packing.algorithms.FirstFitDecreasingPacker(
bin_capacity: int,
collect_metrics: bool = False,
min_bin_count: Optional[int] = None,
bin_count_multiple: Optional[int] = None,
)[source]#

Bases: nemo_rl.data.packing.algorithms.FirstFitPacker

First-Fit Decreasing (FFD) algorithm for sequence packing.

This algorithm sorts sequences by length in descending order and then places each sequence into the first bin where it fits.

Time complexity: O(n log n) for sorting + O(n * m) for packing, where n is the number of sequences and m is the number of bins.

Initialization

Initialize the sequence packer.

Parameters:
  • bin_capacity – The maximum capacity of each bin.

  • collect_metrics – Whether to collect metrics across multiple packing operations.

  • min_bin_count – Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.

  • bin_count_multiple – The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.

Raises:

ValueError – If min_bin_count or bin_count_multiple are invalid.

_prepare_sequences(
sequence_lengths: List[int],
) List[Tuple[int, int]][source]#

Prepare sequences for packing by sorting them in descending order.

Parameters:

sequence_lengths – A list of sequence lengths to pack.

Returns:

A list of (length, index) pairs sorted by length in descending order.

class nemo_rl.data.packing.algorithms.FirstFitShufflePacker(
bin_capacity: int,
collect_metrics: bool = False,
min_bin_count: Optional[int] = None,
bin_count_multiple: Optional[int] = None,
)[source]#

Bases: nemo_rl.data.packing.algorithms.FirstFitPacker

First-Fit Shuffle algorithm for sequence packing.

This algorithm randomly shuffles the sequences and then places each sequence into the first bin where it fits.

Time complexity: O(n * m) for packing, where n is the number of sequences and m is the number of bins.

Initialization

Initialize the sequence packer.

Parameters:
  • bin_capacity – The maximum capacity of each bin.

  • collect_metrics – Whether to collect metrics across multiple packing operations.

  • min_bin_count – Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.

  • bin_count_multiple – The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.

Raises:

ValueError – If min_bin_count or bin_count_multiple are invalid.

_prepare_sequences(
sequence_lengths: List[int],
) List[Tuple[int, int]][source]#

Prepare sequences for packing by randomly shuffling them.

Parameters:

sequence_lengths – A list of sequence lengths to pack.

Returns:

A list of (length, index) pairs in random order.

class nemo_rl.data.packing.algorithms.ModifiedFirstFitDecreasingPacker(
bin_capacity: int,
collect_metrics: bool = False,
min_bin_count: Optional[int] = None,
bin_count_multiple: Optional[int] = None,
)[source]#

Bases: nemo_rl.data.packing.algorithms.SequencePacker

Modified First-Fit Decreasing (MFFD) algorithm for sequence packing.

This algorithm implements the Johnson & Garey (1985) Modified First-Fit-Decreasing heuristic. It classifies items into four categories (large, medium, small, tiny) and uses a sophisticated 5-phase packing strategy to achieve better bin utilization than standard First-Fit Decreasing.

The algorithm phases:

  1. Classify items by size relative to bin capacity

  2. Create one bin per large item

  3. Add medium items to large bins (forward pass)

  4. Add pairs of small items to bins with medium items (backward pass)

  5. Greedily fit remaining items

  6. Apply FFD to any leftovers

Time complexity: O(n log n) for sorting + O(n * m) for packing, where n is the number of sequences and m is the number of bins.

Initialization

Initialize the sequence packer.

Parameters:
  • bin_capacity – The maximum capacity of each bin.

  • collect_metrics – Whether to collect metrics across multiple packing operations.

  • min_bin_count – Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.

  • bin_count_multiple – The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.

Raises:

ValueError – If min_bin_count or bin_count_multiple are invalid.

_classify_items(
items: List[Tuple[int, int]],
) Tuple[List[Tuple[int, int]], List[Tuple[int, int]], List[Tuple[int, int]], List[Tuple[int, int]]][source]#

Split items into large / medium / small / tiny classes.

Follows the classification used by Johnson & Garey: large : (C/2, C] medium : (C/3, C/2] small : (C/6, C/3] tiny : (0 , C/6]

Parameters:

items – List of (index, size) tuples

Returns:

Tuple of four lists (large, medium, small, tiny) without additional sorting.

_pack_implementation(
sequence_lengths: List[int],
) List[List[int]][source]#

Pack sequences using the Modified First-Fit Decreasing algorithm.

Parameters:

sequence_lengths – A list of sequence lengths to pack.

Returns:

A list of bins, where each bin is a list of indices into the original sequence_lengths list.

nemo_rl.data.packing.algorithms.get_packer(
algorithm: Union[nemo_rl.data.packing.algorithms.PackingAlgorithm, str],
bin_capacity: int,
collect_metrics: bool = False,
min_bin_count: Optional[int] = None,
bin_count_multiple: Optional[int] = None,
) nemo_rl.data.packing.algorithms.SequencePacker[source]#

Factory function to get a sequence packer based on the algorithm.

Parameters:
  • algorithm – The packing algorithm to use. Can be either a PackingAlgorithm enum value or a string (case-insensitive) matching one of the enum names.

  • bin_capacity – The maximum capacity of each bin.

  • collect_metrics – Whether to collect metrics across multiple packing operations.

  • min_bin_count – Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.

  • bin_count_multiple – The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.

Returns:

A SequencePacker instance for the specified algorithm.

Raises:

ValueError – If the algorithm is not recognized.