nemo_rl.data.packing.algorithms
#
Sequence packing algorithms for efficient batching of variable-length sequences.
Module Contents#
Classes#
Enum for supported sequence packing algorithms. |
|
Abstract base class for sequence packing algorithms. |
|
Concatenative packing algorithm. |
|
Base class for First-Fit algorithms. |
|
First-Fit Decreasing (FFD) algorithm for sequence packing. |
|
First-Fit Shuffle algorithm for sequence packing. |
|
Modified First-Fit Decreasing (MFFD) algorithm for sequence packing. |
Functions#
Factory function to get a sequence packer based on the algorithm. |
API#
- class nemo_rl.data.packing.algorithms.PackingAlgorithm(*args, **kwds)[source]#
Bases:
enum.Enum
Enum for supported sequence packing algorithms.
Initialization
- CONCATENATIVE#
βconcatenativeβ
- FIRST_FIT_DECREASING#
βfirst_fit_decreasingβ
- FIRST_FIT_SHUFFLE#
βfirst_fit_shuffleβ
- MODIFIED_FIRST_FIT_DECREASING#
βmodified_first_fit_decreasingβ
- class nemo_rl.data.packing.algorithms.SequencePacker(
- bin_capacity: int,
- collect_metrics: bool = False,
- min_bin_count: Optional[int] = None,
- bin_count_multiple: Optional[int] = None,
Bases:
abc.ABC
Abstract base class for sequence packing algorithms.
Sequence packing is the process of efficiently arranging sequences of different lengths into fixed-capacity bins (batches) to maximize computational efficiency.
Initialization
Initialize the sequence packer.
- Parameters:
bin_capacity β The maximum capacity of each bin.
collect_metrics β Whether to collect metrics across multiple packing operations.
min_bin_count β Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.
bin_count_multiple β The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.
- Raises:
ValueError β If min_bin_count or bin_count_multiple are invalid.
- abstractmethod _pack_implementation(
- sequence_lengths: List[int],
Implementation of the packing algorithm.
- Parameters:
sequence_lengths β A list of sequence lengths to pack.
- Returns:
A list of bins, where each bin is a list of indices into the original sequence_lengths list.
- _adjust_bin_count(
- bins: List[List[int]],
Adjust the number of bins to meet minimum and multiple constraints.
This method preserves the existing bin packing as much as possible and only moves sequences one at a time to create additional bins when needed.
- Parameters:
bins β The original bins from the packing algorithm.
- Returns:
Adjusted bins with minimal changes to meet constraints.
- Raises:
ValueError β If there arenβt enough sequences to fill the required number of bins.
- pack(
- sequence_lengths: List[int],
Pack sequences into bins and update metrics if enabled.
- Parameters:
sequence_lengths β A list of sequence lengths to pack.
- Returns:
A list of bins, where each bin is a list of indices into the original sequence_lengths list. The number of bins will satisfy min_bin_count and bin_count_multiple constraints if specified.
- compute_metrics(
- sequence_lengths: List[int],
- bins: List[List[int]],
Calculate metrics for a packing solution without updating the metrics tracker.
- Parameters:
sequence_lengths β List of sequence lengths
bins β List of bins, where each bin is a list of indices
- Returns:
Dictionary of packing metrics
- get_aggregated_metrics() Dict[str, float] [source]#
Get aggregated metrics across all packing operations.
- Returns:
Dictionary of aggregated metrics, or empty dict if not collecting
- _validate_sequence_lengths(sequence_lengths: List[int]) None [source]#
Validate that all sequence lengths are within bin capacity.
- Parameters:
sequence_lengths β A list of sequence lengths to validate.
- Raises:
ValueError β If any sequence length exceeds bin capacity.
- _create_indexed_lengths(
- sequence_lengths: List[int],
- reverse: bool = False,
Create a list of (length, index) pairs from sequence lengths.
- Parameters:
sequence_lengths β A list of sequence lengths.
reverse β Whether to sort in descending order (True) or ascending order (False).
- Returns:
A list of (length, index) pairs, optionally sorted.
- class nemo_rl.data.packing.algorithms.ConcatenativePacker(
- bin_capacity: int,
- collect_metrics: bool = False,
- min_bin_count: Optional[int] = None,
- bin_count_multiple: Optional[int] = None,
Bases:
nemo_rl.data.packing.algorithms.SequencePacker
Concatenative packing algorithm.
This algorithm simply concatenates sequences in order until reaching the bin capacity, then starts a new bin. It doesnβt try to optimize the packing in any way.
Time complexity: O(n) where n is the number of sequences.
Example:
>>> examples = { ... "sequence_lengths": [4, 1, 3, 2, 1, 3, 4, 5] ... } >>> # If packed with seq_length=5: ... {"bins": [ [0, 1], [2, 3], [4, 5], [6], [7] ]} >>> # If packed with seq_length=8: ... {"bins": [ [0, 1, 2], [3, 4, 5], [6], [7] ]}
Initialization
Initialize the sequence packer.
- Parameters:
bin_capacity β The maximum capacity of each bin.
collect_metrics β Whether to collect metrics across multiple packing operations.
min_bin_count β Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.
bin_count_multiple β The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.
- Raises:
ValueError β If min_bin_count or bin_count_multiple are invalid.
- max_sequences_per_bin#
None
- class nemo_rl.data.packing.algorithms.FirstFitPacker(
- bin_capacity: int,
- collect_metrics: bool = False,
- min_bin_count: Optional[int] = None,
- bin_count_multiple: Optional[int] = None,
Bases:
nemo_rl.data.packing.algorithms.SequencePacker
Base class for First-Fit algorithms.
First-Fit algorithms place each sequence into the first bin where it fits. If no bin can fit the sequence, a new bin is created.
This is an abstract base class that provides the common implementation for First-Fit variants. Subclasses must implement the _prepare_sequences method to determine the order in which sequences are processed.
Initialization
Initialize the sequence packer.
- Parameters:
bin_capacity β The maximum capacity of each bin.
collect_metrics β Whether to collect metrics across multiple packing operations.
min_bin_count β Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.
bin_count_multiple β The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.
- Raises:
ValueError β If min_bin_count or bin_count_multiple are invalid.
- abstractmethod _prepare_sequences(
- sequence_lengths: List[int],
Prepare sequences for packing.
This method determines the order in which sequences are processed. Subclasses must override this method.
- Parameters:
sequence_lengths β A list of sequence lengths to pack.
- Returns:
A list of (length, index) pairs.
- class nemo_rl.data.packing.algorithms.FirstFitDecreasingPacker(
- bin_capacity: int,
- collect_metrics: bool = False,
- min_bin_count: Optional[int] = None,
- bin_count_multiple: Optional[int] = None,
Bases:
nemo_rl.data.packing.algorithms.FirstFitPacker
First-Fit Decreasing (FFD) algorithm for sequence packing.
This algorithm sorts sequences by length in descending order and then places each sequence into the first bin where it fits.
Time complexity: O(n log n) for sorting + O(n * m) for packing, where n is the number of sequences and m is the number of bins.
Initialization
Initialize the sequence packer.
- Parameters:
bin_capacity β The maximum capacity of each bin.
collect_metrics β Whether to collect metrics across multiple packing operations.
min_bin_count β Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.
bin_count_multiple β The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.
- Raises:
ValueError β If min_bin_count or bin_count_multiple are invalid.
- class nemo_rl.data.packing.algorithms.FirstFitShufflePacker(
- bin_capacity: int,
- collect_metrics: bool = False,
- min_bin_count: Optional[int] = None,
- bin_count_multiple: Optional[int] = None,
Bases:
nemo_rl.data.packing.algorithms.FirstFitPacker
First-Fit Shuffle algorithm for sequence packing.
This algorithm randomly shuffles the sequences and then places each sequence into the first bin where it fits.
Time complexity: O(n * m) for packing, where n is the number of sequences and m is the number of bins.
Initialization
Initialize the sequence packer.
- Parameters:
bin_capacity β The maximum capacity of each bin.
collect_metrics β Whether to collect metrics across multiple packing operations.
min_bin_count β Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.
bin_count_multiple β The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.
- Raises:
ValueError β If min_bin_count or bin_count_multiple are invalid.
- class nemo_rl.data.packing.algorithms.ModifiedFirstFitDecreasingPacker(
- bin_capacity: int,
- collect_metrics: bool = False,
- min_bin_count: Optional[int] = None,
- bin_count_multiple: Optional[int] = None,
Bases:
nemo_rl.data.packing.algorithms.SequencePacker
Modified First-Fit Decreasing (MFFD) algorithm for sequence packing.
This algorithm implements the Johnson & Garey (1985) Modified First-Fit-Decreasing heuristic. It classifies items into four categories (large, medium, small, tiny) and uses a sophisticated 5-phase packing strategy to achieve better bin utilization than standard First-Fit Decreasing.
The algorithm phases:
Classify items by size relative to bin capacity
Create one bin per large item
Add medium items to large bins (forward pass)
Add pairs of small items to bins with medium items (backward pass)
Greedily fit remaining items
Apply FFD to any leftovers
Time complexity: O(n log n) for sorting + O(n * m) for packing, where n is the number of sequences and m is the number of bins.
Initialization
Initialize the sequence packer.
- Parameters:
bin_capacity β The maximum capacity of each bin.
collect_metrics β Whether to collect metrics across multiple packing operations.
min_bin_count β Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.
bin_count_multiple β The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.
- Raises:
ValueError β If min_bin_count or bin_count_multiple are invalid.
- _classify_items(
- items: List[Tuple[int, int]],
Split items into large / medium / small / tiny classes.
Follows the classification used by Johnson & Garey: large : (C/2, C] medium : (C/3, C/2] small : (C/6, C/3] tiny : (0 , C/6]
- Parameters:
items β List of (index, size) tuples
- Returns:
Tuple of four lists (large, medium, small, tiny) without additional sorting.
- _pack_implementation(
- sequence_lengths: List[int],
Pack sequences using the Modified First-Fit Decreasing algorithm.
- Parameters:
sequence_lengths β A list of sequence lengths to pack.
- Returns:
A list of bins, where each bin is a list of indices into the original sequence_lengths list.
- nemo_rl.data.packing.algorithms.get_packer(
- algorithm: Union[nemo_rl.data.packing.algorithms.PackingAlgorithm, str],
- bin_capacity: int,
- collect_metrics: bool = False,
- min_bin_count: Optional[int] = None,
- bin_count_multiple: Optional[int] = None,
Factory function to get a sequence packer based on the algorithm.
- Parameters:
algorithm β The packing algorithm to use. Can be either a PackingAlgorithm enum value or a string (case-insensitive) matching one of the enum names.
bin_capacity β The maximum capacity of each bin.
collect_metrics β Whether to collect metrics across multiple packing operations.
min_bin_count β Minimum number of bins to create, even if fewer would suffice. If None, no minimum is enforced.
bin_count_multiple β The total number of bins must be a multiple of this value. If None, no multiple constraint is enforced.
- Returns:
A SequencePacker instance for the specified algorithm.
- Raises:
ValueError β If the algorithm is not recognized.