bridge.data.vlm_datasets.step37_flickr8k.packing#

Greedy non-truncation packing.

Walks the sample sizes in order and greedily fills each pack up to max_len without ever truncating a sample. Do NOT modify the loop arithmetic or the flush() semantics — the exact sequence of “drop” vs “extend” decisions determines the contents of every pack, and any reordering would shift the entire downstream packing layout.

Module Contents#

Classes#

PackingResult

Result metadata from greedy sample packing.

Functions#

pack

Pack ordered sample lengths into contiguous groups without truncation.

API#

class bridge.data.vlm_datasets.step37_flickr8k.packing.PackingResult#

Result metadata from greedy sample packing.

num_packed_samples: int#

None

num_droped: int#

None

packed_sample_ranges: list[tuple[int, int]]#

None

Offset & Num of samples packed for each packed-sample, e.g. [(0, 2), (3, 2), …]

bridge.data.vlm_datasets.step37_flickr8k.packing.pack(
sizes: list[int],
max_len: int,
oversize_policy: Literal[drop, extend],
) bridge.data.vlm_datasets.step37_flickr8k.packing.PackingResult#

Pack ordered sample lengths into contiguous groups without truncation.

Parameters:
  • sizes – Token lengths for the samples to pack.

  • max_len – Maximum packed sequence length.

  • oversize_policy – Whether to drop oversize samples or keep them in extended packs.

Returns:

Metadata describing the packed sample ranges and dropped sample count.