`bridge.data.vlm_datasets.step37_flickr8k.packing`#

Greedy non-truncation packing.

Walks the sample sizes in order and greedily fills each pack up to max_len without ever truncating a sample. Do NOT modify the loop arithmetic or the flush() semantics — the exact sequence of “drop” vs “extend” decisions determines the contents of every pack, and any reordering would shift the entire downstream packing layout.

Module Contents#

Classes#

PackingResult

Result metadata from greedy sample packing.

Functions#

pack

Pack ordered sample lengths into contiguous groups without truncation.

API#

class bridge.data.vlm_datasets.step37_flickr8k.packing.PackingResult#

Result metadata from greedy sample packing.

num_packed_samples: int#: None

num_droped: int#: None

packed_sample_ranges: list[tuple[int, int]]#

None

Offset & Num of samples packed for each packed-sample, e.g. [(0, 2), (3, 2), …]

bridge.data.vlm_datasets.step37_flickr8k.packing.pack( sizes: list[int], max_len: int, oversize_policy: Literal[drop, extend], ) → bridge.data.vlm_datasets.step37_flickr8k.packing.PackingResult#

Pack ordered sample lengths into contiguous groups without truncation.

Parameters:

sizes – Token lengths for the samples to pack.
max_len – Maximum packed sequence length.
oversize_policy – Whether to drop oversize samples or keep them in extended packs.

Returns:

Metadata describing the packed sample ranges and dropped sample count.

bridge.data.vlm_datasets.step37_flickr8k.packing#

Module Contents#

Classes#

Functions#

API#

`bridge.data.vlm_datasets.step37_flickr8k.packing`#