bridge.data.vlm_datasets.step37_flickr8k.packing#
Greedy non-truncation packing.
Walks the sample sizes in order and greedily fills each pack up to
max_len without ever truncating a sample. Do NOT modify the loop
arithmetic or the flush() semantics — the exact sequence of “drop” vs
“extend” decisions determines the contents of every pack, and any
reordering would shift the entire downstream packing layout.
Module Contents#
Classes#
Result metadata from greedy sample packing. |
Functions#
Pack ordered sample lengths into contiguous groups without truncation. |
API#
- class bridge.data.vlm_datasets.step37_flickr8k.packing.PackingResult#
Result metadata from greedy sample packing.
- num_packed_samples: int#
None
- num_droped: int#
None
- packed_sample_ranges: list[tuple[int, int]]#
None
Offset & Num of samples packed for each packed-sample, e.g. [(0, 2), (3, 2), …]
- bridge.data.vlm_datasets.step37_flickr8k.packing.pack(
- sizes: list[int],
- max_len: int,
- oversize_policy: Literal[drop, extend],
Pack ordered sample lengths into contiguous groups without truncation.
- Parameters:
sizes – Token lengths for the samples to pack.
max_len – Maximum packed sequence length.
oversize_policy – Whether to drop oversize samples or keep them in extended packs.
- Returns:
Metadata describing the packed sample ranges and dropped sample count.