nemo_automodel.components.datasets.vlm.samplers
nemo_automodel.components.datasets.vlm.samplers
Module Contents
Classes
Functions
Data
API
Bases: Sampler
Sampler that groups samples by total token count for balanced distributed training.
With shard_data=True each rank owns a different subset of data.
This sampler sorts every rank’s indices by total tokens
(text_tokens + media_tokens, descending). All ranks share the
same seed + epoch so position N on every rank corresponds to a
sample of similar length, keeping cross-rank padding minimal.
Per-epoch randomness is achieved by rotating the sorted order by a deterministic random offset (same on every rank).
Parameters:
The dataset to sample from.
Base random seed (same value on every rank).
Optional HuggingFace processor (e.g. Qwen2VLProcessor).
Used to read image_processor / video_processor attributes
for accurate media token estimation via smart_resize.
Compute token lengths with direct list access for speed.
Estimate token count for one image from its [height, width] metadata.
Return (text_tokens, media_tokens) for one example.
Uses pre-computed _text_tokens / _media_tokens when available
(written by scripts/precompute_tokens.py). Otherwise falls back
to heuristic estimation.
Estimate token count for one video from its
[total_frames, height, width, fps, duration] metadata.
Unwrap dataset wrappers to get the underlying list for direct access.
Set the epoch for deterministic shuffling (standard PyTorch pattern).
Compute the resized (height, width) for an image, matching
transformers.models.qwen2_vl.image_processing_qwen2_vl.smart_resize.
Compute the resized (height, width) for a video, matching
transformers.models.qwen3_vl.video_processing_qwen3_vl.smart_resize.