> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.vlm.samplers

## Module Contents

### Classes

| Name                                                                                            | Description                                                   |
| ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------- |
| [`LengthGroupedSampler`](#nemo_automodel-components-datasets-vlm-samplers-LengthGroupedSampler) | Sampler that groups samples by total token count for balanced |

### Functions

| Name                                                                                          | Description                                                |
| --------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| [`_smart_resize_image`](#nemo_automodel-components-datasets-vlm-samplers-_smart_resize_image) | Compute the resized (height, width) for an image, matching |
| [`_smart_resize_video`](#nemo_automodel-components-datasets-vlm-samplers-_smart_resize_video) | Compute the resized (height, width) for a video, matching  |

### Data

[`logger`](#nemo_automodel-components-datasets-vlm-samplers-logger)

### API

```python
class nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler(
    dataset,
    seed = 42,
    processor = None,
    max_length = None,
    batch_size = 1
)
```

**Bases:** `Sampler`

Sampler that groups samples by total token count for balanced
distributed training.

With `shard_data=True` each rank owns a different subset of data.
This sampler sorts every rank's indices by **total tokens**
(`text_tokens + media_tokens`, descending).  All ranks share the
same `seed + epoch` so position *N* on every rank corresponds to a
sample of similar length, keeping cross-rank padding minimal.

Per-epoch randomness is achieved by rotating the sorted order by a
deterministic random offset (same on every rank).

**Parameters:**

The dataset to sample from.

Base random seed (same value on every rank).

Optional HuggingFace processor (e.g. `Qwen2VLProcessor`).
Used to read `image_processor` / `video_processor` attributes
for accurate media token estimation via `smart_resize`.

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler.__iter__()
```

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler.__len__()
```

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler._compute_or_load_lengths(
    dataset
)
```

Compute token lengths with direct list access for speed.

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler._estimate_image_tokens(
    img_meta
)
```

Estimate token count for one image from its `[height, width]` metadata.

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler._estimate_tokens(
    example
)
```

Return `(text_tokens, media_tokens)` for one example.

Uses pre-computed `_text_tokens` / `_media_tokens` when available
(written by `scripts/precompute_tokens.py`).  Otherwise falls back
to heuristic estimation.

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler._estimate_video_tokens(
    vid_meta
)
```

Estimate token count for one video from its
`[total_frames, height, width, fps, duration]` metadata.

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler._extract_image_config(
    processor
)
```

staticmethod

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler._extract_video_config(
    processor
)
```

staticmethod

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler._get_raw_samples(
    dataset
)
```

staticmethod

Unwrap dataset wrappers to get the underlying list for direct access.

```python
nemo_automodel.components.datasets.vlm.samplers.LengthGroupedSampler.set_epoch(
    epoch
)
```

Set the epoch for deterministic shuffling (standard PyTorch pattern).

```python
nemo_automodel.components.datasets.vlm.samplers._smart_resize_image(
    height,
    width,
    factor = 28,
    min_pixels = 56 * 56,
    max_pixels = 14 * 14 * 4 * 1280
)
```

Compute the resized (height, width) for an image, matching
`transformers.models.qwen2_vl.image_processing_qwen2_vl.smart_resize`.

```python
nemo_automodel.components.datasets.vlm.samplers._smart_resize_video(
    num_frames,
    height,
    width,
    temporal_factor = 2,
    factor = 32,
    min_pixels = 128 * 128,
    max_pixels = 16 * 16 * 2 * 2 * 2 * 6144
)
```

Compute the resized (height, width) for a video, matching
`transformers.models.qwen3_vl.video_processing_qwen3_vl.smart_resize`.

```python
nemo_automodel.components.datasets.vlm.samplers.logger = logging.getLogger(__name__)
```