> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.llm.packed_sequence

## Module Contents

### Functions

| Name                                                                                                                                               | Description                                                                                 |
| -------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
| [`_convert_to_tensors`](#nemo_automodel-components-datasets-llm-packed_sequence-_convert_to_tensors)                                               | Converts a pack into tensors. Pack comes in as a dict of lists and is converted to tensors. |
| [`_fill_labels_with_cross_entropy_ignore_idx`](#nemo_automodel-components-datasets-llm-packed_sequence-_fill_labels_with_cross_entropy_ignore_idx) | -                                                                                           |
| [`_pad_pack`](#nemo_automodel-components-datasets-llm-packed_sequence-_pad_pack)                                                                   | Pads a pack to `packed_sequence_size`.                                                      |
| [`_should_stop_packing`](#nemo_automodel-components-datasets-llm-packed_sequence-_should_stop_packing)                                             | If max packs is set, stop packing when we reach that number.                                |
| [`_split_and_add_pack`](#nemo_automodel-components-datasets-llm-packed_sequence-_split_and_add_pack)                                               | Splits the current pack at the boundary, processes it, adds it to `packs`.                  |
| [`_tensorize_and_pad_pack`](#nemo_automodel-components-datasets-llm-packed_sequence-_tensorize_and_pad_pack)                                       | converts to tensors, pads a pack and returns it.                                            |
| [`build_block_causal_additive_mask`](#nemo_automodel-components-datasets-llm-packed_sequence-build_block_causal_additive_mask)                     | Build a `[B, 1, T, T]` additive block-causal mask directly on `device`.                     |
| [`create_block_causal_mask`](#nemo_automodel-components-datasets-llm-packed_sequence-create_block_causal_mask)                                     | Creates causal mask block for specified lengths.                                            |
| [`pack_dataset`](#nemo_automodel-components-datasets-llm-packed_sequence-pack_dataset)                                                             | Pack the dataset to defined length.                                                         |
| [`packed_block_causal_mask`](#nemo_automodel-components-datasets-llm-packed_sequence-packed_block_causal_mask)                                     | Create a 2D block causal document mask for a batch of packed sequences.                     |

### Data

[`CROSS_ENTROPY_IGNORE_IDX`](#nemo_automodel-components-datasets-llm-packed_sequence-CROSS_ENTROPY_IGNORE_IDX)

[`PACK_TYPE`](#nemo_automodel-components-datasets-llm-packed_sequence-PACK_TYPE)

[`logger`](#nemo_automodel-components-datasets-llm-packed_sequence-logger)

### API

```python
nemo_automodel.components.datasets.llm.packed_sequence._convert_to_tensors(
    pack: nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE
) -> nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE
```

Converts a pack into tensors. Pack comes in as a dict of lists and is converted to tensors.

```python
nemo_automodel.components.datasets.llm.packed_sequence._fill_labels_with_cross_entropy_ignore_idx(
    labels: list[int],
    loss_mask: list[int]
) -> list[int]
```

```python
nemo_automodel.components.datasets.llm.packed_sequence._pad_pack(
    pack: nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE,
    padding_idx: int,
    packed_sequence_size: int,
    cross_entropy_ignore_idx: int = CROSS_ENTROPY_IGNORE_IDX,
    cp_size: int = 1
) -> nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE
```

Pads a pack to `packed_sequence_size`.

seq\_lens contains original lengths.
seq\_lens\_padded applies CP padding (if cp\_size > 1) and pack-level padding.

```python
nemo_automodel.components.datasets.llm.packed_sequence._should_stop_packing(
    max_packs: int,
    packs: list[nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE]
) -> bool
```

If max packs is set, stop packing when we reach that number.

```python
nemo_automodel.components.datasets.llm.packed_sequence._split_and_add_pack(
    current_pack: nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE,
    packs: list[nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE],
    previous_sample_boundary: int,
    packed_sequence_size: int,
    padding_idx: int,
    cross_entropy_ignore_idx = CROSS_ENTROPY_IGNORE_IDX,
    cp_size: int = 1
) -> nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE
```

Splits the current pack at the boundary, processes it, adds it to `packs`.

...and returns the start of the next pack.

TODO(@akoumparouli): refactor.

```python
nemo_automodel.components.datasets.llm.packed_sequence._tensorize_and_pad_pack(
    pack: nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE,
    padding_idx: int,
    packed_sequence_size: int,
    cross_entropy_ignore_idx: int = CROSS_ENTROPY_IGNORE_IDX,
    cp_size: int = 1
) -> None
```

converts to tensors, pads a pack and returns it.

```python
nemo_automodel.components.datasets.llm.packed_sequence.build_block_causal_additive_mask(
    seq_lens: torch.Tensor,
    seq_length: int,
    dtype: torch.dtype,
    device: torch.device
) -> torch.Tensor
```

Build a `[B, 1, T, T]` additive block-causal mask directly on `device`.

In-document causal attention is allowed (`0`); cross-document and padding
positions are `finfo(dtype).min`. `seq_lens` is the `[B, max_docs]`
0-padded per-document length tensor; each row's non-zero entries sum to
`seq_length` (trailing pad folded into the final document).

```python
nemo_automodel.components.datasets.llm.packed_sequence.create_block_causal_mask(
    seq_lens: list[torch.Tensor]
) -> torch.Tensor
```

Creates causal mask block for specified lengths.

In particular, given a batch tensor of seq lens defining the lengths of samples in each pack,
Construct a 2D block causal mask for each pack in the batch. For example, if
a single sample's seq\_lens is \[3, 2, 1], the mask would be::
mask = \[
\[1, 0, 0, 0, 0, 0],
\[1, 1, 0, 0, 0, 0],
\[1, 1, 1, 0, 0, 0],
\[0, 0, 0, 1, 0, 0],
\[0, 0, 0, 1, 1, 0],
\[0, 0, 0, 0, 0, 1],
]

**Parameters:**

Sequence lengths of samples in each pack in the batch,
shape (batch\_size, n), where n is the max number of sequences in a pack and can vary
across packs.

**Returns:** `torch.Tensor`

Block causal mask of shape (batch\_size, packed\_sequence\_size, packed\_sequence\_size).

```python
nemo_automodel.components.datasets.llm.packed_sequence.pack_dataset(
    dataset,
    split,
    packed_sequence_size,
    max_packs = None,
    padding_idx = 0,
    drop_long_samples = True,
    cp_size = 1
)
```

Pack the dataset to defined length.

In particulat, it will iterate through the dataset. Use a buffer to hold samples until
packed\_sequence\_size, then append the buffer to packs as a single "packed" sample.
Continue until max\_packs or end of dataset.

**Parameters:**

Actual dataset (can be 'train', 'val' or 'test')

Whether the dataset is 'train', 'val' or 'test'

Number of tokens in a pack

Maximum number of packs. Default: None

If True, drop samples that are longer than packed\_sequence\_size.

Context parallel size. When > 1, each sequence will be padded to be
divisible by 2\*cp\_size for context parallel processing. Default: 1 (no CP).

```python
nemo_automodel.components.datasets.llm.packed_sequence.packed_block_causal_mask(
    seq_lens: list[torch.Tensor]
)
```

Create a 2D block causal document mask for a batch of packed sequences.

**Parameters:**

Sequence lengths of samples in each pack in the batch,
shape (batch\_size, n), where n is the max number of sequences in a pack and can vary
across packs.

**Returns:**

BlockMask or Tensor if torch version \< 2.5.0.

```python
nemo_automodel.components.datasets.llm.packed_sequence.CROSS_ENTROPY_IGNORE_IDX = -100
```

```python
nemo_automodel.components.datasets.llm.packed_sequence.PACK_TYPE = dict[str, torch.Tensor | list[int]]
```

```python
nemo_automodel.components.datasets.llm.packed_sequence.logger = logging.getLogger(__name__)
```