> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.llm.formatting_utils

## Module Contents

### Functions

| Name                                                                                                                                | Description                                                             |
| ----------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| [`_add_pad_token`](#nemo_automodel-components-datasets-llm-formatting_utils-_add_pad_token)                                         | Add pad token to tokenizer if not present.                              |
| [`_build_multiturn_assistant_mask`](#nemo_automodel-components-datasets-llm-formatting_utils-_build_multiturn_assistant_mask)       | Build a fallback loss mask that supervises every assistant turn.        |
| [`_build_reasoning_mask`](#nemo_automodel-components-datasets-llm-formatting_utils-_build_reasoning_mask)                           | Build a token mask for reasoning\_content spans inside assistant turns. |
| [`_find_reasoning_span`](#nemo_automodel-components-datasets-llm-formatting_utils-_find_reasoning_span)                             | Locate the contiguous token span attributable to reasoning content.     |
| [`_get_right_trailing_pad_mask`](#nemo_automodel-components-datasets-llm-formatting_utils-_get_right_trailing_pad_mask)             | Boolean mask identifying right-trailing padding positions.              |
| [`_has_chat_template`](#nemo_automodel-components-datasets-llm-formatting_utils-_has_chat_template)                                 | Check if the tokenizer supports a chat template.                        |
| [`_mask_labels_to_last_turn`](#nemo_automodel-components-datasets-llm-formatting_utils-_mask_labels_to_last_turn)                   | Restrict supervision to the final assistant turn (`mask_history`).      |
| [`_masked_reasoning_message`](#nemo_automodel-components-datasets-llm-formatting_utils-_masked_reasoning_message)                   | Return a copy of a message with reasoning\_content removed.             |
| [`_maybe_shift_mask_for_left_padding`](#nemo_automodel-components-datasets-llm-formatting_utils-_maybe_shift_mask_for_left_padding) | Shift a token-level mask right when the tokenizer uses left padding.    |
| [`_package_tokenized_example`](#nemo_automodel-components-datasets-llm-formatting_utils-_package_tokenized_example)                 | Package a tokenized example with proper masking and padding.            |
| [`_pad_to_seq_length`](#nemo_automodel-components-datasets-llm-formatting_utils-_pad_to_seq_length)                                 | Pad a sample to a specific sequence length.                             |
| [`_resolve_chat_template`](#nemo_automodel-components-datasets-llm-formatting_utils-_resolve_chat_template)                         | Resolve a chat template string that may be a file path.                 |
| [`_tokenize_chat`](#nemo_automodel-components-datasets-llm-formatting_utils-_tokenize_chat)                                         | Tokenize chat messages without padding and return input ids.            |
| [`_tokenized_chat_length`](#nemo_automodel-components-datasets-llm-formatting_utils-_tokenized_chat_length)                         | Return the tokenized chat length for a message prefix without padding.  |
| [`format_chat_template`](#nemo_automodel-components-datasets-llm-formatting_utils-format_chat_template)                             | Format a chat template style example.                                   |
| [`format_prompt_completion`](#nemo_automodel-components-datasets-llm-formatting_utils-format_prompt_completion)                     | Format a prompt-completion style example (without chat template).       |

### Data

[`GENERATION_REGEX`](#nemo_automodel-components-datasets-llm-formatting_utils-GENERATION_REGEX)

[`_warned_add_pad_token`](#nemo_automodel-components-datasets-llm-formatting_utils-_warned_add_pad_token)

[`logger`](#nemo_automodel-components-datasets-llm-formatting_utils-logger)

### API

```python
nemo_automodel.components.datasets.llm.formatting_utils._add_pad_token(
    tokenizer
)
```

Add pad token to tokenizer if not present.

```python
nemo_automodel.components.datasets.llm.formatting_utils._build_multiturn_assistant_mask(
    tokenizer: transformers.PreTrainedTokenizer,
    formatted_text: typing.List[typing.Dict[str, typing.Any]],
    input_ids: typing.List[int],
    tools: typing.Optional[typing.List[typing.Dict]] = None,
    truncation: typing.Union[str, bool] = 'do_not_truncate',
    seq_length: typing.Optional[int] = None,
    full_length: typing.Optional[int] = None
) -> typing.List[int]
```

Build a fallback loss mask that supervises every assistant turn.

Each assistant span is located by tokenizing the conversation prefixes
before and after the turn, which is O(turns) `apply_chat_template` calls.
Two reductions keep that from re-doing work:

* `full_length` is the caller's already-known unpadded token count for the
  whole conversation (`sum(attention_mask)`). When the dialogue ends on an
  assistant turn its closing boundary is the full conversation, so passing
  `full_length` skips re-tokenizing the entire prefix — the single most
  expensive call in the loop.
* Prefix lengths are memoized so a boundary shared by adjacent turns (a
  turn's end and the next turn's start) is tokenized at most once.

Both are exact: `full_length` and the memoized values equal what
:func:`_tokenized_chat_length` would return, so the mask is unchanged.

```python
nemo_automodel.components.datasets.llm.formatting_utils._build_reasoning_mask(
    tokenizer: transformers.PreTrainedTokenizer,
    formatted_text: typing.List[typing.Dict[str, typing.Any]],
    input_ids: typing.List[int],
    tools: typing.Optional[typing.List[typing.Dict]] = None,
    truncation: typing.Union[str, bool] = 'do_not_truncate',
    seq_length: typing.Optional[int] = None
) -> typing.List[int]
```

Build a token mask for reasoning\_content spans inside assistant turns.

```python
nemo_automodel.components.datasets.llm.formatting_utils._find_reasoning_span(
    full_segment: typing.List[int],
    masked_segment: typing.List[int]
) -> typing.Optional[tuple[int, int]]
```

Locate the contiguous token span attributable to reasoning content.

```python
nemo_automodel.components.datasets.llm.formatting_utils._get_right_trailing_pad_mask(
    sequence: torch.Tensor,
    pad_token_id: int,
    eos_token_id: int
) -> torch.Tensor
```

Boolean mask identifying right-trailing padding positions.

When *pad\_token\_id != eos\_token\_id*, it is simply `sequence == pad_token_id`.

When the two IDs collide, a plain equality check would also match real EOS
tokens inside the content.  In that case the function locates the trailing
contiguous run of the shared token and treats all positions **after the
first one** in that run as padding.  The first token in the trailing run is
the real EOS and is kept unmasked so the model still learns to predict
end-of-sequence.

**Parameters:**

1-D token id tensor.

The token id used for padding.

The token id used for end-of-sequence.  When equal to
*pad\_token\_id* the positional trailing-run logic is used.

**Returns:** `torch.Tensor`

Boolean tensor (same shape as *sequence*) where `True` = padding.

```python
nemo_automodel.components.datasets.llm.formatting_utils._has_chat_template(
    tokenizer: transformers.PreTrainedTokenizer
) -> bool
```

Check if the tokenizer supports a chat template.

**Parameters:**

The tokenizer to check.

**Returns:** `bool`

True if the tokenizer supports a chat template, False otherwise.

```python
nemo_automodel.components.datasets.llm.formatting_utils._mask_labels_to_last_turn(
    mask: typing.List[int],
    ignore_index: int = -100
) -> typing.List[int]
```

Restrict supervision to the final assistant turn (`mask_history`).

Operates on any per-token sequence where `ignore_index` marks
unsupervised positions: a label list (`ignore_index=-100`) or a 0/1
assistant mask (`ignore_index=0`). Each assistant turn renders as a
single contiguous supervised span, so this keeps only the last such run
and rewrites every earlier supervised position to `ignore_index`.

Apply this to the assistant mask **before** any reasoning\_content holes are
punched into it; running it on already-holed labels would treat the
reasoning gap as a turn boundary and drop in-turn content before the hole.

**Parameters:**

per-token labels or 0/1 mask (`ignore_index` marks unsupervised).

the value marking unsupervised positions.

**Returns:** `List[int]`

The same list, mutated so only the final supervised run is kept.

```python
nemo_automodel.components.datasets.llm.formatting_utils._masked_reasoning_message(
    message: typing.Dict[str, typing.Any]
) -> typing.Dict[str, typing.Any]
```

Return a copy of a message with reasoning\_content removed.

```python
nemo_automodel.components.datasets.llm.formatting_utils._maybe_shift_mask_for_left_padding(
    mask: typing.List[int],
    tokenizer: transformers.PreTrainedTokenizer,
    attention_mask: typing.Optional[typing.List[int]]
) -> typing.List[int]
```

Shift a token-level mask right when the tokenizer uses left padding.

`_build_multiturn_assistant_mask` and `_build_reasoning_mask` compute
span indices from **unpadded** (left-aligned) tokenizations.  When the
tokenizer pads on the left, actual content is right-aligned in
`input_ids`, so the mask must be shifted right by the padding offset to
keep positions aligned.

For right-padding tokenizers (the majority) this is a no-op.

```python
nemo_automodel.components.datasets.llm.formatting_utils._package_tokenized_example(
    tokenizer,
    input_ids,
    assistant_masks,
    eos_token_id,
    pad_token_id,
    seq_length,
    truncation = 'do_not_truncate',
    padding = 'do_not_pad',
    unshifted = False
)
```

Package a tokenized example with proper masking and padding.

Returns:
A dictionary with input\_ids, labels, and attention\_mask.
When *unshifted* is True, `labels` is replaced by `loss_mask`.

**Parameters:**

The tokenizer to use.

The tokenized input ids.

Boolean mask indicating which tokens are assistant/answer tokens (1) vs prompt tokens (0).

The end-of-sequence token id.

The padding token id.

Optional sequence length for padding.

Optional truncation strategy.

Optional padding strategy.

If True, return unshifted format for dLLM training
(`input_ids` at full length with `loss_mask` instead of
shifted `input_ids`/`labels`).

```python
nemo_automodel.components.datasets.llm.formatting_utils._pad_to_seq_length(
    sample,
    pad_token_id,
    seq_length
)
```

Pad a sample to a specific sequence length.

```python
nemo_automodel.components.datasets.llm.formatting_utils._resolve_chat_template(
    chat_template: typing.Optional[str]
) -> typing.Optional[str]
```

Resolve a chat template string that may be a file path.

If *chat\_template* points to an existing file, its contents are returned.
If opening it as a file fails and the string contains Jinja-like characters
(`&#123;`, `&#125;`, or newlines) it is treated as a literal template.  Otherwise
a :class:`ValueError` is raised so the caller knows the path was invalid.

**Parameters:**

A Jinja template string or path to a template file.

**Returns:** `Optional[str]`

The resolved template string, or `None` when the input is `None`.

```python
nemo_automodel.components.datasets.llm.formatting_utils._tokenize_chat(
    tokenizer: transformers.PreTrainedTokenizer,
    messages: typing.List[typing.Dict[str, typing.Any]],
    tools: typing.Optional[typing.List[typing.Dict]] = None,
    truncation: typing.Union[str, bool] = 'do_not_truncate',
    seq_length: typing.Optional[int] = None
) -> typing.List[int]
```

Tokenize chat messages without padding and return input ids.

```python
nemo_automodel.components.datasets.llm.formatting_utils._tokenized_chat_length(
    tokenizer: transformers.PreTrainedTokenizer,
    messages: typing.List[typing.Dict[str, str]],
    tools: typing.Optional[typing.List[typing.Dict]] = None,
    truncation: typing.Union[str, bool] = 'do_not_truncate',
    seq_length: typing.Optional[int] = None
) -> int
```

Return the tokenized chat length for a message prefix without padding.

```python
nemo_automodel.components.datasets.llm.formatting_utils.format_chat_template(
    tokenizer: transformers.PreTrainedTokenizer,
    formatted_text: typing.List[typing.Dict[str, typing.Any]],
    eos_token_id: int,
    pad_token_id: int,
    seq_length: typing.Optional[int] = None,
    padding: typing.Union[str, bool] = 'do_not_pad',
    truncation: typing.Union[str, bool] = 'do_not_truncate',
    tools: typing.Optional[typing.List[typing.Dict]] = None,
    answer_only_loss_mask: bool = True,
    mask_reasoning_content: bool = False,
    train_on_last_turn_only: bool = False,
    unshifted: bool = False
) -> typing.Dict[str, typing.List[int]]
```

Format a chat template style example.

**Parameters:**

The tokenizer to use.

The formatted text, with role tags embedded in the content.

The end-of-sequence token id.

The padding token id.

Optional sequence length for padding.

Optional list of tool definitions for function calling.

Whether to compute the loss mask only on the answer tokens.

Whether to exclude rendered reasoning\_content tokens from loss.

Whether to supervise only the final assistant turn,
masking every earlier assistant turn (`mask_history`). Applied to the
assistant mask before reasoning\_content is masked out.

**Returns:** `Dict[str, List[int]]`

A dictionary with the formatted example.

```python
nemo_automodel.components.datasets.llm.formatting_utils.format_prompt_completion(
    tokenizer: transformers.PreTrainedTokenizer,
    prompt: str,
    answer: str,
    eos_token_id: int,
    pad_token_id: int,
    seq_length: typing.Optional[int] = None,
    padding: typing.Union[str, bool] = 'do_not_pad',
    truncation: typing.Union[str, bool] = 'do_not_truncate',
    answer_only_loss_mask: bool = True,
    unshifted: bool = False
) -> typing.Dict[str, typing.List[int]]
```

Format a prompt-completion style example (without chat template).

**Parameters:**

The tokenizer to use.

The prompt string (e.g. context + question).

The answer string.

The end-of-sequence token id.

The padding token id.

Optional sequence length for padding.

**Returns:** `Dict[str, List[int]]`

A dictionary with the formatted example.

```python
nemo_automodel.components.datasets.llm.formatting_utils.GENERATION_REGEX = re.compile('\\{%-?\\s+generation\\s+-?%\\}')
```

```python
nemo_automodel.components.datasets.llm.formatting_utils._warned_add_pad_token = set()
```

```python
nemo_automodel.components.datasets.llm.formatting_utils.logger = logging.getLogger(__name__)
```