> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.llm.agent_chat

Multi-turn agent SFT dataset adapter.

Loads function-calling chat datasets where each example contains tool
definitions and a multi-turn message list that includes tool calls and
tool responses, then renders them through the tokenizer's chat template
with `answer_only_loss_mask=True` so that `user` and `tool` tokens
are excluded from the loss.

Two input schemas are accepted:

1. Swift / chatml `messages` schema (used by
   `AI-ModelScope/function-calling-chatml`)::

   \{
   "tools": "\[\{...openai tool schema...}]",
   "messages": \[
   \{"role": "user",          "content": "..."},
   \{"role": "tool\_call",     "content": "\{"name": ..., "arguments": ...}"},
   \{"role": "tool\_response", "content": "..."},
   \{"role": "assistant",     "content": "..."}
   ]
   }

2. ShareGPT `conversations` schema (used by
   `llamafactory/glaive_toolcall_en` and similar)::

   \{
   "tools": "\[...]",
   "conversations": \[
   \{"from": "human",         "value": "..."},
   \{"from": "function\_call", "value": "..."},
   \{"from": "observation",   "value": "..."},
   \{"from": "gpt",           "value": "..."}
   ]
   }

Consecutive `tool_call` entries are merged into a single assistant
message with parallel `tool_calls`; the following `tool_response`
entries are paired with those calls in order.

## Module Contents

### Functions

| Name                                                                                                                          | Description                                                                             |
| ----------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| [`_convert_messages`](#nemo_automodel-components-datasets-llm-agent_chat-_convert_messages)                                   | Convert chatml-style agent messages to OpenAI chat-completions format.                  |
| [`_extract_eval_samples_from_example`](#nemo_automodel-components-datasets-llm-agent_chat-_extract_eval_samples_from_example) | Extract one eval sample per assistant tool-call position.                               |
| [`_format_example`](#nemo_automodel-components-datasets-llm-agent_chat-_format_example)                                       | Render one agent example into tokenized `input_ids` / `labels`.                         |
| [`_format_example_impl`](#nemo_automodel-components-datasets-llm-agent_chat-_format_example_impl)                             | Render one agent example into tokenized `input_ids` / `labels`.                         |
| [`_json_load_if_str`](#nemo_automodel-components-datasets-llm-agent_chat-_json_load_if_str)                                   | Parse `value` as JSON if it is a string, otherwise return as-is.                        |
| [`_reasoning_content`](#nemo_automodel-components-datasets-llm-agent_chat-_reasoning_content)                                 | Return a turn's reasoning/thinking trace as a string, or `None` if absent.              |
| [`_sharegpt_to_chatml`](#nemo_automodel-components-datasets-llm-agent_chat-_sharegpt_to_chatml)                               | Convert ShareGPT `&#123;from, value&#125;` turns to chatml `&#123;role, content&#125;`. |
| [`_truncate_messages_to_fit`](#nemo_automodel-components-datasets-llm-agent_chat-_truncate_messages_to_fit)                   | Drop the oldest conversation exchanges until the dialogue fits `seq_length`.            |
| [`make_agent_chat_dataset`](#nemo_automodel-components-datasets-llm-agent_chat-make_agent_chat_dataset)                       | Load a multi-turn function-calling SFT dataset.                                         |
| [`make_agent_chat_eval_samples`](#nemo_automodel-components-datasets-llm-agent_chat-make_agent_chat_eval_samples)             | Build a flat list of tool-call eval samples from an agent SFT dataset.                  |

### Data

[`_SHAREGPT_ROLE_MAP`](#nemo_automodel-components-datasets-llm-agent_chat-_SHAREGPT_ROLE_MAP)

[`_VALID_MESSAGE_ROLES`](#nemo_automodel-components-datasets-llm-agent_chat-_VALID_MESSAGE_ROLES)

[`logger`](#nemo_automodel-components-datasets-llm-agent_chat-logger)

### API

```python
nemo_automodel.components.datasets.llm.agent_chat._convert_messages(
    messages: typing.List[typing.Dict[str, typing.Any]],
    example_id: typing.Optional[typing.Union[int, str]] = None,
    drop_history_reasoning_content: bool = False
) -> typing.List[typing.Dict[str, typing.Any]]
```

Convert chatml-style agent messages to OpenAI chat-completions format.

Consecutive `tool_call` entries collapse into one assistant message
with parallel `tool_calls`. When the preceding emitted turn is an
assistant message without tool\_calls (e.g. an assistant `content`
that reasons before calling tools), the tool\_calls attach to that
message instead of creating a second consecutive assistant turn —
this preserves the natural single-turn shape the model produces at
inference. `tool_response` (or `tool`) entries that follow are
paired with those tool\_call ids in order.

**Parameters:**

chatml-style turns with roles in `_VALID_MESSAGE_ROLES`.

Optional identifier used to derive unique tool\_call ids.

If True, strip `reasoning_content`
from every assistant turn except the final one, so historical
thinking traces are not rendered into the prompt. This matches
inference, where the model never sees its own prior-turn thinking.

**Returns:** `List[Dict[str, Any]]`

A list of OpenAI-format messages suitable for `apply_chat_template`.

```python
nemo_automodel.components.datasets.llm.agent_chat._extract_eval_samples_from_example(
    example: typing.Dict[str, typing.Any]
) -> typing.List[typing.Dict[str, typing.Any]]
```

Extract one eval sample per assistant tool-call position.

For each assistant turn in `example` that issues tool\_calls, emit
a sample whose `prompt_messages` are all messages strictly before
that turn and whose `gt_tool_calls` are the tool\_calls from that
turn. This lets the evaluator measure tool-call accuracy at every
position the model is expected to act, not just the first one.

Tool-call arguments are normalized from JSON-encoded strings (as
produced by :func:`_convert_messages`) back to dicts so callers can
compare against parser output directly.

**Parameters:**

a raw row from the agent SFT dataset (chatml `messages`
or ShareGPT `conversations` schema, with a `tools` field).

**Returns:** `List[Dict[str, Any]]`

A list of eval samples, each with keys `prompt_messages`,

```python
nemo_automodel.components.datasets.llm.agent_chat._format_example(
    example: typing.Dict[str, typing.Any],
    tokenizer,
    eos_token_id: int,
    pad_token_id: int,
    seq_length: typing.Optional[int] = None,
    padding: typing.Union[str, bool] = False,
    truncation: typing.Union[str, bool] = False,
    mask_reasoning_content: bool = False,
    train_on_last_turn_only: bool = False,
    drop_history_reasoning_content: bool = False,
    truncate_history: bool = False
) -> typing.Dict[str, typing.List[int]]
```

Render one agent example into tokenized `input_ids` / `labels`.

Thin wrapper that re-raises any parsing/rendering failure as a `ValueError`
tagged with the example id. Rows are rendered lazily inside the dataloader,
so without this a single malformed row surfaces as an opaque
`JSONDecodeError`/`AssertionError` deep in the stack — with no hint as to
which row caused it — and aborts the whole training run.

```python
nemo_automodel.components.datasets.llm.agent_chat._format_example_impl(
    example: typing.Dict[str, typing.Any],
    tokenizer,
    eos_token_id: int,
    pad_token_id: int,
    seq_length: typing.Optional[int] = None,
    padding: typing.Union[str, bool] = False,
    truncation: typing.Union[str, bool] = False,
    mask_reasoning_content: bool = False,
    train_on_last_turn_only: bool = False,
    drop_history_reasoning_content: bool = False,
    truncate_history: bool = False
) -> typing.Dict[str, typing.List[int]]
```

Render one agent example into tokenized `input_ids` / `labels`.

```python
nemo_automodel.components.datasets.llm.agent_chat._json_load_if_str(
    value: typing.Any
) -> typing.Any
```

Parse `value` as JSON if it is a string, otherwise return as-is.

```python
nemo_automodel.components.datasets.llm.agent_chat._reasoning_content(
    turn: typing.Dict[str, typing.Any]
) -> typing.Optional[str]
```

Return a turn's reasoning/thinking trace as a string, or `None` if absent.

Shared by every conversion path so the `reasoning_content` field is read
and coerced identically (a falsy/empty trace is treated as absent).

```python
nemo_automodel.components.datasets.llm.agent_chat._sharegpt_to_chatml(
    conversations: typing.List[typing.Dict[str, typing.Any]]
) -> typing.List[typing.Dict[str, typing.Any]]
```

Convert ShareGPT `&#123;from, value&#125;` turns to chatml `&#123;role, content&#125;`.

```python
nemo_automodel.components.datasets.llm.agent_chat._truncate_messages_to_fit(
    tokenizer,
    messages: typing.List[typing.Dict[str, typing.Any]],
    tools: typing.Optional[typing.List[typing.Dict[str, typing.Any]]],
    seq_length: int
) -> typing.List[typing.Dict[str, typing.Any]]
```

Drop the oldest conversation exchanges until the dialogue fits `seq_length`.

Unlike token-level `truncation` (which clips from the tokenizer's
`truncation_side` and so typically drops the final assistant answer — the
supervised target), this drops whole oldest *exchanges* while keeping any
leading `system` message, the tool definitions, and the final exchange
(the user turn that produces the last assistant answer). Tool-call/response
pairing among the survivors is preserved because exchanges are cut only at
`user` boundaries.

**Parameters:**

tokenizer with a chat template.

OpenAI-format messages from :func:`_convert_messages`.

tool definitions rendered alongside the messages.

the token budget the rendered dialogue must fit.

**Returns:** `List[Dict[str, Any]]`

`messages` unchanged when it already fits or has no `user` boundary

```python
nemo_automodel.components.datasets.llm.agent_chat.make_agent_chat_dataset(
    tokenizer,
    dataset_name: typing.Optional[str] = None,
    path: typing.Optional[typing.Union[str, typing.List[str]]] = None,
    split: str = 'train',
    seq_length: typing.Optional[int] = None,
    limit_dataset_samples: typing.Optional[int] = None,
    padding: typing.Union[str, bool] = False,
    truncation: typing.Union[str, bool] = False,
    mask_reasoning_content: bool = False,
    train_on_last_turn_only: bool = False,
    drop_history_reasoning_content: bool = False,
    truncate_history: bool = False
) -> nemo_automodel.components.datasets.lazy_mapped_dataset.LazyMappedDataset
```

Load a multi-turn function-calling SFT dataset.

Exactly one of `dataset_name` (HuggingFace Hub id) or `path` (local
JSON/JSONL file or list of files) must be provided. The loaded examples
are lazily rendered through the tokenizer's chat template; tool/user
tokens are excluded from the loss via `answer_only_loss_mask=True`.

**Parameters:**

HuggingFace tokenizer with a chat template.

HF Hub dataset id, e.g. `llamafactory/glaive_toolcall_en`.

Local JSON/JSONL file path or list of paths.

Dataset split (only used with `dataset_name`).

Optional max sequence length for the tokenizer.

If set, keep only the first N examples.

Padding strategy forwarded to the tokenizer.

Truncation strategy forwarded to the tokenizer.

If True, exclude assistant `reasoning_content`
(thinking) tokens from the loss while still rendering them into the
prompt. Requires a chat template that emits `reasoning_content`.
Defaults to False, which trains on reasoning tokens like any other
assistant content.

If True, supervise only the final assistant
turn of each dialogue (`mask_history`); all earlier assistant
turns are excluded from the loss. Defaults to False, which
supervises every assistant turn.

If True, strip `reasoning_content`
from all but the final assistant turn so historical thinking is not
rendered into the prompt (matching inference, where prior-turn
thinking is not visible). Orthogonal to `mask_reasoning_content`:
this controls whether history thinking appears in the prompt at all,
the latter controls whether rendered thinking contributes to loss.
Defaults to False, which keeps every turn's reasoning\_content.

If True and `seq_length` is set, drop the oldest
conversation exchanges (keeping any leading system message, the tool
definitions, and the final exchange) until the dialogue fits
`seq_length`. Unlike token-level `truncation`, which clips from
the tokenizer side and usually drops the final assistant answer,
this preserves the supervised target. Defaults to False.

**Returns:** `LazyMappedDataset`

A `LazyMappedDataset` yielding dicts with `input_ids`, `labels`

```python
nemo_automodel.components.datasets.llm.agent_chat.make_agent_chat_eval_samples(
    dataset_name: typing.Optional[str] = None,
    path: typing.Optional[typing.Union[str, typing.List[str]]] = None,
    split: str = 'train',
    limit_dataset_samples: typing.Optional[int] = None,
    max_eval_samples: typing.Optional[int] = None
) -> typing.List[typing.Dict[str, typing.Any]]
```

Build a flat list of tool-call eval samples from an agent SFT dataset.

Each dialogue is expanded into one sample per assistant tool-call
position via :func:`_extract_eval_samples_from_example`. The result
is a plain list (not a HuggingFace `Dataset`) because evaluation
iterates linearly and needs the raw structured fields, not tokenized
tensors.

Exactly one of `dataset_name` or `path` must be provided.

**Parameters:**

HF Hub dataset id, e.g. `llamafactory/glaive_toolcall_en`.

Local JSON/JSONL file path or list of paths.

Dataset split (only used with `dataset_name`).

If set, read only the first N dialogues
before expansion. Useful to bound evaluation cost.

If set, cap the total expanded sample count.

**Returns:** `List[Dict[str, Any]]`

A list of dicts with keys `prompt_messages`, `tools`,

```python
nemo_automodel.components.datasets.llm.agent_chat._SHAREGPT_ROLE_MAP = {'system': 'system', 'human': 'user', 'user': 'user', 'gpt': 'assistant', 'assis...
```

```python
nemo_automodel.components.datasets.llm.agent_chat._VALID_MESSAGE_ROLES = {'system', 'user', 'assistant', 'tool_call', 'tool_response', 'tool'}
```

```python
nemo_automodel.components.datasets.llm.agent_chat.logger = logging.getLogger(__name__)
```