nemo_automodel.components.datasets.llm.agent_chat

View as Markdown

Multi-turn agent SFT dataset adapter.

Loads function-calling chat datasets where each example contains tool definitions and a multi-turn message list that includes tool calls and tool responses, then renders them through the tokenizer’s chat template with answer_only_loss_mask=True so that user and tool tokens are excluded from the loss.

Two input schemas are accepted:

  1. Swift / chatml messages schema (used by AI-ModelScope/function-calling-chatml)::

    { “tools”: ”[{…openai tool schema…}]”, “messages”: [ {“role”: “user”, “content”: ”…”}, {“role”: “tool_call”, “content”: ”{“name”: …, “arguments”: …}”}, {“role”: “tool_response”, “content”: ”…”}, {“role”: “assistant”, “content”: ”…”} ] }

  2. ShareGPT conversations schema (used by llamafactory/glaive_toolcall_en and similar)::

    { “tools”: ”[…]”, “conversations”: [ {“from”: “human”, “value”: ”…”}, {“from”: “function_call”, “value”: ”…”}, {“from”: “observation”, “value”: ”…”}, {“from”: “gpt”, “value”: ”…”} ] }

Consecutive tool_call entries are merged into a single assistant message with parallel tool_calls; the following tool_response entries are paired with those calls in order.

Module Contents

Functions

NameDescription
_convert_messagesConvert chatml-style agent messages to OpenAI chat-completions format.
_extract_eval_samples_from_exampleExtract one eval sample per assistant tool-call position.
_format_exampleRender one agent example into tokenized input_ids / labels.
_format_example_implRender one agent example into tokenized input_ids / labels.
_json_load_if_strParse value as JSON if it is a string, otherwise return as-is.
_reasoning_contentReturn a turn’s reasoning/thinking trace as a string, or None if absent.
_sharegpt_to_chatmlConvert ShareGPT {from, value} turns to chatml {role, content}.
_truncate_messages_to_fitDrop the oldest conversation exchanges until the dialogue fits seq_length.
make_agent_chat_datasetLoad a multi-turn function-calling SFT dataset.
make_agent_chat_eval_samplesBuild a flat list of tool-call eval samples from an agent SFT dataset.

Data

_SHAREGPT_ROLE_MAP

_VALID_MESSAGE_ROLES

logger

API

nemo_automodel.components.datasets.llm.agent_chat._convert_messages(
messages: typing.List[typing.Dict[str, typing.Any]],
example_id: typing.Optional[typing.Union[int, str]] = None,
drop_history_reasoning_content: bool = False
) -> typing.List[typing.Dict[str, typing.Any]]

Convert chatml-style agent messages to OpenAI chat-completions format.

Consecutive tool_call entries collapse into one assistant message with parallel tool_calls. When the preceding emitted turn is an assistant message without tool_calls (e.g. an assistant content that reasons before calling tools), the tool_calls attach to that message instead of creating a second consecutive assistant turn — this preserves the natural single-turn shape the model produces at inference. tool_response (or tool) entries that follow are paired with those tool_call ids in order.

Parameters:

messages
List[Dict[str, Any]]

chatml-style turns with roles in _VALID_MESSAGE_ROLES.

example_id
Optional[Union[int, str]]Defaults to None

Optional identifier used to derive unique tool_call ids.

drop_history_reasoning_content
boolDefaults to False

If True, strip reasoning_content from every assistant turn except the final one, so historical thinking traces are not rendered into the prompt. This matches inference, where the model never sees its own prior-turn thinking.

Returns: List[Dict[str, Any]]

A list of OpenAI-format messages suitable for apply_chat_template.

nemo_automodel.components.datasets.llm.agent_chat._extract_eval_samples_from_example(
example: typing.Dict[str, typing.Any]
) -> typing.List[typing.Dict[str, typing.Any]]

Extract one eval sample per assistant tool-call position.

For each assistant turn in example that issues tool_calls, emit a sample whose prompt_messages are all messages strictly before that turn and whose gt_tool_calls are the tool_calls from that turn. This lets the evaluator measure tool-call accuracy at every position the model is expected to act, not just the first one.

Tool-call arguments are normalized from JSON-encoded strings (as produced by :func:_convert_messages) back to dicts so callers can compare against parser output directly.

Parameters:

example
Dict[str, Any]

a raw row from the agent SFT dataset (chatml messages or ShareGPT conversations schema, with a tools field).

Returns: List[Dict[str, Any]]

A list of eval samples, each with keys prompt_messages,

nemo_automodel.components.datasets.llm.agent_chat._format_example(
example: typing.Dict[str, typing.Any],
tokenizer,
eos_token_id: int,
pad_token_id: int,
seq_length: typing.Optional[int] = None,
padding: typing.Union[str, bool] = False,
truncation: typing.Union[str, bool] = False,
mask_reasoning_content: bool = False,
train_on_last_turn_only: bool = False,
drop_history_reasoning_content: bool = False,
truncate_history: bool = False
) -> typing.Dict[str, typing.List[int]]

Render one agent example into tokenized input_ids / labels.

Thin wrapper that re-raises any parsing/rendering failure as a ValueError tagged with the example id. Rows are rendered lazily inside the dataloader, so without this a single malformed row surfaces as an opaque JSONDecodeError/AssertionError deep in the stack — with no hint as to which row caused it — and aborts the whole training run.

nemo_automodel.components.datasets.llm.agent_chat._format_example_impl(
example: typing.Dict[str, typing.Any],
tokenizer,
eos_token_id: int,
pad_token_id: int,
seq_length: typing.Optional[int] = None,
padding: typing.Union[str, bool] = False,
truncation: typing.Union[str, bool] = False,
mask_reasoning_content: bool = False,
train_on_last_turn_only: bool = False,
drop_history_reasoning_content: bool = False,
truncate_history: bool = False
) -> typing.Dict[str, typing.List[int]]

Render one agent example into tokenized input_ids / labels.

nemo_automodel.components.datasets.llm.agent_chat._json_load_if_str(
value: typing.Any
) -> typing.Any

Parse value as JSON if it is a string, otherwise return as-is.

nemo_automodel.components.datasets.llm.agent_chat._reasoning_content(
turn: typing.Dict[str, typing.Any]
) -> typing.Optional[str]

Return a turn’s reasoning/thinking trace as a string, or None if absent.

Shared by every conversion path so the reasoning_content field is read and coerced identically (a falsy/empty trace is treated as absent).

nemo_automodel.components.datasets.llm.agent_chat._sharegpt_to_chatml(
conversations: typing.List[typing.Dict[str, typing.Any]]
) -> typing.List[typing.Dict[str, typing.Any]]

Convert ShareGPT {from, value} turns to chatml {role, content}.

nemo_automodel.components.datasets.llm.agent_chat._truncate_messages_to_fit(
tokenizer,
messages: typing.List[typing.Dict[str, typing.Any]],
tools: typing.Optional[typing.List[typing.Dict[str, typing.Any]]],
seq_length: int
) -> typing.List[typing.Dict[str, typing.Any]]

Drop the oldest conversation exchanges until the dialogue fits seq_length.

Unlike token-level truncation (which clips from the tokenizer’s truncation_side and so typically drops the final assistant answer — the supervised target), this drops whole oldest exchanges while keeping any leading system message, the tool definitions, and the final exchange (the user turn that produces the last assistant answer). Tool-call/response pairing among the survivors is preserved because exchanges are cut only at user boundaries.

Parameters:

tokenizer

tokenizer with a chat template.

messages
List[Dict[str, Any]]

OpenAI-format messages from :func:_convert_messages.

tools
Optional[List[Dict[str, Any]]]

tool definitions rendered alongside the messages.

seq_length
int

the token budget the rendered dialogue must fit.

Returns: List[Dict[str, Any]]

messages unchanged when it already fits or has no user boundary

nemo_automodel.components.datasets.llm.agent_chat.make_agent_chat_dataset(
tokenizer,
dataset_name: typing.Optional[str] = None,
path: typing.Optional[typing.Union[str, typing.List[str]]] = None,
split: str = 'train',
seq_length: typing.Optional[int] = None,
limit_dataset_samples: typing.Optional[int] = None,
padding: typing.Union[str, bool] = False,
truncation: typing.Union[str, bool] = False,
mask_reasoning_content: bool = False,
train_on_last_turn_only: bool = False,
drop_history_reasoning_content: bool = False,
truncate_history: bool = False
) -> nemo_automodel.components.datasets.lazy_mapped_dataset.LazyMappedDataset

Load a multi-turn function-calling SFT dataset.

Exactly one of dataset_name (HuggingFace Hub id) or path (local JSON/JSONL file or list of files) must be provided. The loaded examples are lazily rendered through the tokenizer’s chat template; tool/user tokens are excluded from the loss via answer_only_loss_mask=True.

Parameters:

tokenizer

HuggingFace tokenizer with a chat template.

dataset_name
Optional[str]Defaults to None

HF Hub dataset id, e.g. llamafactory/glaive_toolcall_en.

path
Optional[Union[str, List[str]]]Defaults to None

Local JSON/JSONL file path or list of paths.

split
strDefaults to 'train'

Dataset split (only used with dataset_name).

seq_length
Optional[int]Defaults to None

Optional max sequence length for the tokenizer.

limit_dataset_samples
Optional[int]Defaults to None

If set, keep only the first N examples.

padding
Union[str, bool]Defaults to False

Padding strategy forwarded to the tokenizer.

truncation
Union[str, bool]Defaults to False

Truncation strategy forwarded to the tokenizer.

mask_reasoning_content
boolDefaults to False

If True, exclude assistant reasoning_content (thinking) tokens from the loss while still rendering them into the prompt. Requires a chat template that emits reasoning_content. Defaults to False, which trains on reasoning tokens like any other assistant content.

train_on_last_turn_only
boolDefaults to False

If True, supervise only the final assistant turn of each dialogue (mask_history); all earlier assistant turns are excluded from the loss. Defaults to False, which supervises every assistant turn.

drop_history_reasoning_content
boolDefaults to False

If True, strip reasoning_content from all but the final assistant turn so historical thinking is not rendered into the prompt (matching inference, where prior-turn thinking is not visible). Orthogonal to mask_reasoning_content: this controls whether history thinking appears in the prompt at all, the latter controls whether rendered thinking contributes to loss. Defaults to False, which keeps every turn’s reasoning_content.

truncate_history
boolDefaults to False

If True and seq_length is set, drop the oldest conversation exchanges (keeping any leading system message, the tool definitions, and the final exchange) until the dialogue fits seq_length. Unlike token-level truncation, which clips from the tokenizer side and usually drops the final assistant answer, this preserves the supervised target. Defaults to False.

Returns: LazyMappedDataset

A LazyMappedDataset yielding dicts with input_ids, labels

nemo_automodel.components.datasets.llm.agent_chat.make_agent_chat_eval_samples(
dataset_name: typing.Optional[str] = None,
path: typing.Optional[typing.Union[str, typing.List[str]]] = None,
split: str = 'train',
limit_dataset_samples: typing.Optional[int] = None,
max_eval_samples: typing.Optional[int] = None
) -> typing.List[typing.Dict[str, typing.Any]]

Build a flat list of tool-call eval samples from an agent SFT dataset.

Each dialogue is expanded into one sample per assistant tool-call position via :func:_extract_eval_samples_from_example. The result is a plain list (not a HuggingFace Dataset) because evaluation iterates linearly and needs the raw structured fields, not tokenized tensors.

Exactly one of dataset_name or path must be provided.

Parameters:

dataset_name
Optional[str]Defaults to None

HF Hub dataset id, e.g. llamafactory/glaive_toolcall_en.

path
Optional[Union[str, List[str]]]Defaults to None

Local JSON/JSONL file path or list of paths.

split
strDefaults to 'train'

Dataset split (only used with dataset_name).

limit_dataset_samples
Optional[int]Defaults to None

If set, read only the first N dialogues before expansion. Useful to bound evaluation cost.

max_eval_samples
Optional[int]Defaults to None

If set, cap the total expanded sample count.

Returns: List[Dict[str, Any]]

A list of dicts with keys prompt_messages, tools,

nemo_automodel.components.datasets.llm.agent_chat._SHAREGPT_ROLE_MAP = {'system': 'system', 'human': 'user', 'user': 'user', 'gpt': 'assistant', 'assis...
nemo_automodel.components.datasets.llm.agent_chat._VALID_MESSAGE_ROLES = {'system', 'user', 'assistant', 'tool_call', 'tool_response', 'tool'}
nemo_automodel.components.datasets.llm.agent_chat.logger = logging.getLogger(__name__)