nemo_automodel.components.datasets.llm.agent_chat
nemo_automodel.components.datasets.llm.agent_chat
Multi-turn agent SFT dataset adapter.
Loads function-calling chat datasets where each example contains tool
definitions and a multi-turn message list that includes tool calls and
tool responses, then renders them through the tokenizer’s chat template
with answer_only_loss_mask=True so that user and tool tokens
are excluded from the loss.
Two input schemas are accepted:
-
Swift / chatml
messagesschema (used byAI-ModelScope/function-calling-chatml)::{ “tools”: ”[{…openai tool schema…}]”, “messages”: [ {“role”: “user”, “content”: ”…”}, {“role”: “tool_call”, “content”: ”{“name”: …, “arguments”: …}”}, {“role”: “tool_response”, “content”: ”…”}, {“role”: “assistant”, “content”: ”…”} ] }
-
ShareGPT
conversationsschema (used byllamafactory/glaive_toolcall_enand similar)::{ “tools”: ”[…]”, “conversations”: [ {“from”: “human”, “value”: ”…”}, {“from”: “function_call”, “value”: ”…”}, {“from”: “observation”, “value”: ”…”}, {“from”: “gpt”, “value”: ”…”} ] }
Consecutive tool_call entries are merged into a single assistant
message with parallel tool_calls; the following tool_response
entries are paired with those calls in order.
Module Contents
Functions
Data
API
Convert chatml-style agent messages to OpenAI chat-completions format.
Consecutive tool_call entries collapse into one assistant message
with parallel tool_calls. When the preceding emitted turn is an
assistant message without tool_calls (e.g. an assistant content
that reasons before calling tools), the tool_calls attach to that
message instead of creating a second consecutive assistant turn —
this preserves the natural single-turn shape the model produces at
inference. tool_response (or tool) entries that follow are
paired with those tool_call ids in order.
Parameters:
chatml-style turns with roles in _VALID_MESSAGE_ROLES.
Optional identifier used to derive unique tool_call ids.
If True, strip reasoning_content
from every assistant turn except the final one, so historical
thinking traces are not rendered into the prompt. This matches
inference, where the model never sees its own prior-turn thinking.
Returns: List[Dict[str, Any]]
A list of OpenAI-format messages suitable for apply_chat_template.
Extract one eval sample per assistant tool-call position.
For each assistant turn in example that issues tool_calls, emit
a sample whose prompt_messages are all messages strictly before
that turn and whose gt_tool_calls are the tool_calls from that
turn. This lets the evaluator measure tool-call accuracy at every
position the model is expected to act, not just the first one.
Tool-call arguments are normalized from JSON-encoded strings (as
produced by :func:_convert_messages) back to dicts so callers can
compare against parser output directly.
Parameters:
a raw row from the agent SFT dataset (chatml messages
or ShareGPT conversations schema, with a tools field).
Returns: List[Dict[str, Any]]
A list of eval samples, each with keys prompt_messages,
Render one agent example into tokenized input_ids / labels.
Thin wrapper that re-raises any parsing/rendering failure as a ValueError
tagged with the example id. Rows are rendered lazily inside the dataloader,
so without this a single malformed row surfaces as an opaque
JSONDecodeError/AssertionError deep in the stack — with no hint as to
which row caused it — and aborts the whole training run.
Render one agent example into tokenized input_ids / labels.
Parse value as JSON if it is a string, otherwise return as-is.
Return a turn’s reasoning/thinking trace as a string, or None if absent.
Shared by every conversion path so the reasoning_content field is read
and coerced identically (a falsy/empty trace is treated as absent).
Convert ShareGPT {from, value} turns to chatml {role, content}.
Drop the oldest conversation exchanges until the dialogue fits seq_length.
Unlike token-level truncation (which clips from the tokenizer’s
truncation_side and so typically drops the final assistant answer — the
supervised target), this drops whole oldest exchanges while keeping any
leading system message, the tool definitions, and the final exchange
(the user turn that produces the last assistant answer). Tool-call/response
pairing among the survivors is preserved because exchanges are cut only at
user boundaries.
Parameters:
tokenizer with a chat template.
OpenAI-format messages from :func:_convert_messages.
tool definitions rendered alongside the messages.
the token budget the rendered dialogue must fit.
Returns: List[Dict[str, Any]]
messages unchanged when it already fits or has no user boundary
Load a multi-turn function-calling SFT dataset.
Exactly one of dataset_name (HuggingFace Hub id) or path (local
JSON/JSONL file or list of files) must be provided. The loaded examples
are lazily rendered through the tokenizer’s chat template; tool/user
tokens are excluded from the loss via answer_only_loss_mask=True.
Parameters:
HuggingFace tokenizer with a chat template.
HF Hub dataset id, e.g. llamafactory/glaive_toolcall_en.
Local JSON/JSONL file path or list of paths.
Dataset split (only used with dataset_name).
Optional max sequence length for the tokenizer.
If set, keep only the first N examples.
Padding strategy forwarded to the tokenizer.
Truncation strategy forwarded to the tokenizer.
If True, exclude assistant reasoning_content
(thinking) tokens from the loss while still rendering them into the
prompt. Requires a chat template that emits reasoning_content.
Defaults to False, which trains on reasoning tokens like any other
assistant content.
If True, supervise only the final assistant
turn of each dialogue (mask_history); all earlier assistant
turns are excluded from the loss. Defaults to False, which
supervises every assistant turn.
If True, strip reasoning_content
from all but the final assistant turn so historical thinking is not
rendered into the prompt (matching inference, where prior-turn
thinking is not visible). Orthogonal to mask_reasoning_content:
this controls whether history thinking appears in the prompt at all,
the latter controls whether rendered thinking contributes to loss.
Defaults to False, which keeps every turn’s reasoning_content.
If True and seq_length is set, drop the oldest
conversation exchanges (keeping any leading system message, the tool
definitions, and the final exchange) until the dialogue fits
seq_length. Unlike token-level truncation, which clips from
the tokenizer side and usually drops the final assistant answer,
this preserves the supervised target. Defaults to False.
Returns: LazyMappedDataset
A LazyMappedDataset yielding dicts with input_ids, labels
Build a flat list of tool-call eval samples from an agent SFT dataset.
Each dialogue is expanded into one sample per assistant tool-call
position via :func:_extract_eval_samples_from_example. The result
is a plain list (not a HuggingFace Dataset) because evaluation
iterates linearly and needs the raw structured fields, not tokenized
tensors.
Exactly one of dataset_name or path must be provided.
Parameters:
HF Hub dataset id, e.g. llamafactory/glaive_toolcall_en.
Local JSON/JSONL file path or list of paths.
Dataset split (only used with dataset_name).
If set, read only the first N dialogues before expansion. Useful to bound evaluation cost.
If set, cap the total expanded sample count.
Returns: List[Dict[str, Any]]
A list of dicts with keys prompt_messages, tools,