nemo_automodel.components.eval.tool_call_parser#
Generic tool-call parser for evaluating agent SFT outputs.
Different chat templates wrap tool calls in different syntax:
Qwen / Hermes / FunctionGemma / Gemma 3 / GLM-4:
<tool_call>{"name": ..., "arguments": ...}</tool_call>Llama 3.1+:
<|python_tag|>{"name": ..., "parameters": ...}<|eom_id|>Mistral:
[TOOL_CALLS][{...}, {...}]Harmony / GPT-OSS:
<|channel|>commentary to=functions.NAME<|message|>{...}
This parser tries each known wrapper, then falls back to a generic JSON object scan. It is intentionally permissive: malformed JSON, missing wrappers, or unknown formats degrade gracefully and never raise.
The companion :func:compute_sample_metrics compares parser output
against ground-truth tool calls and produces 0/1 (or fractional)
indicators that average cleanly across a dataset.
Module Contents#
Classes#
One tool call extracted from generated text. |
Functions#
Return the substring from |
|
Normalize an |
|
Build a :class: |
|
Yield substrings that look like balanced top-level JSON objects. |
|
Last-resort fallback: scan for any JSON object with a |
|
Extract every tool call from a generated model response. |
|
Normalize a ground-truth |
|
Score a single (pred, gt) tool-call pair. |
|
Compute per-sample tool-call metrics across all GT positions. |
Data#
API#
- nemo_automodel.components.eval.tool_call_parser.logger#
‘getLogger(…)’
- class nemo_automodel.components.eval.tool_call_parser.ParsedToolCall[source]#
One tool call extracted from generated text.
.. attribute:: name
function name if extracted, otherwise
None... attribute:: arguments
parsed arguments dict; empty when JSON is invalid.
.. attribute:: arguments_valid_json
Trueifargumentsparsed cleanly... attribute:: raw
the source substring this was parsed from.
- name: Optional[str]#
None
- arguments: Dict[str, Any]#
None
- arguments_valid_json: bool#
None
- raw: str#
None
- nemo_automodel.components.eval.tool_call_parser._HARMONY_ANCHOR_RE#
‘compile(…)’
- nemo_automodel.components.eval.tool_call_parser._QWEN_RE#
‘compile(…)’
- nemo_automodel.components.eval.tool_call_parser._LLAMA_ANCHOR_RE#
‘compile(…)’
- nemo_automodel.components.eval.tool_call_parser._MISTRAL_ANCHOR_RE#
‘compile(…)’
- nemo_automodel.components.eval.tool_call_parser._extract_balanced(
- text: str,
- start: int,
- opener: str,
- closer: str,
Return the substring from
text[start](which must beopener) through its matchingcloser, skipping over chars inside JSON strings.Returns
Noneiftext[start]is notopeneror the span is unbalanced.
- nemo_automodel.components.eval.tool_call_parser._coerce_args(
- args_value: Any,
Normalize an
argumentsfield to a dict.Accepts a dict (passthrough) or a JSON-encoded string. Returns the parsed dict alongside a flag indicating whether the source was a well-formed JSON object.
- nemo_automodel.components.eval.tool_call_parser._from_call_object(
- obj: Dict[str, Any],
- raw: str,
Build a :class:
ParsedToolCallfrom a{"name": ..., "arguments": ...}dict.Llama 3.1 emits
parametersinstead ofarguments; both are accepted. ReturnsNonewhennameis missing or non-string.
- nemo_automodel.components.eval.tool_call_parser._iter_balanced_json_objects(text: str) Iterator[str][source]#
Yield substrings that look like balanced top-level JSON objects.
Skips characters inside JSON string literals (so braces inside strings don’t unbalance the count). Designed for fallback extraction when no known wrapper matches.
- nemo_automodel.components.eval.tool_call_parser._parse_qwen_style(
- text: str,
- nemo_automodel.components.eval.tool_call_parser._parse_llama_style(
- text: str,
- nemo_automodel.components.eval.tool_call_parser._parse_mistral_style(
- text: str,
- nemo_automodel.components.eval.tool_call_parser._parse_harmony_style(
- text: str,
- nemo_automodel.components.eval.tool_call_parser._parse_generic_json(
- text: str,
Last-resort fallback: scan for any JSON object with a
namefield.
- nemo_automodel.components.eval.tool_call_parser.parse_tool_calls(
- text: str,
Extract every tool call from a generated model response.
Wrappers are tried in order of specificity; the first wrapper that yields any match wins. If no wrapper matches, a generic JSON-object scan is used. Returns an empty list when no plausible tool call is present.
- Parameters:
text – raw decoded text from
model.generate().- Returns:
Parsed tool calls in document order. Possibly empty.
- nemo_automodel.components.eval.tool_call_parser._coerce_gt_args(
- args_value: Any,
Normalize a ground-truth
argumentsfield to a dict.
- nemo_automodel.components.eval.tool_call_parser._score_one_pair(
- pred: Optional[nemo_automodel.components.eval.tool_call_parser.ParsedToolCall],
- gt: Dict[str, Any],
Score a single (pred, gt) tool-call pair.
predmay beNone.
- nemo_automodel.components.eval.tool_call_parser.compute_sample_metrics(
- pred_calls: List[nemo_automodel.components.eval.tool_call_parser.ParsedToolCall],
- gt_calls: List[Dict[str, Any]],
Compute per-sample tool-call metrics across all GT positions.
Predicted calls are aligned positionally against the ground-truth list:
pred_calls[i]is scored againstgt_calls[i]. Missing predictions (i >= len(pred_calls)) contribute zeros across every metric for that position, so a model that emits only one of two parallel tool calls is correctly penalized on the missing call.Extra predictions beyond
len(gt_calls)are ignored. All values are in[0.0, 1.0]so callers canmean()across a dataset.Returned keys:
has_call: prediction exists at this position.name_correct: predicted call name equals GT name.args_json_valid: prediction had valid JSON arguments.args_field_recall: fraction of GT argument keys present in pred.args_field_precision: fraction of pred argument keys present in GT.args_exact_match: pred arguments dict equals GT arguments dict.
- Parameters:
pred_calls – output of :func:
parse_tool_calls.gt_calls – ground-truth list of
{"name": str, "arguments": dict|str}.