Function Calling with FunctionGemma

View as Markdown

This tutorial walks through fine-tuning FunctionGemma, Google’s 270M function-calling model, with NeMo AutoModel on the xLAM function-calling dataset.

FunctionGemma Introduction

FunctionGemma is a lightweight, 270M-parameter variant built on the Gemma 3 architecture with a function-calling chat format. It is intended to be fine-tuned for task-specific function calling, and its compact size makes it practical for edge or resource-constrained deployments.

  • Gemma 3 architecture, updated tokenizer, and function-calling chat format.
  • Trained specifically for function calling: multiple tool definitions, parallel calls, tool responses, and natural-language summaries.
  • Small/edge friendly: ~270M params for fast, dense inference on-device.
  • Text-only, function-oriented model (not a general dialogue model), best used after task-specific finetuning.

Prerequisites

  • Install NeMo AutoModel and its extras: pip install nemo-automodel.
  • A FunctionGemma checkpoint available locally or using google/functiongemma-270m-it.
  • Small model footprint: can be fine-tuned on a single GPU; scale batch/sequence as needed.

xLAM Dataset

The xLAM function-calling dataset contains user queries, tool schemas, and tool call traces. It covers diverse tools and arguments so models learn to emit structured tool calls.

Example entry:

1{
2 "id": 123,
3 "query": "Book me a table for two at 7pm in Seattle.",
4 "tools": [
5 {
6 "name": "book_table",
7 "description": "Book a restaurant table",
8 "parameters": {
9 "party_size": {"type": "int"},
10 "time": {"type": "string"},
11 "city": {"type": "string"}
12 }
13 }
14 ],
15 "answers": [
16 {
17 "name": "book_table",
18 "arguments": "{\"party_size\":2,\"time\":\"19:00\",\"city\":\"Seattle\"}"
19 }
20 ]
21}

The helper make_xlam_dataset converts each xLAM row into OpenAI-style tool schemas and tool calls, then renders them through the chat template so loss is applied only on the tool-call arguments:

1def _format_example(
2 example,
3 tokenizer,
4 eos_token_id,
5 pad_token_id,
6 seq_length=None,
7 padding=None,
8 truncation=None,
9):
10 tools = _convert_tools(_json_load_if_str(example["tools"]))
11 tool_calls = _convert_tool_calls(_json_load_if_str(example["answers"]), example_id=example.get("id"))
12
13 formatted_text = [
14 {"role": "user", "content": example["query"]},
15 {"role": "assistant", "content": "", "tool_calls": tool_calls},
16 ]
17
18 return format_chat_template(
19 tokenizer=tokenizer,
20 formatted_text=formatted_text,
21 tools=tools,
22 eos_token_id=eos_token_id,
23 pad_token_id=pad_token_id,
24 seq_length=seq_length,
25 padding=padding,
26 truncation=truncation,
27 answer_only_loss_mask=True,
28 )

Run Full-Parameter SFT

Use the ready-made config at examples/llm_finetune/gemma/functiongemma_xlam.yaml to start fine-tuning:

With the config in place, launch training (8 GPUs shown; adjust --nproc-per-node as needed):

$automodel --nproc-per-node=8 examples/llm_finetune/gemma/functiongemma_xlam.yaml

You should be able to see a training loss curve similar to the one shown below:

FunctionGemma SFT loss

Run PEFT (LoRA)

To apply LoRA (PEFT), uncomment the peft block in the config and tune rank/alpha/targets per the SFT/PEFT guide. Example override:

1peft:
2 _target_: nemo_automodel.components._peft.lora.PeftConfig
3 target_modules: '*_proj'
4 dim: 16
5 alpha: 16
6 use_triton: true

Then fine-tune with the same recipe. Adjust the number of GPUs as needed.

$automodel examples/llm_finetune/gemma/functiongemma_xlam.yaml

FunctionGemma PEFT loss