Function Calling with FunctionGemma

This tutorial walks through fine-tuning FunctionGemma, Google’s 270M function-calling model, with NeMo AutoModel on the xLAM function-calling dataset.

FunctionGemma Introduction

FunctionGemma is a lightweight, 270M-parameter variant built on the Gemma 3 architecture with a function-calling chat format. It is intended to be fine-tuned for task-specific function calling, and its compact size makes it practical for edge or resource-constrained deployments.

Gemma 3 architecture, updated tokenizer, and function-calling chat format.
Trained specifically for function calling: multiple tool definitions, parallel calls, tool responses, and natural-language summaries.
Small/edge friendly: ~270M params for fast, dense inference on-device.
Text-only, function-oriented model (not a general dialogue model), best used after task-specific finetuning.

Prerequisites

Install NeMo AutoModel and its extras: pip install nemo-automodel.
A FunctionGemma checkpoint available locally or using google/functiongemma-270m-it.
Small model footprint: can be fine-tuned on a single GPU; scale batch/sequence as needed.

xLAM Dataset

The xLAM function-calling dataset contains user queries, tool schemas, and tool call traces. It covers diverse tools and arguments so models learn to emit structured tool calls.

Dataset URL: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k
Each sample provides:
- query: the user request.
- tools: tool definitions (lightweight schema).
- answers: tool calls with serialized arguments.

Example entry:

1 {
2   "id": 123,
3   "query": "Book me a table for two at 7pm in Seattle.",
4   "tools": [
5     {
6       "name": "book_table",
7       "description": "Book a restaurant table",
8       "parameters": {
9         "party_size": {"type": "int"},
10         "time": {"type": "string"},
11         "city": {"type": "string"}
12       }
13     }
14   ],
15   "answers": [
16     {
17       "name": "book_table",
18       "arguments": "{\"party_size\":2,\"time\":\"19:00\",\"city\":\"Seattle\"}"
19     }
20   ]
21 }

The helper make_xlam_dataset converts each xLAM row into OpenAI-style tool schemas and tool calls, then renders them through the chat template so loss is applied only on the tool-call arguments:

1 def _format_example(
2     example,
3     tokenizer,
4     eos_token_id,
5     pad_token_id,
6     seq_length=None,
7     padding=None,
8     truncation=None,
9 ):
10     tools = _convert_tools(_json_load_if_str(example["tools"]))
11     tool_calls = _convert_tool_calls(_json_load_if_str(example["answers"]), example_id=example.get("id"))
12 
13     formatted_text = [
14         {"role": "user", "content": example["query"]},
15         {"role": "assistant", "content": "", "tool_calls": tool_calls},
16     ]
17 
18     return format_chat_template(
19         tokenizer=tokenizer,
20         formatted_text=formatted_text,
21         tools=tools,
22         eos_token_id=eos_token_id,
23         pad_token_id=pad_token_id,
24         seq_length=seq_length,
25         padding=padding,
26         truncation=truncation,
27         answer_only_loss_mask=True,
28     )

Run Full-Parameter SFT

Use the ready-made config at examples/llm_finetune/gemma/functiongemma_xlam.yaml to start fine-tuning:

With the config in place, launch training (8 GPUs shown; adjust --nproc-per-node as needed):

$ automodel --nproc-per-node=8 examples/llm_finetune/gemma/functiongemma_xlam.yaml

You should be able to see a training loss curve similar to the one shown below:

FunctionGemma SFT loss

Run PEFT (LoRA)

To apply LoRA (PEFT), uncomment the peft block in the config and tune rank/alpha/targets per the SFT/PEFT guide. Example override:

1 peft:
2   _target_: nemo_automodel.components._peft.lora.PeftConfig
3   target_modules: '*_proj'
4   dim: 16
5   alpha: 16
6   use_triton: true

Then fine-tune with the same recipe. Adjust the number of GPUs as needed.

$ automodel examples/llm_finetune/gemma/functiongemma_xlam.yaml

FunctionGemma PEFT loss