Function Calling with FunctionGemma
This tutorial walks through fine-tuning FunctionGemma, Google’s 270M function-calling model, with NeMo AutoModel on the xLAM function-calling dataset.
FunctionGemma Introduction
FunctionGemma is a lightweight, 270M-parameter variant built on the Gemma 3 architecture with a function-calling chat format. It is intended to be fine-tuned for task-specific function calling, and its compact size makes it practical for edge or resource-constrained deployments.
- Gemma 3 architecture, updated tokenizer, and function-calling chat format.
- Trained specifically for function calling: multiple tool definitions, parallel calls, tool responses, and natural-language summaries.
- Small/edge friendly: ~270M params for fast, dense inference on-device.
- Text-only, function-oriented model (not a general dialogue model), best used after task-specific finetuning.
Prerequisites
- Install NeMo AutoModel and its extras:
pip install nemo-automodel. - A FunctionGemma checkpoint available locally or using google/functiongemma-270m-it.
- Small model footprint: can be fine-tuned on a single GPU; scale batch/sequence as needed.
xLAM Dataset
The xLAM function-calling dataset contains user queries, tool schemas, and tool call traces. It covers diverse tools and arguments so models learn to emit structured tool calls.
- Dataset URL: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k
- Each sample provides:
query: the user request.tools: tool definitions (lightweight schema).answers: tool calls with serialized arguments.
Example entry:
The helper make_xlam_dataset converts each xLAM row into OpenAI-style tool schemas and tool calls, then renders them through the chat template so loss is applied only on the tool-call arguments:
Run Full-Parameter SFT
Use the ready-made config at examples/llm_finetune/gemma/functiongemma_xlam.yaml to start fine-tuning:
With the config in place, launch training (8 GPUs shown; adjust --nproc-per-node as needed):
You should be able to see a training loss curve similar to the one shown below:

Run PEFT (LoRA)
To apply LoRA (PEFT), uncomment the peft block in the config and tune rank/alpha/targets per the SFT/PEFT guide. Example override:
Then fine-tune with the same recipe. Adjust the number of GPUs as needed.
