Function (Tool) Calling with NVIDIA NIM for LLMs#

You can connect NIM to external tools and services using function calling (also known as tool calling). By providing a list of available functions, NIM can choose to output function arguments for the relevant function(s) which you can execute to augment the prompt with relevant external information.

Function calling is controlled using the tool_choice and tools request parameters.

Prerequisites#

To enable tool calling, you must launch the server with the following environment variables for LLM NIMs.

Note

For LLM-specific NIMs, tool calling environment variables are not to be set externally. If an LLM-specific NIM container supports tool calling, it is enabled automatically.

Environment variable	Description	Type	Optional?
`NIM_ENABLE_AUTO_TOOL_CHOICE`	`1` to enable tool calling.	Boolean	No
`NIM_CHAT_TEMPLATE`	`true` to override the default chat template found in `tokenizer_config.json` provided with the model. The value should be the absolute path to the `.jinja` file that contains the chat template. Useful for instructing the LLM to format the output response in a way that the tool-call parser can understand.	Boolean	Yes
`NIM_TOOL_CALL_PARSER`	How the model post-processes the LLM response text into a tool call data structure. One of : `"pythonic"`, `"mistral"`, `"llama3_json"`, `"granite-20b-fc"`, `"granite"`, `"hermes"`, `"jamba"`, or custom value.	String	Required when `NIM_ENABLE_AUTO_TOOL_CHOICE` is set to `1`.
`NIM_TOOL_PARSER_PLUGIN`	The absolute path of a python file that is a custom tool-call parser. Required when `NIM_TOOL_CALL_PARSER` is specified with a custom value.	String	No

If you run into issues where the chat completion response contains an empty tool_call field but the function call is present in the content field, this means that the output of the LLM wasn’t post-processed into a tool call successfully. In this case, you can update the chat template to instruct the LLM to format the output response correctly, or you can use a different tool call parser.

Tool calling is supported in the following LLM-specific NIM containers and is automatically enabled for the following models:

- Llama 3.1 models
- Llama 3.2 models
- Llama 3.3 models
- Mistral models
- Llama Nemotron Nano models (supports [detailed thinking off](./reasoning-model.md))
- Llama Nemotron Super models (supports [detailed thinking off](./reasoning-model.md))
- Llama Nemotron Ultra models (supports [detailed thinking off](./reasoning-model.md))

Inference request parameters#

To use function calling, modify the tool_choice and tools parameters.

Parameter	Description
`tool_choice`	How the model should choose tools. One of : `"none"`, `"auto"`, or a named tool choice. Requires that `tools` is also set.
`tools`	The list of tool objects that define the functions the model can call. Requires that `tool_choice` is also set.

Note

tool_choice can only be set when tools is also set, and vice versa. These parameters work together to define and control the use of tools in the model’s responses. For further information on these parameters and their usage, see the OpenAI API documentation.

`tool_choice` options#

"none": Disables the use of tools.
"auto": Enables the model to decide whether to use tools and which ones to use.

Named tool choice: Forces the model to use a specific tool. It must be in the following format:

{
  "type": "function",
  "function": {
    "name": "name of the tool goes here"
  }
}

Examples#

LLM NIMs tool parser plugins#

The following is an example tool parser plugin for LLM NIM deployment. As a reference you can use Hermes2ProToolParser.

import json
import re
from typing import Dict, List, Sequence, Union

from pydantic import Field
from vllm.entrypoints.openai.protocol import (
    ChatCompletionRequest,
    DeltaFunctionCall,
    DeltaMessage,
    DeltaToolCall,
    ExtractedToolCallInformation,
    FunctionCall,
    ToolCall,
)
from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import ToolParser, ToolParserManager
from vllm.logger import init_logger
from vllm.transformers_utils.tokenizer import AnyTokenizer

logger = init_logger(__name__)

@ToolParserManager.register_module(["example_tool123"])
class ExampleToolParser(ToolParser):
    def __init__(self, tokenizer: AnyTokenizer):
        super().__init__(tokenizer)

    # adjust request. e.g.: set skip special tokens
    # to False for tool call output.
    def adjust_request(
            self, request: ChatCompletionRequest) -> ChatCompletionRequest:
        return request

    # implement the tool call parse for stream call
    def extract_tool_calls_streaming(
        self,
        previous_text: str,
        current_text: str,
        delta_text: str,
        previous_token_ids: Sequence[int],
        current_token_ids: Sequence[int],
        delta_token_ids: Sequence[int],
        request: ChatCompletionRequest,
    ) -> Union[DeltaMessage, None]:
        return DeltaMessage(content=delta_text)

    # implement the tool parse for non-stream call
    def extract_tool_calls(
        self,
        model_output: str,
        request: ChatCompletionRequest,
    ) -> ExtractedToolCallInformation:
        return ExtractedToolCallInformation(tools_called=False,
                                            tool_calls=[],
                                            content=model_output)

To use the preceding parser, the following environment variables should be set.

NIM_ENABLE_AUTO_TOOL_CHOICE=1
NIM_TOOL_PARSER_PLUGIN=<absolute path of the plugin file>
NIM_TOOL_CALL_PARSER=example_tool123
NIM_CHAT_TEMPLATE=<your chat template jinja file>

Usage Workflows#

These examples showcase various ways to use function calling with LLM NIMs and LLM-specific NIMs:

Basic Function Calling: Demonstrates how to use a single function with automatic tool choice.
Multiple Tools: Shows how to provide multiple tools, including one without parameters.
Forced Tool Usage: Illustrates how to force the model to use a specific tool.

1. Basic Function Calling#

This example shows how to use a single function with automatic tool choice.

from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
MODEL_NAME = "meta/llama-3.1-70b-instruct"

# Define available function
weather_tool = {
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use. Infer this from the user's location."
                }
            },
            "required": ["location", "format"]
        }
    }
}

messages = [
    {"role": "user", "content": "What is the weather in San Francisco, CA?"}
]

chat_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message
messages.append(assistant_message)

print(assistant_message)
# Example output:
# ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_abc123', function=Function(arguments='{"location": "San Francisco, CA", "format": "fahrenheit"}', name='get_current_weather'), type='function')])

# Simulate external function call
tool_call_result = 88
tool_call_id = assistant_message.tool_calls[0].id
tool_function_name = assistant_message.tool_calls[0].function.name
messages.append({"role": "tool", "content": str(tool_call_result), "tool_call_id": tool_call_id, "name": tool_function_name})

chat_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message
print(assistant_message)
# Example output:
# ChatCompletionMessage(content='Based on the current temperature of 88°F (31°C) in San Francisco, CA, it is indeed quite hot right now. This temperature is generally considered warm to hot, especially if accompanied by high humidity, which is common in San Francisco during summer months.', role='assistant', function_call=None, tool_calls=None)

2. Multiple Tools#

You can also define more than one tool for tools, including tools with no parameters, like the time_tool below.

weather_tool = {
    # ... (same as in the previous example)
}

time_tool = {
    "type": "function",
    "function": {
        "name": "get_current_time_nyc",
        "description": "Get the current time in NYC.",
        "parameters": {}
    }
}

messages = [
    {"role": "user", "content": "What's the current time in New York?"}
]

chat_response = client.chat.completions.create(
    model="meta/llama-3.1-70b-instruct",
    messages=messages,
    tools=[weather_tool, time_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message
print(assistant_message)
# Example output:
# ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[
#     ChatCompletionMessageToolCall(id='call_ghi789', function=Function(arguments='{}', name='get_current_time_nyc'), type='function')
# ])

# Process tool calls and generate final response as in the previous example

3. Named Tool Usage#

This example forces the model to use a specific tool.

chat_response = client.chat.completions.create(
    model="meta/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "What's the weather in New York City like?"}],
    tools=[weather_tool],
    tool_choice={
        "type": "function",
        "function": {
            "name": "get_current_weather"
        }
    },
    stream=False
)

assistant_message = chat_response.choices[0].message
print(assistant_message)
# Example output:
# ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_jkl012', function=Function(arguments='{"location": "New York, NY", "format": "fahrenheit"}', name='get_current_weather'), type='function')])

# Process tool call and generate final response as in the previous examples