Function Calling#

You can connect NIM to external tools and services using function calling (also known as tool calling). By providing a list of available functions, NIM can choose to output function arguments for the relevant function(s) which you can execute to augment the prompt with relevant external information.

Function calling is controlled using the tool_choice, tools, and parallel_tool_calls parameters. Only the following models support function calling, and only a subset of those models support parallel tool calling.

Supported Models#

Model	Parallel Tool Calls Supported	Latest Supported NIM Version
Llama-3.1-8B-Instruct	No	1.2
Llama-3.1-70B-Instruct	No	1.2
Llama-3.1-405B-Instruct	No	1.2
Mistral NeMo 12B Instruct	Yes	1.1.2
Mistral 7B Instruct v0.3	Yes	1.1.2

Parameters#

To use function calling, modify the tool_choice, tools, and parallel_tool_calls parameters.

Parameter	Description
`tool_choice`	Specifies how the model should choose tools. Has four options: `"none"`, `"auto"`, `"required"`, or named tool choice. Requires that `tools` is also set.
`tools`	The list of tool objects that define the functions the model can call. Requires that `tool_choice` is also set.
`parallel_tool_calls`	Boolean value (`True` or `False`) specifying whether to make tool calls in parallel. Default is `False`. Requires that the model supports it.

`tool_choice` options#

"none": Disables the use of tools.
"auto": Enables the model to decide whether to use tools and which ones to use.
"required": Forces the model to use a tool, but the model chooses which one.

Named tool choice: Forces the model to use a specific tool. It must be in the following format:

{
  "type": "function",
  "function": {
    "name": "name of the tool goes here"
  }
}

Note: tool_choice can only be set when tools is also set, and vice versa. These parameters work together to define and control the use of tools in the model’s responses. For further information on these parameters and their usage, see the OpenAI API documentation.

Example Workflows#

These examples showcase various ways to use function calling with NIM:

Basic Function Calling: Demonstrates how to use a single function with automatic tool choice.
Multiple Tools: Shows how to provide multiple tools, including one without parameters.
Forced Tool Usage: Illustrates how to force the model to use a specific tool.
Parallel Tool Calling: Exemplifies how to use parallel tool calling with a supporting model.

1. Basic Function Calling#

This example shows how to use a single function with automatic tool choice.

from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
MODEL_NAME = "meta/llama-3.1-70b-instruct"

# Define available function
weather_tool = {
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use. Infer this from the user's location."
                }
            },
            "required": ["location", "format"]
        }
    }
}

messages = [
    {"role": "user", "content": "Is it hot in Pittsburgh, PA right now?"}
]

chat_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message
messages.append(assistant_message)

print(assistant_message)
# Example output:
# ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_abc123', function=Function(arguments='{"location": "Pittsburgh, PA", "format": "fahrenheit"}', name='get_current_weather'), type='function')])

# Simulate external function call
tool_call_result = 88
tool_call_id = assistant_message.tool_calls[0].id
tool_function_name = assistant_message.tool_calls[0].function.name
messages.append({"role": "tool", "content": str(tool_call_result), "tool_call_id": tool_call_id, "name": tool_function_name})

chat_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message
print(assistant_message)
# Example output:
# ChatCompletionMessage(content='Based on the current temperature of 88°F (31°C) in Pittsburgh, PA, it is indeed quite hot right now. This temperature is generally considered warm to hot, especially if accompanied by high humidity, which is common in Pittsburgh during summer months.', role='assistant', function_call=None, tool_calls=None)

2. Multiple Tools#

You can also define more than one tool for tools, including tools with no parameters, like the time_tool below.

weather_tool = {
    # ... (same as in the previous example)
}

time_tool = {
    "type": "function",
    "function": {
        "name": "get_current_time_nyc",
        "description": "Get the current time in NYC.",
        "parameters": {}
    }
}

messages = [
    {"role": "user", "content": "What's the current time in New York?"}
]

chat_response = client.chat.completions.create(
    model="meta/llama-3.1-70b-instruct",
    messages=messages,
    tools=[weather_tool, time_tool],
    tool_choice="auto",
    stream=False
)

assistant_message = chat_response.choices[0].message
print(assistant_message)
# Example output:
# ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[
#     ChatCompletionMessageToolCall(id='call_ghi789', function=Function(arguments='{}', name='get_current_time_nyc'), type='function')
# ])

# Process tool calls and generate final response as in the previous example

3. Named Tool Usage#

This example forces the model to use a specific tool.

chat_response = client.chat.completions.create(
    model="meta/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "What's the weather in New York City like?"}],
    tools=[weather_tool],
    tool_choice={
        "type": "function",
        "function": {
            "name": "get_current_weather"
        }
    },
    stream=False
)

assistant_message = chat_response.choices[0].message
print(assistant_message)
# Example output:
# ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_jkl012', function=Function(arguments='{"location": "New York, NY", "format": "fahrenheit"}', name='get_current_weather'), type='function')])

# Process tool call and generate final response as in the previous examples

4. Parallel Tool Calling#

Some models are able to respond with multiple tool calls in one message. This example demonstrates parallel tool calling using a model that supports it.

chat_response = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": "What's the weather and time in New York?"}],
    tools=[weather_tool, time_tool],
    tool_choice="auto",
    parallel_tool_calls=True,
    stream=False
)

assistant_message = chat_response.choices[0].message
print(assistant_message)
# Example output:
# ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[
#     ChatCompletionMessageToolCall(id='call_mno345', function=Function(arguments='{"location": "New York, NY", "format": "fahrenheit"}', name='get_current_weather'), type='function'),
#     ChatCompletionMessageToolCall(id='call_pqr678', function=Function(arguments='{}', name='get_current_time'), type='function')
# ])

# Process multiple tool calls in parallel
tool_results = []
for tool_call in assistant_message.tool_calls:
    if tool_call.function.name == "get_current_weather":
        # Simulate weather API call
        weather_result = "75°F"
        tool_results.append({"role": "tool", "content": weather_result, "tool_call_id": tool_call.id, "name": tool_call.function.name})
    elif tool_call.function.name == "get_current_time":
        # Simulate time API call
        time_result = "2:30 PM EDT"
        tool_results.append({"role": "tool", "content": time_result, "tool_call_id": tool_call.id, "name": tool_call.function.name})

# Add tool results to messages
messages.extend(tool_results)

# Generate final response based on all tool call results
# Note that not all models support parallel tool calls
chat_response = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=messages,
    tools=[weather_tool, time_tool],
    tool_choice="auto",
    stream=False
)

final_response = chat_response.choices[0].message
print(final_response)
# Example output:
# ChatCompletionMessage(content="In New York, the current weather is 75°F (23.9°C), which is quite pleasant. It's not too hot or cold. The current time in New York is 2:30 PM EDT (Eastern Daylight Time). It's mid-afternoon there right now.", role='assistant', function_call=None, tool_calls=None)