Tool Calling and MCP Integration#

NIM LLM supports OpenAI-compatible tool calling through vLLM’s tool calling engine. Tool calling lets models invoke external functions by returning structured tool calls instead of text responses, enabling integration with Model Context Protocol (MCP) servers and any client library that uses the OpenAI tools format.

Enable Tool Calling#

Tool calling requires two vLLM engine arguments: --enable-auto-tool-choice and a --tool-call-parser that matches your model. Pass these as CLI arguments to nim-serve:

docker run --gpus all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -p 8000:8000 \
  ${NIM_LLM_MODEL_FREE_IMAGE}:2.0.2 \
  nim-serve --enable-auto-tool-choice --tool-call-parser llama3_json

In environments where CLI arguments are not available, such as Kubernetes, use NIM_PASSTHROUGH_ARGS:

export NIM_PASSTHROUGH_ARGS="--enable-auto-tool-choice --tool-call-parser llama3_json"

Argument

Description

--enable-auto-tool-choice

Allows the model to choose between generating text or calling a tool. Required for tool calling.

--tool-call-parser <parser>

Parser for extracting tool calls from model output. Must match the model’s tool calling format (for example, llama3_json for Llama 3.1 and 3.3).

--tool-parser-plugin <module>

Python module path for a custom tool call parser plugin.

Both --enable-auto-tool-choice and --tool-call-parser are required together. For the full list of built-in parsers and supported models, refer to the vLLM tool calling documentation. For more information on how CLI arguments and NIM_PASSTHROUGH_ARGS work, refer to Advanced Configuration.

Once enabled, use the /v1/chat/completions endpoint with the tools parameter to send tool definitions and receive tool calls. For request/response format, examples, and the tool result loop, refer to the vLLM tool calling documentation.

MCP Integration#

NIM LLM does not connect to MCP servers directly. To use MCP tools, your client application connects to MCP servers, converts tool definitions to the OpenAI tools format, and sends them in the tools array of a Chat Completions request. The model returns tool calls in the response, and your application executes them against the MCP server and returns results to the model.

Schema Compatibility#

MCP tool schemas converted to OpenAI format may include extra fields such as strict or additionalProperties that are not part of the core function definition schema. vLLM accepts these fields without error. For details on schema handling, refer to the vLLM tool calling documentation.

Troubleshooting#

Tool calls not generated#

If the model returns text instead of a tool call, do the following:

  • Verify that --enable-auto-tool-choice is set.

  • Verify that --tool-call-parser is set to the correct parser for your model.

  • Check that the tools array is included in the request.

Error: “auto” tool choice requires configuration#

If you see "auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set, both arguments must be provided when launching the container.