Tool Calling and MCP Integration#

NIM LLM supports OpenAI-compatible tool calling through vLLM’s tool calling engine. Tool calling lets models invoke external functions by returning structured tool calls instead of text responses, enabling integration with Model Context Protocol (MCP) servers and any client library that uses the OpenAI tools format.

Enable Tool Calling#

Tool calling requires two vLLM engine arguments: --enable-auto-tool-choice and a --tool-call-parser that matches your model. Pass these as CLI arguments to nim-serve:

docker run --gpus all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -p 8000:8000 \
  ${NIM_LLM_MODEL_FREE_IMAGE}:2.0.2 \
  nim-serve --enable-auto-tool-choice --tool-call-parser llama3_json

In environments where CLI arguments are not available, such as Kubernetes, use NIM_PASSTHROUGH_ARGS:

export NIM_PASSTHROUGH_ARGS="--enable-auto-tool-choice --tool-call-parser llama3_json"

Argument

Description

--enable-auto-tool-choice

Allows the model to choose between generating text or calling a tool. Required for tool calling.

--tool-call-parser <parser>

Parser for extracting tool calls from model output. Must match the model’s tool calling format (for example, llama3_json for Llama 3.1 and 3.3).

--tool-parser-plugin <module>

Python module path for a custom tool call parser plugin.

Both --enable-auto-tool-choice and --tool-call-parser are required together. For the full list of built-in parsers and supported models, refer to the vLLM tool calling documentation. For more information on how CLI arguments and NIM_PASSTHROUGH_ARGS work, refer to Advanced Configuration.

Once enabled, use the /v1/chat/completions endpoint with the tools parameter to send tool definitions and receive tool calls. For request/response format, examples, and the tool result loop, refer to the vLLM tool calling documentation.

MCP Integration#

NIM LLM does not connect to MCP servers directly. To use MCP tools, your client application connects to MCP servers, converts tool definitions to the OpenAI tools format, and sends them in the tools array of a Chat Completions request. The model returns tool calls in the response, and your application executes them against the MCP server and returns results to the model.

Schema Compatibility#

MCP tool schemas converted to OpenAI format may include extra fields such as strict or additionalProperties that are not part of the core function definition schema. vLLM accepts these fields without error. For details on schema handling, refer to the vLLM tool calling documentation.

LangChain and LangGraph Integration#

When building agentic applications with LangChain and LangGraph, use create_react_agent from langgraph.prebuilt to implement tool calling with NIM LLM. This agent correctly executes the full tool-calling loop: generating a tool call, executing the tool, feeding the result back to the model, and producing a final response.

Important

Do not use langchain.agents.create_agent with ProviderStrategy for tool calling. This pattern bypasses the tool execution loop, causing the model to describe intended tool calls instead of executing them.

Install the required packages:

pip install langchain-nvidia-ai-endpoints langgraph langchain-core

Create an agent with create_react_agent and invoke it:

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langgraph.prebuilt import create_react_agent

llm = ChatNVIDIA(base_url="http://localhost:8000/v1", model="your-model-name")
agent = create_react_agent(model=llm, tools=[your_tool])  # Replace your_tool with a defined tool
response = agent.invoke({"messages": [{"role": "user", "content": "Your query"}]})

For more information on defining tools and building agents, refer to the LangGraph documentation.

Troubleshooting#

Tool calls not generated#

If the model returns text instead of a tool call, do the following:

  • Verify that --enable-auto-tool-choice is set.

  • Verify that --tool-call-parser is set to the correct parser for your model.

  • Check that the tools array is included in the request.

Error: “auto” tool choice requires configuration#

If you see "auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set, both arguments must be provided when launching the container.