Tool Use and Function Calling

In Short

Tool use (also called function calling) is the mechanism by which a language model requests that the application execute an external function and return the result. The model never calls functions directly. It outputs a structured request, the application runs the function, and the result feeds back into the conversation. This is the foundational primitive that makes AI agents possible.

01. What It Is

Tool use is a capability built into modern LLM APIs that lets a model indicate it wants to invoke an external function. The developer defines a set of available tools, each described by a name, a human-readable description, and a JSON schema specifying the expected parameters. The model reads these definitions, decides when a tool is relevant, and returns a structured call (the tool name and parameter values). The application executes the function and passes the result back to the model, which then continues its response.

This is sometimes called function calling (OpenAI's term) or tool use (Anthropic's term). The concepts are essentially the same, though tool use is sometimes used more broadly to cover built-in and server-side tools. Modern documentation often uses both interchangeably.

Tool use is not a programming construct embedded in the model. The model does not actually run code. It produces a structured output that looks like a function call, and the surrounding application is responsible for interpreting that output, running the real function, and feeding the result back into the conversation.

02. Why It Matters

On their own, language models are frozen at their training cutoff, with no access to live data, external systems, or user-specific information. Tool use is the primary mechanism for bridging that gap.

Without tool use, an LLM answering "What is the weather in Tokyo right now?" can only hallucinate or decline. With a weather tool, it calls the function, gets the current data, and answers accurately.

Tool use is also the foundation of every agentic system. Agents are, at their core, LLMs running a loop where tool calls are the actions and tool results are the observations. Everything from autonomous coding assistants to customer support bots to research agents is built on top of this mechanism.

By 2025, every major LLM provider offered native tool use, and every major agent framework (LangGraph, CrewAI, Pydantic AI, OpenAI Agents SDK) was built on it. It is a standard interface, not an advanced feature.

03. How It Works

Defining tools with JSON Schema

Every tool is defined by three things:

Name:
A unique identifier the model uses to refer to the tool (for example, get_current_weather).
Description:
A plain-language explanation of what the tool does, when to use it, and any important constraints. The model reads this description to decide whether to call the tool. Poor descriptions lead to wrong or missed calls.
Input schema:
A JSON Schema object describing the parameters the function expects: their names, types, whether they are required, and any constraints (enums, ranges, formats).

Example tool definition:

{
  "type": "function",
  "name": "get_current_weather",
  "description": "Get the current weather for a location. Call this when the user asks about current weather conditions.",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string",
        "description": "City name or coordinates, e.g. 'Tokyo' or '35.67,139.65'"
      },
      "units": {
        "type": "string",
        "enum": ["celsius", "fahrenheit"],
        "description": "Temperature units"
      }
    },
    "required": ["location"],
    "additionalProperties": false
  },
  "strict": true
}

The strict: true flag (supported by OpenAI) constrains the model's output to match the schema, making malformed calls far less likely.

The request/response loop

The full interaction follows these steps:

Step 1: Request with tools:
The application sends a message (or conversation history) to the LLM API along with the list of tool definitions. The model now knows what tools are available.

Step 2: Tool call response:
If the model decides a tool is needed, its response contains a tool_calls array rather than (or in addition to) a text reply. Each entry in the array includes the tool name and the model's chosen argument values, encoded as a JSON string.

Step 3: Application executes the function:
The application parses the tool call, validates the arguments against the schema if needed, and runs the actual function (an API call, database query, calculation, file read, etc.). This execution happens entirely outside the model.

Step 4: Return the result:
The application appends the function output to the conversation with a role of tool (or function, depending on the provider), referencing the tool call ID so the model knows which call the result corresponds to.

Step 5: Final response:
The model receives the updated conversation including the tool result and generates its final text response, now grounded in real data.

This loop can repeat. If one tool result prompts the need for another tool call, the model can issue a second call before producing its final answer.

Parallel tool calls

Most modern providers support parallel tool calling: the model issues multiple tool calls in a single response turn rather than waiting for each result before requesting the next. This significantly reduces latency for tasks that require multiple independent pieces of information. A model asked to compare the weather in three cities can request all three weather lookups simultaneously.

Developers can disable parallel calls (setting parallel_tool_calls: false) when sequential execution is required for correctness or when downstream systems cannot handle concurrent requests.

04. Key Terms / Components

Term	Meaning
Tool / function	An external capability the model can request
Tool definition	The JSON Schema describing a tool's name, purpose, and parameters
Tool call	The model's structured output requesting a tool be executed
Tool result	The output of the executed function, returned to the model
`tool_calls`	The field in a model response containing one or more tool invocations
Parallel tool calls	Multiple tool calls issued by the model in a single turn
Strict mode	Schema enforcement that guarantees the model's output matches the parameter definition exactly
Tool call ID	A unique identifier linking a tool call request to its result

05. Examples

Live data lookup:
A user asks "Is the stock market up today?" The model calls a get_market_summary tool with today's date as a parameter. The application hits a financial API, returns the data, and the model summarizes it.

Code execution:
A coding assistant calls a run_python tool with a code snippet. The application runs it in a sandbox, returns stdout and stderr, and the model interprets the output to decide whether the code is correct.

Multi-tool agent:
A research agent is given the task "Find the top three papers on context compression published in 2025 and summarize each." It calls search_arxiv three times in parallel to retrieve abstracts, then calls summarize three more times with each abstract, then writes its final output.

Database query:
A business intelligence bot defines a query_database tool that accepts a SQL string. The model constructs the query based on the user's question, the application executes it, and the model formats the results into a human-readable answer.

06. How Tool Use Underpins Agents and MCP

Tool use is the atomic unit of agent action. The agent loop (perceive, plan, act, observe) maps directly onto the tool use loop: the model observes its context, decides which tool to call (planning), the application executes it (action), and the result returns to the model as an observation.

Model Context Protocol (MCP) is a layer above tool use. MCP standardizes how tool definitions are discovered from external servers, transported over the network, and managed across providers. When an AI host connects to an MCP server, it fetches the server's tool schemas and passes them to the model using the model's native tool use format. Under the hood, every MCP tool invocation is a tool use request.

The distinction: tool use is the mechanism inside one model/application pair. MCP is the protocol that makes tool definitions portable across providers and applications. Both are necessary. MCP without tool use has no way to invoke actions. Tool use without MCP requires hand-coding tool definitions for every application.

07. Common Pitfalls

Vague descriptions:
The model decides whether to call a tool based almost entirely on the description. Unclear descriptions lead to missed calls ("model didn't know when to use it") or wrong calls ("model used it for everything").
Overloaded tools:
A single tool that does too many things forces the model to reason about which behavior applies in each context. Narrow, single-purpose tools are easier for the model to use correctly.
No schema validation:
Even with strict mode, defensive validation in the application layer prevents bad arguments from reaching real systems.
Missing error handling in results:
If a tool call fails (API error, invalid query, timeout), returning a clear error message in the tool result allows the model to recover gracefully. Returning nothing or crashing silently causes the model to proceed with missing information.
Trusting the model's argument values blindly:
The model generates argument values based on patterns, not guarantees. Validate types, ranges, and required fields before executing any tool that has side effects (writes, deletes, sends).
Too many tools in a single context:
Sending hundreds of tool definitions to a model dilutes its ability to select the right one. Prune tool lists to what is relevant for the current task.