03. How It Works
Defining tools with JSON Schema
Every tool is defined by three things:
- Name. A unique identifier the model uses to refer to the tool (for example,
get_current_weather).
- Description. A plain-language explanation of what the tool does, when to use it, and any important constraints. The model reads this description to decide whether to call the tool. Poor descriptions lead to wrong or missed calls.
- Input schema. A JSON Schema object describing the parameters the function expects: their names, types, whether they are required, and any constraints (enums, ranges, formats).
Example tool definition:
{
"type": "function",
"name": "get_current_weather",
"description": "Get the current weather for a location. Call this when the user asks about current weather conditions.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name or coordinates, e.g. 'Tokyo' or '35.67,139.65'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["location"],
"additionalProperties": false
},
"strict": true
}
The strict: true flag (supported by OpenAI and others) enforces that the model's output exactly matches the schema, eliminating malformed calls.
The request/response loop
The full interaction follows these steps:
Step 1: Request with tools:
The application sends a message (or conversation history) to the LLM API along with the list of tool definitions. The model now knows what tools are available.
Step 2: Tool call response:
If the model decides a tool is needed, its response contains a tool_calls array rather than (or in addition to) a text reply. Each entry in the array includes the tool name and the model's chosen argument values, encoded as a JSON string.
Step 3: Application executes the function:
The application parses the tool call, validates the arguments against the schema if needed, and runs the actual function (an API call, database query, calculation, file read, etc.). This execution happens entirely outside the model.
Step 4: Return the result:
The application appends the function output to the conversation with a role of tool (or function, depending on the provider), referencing the tool call ID so the model knows which call the result corresponds to.
Step 5: Final response:
The model receives the updated conversation including the tool result and generates its final text response, now grounded in real data.
This loop can repeat. If one tool result prompts the need for another tool call, the model can issue a second call before producing its final answer.
Parallel tool calls
Most modern providers support parallel tool calling: the model issues multiple tool calls in a single response turn rather than waiting for each result before requesting the next. This significantly reduces latency for tasks that require multiple independent pieces of information. A model asked to compare the weather in three cities can request all three weather lookups simultaneously.
Developers can disable parallel calls (setting parallel_tool_calls: false) when sequential execution is required for correctness or when downstream systems cannot handle concurrent requests.