Tool Calling Is Just JSON — How LLMs Actually "Use" Tools Without Running Anything

Every AI agent demo looks like magic. The model searches the web, runs code, queries a database, sends an email — all by itself. Naturally, most people assume the LLM is actually executing these tools.

It's not. It's printing JSON.

That's the dirty secret of tool calling. The most important capability in modern AI — the thing that makes agents, copilots, and every useful AI product work — is just the model outputting structured text in a specific format. Your code does the rest.

Once you understand this, everything about AI agents clicks.

The Misconception Everyone Has

When someone says "GPT-4 can search the web" or "Claude can run bash commands" — it sounds like the model has superpowers. Like it's reaching out into the internet or executing code on a server somewhere.

Here's what's actually happening:

You tell the model what tools exist (as JSON schemas)
The user asks a question
The model decides a tool would help
Instead of generating text, it outputs a structured JSON object — the function name and arguments
Your code executes the function and sends the result back
The model uses the result to generate its final answer

The model never touches the tool. It never executes code. It never makes an HTTP request. It writes a JSON object that says "hey, call this function with these arguments" — and then your application does the actual work.

That's it. The entire multi-billion-dollar agent ecosystem runs on this pattern.

How You Define Tools

Every tool is a JSON Schema definition. You describe the function name, what it does, and what parameters it accepts — with types and constraints.

Here's a real example:

{
  "name": "get_weather",
  "description": "Get the current weather for a location",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string",
        "description": "City and state, e.g. San Francisco, CA"
      },
      "unit": {
        "type": "string",
        "enum": ["celsius", "fahrenheit"]
      }
    },
    "required": ["location"]
  }
}

That's the contract. You're telling the model: "This tool exists. It takes a location string and an optional unit. Here's what it does." The model reads this schema alongside the user's message and decides whether to call it.

The description matters more than you think. The model uses it to decide when to use the tool, not just how. A vague description means the model picks the wrong tool at the wrong time. A precise description means it nails it.

The Loop That Powers Every Agent

The actual execution flow is a while-loop. Seriously — the architecture behind Claude Code, Cursor, ChatGPT plugins, and every serious AI agent is roughly this:

while model returns tool calls:
    for each tool call:
        execute the function
        collect the result
    send results back to the model
    get next response

return final text response

Four lines of pseudocode. That's the skeleton of every coding agent that's making headlines right now. A speaker at a recent Anthropic architecture talk described it as "give it tools and then get out of the way."

The model keeps calling tools until it has enough information to answer. It might call one tool, get a result, realize it needs more data, call another tool, and chain them together — all within the same loop.

What the Model Actually Outputs

When the model decides to use a tool, it stops generating normal text and outputs something like this:

OpenAI format:

{
  "role": "assistant",
  "tool_calls": [{
    "id": "call_abc123",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"Tokyo, Japan\"}"
    }
  }]
}

Anthropic format:

{
  "role": "assistant",
  "content": [{
    "type": "tool_use",
    "id": "toolu_01A09q90",
    "name": "get_weather",
    "input": {"location": "Tokyo, Japan"}
  }]
}

Spot the difference? OpenAI returns arguments as a JSON string you need to parse. Anthropic returns a parsed object directly. This is the #1 migration bug when switching providers — your code does JSON.parse() on something that's already an object, or doesn't parse something that's still a string.

Also notice — the model generates a unique ID for each tool call. This is how you match results back when the model makes multiple calls in parallel.

Parallel Tool Calls — The Speed Trick

Modern models don't just call one tool at a time. If the user asks "What's the weather in Tokyo and New York?" — the model outputs two tool calls in a single response. Your code runs both, returns both results, and the model synthesizes them.

This is what makes agents fast. Instead of:

call weather(Tokyo) → wait → call weather(New York) → wait → answer

You get:

call weather(Tokyo) + weather(New York) → wait once → answer

Claude Code does this constantly — reading multiple files, running grep and glob in parallel, checking git status and diff simultaneously. Same loop, just batched tool calls.

The Three Ways Models Fail at Tool Calling

Tool calling isn't perfect. The model is still guessing which JSON to output. Common failure modes:

1. Wrong tool choice. The model picks "search_web" when it should've queried the database. Fix: better descriptions. If two tools sound similar, the model gets confused.

2. Wrong arguments. Missing required fields, wrong types, hallucinated enum values. Fix: strict JSON Schema with explicit constraints. Mark required fields. Use enums instead of free-text where possible.

3. Right tool, wrong timing. The model calls a tool before it has enough context from the user. It asks for the weather without confirming which city the user means. Fix: add a required array and let validation reject incomplete calls — the model learns to ask clarifying questions instead.

Function Calling vs MCP — The 2026 Split

You'll hear about MCP (Model Context Protocol) everywhere in 2026. Here's the difference:

Function calling = you define tools inline in every API request. The schemas travel with every message. Simple to set up, but scales poorly — 50 tools means 50 schemas bloating every request.

MCP = tools live on separate servers. The model discovers them through a protocol. Add tools without changing your code. Swap providers without rewriting schemas. Scales to hundreds of tools.

Function calling is the mechanism. MCP is the infrastructure. MCP still uses tool calling under the hood — it just standardizes how tools are discovered, connected, and shared across providers.

If you're building a quick prototype with 3-5 tools — function calling is fine. If you're building a production agent with dozens of integrations — MCP saves you from schema sprawl.

Why This Matters More Than the Model

Here's the counterintuitive thing: the quality of your tools matters more than the quality of your model.

A mediocre model with well-designed tools (clear descriptions, tight schemas, good error messages) will outperform a frontier model with sloppy tools every time. The model can only be as good as the contracts you give it.

This is why Harness Engineering — the system around the model — is the real skill. The model is a reasoning engine that outputs JSON. Everything else is your problem.

The One-Liner

LLMs don't call tools. They print JSON that says "please call this tool." Your code does the work. Understanding that one sentence is the difference between using AI agents and building them.

Now go define some schemas and close the loop — that's where the magic actually happens.