Tool & Function Calling

OpenAI/Anthropic Tool-Use APIs, Schema Design, Parallel Calls, Error Handling, Retry Logic

1. Overview & Why This Matters

Tool calling — also called function calling — is the mechanism that transforms an LLM from a text generator into an agent capable of interacting with the real world. Without it, an agent can only reason over information it was trained on or was given in the prompt. With it, agents can query live databases, execute code, call REST APIs, read files, send messages, and take actions in external systems.

Tool calling is deceptively simple to get started with and surprisingly deep to get right in production. The difference between a demo-grade tool-calling implementation and a production-grade one lies in how you handle schema design, parallel calls, ambiguous inputs, tool errors, retries, and security. This topic covers all of it.

Core question: How do you define, invoke, handle, and secure LLM tool calls such that your agent reliably executes the right actions with the right parameters and recovers gracefully from failures?

2. How Tool Calling Works Under the Hood

2.1  The Tool Call Lifecycle

Understanding the exact request-response cycle prevents a class of production bugs that stem from misunderstanding what the model is actually doing when it ‘calls a tool’.

TOOL CALLING LIFECYCLE    
1. Developer defines tools as JSON schemas in the API request     (name, description, input_schema with required/optional params)    
2. LLM receives user message + tool definitions in context     Model decides: respond in text OR call one/more tools  
3. If tool call: model returns stop_reason=’tool_use’ with     tool_use block: { id, name, input: {…} }     (Model does NOT execute the tool — it only requests it)    
4. YOUR CODE executes the tool with the provided input     Capture result (or error)    
5. Return tool result back to model as tool_result message     with matching tool_use_id    
6. Model continues reasoning with the result in context     May call more tools or produce final text response

The critical insight: the LLM never executes tools itself. It outputs a structured request. Your application code is responsible for execution, validation, error handling, and returning results. This puts all reliability engineering in your hands.

2.2  Tool Schema Design — The Most Impactful Thing You Control

The quality of your tool schema directly determines the quality of tool selection and parameter extraction. Treat every schema field as prompt engineering — the model reads descriptions as instructions.

# WEAK schema — causes wrong invocations and missing params tools = [{‘name’: ‘get_data’, ‘description’: ‘Get data’,           ‘WEAK schema — causes wrong invocations and missing params
tools = [{‘name’: ‘get_data’, ‘description’: ‘Get data’,
‘input_schema’: {‘type’: ‘object’, ‘properties’: {
‘id’: {‘type’: ‘string’},
‘type’: {‘type’: ‘string’}
}}}]
STRONG schema — precise, bounded, with examples
tools = [{
‘name’: ‘get_cell_kpi_timeseries’,
‘description’: ”’Retrieve KPI timeseries for a 5G NR cell.
Use when the user asks about cell performance, throughput,
PRB utilisation, or signal quality over a time window.
Do NOT use for alarm queries — use get_alarms instead.”’,
‘input_schema’: {
‘type’: ‘object’,
‘properties’: {
‘cell_id’: {
‘type’: ‘string’,
‘description’: ‘Cell Global Identity in format MCC-MNC-TAC-CID (e.g. 422-51-1234-56789)’
},
‘kpi_names’: {
‘type’: ‘array’,
‘items’: {‘type’: ‘string’,
‘enum’: [‘DL_THROUGHPUT_MBPS’,’UL_THROUGHPUT_MBPS’,
‘PRB_UTIL_DL’,’SINR_AVG’,’RRC_CONNECTED_USERS’]},
‘description’: ‘KPIs to retrieve. Always specify explicitly.’
},
‘window_minutes’: {
‘type’: ‘integer’, ‘minimum’: 5, ‘maximum’: 1440,
‘description’: ‘Lookback window in minutes. Default 60 if not specified by user.’
}
},
‘required’: [‘cell_id’, ‘kpi_names’]
}
}]

Schema design rules that matter most:

  • Use enum for bounded parameter values: If a parameter has a fixed set of valid values (KPI names, severity levels, actions), use enum in the schema. This eliminates hallucinated parameter values — a major source of tool errors.
  • Write negative descriptions: Tell the model what the tool does NOT do. ‘Do not use for X — use Y instead’ prevents wrong tool selection on ambiguous queries.
  • Include example values in descriptions: For ID formats, timestamps, or domain-specific strings, show the exact format in the description. The model will follow the format more reliably than any format: directive.
  • Mark required vs optional clearly: Only include fields in required that are truly required. Optional fields with good defaults let the model succeed even when the user doesn’t specify everything.

2.3  Parallel Tool Calls

Modern LLMs (GPT-4o, Claude 3.5+, Gemini 1.5+) can request multiple tool calls simultaneously in a single response. This is critical for performance — instead of sequential round-trips, the model batches independent lookups.

Claude parallel tool call response (simplified)
response.content = [
{‘type’: ‘text’, ‘text’: ‘Let me check both cells simultaneously.’},
{‘type’: ‘tool_use’, ‘id’: ‘tu_01’, ‘name’: ‘get_cell_kpi_timeseries’,
‘input’: {‘cell_id’: ‘422-51-1234-56789’, ‘kpi_names’: [‘DL_THROUGHPUT_MBPS’]}},
{‘type’: ‘tool_use’, ‘id’: ‘tu_02’, ‘name’: ‘get_cell_kpi_timeseries’,
‘input’: {‘cell_id’: ‘422-51-1234-56790’, ‘kpi_names’: [‘DL_THROUGHPUT_MBPS’]}}
]
WRONG: naive handler that only processes first tool_use block
RIGHT: collect ALL tool_use blocks, execute in parallel, return ALL results
import asyncio
tool_calls = [b for b in response.content if b[‘type’] == ‘tool_use’]
results = await asyncio.gather(*[execute_tool(tc) for tc in tool_calls])
Return ALL results with matching IDs
tool_results = [{‘type’: ‘tool_result’, ‘tool_use_id’: tc[‘id’],
‘content’: str(result)}
for tc, result in zip(tool_calls, results)]

3. Error Handling & Retry Patterns

3.1  Tool Error Passback — Never Hide Failures

The single most important rule: always return tool errors back to the model as tool_result messages, not as exceptions that terminate the loop. The model can reason about errors and adapt — but only if it sees them.

async def execute_tool_safe(tool_call: dict) -> dict:     try:         result = await dispatch_tool(tool_call[‘name’], tool_call[‘input’])         return {             ‘type’: ‘tool_result’,             ‘tool_use_id’: tool_call[‘id’],             ‘content’: json.dumps(result)         }     except ToolTimeoutError as e: async def execute_tool_safe(tool_call: dict) -> dict:
try:
result = await dispatch_tool(tool_call[‘name’], tool_call[‘input’])
return {
‘type’: ‘tool_result’,
‘tool_use_id’: tool_call[‘id’],
‘content’: json.dumps(result)
}
except ToolTimeoutError as e:
return {
‘type’: ‘tool_result’,
‘tool_use_id’: tool_call[‘id’],
‘is_error’: True,
‘content’: f’Tool timed out after {e.timeout}s. Try a smaller time window.’
}
except ToolValidationError as e:
return {
‘type’: ‘tool_result’,
‘tool_use_id’: tool_call[‘id’],
‘is_error’: True,
‘content’: f’Invalid parameter: {e.param}. Valid values: {e.valid_values}’
}
except Exception as e:
return {
‘type’: ‘tool_result’,
‘tool_use_id’: tool_call[‘id’],
‘is_error’: True,
‘content’: f’Tool execution failed: {str(e)}’
}
excel.Quit()

3.2  Retry Logic & Exponential Backoff

Tools fail. APIs time out. Databases have transient errors. Your tool execution layer needs intelligent retry logic that distinguishes retryable failures from permanent ones.

Error TypeRetryable?Strategy
HTTP 429 Rate LimitYesExponential backoff with jitter. Respect Retry-After header.
HTTP 503 / 504 TimeoutYes (limited)Retry up to 3x with backoff. After 3 failures, return error to model.
HTTP 400 Bad RequestNoReturn error to model immediately — retrying won’t help. Model should reformulate.
HTTP 401 / 403 Auth ErrorNoReturn auth error to model. Do not retry — escalate to human or fail gracefully.
Tool Validation ErrorNoReturn specific validation message. Model can correct the parameter.
Empty / Null ResultContext-dependentNot an error — return ‘no data found for these parameters’. Model adapts.

3.3  Tool Security Considerations

Tool calling introduces a direct code execution risk surface. Any tool that takes external input and acts on a system must be hardened:

  • Validate all inputs at the tool layer: Never trust the model’s input as safe. Validate types, ranges, and formats in your tool implementation regardless of schema constraints.
  • Implement least-privilege tool access: A read-only tool should never have write credentials. Separate tool permission scopes explicitly.
  • Audit every tool call: Log tool name, input parameters, calling agent identity, and result to an immutable audit log. This is mandatory for enterprise and regulatory environments.
  • Prompt injection via tool results: A malicious external system could return content in a tool result designed to hijack the agent’s reasoning. Sanitise tool results before injecting them back into the model context.

4. Market Landscape: Tool Calling in 2025

Provider / ToolTool Calling Capabilities
Anthropic Claude 3.5+Parallel tool calls, tool_choice forcing (auto/any/specific), extended thinking with tools, JSON-validated inputs
OpenAI GPT-4oParallel calls, strict mode (schema enforcement), response_format JSON schema, function calling v2
Google Gemini 1.5/2.0Function calling, auto tool selection, code execution tool built-in, grounding with Google Search
Ollama + Qwen2.5/Llama3Tool calling via OpenAI-compatible API. Reliability varies by model — test your specific tools per model.
Instructor (Python lib)Forces structured output via tool calling. Validates against Pydantic models. Best for schema-strict outputs.
LangChain ToolsTool abstraction layer, BaseTool/StructuredTool classes, built-in retry and error handling wrappers
Emerging pattern — Tool Choice Forcing: Use tool_choice=’required’ (Anthropic) or function_call=’force’ (OpenAI) when you need the model to always use a tool rather than respond in text. Essential for structured data extraction tasks where you cannot accept a free-text response.

5. Failure Modes & Pitfalls

FailureCauseFix
Wrong tool selectedAmbiguous tool descriptions, overlapping tool namesWrite distinct descriptions with explicit NOT-use cases; add discriminating examples
Hallucinated parametersModel invents parameter values not in the inputUse enum constraints; include example values in description; validate in tool layer
Silent parallel call dropHandler only processes first tool_use blockAlways collect ALL tool_use blocks in response; execute and return all results
Tool result mismatchWrong tool_use_id returned with resultAlways match results to calls by ID; never assume order
Infinite tool loopModel keeps calling tools without producing final answerSet max_tool_calls limit; detect repeated (tool, same_input) pattern
Injection via tool resultExternal API returns adversarial content in resultSanitise tool results; wrap in structured schema before returning to model

6. Expert Tips & Quick Reference

Expert Practitioner Tips
1. Write tool descriptions like API docs for a junior developer — explicit, with examples, with explicit NOT-use cases. This is the highest-ROI prompt engineering you can do.
2. Use strict/enum parameter schemas in production. Hallucinated parameter values are the #1 source of tool execution errors in production agents.
3. Always handle parallel tool calls. A single-call handler silently drops tool calls when the model batches them — this is invisible and hard to debug.
4. Return all errors to the model as tool_result with is_error:true. Models recover gracefully from errors they can see; they hallucinate solutions to errors they can’t.
5. Audit every tool call with agent identity, inputs, outputs, and timestamp. In enterprise environments this is a compliance requirement, not a nice-to-have.
6. Test tool calling with adversarial inputs — what happens if the model passes an empty string, a negative number, or an SQL injection string to your tool? Your tool layer must handle it.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top