Lecture 03 — Tool Use & Function Calling¶
Track B · Agentic AI & GenAI | ← Lecture 02 | Next →
Learning Objectives¶
- Define tools with precise schemas that minimize hallucination
- Implement the tool-use loop correctly (the core of every agent)
- Handle tool errors gracefully
- Use parallel tool calls to reduce latency
- Understand safety boundaries for dangerous tools
1. What Is Tool Use?¶
Tool use lets the LLM call external functions — search the web, run code, query a database, control a browser. The model doesn't execute code; it outputs a structured JSON call that your application executes.
User message
↓
[LLM] → stop_reason: "tool_use" → tool call JSON
↓
Your code executes the tool
↓
Tool result sent back to LLM as "tool" role message
↓
[LLM] → stop_reason: "end_turn" → final answer
2. Defining Tools¶
Tool schemas are the most important thing to get right. Vague descriptions cause wrong calls; wrong input types cause parse errors.
import anthropic
tools = [
{
"name": "get_gpu_specs",
"description": (
"Retrieve detailed specifications for an NVIDIA or AMD GPU by model name. "
"Returns memory, bandwidth, compute, and TDP. "
"Use this when the user asks about specific GPU hardware capabilities."
),
"input_schema": {
"type": "object",
"properties": {
"model": {
"type": "string",
"description": "GPU model name, e.g. 'H100 SXM5', 'A100 PCIe 80GB', 'RX 7900 XTX'"
},
"metric": {
"type": "string",
"enum": ["memory", "bandwidth", "compute", "tdp", "all"],
"description": "Which specification to retrieve. Use 'all' if unsure."
}
},
"required": ["model"]
}
},
{
"name": "run_cuda_profiler",
"description": (
"Run Nsight Systems profiler on a CUDA kernel file and return performance metrics. "
"Use this to identify bottlenecks: memory bound vs compute bound."
),
"input_schema": {
"type": "object",
"properties": {
"kernel_path": {
"type": "string",
"description": "Absolute path to the .cu file"
},
"num_iterations": {
"type": "integer",
"description": "Number of profiling iterations (default: 100)",
"default": 100
}
},
"required": ["kernel_path"]
}
}
]
Schema writing rules:
| Rule | Why |
|---|---|
| Description explains when to call, not just what it does | Model decides whether to call based on this |
Use enum for constrained string values |
Eliminates typos and invalid inputs |
Mark truly optional fields with default, don't put in required |
Prevents unnecessary calls |
| Keep parameter names short and clear | Model generates parameter names verbatim |
3. The Tool-Use Loop¶
import anthropic
import json
from typing import Any
client = anthropic.Anthropic()
# Mock tool implementations
def get_gpu_specs(model: str, metric: str = "all") -> dict:
specs_db = {
"H100 SXM5": {"memory": "80GB HBM3", "bandwidth": "3.35 TB/s",
"compute": "989 TFLOPS FP16", "tdp": "700W"},
"A100 PCIe 80GB": {"memory": "80GB HBM2e", "bandwidth": "1.935 TB/s",
"compute": "312 TFLOPS FP16", "tdp": "300W"},
}
data = specs_db.get(model, {"error": f"GPU '{model}' not found in database"})
if metric == "all" or "error" in data:
return data
return {metric: data.get(metric, "unknown")}
def run_cuda_profiler(kernel_path: str, num_iterations: int = 100) -> dict:
# In production, actually run nsys/ncu here
return {
"kernel": kernel_path,
"avg_duration_ms": 2.34,
"memory_throughput_pct": 78.5,
"compute_throughput_pct": 31.2,
"bottleneck": "memory_bound",
"recommendation": "Improve memory access coalescing"
}
TOOL_MAP = {
"get_gpu_specs": get_gpu_specs,
"run_cuda_profiler": run_cuda_profiler,
}
def run_agent(user_message: str) -> str:
"""Core agent loop with tool use."""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=tools,
messages=messages
)
# Append assistant response to message history
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
# Extract final text response
for block in response.content:
if hasattr(block, "text"):
return block.text
elif response.stop_reason == "tool_use":
# Process all tool calls in this response
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_name = block.name
tool_input = block.input
# Execute the tool
if tool_name in TOOL_MAP:
result = TOOL_MAP[tool_name](**tool_input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
else:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"Error: tool '{tool_name}' not found",
"is_error": True
})
# Send tool results back to model
messages.append({"role": "user", "content": tool_results})
else:
break # Unexpected stop reason
return "Agent completed without response."
# Test it
answer = run_agent(
"Compare the H100 SXM5 and A100 PCIe 80GB memory bandwidth. "
"Which is better for memory-bound kernels?"
)
print(answer)
4. Parallel Tool Calls¶
When multiple independent tools are needed, Claude can call them simultaneously — reducing round-trips.
# Claude may return multiple tool_use blocks in one response
# Your loop must handle ALL of them before sending results back
for block in response.content:
if block.type == "tool_use":
# This may execute multiple times per response
result = TOOL_MAP[block.name](**block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
# Send ALL results in one message — critical!
messages.append({"role": "user", "content": tool_results})
With Python asyncio for true parallel execution:
import asyncio
import anthropic
async def execute_tool_async(tool_name: str, tool_input: dict, tool_id: str) -> dict:
# Run tool in thread pool (for blocking I/O tools)
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(None, lambda: TOOL_MAP[tool_name](**tool_input))
return {
"type": "tool_result",
"tool_use_id": tool_id,
"content": json.dumps(result)
}
async def process_tool_calls(tool_blocks: list) -> list:
tasks = [
execute_tool_async(b.name, b.input, b.id)
for b in tool_blocks if b.type == "tool_use"
]
return await asyncio.gather(*tasks)
Pro tip: For I/O-heavy tools (HTTP requests, database queries), parallel async execution can cut multi-tool latency by 3–5×.
5. Error Handling¶
Agents must handle tool failures gracefully — the LLM can recover if you give it good error messages.
def safe_tool_call(tool_name: str, tool_input: dict, tool_id: str) -> dict:
"""Execute a tool with error handling."""
try:
result = TOOL_MAP[tool_name](**tool_input)
return {
"type": "tool_result",
"tool_use_id": tool_id,
"content": json.dumps(result)
}
except KeyError as e:
return {
"type": "tool_result",
"tool_use_id": tool_id,
"content": f"Missing required parameter: {e}",
"is_error": True
}
except Exception as e:
return {
"type": "tool_result",
"tool_use_id": tool_id,
"content": f"Tool execution failed: {type(e).__name__}: {e}",
"is_error": True
}
When is_error: True is set, Claude will typically:
1. Acknowledge the error
2. Try a different approach (different parameters, different tool)
3. Ask the user for clarification if stuck
6. Tool Safety Patterns¶
Some tools are dangerous (delete files, send emails, execute shell commands). Add confirmation gates.
DANGEROUS_TOOLS = {"delete_file", "send_email", "execute_shell", "git_push"}
def run_agent_with_confirmation(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=all_tools,
messages=messages
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return next(b.text for b in response.content if hasattr(b, "text"))
elif response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
# Human-in-the-loop for dangerous operations
if block.name in DANGEROUS_TOOLS:
print(f"\n⚠️ Agent wants to call: {block.name}")
print(f" Input: {json.dumps(block.input, indent=2)}")
confirm = input(" Allow? (y/n): ").strip().lower()
if confirm != "y":
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": "User denied this action.",
"is_error": True
})
continue
result = safe_tool_call(block.name, block.input, block.id)
tool_results.append(result)
messages.append({"role": "user", "content": tool_results})
7. Tool Use with tool_choice¶
Force or restrict which tools the model uses:
# Force the model to use a specific tool
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
tools=tools,
tool_choice={"type": "tool", "name": "get_gpu_specs"}, # must call this
messages=[{"role": "user", "content": "Tell me about the H100."}]
)
# Allow any tool (default)
tool_choice={"type": "auto"}
# Prevent any tool use
tool_choice={"type": "none"}
Key Takeaways¶
- The tool loop: call API → check
stop_reason→ execute tools → append results → repeat - Send all tool results in one message — never split them
- Use
is_error: Truefor failures; the model will try to recover - Parallel tool calls are automatic — your code must handle multiple blocks per response
- Gate dangerous operations with human-in-the-loop confirmation
tool_choiceforces or prevents tool use — useful for structured extraction workflows
Exercises¶
- Build a weather + calculator dual-tool agent. Test it with a query that requires both tools in one response.
- Add retry logic to
safe_tool_call— retry up to 3 times with exponential backoff on network errors. - Implement a
max_tool_callssafety limit that stops the agent after N tool calls to prevent infinite loops.
Previous: Lecture 02 | Next: Lecture 04 — Agent Architecture Patterns