There's a version of Claude you've probably used: you type something, it replies, the conversation ends. That's Claude as a chatbot.
Then there's a different version: you give it a goal, it figures out what tools it needs, calls them in the right order, handles errors when things go wrong, and returns a finished result. That's Claude as an agent.
Anthropic's managed agents infrastructure is the scaffolding that makes the second version possible — and it's significantly more practical than the DIY approach most developers were cobbling together eighteen months ago.
What "managed" means here
When developers first started building agents with the Claude API, the agentic loop was entirely on them. You'd call the API, get a response with a tool call, execute the tool, pass the result back, repeat. Simple in theory. In practice, you were writing a lot of error handling, retry logic, and state management that had nothing to do with the actual task.
Anthropic's managed agents offload most of that. The agent loop — the cycle of reasoning, tool calling, and responding — is handled by Anthropic's infrastructure. You define the tools, set the goal, and the system runs the loop. You're not managing the back-and-forth manually.
This is the "managed" part. The loop runs on Anthropic's side, not yours.
The building blocks
Tools
Every managed agent has access to a set of tools — functions it can call to interact with the world. These are defined by you, using the standard Claude tool-use schema.
A tool is a JSON description of a function: what it's called, what parameters it takes, and what it does. The agent decides when to call each tool and with what arguments. Your code executes the tool and returns the result. The agent uses that result to decide what to do next.
Tools can be anything: web search, database queries, file reads, API calls, sending emails, creating calendar events. If you can write a function for it, it can be a tool.
{
"name": "search_crm",
"description": "Search the CRM for client records by name or email",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Name, email address, or company to search for"
}
},
"required": ["query"]
}
}
The description is not a formality. It's how the agent decides when to call a tool. Write it like you're briefing a capable human who hasn't seen your codebase.
System prompt
The system prompt defines what the agent is and what it's trying to accomplish. This is where you set role, constraints, tone, and scope. A good system prompt for an agent is more directive than one for a chatbot — you want to be explicit about what the agent should and shouldn't do, what counts as a completed task, and when it should stop rather than guess.
The agent loop
Once an agent has tools and a goal, the loop works like this:
- Agent receives the task and any initial context
- Agent reasons about what needs to happen
- If a tool is needed, the agent emits a tool call
- Your code executes the tool and returns the result
- Agent processes the result and continues reasoning
- Loop repeats until the agent produces a final response or hits a stopping condition
Managed agents handle steps 2–5 automatically. Your responsibility is providing the tools (step 4) and defining when the loop terminates.
Multi-agent setups
Single-agent loops handle a lot. But some tasks are better decomposed: a research task where one agent gathers information and another synthesises it, a content pipeline where agents handle different stages, a monitoring system where a coordinator dispatches subtasks to specialists.
Anthropic's managed infrastructure supports this through agent-to-agent handoffs. One agent can spawn a subagent, pass it a task and relevant context, and receive the result. The orchestrating agent doesn't need to manage the subagent's loop — that's handled the same way a single agent's loop is handled.
This model works well when:
- Tasks have genuinely independent components that can run in parallel
- Different stages require different tools or context
- You want a single coordinating agent deciding what to delegate and when
It adds complexity. Don't use multi-agent setups because they feel more sophisticated. Use them when a single agent genuinely can't hold all the context it needs, or when parallelism meaningfully changes the economics of the task.
If you want a higher-level management layer on top — org charts, monthly budgets, approval gates — Paperclip is an open-source option built specifically for that problem.
What the SDK looks like
Anthropic's Python and TypeScript SDKs expose the agent infrastructure without requiring you to wire up the loop manually.
A minimal Python setup:
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
]
messages = [{"role": "user", "content": "What's the weather in Melbourne right now?"}]
# Agent loop
while True:
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
print(response.content[-1].text)
break
# Process tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
The execute_tool function is yours — it routes tool calls to your actual implementation. Everything else is the managed loop.
Context management
Long-running agents accumulate context fast. Every tool result goes into the message history. After enough iterations, you're pushing large payloads on every API call, and the agent may start to lose focus on the original goal.
The standard patterns for managing this:
Summarisation. Periodically summarise the conversation history into a compact representation and replace the full history with the summary. The agent loses some detail but maintains working context.
Selective retention. Keep tool results only when they were used in a meaningful decision. Intermediate reasoning steps that didn't change the outcome can be dropped.
Explicit memory tools. Give the agent a tool to write notes to a persistent store. The agent manages its own memory explicitly rather than relying on the conversation window.
For most practical tasks that complete in under 20 tool calls, context management isn't a concern. For longer-running agents — research tasks, multi-day workflows, processes that resume across sessions — it's something you need to design for upfront.
Prompt caching
Managed agents make heavy use of the same context repeatedly: the system prompt, tool definitions, and often a large block of background knowledge don't change between calls.
Anthropic's prompt caching lets you mark these stable sections with a cache_control parameter. On subsequent calls within the cache window, these sections are served from cache rather than re-processed. For agents that make 20 or 30 API calls per task, this meaningfully reduces both latency and cost.
system = [
{
"type": "text",
"text": "You are a research assistant...",
"cache_control": {"type": "ephemeral"}
}
]
The cache window is five minutes by default. For long-running agents, you may need to refresh the cache before it expires. Build the refresh into your agent loop if the task can exceed that window.
When managed agents are the right tool
Managed agents are appropriate when:
The task requires multiple steps and tool calls. Single-shot tasks don't need an agent loop. If the answer fits in one response, use a standard API call.
The steps aren't fully predictable in advance. If you know exactly which tools will be called in exactly which order, you don't need an agent — you need a pipeline. Agents are useful when the path to the result depends on intermediate findings.
Errors need handling mid-task. An agent can recognise when a tool call fails, try a different approach, or ask a clarifying question. A rigid pipeline can't.
The task has meaningful decision points. Research tasks, customer support with backend access, code generation that needs to test its own output — these benefit from reasoning between steps.
When they're not
Real-time, latency-sensitive tasks. The agent loop adds latency. If response time is critical, a pre-defined call sequence is faster.
Simple extraction or generation. If the task is "summarise this document" or "translate this text," a single well-crafted prompt does it. Adding an agent loop is complexity without benefit.
Tasks where mistakes are expensive. Agents are capable but they're not infallible. If an agent mistake means an incorrect email goes to a client, or a payment gets processed for the wrong amount, you need human review in the loop — at which point you're building a workflow, not an autonomous agent.
When you haven't tested the tools. An agent is only as reliable as the tools it has access to. If your tool implementations are flaky, your agent will be flaky too. Build and test tools independently before connecting them to an agent.
The practical starting point
If you're new to managed agents, start with a single-agent loop and two or three tools that handle a task you currently do manually. Don't start with multi-agent coordination — that's a second-order complexity that makes sense only once you've seen the single-agent pattern work.
Define the system prompt carefully. Describe what success looks like, not just what the role is. Include explicit stopping conditions. Tell the agent what to do when it hits ambiguity rather than letting it guess.
Log everything. The agent's reasoning, every tool call, every result. Not for debugging — for understanding. The first time you see an agent take an unexpected path through a task, you'll want to know why. The logs are how you find out.
Measure cost from day one. Agents can make a lot of API calls on a single task. Know what a typical task costs before you scale usage — these habits apply just as much to agentic workflows as to regular prompting.
The broader picture
Anthropic's managed agent infrastructure isn't unique in the market — every major AI provider has something similar now. What matters is whether you're using it for the right problems.
Agents are useful when reasoning across multiple steps actually changes the outcome. They're overhead when you already know the steps. The managed infrastructure removes a lot of the implementation friction, but it doesn't change the underlying question: does this task actually benefit from autonomous reasoning, or does it just need a well-structured prompt?
Answer that first. Build second.