03. How It Works
The core patterns
Orchestrator-worker (lead plus subagents):
The most common shape, and the one behind Anthropic Research. A lead breaks the task down, delegates pieces to workers that run in parallel, then synthesizes their results, deciding at runtime what needs doing.
Handoffs (one agent passes control):
Like a phone tree, a triage agent hands the conversation to a specialist who takes over. In the OpenAI Agents SDK a handoff is shown to the model as a tool, so passing work to a refund agent appears as a tool named transfer_to_refund_agent, and the receiving agent sees the earlier history by default.
Parallel fan-out:
Several subtasks run at the same time and their results are gathered at the end, either as different slices of one job, or as the same task sent to several agents whose answers are compared or voted on.
Sequential pipeline:
An assembly line where agent one's output feeds agent two, whose output feeds agent three. This fits work where each step builds on the last.
Hierarchical (a manager delegates and checks):
A manager agent, sometimes layered managers, assigns work and validates it. CrewAI's hierarchical process auto-creates a manager to delegate and check outputs, and Microsoft's Agent Framework calls its manager-coordinated version the magentic pattern.
Framework specifics and versions live in ai-frameworks-and-tooling.
OpenAI groups these into two families. A manager pattern keeps one agent in control, calling specialists as tools. A decentralized pattern lets agents act as peers and hand tasks to one another.
When it helps vs when it hurts
Multi-agent orchestration earns its cost on a specific kind of work.
Anthropic says it excels at valuable tasks that can be split into parallel work or that exceed a single context window, and that may also lean on many complex tools (see tool-use-function-calling and mcp). It suits breadth-first queries that explore many directions at once.
The cost is the other half. A single agent uses about 4x the tokens of a normal chat, and a multi-agent system about 15x, so Anthropic says it only pays off when the task is valuable enough. Much of the gain is that spending. On BrowseComp, token usage alone explained 80% of the performance variance, and token usage plus tool calls and model choice explained 95%.
Where it hurts is tightly coupled work. Anthropic notes most coding tasks have fewer truly parallelizable pieces than research, and domains needing all agents to share context are not a good fit. One day before Anthropic's post, Cognition's Walden Yan, co-founder of the maker of the Devin coding agent, published "Don't Build Multi-Agents" from the coding side. His argument is that context should be shared, including full agent traces, and that every action carries implicit decisions, so parallel agents that cannot see each other's work make conflicting choices. In his Flappy Bird example, one subagent builds a Super Mario style background while another builds a mismatched bird.
He defaults to a single-threaded linear agent, the shared-context problem at the heart of context-engineering.
The two camps agree more than the headlines suggest. Multi-agent shines on broad, parallel research where many threads read at once. Single-threaded shines on tightly coupled work like coding, where every step depends on shared context. The rule both reach is to validate a single agent first and add more only when it demonstrably falls short, which OpenAI frames as maximizing one agent before splitting for complex logic or too many tools.