A Coordinator That Executes Stops Coordinating

The defining mistake in multi-agent design is building an orchestrator that does work. It feels natural: the orchestrator is the smartest, most context-rich component in the system, so why not let it handle the hard parts itself and delegate only the scraps? Because the moment the coordinating agent starts executing, it stops being able to coordinate well. Its context fills with the particulars of whatever task it took on. Its attention narrows to the thing in front of it. The plan it was supposed to be holding, the state it was supposed to be tracking, the results it was supposed to be checking all degrade because the component responsible for them is busy being a worker. An orchestrator is a role defined by what it refuses to do. It holds the plan, parcels out the work, and reconciles what returns, and it keeps its hands off the work itself so that it can stay good at the only job that is actually hard to centralize: holding the whole picture.

The reason has nothing to do with tidiness. A multi-agent system has exactly one component with a view across all the subtasks, and that component is the orchestrator. Everything that requires global knowledge, deciding what runs next, judging whether a result is acceptable, choosing how to respond to a failure, has to happen there because nowhere else has the information to do it. The instant the orchestrator dilutes that role by also executing, it weakens the one capability that cannot be delegated. Coordination is not a lightweight wrapper around the real work. It is the real work, and it deserves a component that does nothing else.

The boundary that makes everything else possible

The clean separation between coordination and execution shows up most concretely in what each component is even allowed to know. The orchestrator holds everything whose meaning depends on more than a single subtask: the plan, the live ledger of what has finished and what is still in flight, and the decision rules that only make sense against that ledger, how partial results get reconciled into one answer and how a failure gets answered. None of that can be pushed downward, because each piece is defined by spanning the whole run rather than any one step of it.

A subagent gets the opposite arrangement. Its context is an allowlist. It starts empty and contains exactly what the orchestrator chose to place there for the task at hand, and nothing more. The plan, the accumulated history, a sibling’s output, none of it arrives unless someone deliberately decided it should, and the standing default is to pass nothing past what the task in front of it requires. This is not an incidental property of how subagents happen to be spawned. It is the mechanism that keeps the system predictable. When every piece of a subagent’s context got there by an explicit choice, you can reason about that subagent in isolation, and you can be sure that one subtask is not quietly poisoning another through shared state nobody is watching. The isolation is what makes the system inspectable. Let subagents reach into global state instead and you lose the ability to say what any one of them will do.

There is a cost to concentrating all of this in one place, and it is worth naming rather than hiding. A system whose coordination, state, and decision-making all live in a single component has put a great deal of weight on that component. If it stalls, the system stalls. If it is overwhelmed, nothing else can route around it. That concentration of risk is the price of centralized coordination, and the various shapes a multi-agent system can take are largely different answers to how much of that price to pay and where. That tradeoff is its own subject. What matters here is that the orchestrator’s role is defined by holding the responsibilities that require global knowledge, and that doing so deliberately concentrates both capability and risk in one place.

An under-built orchestrator fails silently

It is tempting to treat coordination as plumbing, something that falls out of wiring a few agents together. It does not. The orchestrator’s responsibilities are behaviors, and a behavior that is not deliberately built is simply absent. Deciding what work exists and who should handle it, vetting and combining what comes back, choosing a response to failure: none of these appear on their own, and a system that leaves any of them implicit fails in a way that is unusually hard to diagnose, because the missing behavior never announces itself.

That is the trap. An orchestrator with a hollow function does not crash. It runs end to end and hands back a confident, wrong result, because the function was present in name and empty in substance. A routing step that never really decided anything still sends work somewhere. A synthesis step that never really checked anything still produces an answer. The system looks like it worked, and the defect only shows up later, downstream, where it is expensive to trace and far from its cause. Reliability at scale is mostly a matter of making each of these behaviors real rather than assuming the wiring supplies them.

Routing is a scoping problem before it is a matching problem

Most subagent failures get blamed on the subagent. It must be underpowered, or badly instructed, or the wrong tool for the job. Usually the fault is older than that. A subagent never picks what it works on. It runs whatever it is handed, framed exactly as it was handed, which means every property of the result was decided upstream before the subagent ever saw the task. When the work was never scoped tightly enough to succeed at, no model, however capable, recovers cleanly, because there was nothing in the assignment that said what success even looked like.

So the quality of an assignment is mostly the quality of scoping, and scoping happens before any routing does. A usable assignment pins down three things the subagent cannot supply for itself: what counts as finished, what it has to work with, and what the answer has to look like coming back. Get those right and routing is nearly mechanical, a matter of pointing a well-defined need at a handler that can meet it. Get them wrong and no cleverness in the matching step rescues the result, because the work was malformed before anyone tried to place it. This sits entirely with the orchestrator. A subagent cannot make up the difference, because it never sees the larger goal the assignment was cut from. It sees only the cut, and it inherits every flaw in how the cut was made.

Combining results needs the view no single agent has

Pulling results together looks like the easy part, which is exactly why weak implementations fail there. It is really three acts, and the middle one is the one that gets quietly dropped: collecting the outputs as subagents finish, checking that each is actually what it claimed to be, and only then weaving the survivors into an answer to the original goal. Skipping the check is tempting because it usually gets away with it. Most of the time a subagent returns something plausible, and stitching plausible parts together yields a plausible whole.

The bill comes later and somewhere else. An output that was malformed, or well-formed but wrong, gets woven into the final answer, and the defect travels downstream until its origin is no longer visible. By the time anyone notices, the result is a synthesis of many subtasks with nothing pointing back to which one was bad. Checking at the moment of collection is what keeps a bad result contained to the step that made it, while it can still be retried on its own, instead of letting it dissolve into a final answer that is uniformly hard to trust. And the check has to live with the orchestrator for a reason that is structural, not conventional. No subagent is positioned to perform it. Each one sees its own output and is blind to the rest, so none can tell whether two results contradict each other or whether the assembled set still misses the goal. That judgment needs every result in view at once, and exactly one component ever has that.

Only the coordinator can decide what a failure means

When a subagent fails, something has to choose what happens next, and that choice is the orchestrator’s. Three responses are on the table. Retry the work, on the bet that the failure was a transient hiccup and a second attempt, perhaps with sharper instructions, gets through. Reassign it to a different agent or tool, when the first is unavailable or simply not equal to the task. Or raise it to a person, when nothing automatic is the right answer. Choosing among these takes facts the failed subagent was never given: whether this has failed before, whether an alternative handler exists, how much this particular subtask matters to the overall goal, how much time or budget is left.

That is why recovery is the coordinator’s call and never the worker’s. A subagent’s entire responsibility when it fails is to say so, clearly and with enough detail to act on. What happens next is not its decision to make, because it lacks the cross-task view that a good decision depends on. A subagent that starts handling its own failures, retrying on its own, quietly routing around a problem, swallowing an error and returning something degraded, is making choices the orchestrator cannot see and therefore cannot account for. The system turns unpredictable, not because anything crashed, but because decisions are being made in places that were never meant to make them. Keeping recovery centralized is what keeps the system’s response to failure something you can reason about and audit. That division is what makes failure handling a designed behavior instead of an emergent accident.

The orchestrator has to record its own reasoning

A multi-agent system fails in ways that are genuinely hard to reconstruct after the fact. The work was spread across several agents, each with its own context, running in some order, and when the final result is wrong the question is not only what went wrong but where, in a process that left little trace of itself. The defense is to build the orchestrator so that it records its reasoning as it goes. The cheap version logs what happened: which subtask ran, what it returned. The version that survives a real incident logs: why this assignment went to this handler, what the run’s state was as each step completed, and what the system was looking at when something broke.

With that on record, a wrong answer becomes traceable. You can follow it back to the assignment that produced it, ask why that assignment was framed and routed the way it was, and see the state the run was in when it happened. Without it, debugging collapses into running the whole thing again and hoping the failure shows up a second time, which for anything nondeterministic is a poor wager. Observability here is not a production-monitoring nicety bolted on at the end. It is a property the orchestrator has to be built with, because the orchestrator is the one component placed to know why the system did what it did, and a coordinator that cannot explain its own choices has thrown away the main advantage of centralizing them in the first place.

Coordination is the whole job

The orchestrator is the component that makes a multi-agent system more than a pile of agents. It carries the plan, decides what work exists and who handles it, vets and reconciles what returns, and answers for failures when they come. What ties those together is that each one needs the whole picture, and the orchestrator is the only place that picture exists. The single discipline that keeps the system reliable is refusing to let that component drift into execution. The moment it starts doing the work, its context narrows, its global view degrades, and the responsibilities that only it can hold begin to slip. Keep coordination and execution in separate hands and the system stays something you can understand, extend, and recover when it breaks. Blur them and you have built a worker that is also supposed to be a manager, and it will do neither job well. The orchestrator earns its place by holding the whole and touching none of it.