Every Agent Runs the Same Four-Phase Loop, Whether You Designed It or Not

An agent is not a clever prompt or a powerful model. It is a loop. Strip away the framework, the tool catalog, and the system prompt, and what remains is a cycle that repeats: the model reads its situation, decides on one action, that action runs, and the result is folded back into what the model reads next time. Perceive, reason, act, observe, and then perceive again. This loop is running underneath every agentic system, regardless of whether its designers thought about it explicitly. The systems that behave well are the ones built by people who did.

Treating the loop as the unit of design changes how you reason about an agent. Most problems that get blamed on the model, the agent “got confused,” “went in circles,” “lost the thread,” are not failures of intelligence. They are failures of one of the four phases, and they are diagnosable as such once you know which phase you are looking at. The loop is the right altitude for designing an agent, debugging it when it misbehaves, and auditing it when you need to explain what it did. Each phase carries a distinct accountability, and together they form a feedback cycle rather than a pipeline. That cyclic shape, not any single phase, is what gives agents their power and most of their failure modes.

The phases are roles, not steps in a script

It is tempting to read “perceive, reason, act, observe” as four sequential steps, like a flowchart. That reading is close enough to be useful and wrong in a way that matters. They are better understood as four distinct responsibilities, each transforming a specific input into a specific output, chained so that every phase consumes what the previous one produced. The sequence repeats, but the important structure is what each phase is accountable for. When an agent behaves strangely, the diagnostic question is never “what went wrong” in the abstract. It is “which phase produced the bad output, and what did it have to work with.”

Perception is active reading

Perception is the phase where the model takes in its current state by reading everything in its working context. That includes the original instructions, the accumulated history of the session, anything loaded in at the start, and, most consequentially, every result returned by every action taken so far. From that material the model has to reconstruct where it is: what has been attempted, what is now known, and what remains unresolved.

Calling this “reading” undersells it. The work is closer to triage than to intake. Out of everything sitting in context, the model has to pick the few facts that actually constrain its next move and let the rest recede. This is why perception is the phase most often blamed on the model when the real fault is the designer’s. If the relevant fact is buried under thousands of tokens of verbose tool output, perception degrades, not because the model is incapable but because the signal it needed to find was drowned. An agent that “ignores” an instruction it was given twenty steps ago is frequently an agent whose perception phase is being asked to find a needle in a context that has filled with hay. The lever here is not a better model. It is controlling what perception has to read.

Reasoning is where the agent stops following a script

With that picture assembled, the model commits to a next move. This is the phase that separates an agent from a fixed program. In a fixed program, the third instruction runs after the second because someone wrote it in that order. An agent, in its reasoning phase, weighs what it just perceived against the goal it is pursuing and chooses the action that best advances it. The choice is made fresh each iteration, in light of the current state, not according to a predetermined order.

That adaptivity is the source of an agent’s capability and the reason it is harder to reason about than a deterministic system. Because the next action is chosen rather than scripted, the agent can respond to situations its designer never enumerated. It can notice that a prior result came back incomplete and decide to refine its approach before continuing. It can abandon a line of attack that is not working. But the same property means you cannot fully predict the path the agent will take, only constrain the space it chooses within. Designing the reasoning phase is mostly about shaping that space: making the goal legible, making the available actions and their consequences clear, and giving the model enough of the right context that its choice is well informed rather than a guess.

Action and observation are one transaction with two halves

Action and observation are best understood together, because neither is meaningful without the other. In the action phase, the model stops producing prose and instead emits a structured request to do something: invoke an external system, execute code, read from a data store, or commit to a final answer. That request is the point at which the agent reaches past its own text and toward the world. On its own, though, a request changes nothing the agent can use. What makes it part of the loop is observation: the runtime carries out the request, captures whatever comes back, and injects that result into the context the model will read on its next pass.

The coupling is strict, and treating it as strict is a design discipline. An action worth taking returns something the next pass can read; an observation worth keeping is one a decision can actually turn on. Break either half and the coupling stops earning its place in the loop. An action that produces no usable observation is a dead end, the agent has reached into the world and learned nothing from doing so. An observation that the next perception phase cannot interpret is just as bad, it is noise that crowds the context without advancing the task. A large share of agent reliability work lives in this seam: shaping what an action returns so that it lands in context as a clear, interpretable signal rather than as raw, undifferentiated output. The agent’s competence on the next turn is bounded by the quality of the observation it receives on this one.

The loop closes on itself

Here is the distinction that does the most explanatory work. A pipeline is linear: input enters one end, passes through a fixed series of stages, and output emerges from the other. Each stage runs once. There is no path back. The agent loop is not shaped like that, even though its four phases can be listed in order. It is a feedback cycle, because the output of the last phase becomes the input to the first phase of the next iteration. Observation writes a result into context; the next perception reads that result; reasoning chooses an action in light of it; that action produces a new observation; and around it goes.

That closed circuit is the entire reason agents can do things pipelines cannot. Because each pass reads the consequences of the last, the agent can adjust mid-task. A failed action returns as a result the next reasoning phase can answer, by attempting the same thing again, switching tactics, or handing the problem up. Data that arrives in an unexpected shape is, likewise, just another result to fold in. A linear pipeline has no equivalent capacity. If stage three assumes an input that stage two failed to produce, the pipeline either errors out or carries the bad assumption straight through to its output. It has no later stage positioned to notice the problem and adjust, because no stage reads the consequences of an earlier stage’s work and feeds a revised decision back upstream. The feedback loop is what lets an agent absorb the unexpected instead of breaking on it.

This is also where the loop’s characteristic failures come from, and they are failures of the cycle, not of any single phase. The clearest one is the unproductive cycle: an agent that takes an action, observes a result, and on the next turn chooses an action that returns it to the same state, repeating without converging. Nothing is broken in any individual phase. Perception reads correctly, reasoning makes a locally defensible choice, the action executes, the observation comes back. The pathology is in the trajectory: the feedback is not moving the agent toward the goal. You cannot see a problem like this by inspecting one phase in isolation. You see it only by watching the loop’s state evolve across iterations, which is exactly why the loop, not the phase, is the right unit of observation when you are debugging.

Working memory fills as the loop runs, and that is a budget

There is a resource consideration built into the loop that follows directly from its cyclic shape. Each iteration appends to the context the next iteration must read. Actions accumulate as a record of what was requested. Observations accumulate as the results came back. The model’s intermediate reasoning, where it is exposed, accumulates too. The context that perception reads is therefore not fixed. It grows with every turn the loop completes.

This has a consequence that is easy to miss until an agent hits it mid-task. The same context that serves as the agent’s working memory is finite, and a loop that runs long enough, or that takes actions returning large results, will fill it. When that happens, the failure does not announce itself as “out of memory.” It shows up as degraded perception: the signal the agent needs is still technically present but is now competing with the entire accumulated history of the session for the model’s attention, or it has had to be dropped to make room. The loop’s own operation is what consumes the resource the loop depends on. Designing an agent that runs for many turns therefore means designing how working memory is managed across the cycle, deciding what each turn adds, what can be summarized, and what can be dropped, so that perception stays sharp as the loop accumulates. The mechanics of how to do that are their own subject. The point here is structural: the loop spends the very resource it runs on, so the budget has to be designed into the loop rather than discovered when it runs out.

Where the loop ends is a design decision

A loop that only repeats is a loop that never stops. Every iteration of perceive, reason, act, observe has to be followed by an implicit fifth question: continue, or halt? An agent terminates when its goal is reached, when it hits a limit set on it, or when it encounters a condition it cannot handle and must hand off. That decision is not incidental to the loop; it is part of every turn. An agent with no well-designed sense of when to stop is the same unbounded cycle described earlier, now with permission to run forever. The full treatment of termination, how to set limits, when to iterate, when to escalate, is a topic in its own right. What belongs here is only the recognition that the loop is not complete as a design until you have specified how it ends, because “perceive, reason, act, observe, repeat” describes a process with no exit unless you build one.

Designing at the altitude of the loop

The reason to internalize this loop is that it gives you the correct altitude for every activity that surrounds an agent. When you are designing one, the loop tells you what you are actually building: not a prompt that produces a good answer but a cycle that has to keep producing good decisions as its own state evolves underneath it. You are designing what each turn perceives, how its reasoning is constrained, what its actions can reach, what its observations return, and how the whole thing knows when it is done.

When you are debugging one, the loop tells you where to look. A bad outcome traces to a phase: perception that could not find the signal, reasoning that chose poorly given what it had, an action that returned an unusable observation, or a trajectory that cycled without converging. “The agent failed” is not a diagnosis. “Perception degraded once context filled past a certain point” is, and it points at a fix. When you are auditing one, explaining after the fact what it did and why, the loop is the trace. The sequence of perceptions, decisions, actions, and observations is the record of the agent’s behavior, and it is the only honest account of why the system did what it did.

The four phases are simple enough to state in a sentence and deep enough to organize an entire practice around. The model reads its context, decides on an action, the action runs, the result returns to context, and the loop starts over, adjusting with every pass until the goal is met. That cycle is not a description of how some agents work. It is the definition of what an agent is. Build it deliberately, and the rest of the discipline, tools, context, planning, oversight, is the work of making each phase of that loop reliable.