A Chatbot, a Workflow, and an Agent Are Three Different Architectures

Three words get used as if they were a single rising scale of sophistication: chatbot, workflow, agent. The implication is that a chatbot is a simple thing, a workflow is a more capable thing, and an agent is the advanced version of both. That framing is wrong, and the wrongness is expensive, because it leads teams to reach for the label that sounds most advanced rather than the architecture the task requires. These are not three points on one axis of capability. They are three distinct architectures, separated by two structural questions: can the system take action in the world at all, and who decides what it does next. Answer those two questions and the boundaries become sharp. Confuse them, and you will mis-scope systems, misjudge their risk, and argue endlessly about whether something “is really an agent” when the real disagreement is about which machine you built.

The distinctions matter for a practical reason. Each of these architectures has a different failure surface, a different cost profile, and a different set of things that can go wrong without anyone noticing. You cannot reason about a system’s risk until you know which of the three you are operating. So the goal here is not vocabulary for its own sake. It is to draw the lines precisely enough that the name you give a system tells you something true about how it will behave.

The chatbot is a closed loop with no exit to the world

Start with the architecture that looks the most capable and is, structurally, the most contained. A conversational system runs a single cycle: text comes in, text goes out, and the system waits to be addressed again. Everything it does happens inside that exchange of language. It holds no handle on a filesystem, no path to a service, no way to alter a stored record, and no recollection of anything beyond the context it was handed this turn. Strip away the interface and what remains is a generator of text fitted with a turn-taking wrapper.

None of this is a complaint about the architecture. A great deal of useful work, conversation, explanation, drafting, reasoning over what the model already holds, asks for nothing more than this, and bolting on machinery it does not need would be a mistake. The point is only to be precise about where the boundary sits. A respond-only loop has no path to act. As soon as a task asks the system to do something rather than say something, to fetch a fact it was never given, commit a change to a system of record, or set a downstream process in motion, the architecture has run out of room. Improving the model does not extend that room. A stronger model writes better text inside the same sealed loop. It does not cut the loop an exit.

This is why “the model got smarter” never, by itself, turns a chatbot into an agent. The limitation is not the model’s intelligence. It is the shape of the loop the model is running inside.

A capable model is still not an agent

The instinct to treat a powerful language model as an agent comes from watching it describe, in fluent and correct detail, exactly what should be done about a problem. If it can lay out the plan that precisely, surely it can carry it out. It cannot, and the reason is foundational rather than incidental.

Whatever a language model produces, it produces as text, and that is the whole of its native capability. A single inference pass does not reach a filesystem, run code, touch a datastore, or register anything that shifts in the world outside the call itself. Each pass also begins cold. Nothing carries forward between calls, nothing it generates leaves a mark on an external system, and nothing returns to it from whatever its text described. A model can compose a flawless account of an action down to the last step. It has no way to perform the action it just described.

This boundary is worth naming, because once you have a name for it you start seeing it everywhere. Call it the gap between what the model generates and what actually happens in the world. On one side is text: plans, decisions, intentions, all expressed as output tokens. On the other side is the world: files, services, databases, state that changes when something acts on it. A raw model lives entirely on the text side of that gap. It can reason about crossing it and produce a perfect account of what crossing it would involve, but it possesses no bridge.

Everything that distinguishes an agent from a conversational system is, at bottom, about building that bridge and deciding who controls the traffic across it.

Tool use is the bridge across the gap

Crossing from text to action requires a runtime layer wrapped around the model. The mechanism is tool use, and its logic is simple to state. The model is told, at inference time, what actions are available to it and how to request them. When the model determines that an action is needed, it does not emit prose. It emits a structured request: the name of the action and the arguments it wants supplied. The runtime catches that request before it goes anywhere, performs the actual operation it names, the file read, the service call, the lookup, takes hold of the result, and returns it into the model’s context as something the model can read on its following turn.

That cycle is the bridge. The model proposes an action in the only currency it has, tokens, and the runtime converts those tokens into a real effect in the world and converts the world’s response back into tokens the model can perceive. The gap between generation and action is closed not by making the model more capable but by surrounding it with machinery that executes what it requests and reports back what happened.

Notice what tool use does and does not establish. It gives the system the ability to act, which the closed conversational loop lacked entirely. That is necessary for agency. It is not sufficient. A system can have tools and still not be an agent, because the ability to act says nothing about who decides which actions happen and in what order. That second question is the one that actually separates a workflow from an agent.

The dividing line is who controls the sequence, and when

Once a system can act, the architecturally decisive question is where its control flow comes from. There are two possibilities, and the difference between them is the whole distinction between a workflow and an agent.

Take the deterministic workflow first. Here the answer to the second question is fixed in advance: the author owns the sequence. Every step, every branch, every loop, every test that sends the system down one path instead of another is laid down as code before a single request arrives. A language model can carry real weight inside that frame. It can read an input and label which branch applies, write the text a given step calls for, pull structured fields out of a document, condense a long passage to its core. What it cannot do is choose what follows. The route was drawn before runtime; the model supplies the contents that travel it.

Now flip the answer to the second question. In an agentic system the sequence is built at runtime by the model itself. It reads where things stand, picks a move, watches what that move returns, and picks again, each decision shaped by what the ones before it surfaced. No one wrote the order down, and no one could have, because it is assembled in the moment from the specific situation in front of the system. That is the property that names an agent: the model, not an author, deciding what comes next while the system runs. Present or absent, it is the single fact that settles which of the two architectures you are holding.

Everything people find compelling and everything people find frightening about agents follows from this one property. The capability to handle tasks whose steps could not be scripted in advance comes from runtime control of the sequence. So does the difficulty of predicting, testing, and auditing the system, because a sequence constructed live is a sequence no one wrote down and no one can fully anticipate.

The lines blur because involvement looks like control

If the distinctions are this clean, why do they get confused so reliably? Because the surface signals mislead. Two patterns in particular fool people into calling a workflow an agent.

The first is a model appearing at every step. When a system calls a language model repeatedly, threading the output of one call into the next, it has the texture of something thinking its way through a problem. But a fixed chain of model calls, where the author decided that step one feeds step two feeds step three, is a workflow no matter how many model calls it contains. The model is doing local work at each station on a route the author laid down. Frequent involvement of the model is not the same as the model controlling the flow. A system can call a model a dozen times and remain fully deterministic, because not one of those calls chose what came after it.

The second is intricacy passing for autonomy. A task thick with conditions and edge cases, drawing on more than one source of data, feels like it must have something deliberating at its center. But intricacy and autonomy are independent properties. A system can be elaborate and still be scripted end to end. Consider a step that examines each incoming case and dispatches it to whichever of a few dozen handlers fits: the discrimination can be genuinely subtle, and a model can be the thing performing it, yet the decision to route here, and the set of places routing can send things, were both fixed by a person. A model applying fine discernment within a structure it did not design is one thing. A model designing the structure is another. The first is a workflow that happens to contain a model. Only the second has become an agent.

These confusions are not pedantic to correct, because they cause real misjudgment of risk. A team that believes it has built an agent, when it has actually built a deterministic chain of model calls, will over-engineer its safeguards against runtime unpredictability it does not have. A team that believes it has built a controllable workflow, when it has actually delegated sequence decisions to the model, will under-build the safeguards it genuinely needs. Naming the architecture correctly is the first step in defending it correctly.

Autonomy is a spectrum, and most real systems are hybrids

Drawing the line between workflow and agent cleanly does not put every system tidily on one side of it. Who-decides-the-sequence is better read as a dial than a switch. Turn it all the way down and you have a pipeline whose every move is authored, the model lending nothing but content. Turn it all the way up and you have a system given an objective and trusted to work out each step on its own. Most production systems sit somewhere along the dial rather than at either stop, and the durable shape tends to be a fixed structure with one or two points where real runtime judgment is deliberately handed to the model and the remainder stays under authored control.

This shifts the question worth asking about any system. The binary, is this an agent or a workflow, usually has no clean answer and rarely repays the argument. The better question points at individual decisions: for each choice the system makes, was it resolved by the author ahead of time or left to the model in the moment? Every system you can name has a precise answer to that, and the answer maps exactly where the unpredictability lives, where risk gathers, and where your attention is owed. A system might be ninety percent authored with a single contained pocket of autonomy, and knowing precisely where that pocket is matters far more than the label you put on the whole.

The same lens clarifies where the chatbot sits on this map. It is not a low-capability point on the autonomy spectrum at all. It is off the spectrum entirely, because the spectrum measures who controls action, and the chatbot cannot take action in the first place. Adding tools to a conversational system is what moves it onto the spectrum, by giving it an exit to the world. Where it then lands depends entirely on whether an author or the model decides when those tools fire.

Each architecture commits you to a different operational reality

The reason to get these three architectures straight is that each carries a different operational reality, and the name is the fastest way to know which reality you are signing up for.

A conversational system is bounded and safe by construction, because it cannot act. Its failure mode is a bad answer, contained within the conversation, with no external consequence. A deterministic workflow can act, but only along paths an author defined, so its behavior is predictable across runs, its failures localize to nameable steps, and its cost and latency can be estimated before it ships. An agent can act and decides its own path, which buys it the ability to handle tasks nothing else can express, at the price of behavior that varies run to run, failures that can originate anywhere in a self-constructed chain and compound before they surface, and cost that is driven by planning the system does for itself rather than work you scheduled.

These are not gradations of the same risk. They are different risks in kind. Choosing among them is the most consequential architectural decision you make about a system, and it is a decision, not a default. The right instinct is to choose the least autonomous architecture that can actually express the task. Use the closed conversational loop when the task is genuinely conversation and never needs to touch the world. Use an authored workflow when the system must act but its steps can be specified in advance, which is more often than the prevailing enthusiasm assumes. Reserve real autonomy for the cases where the path to the goal genuinely cannot be enumerated ahead of time, and even then, contain it to the specific decisions that require it rather than handing the whole system over to it.

A chatbot, a workflow, and an agent are not a beginner, an intermediate, and an expert version of the same idea. They are three machines that do different things, fail in different ways, and cost different amounts to run and to trust. The engineers who build well are the ones who can say precisely which machine they are building and why, and who pick it because the task demanded it, not because of how the name sounds in a meeting.