When a Model’s Output Becomes a Contract Your System Can Trust
There is a difference between asking a model for JSON and guaranteeing that what comes back conforms to a schema. It looks like a formatting detail. It is actually one of the most consequential architectural decisions you make when you put a language model inside a larger system, because it determines where the risk of malformed output lives, what failure modes your pipeline has to absorb, and how much defensive code everything downstream is forced to carry.
The decision turns on a single distinction: best-effort structure versus guaranteed structure. Asking for JSON in the prompt is best-effort. The model has been trained to produce well-formed output and usually will, but nothing about the generation process enforces the shape you asked for. Constraining generation to a schema is a guarantee. The output cannot violate the structure, because the mechanism that produces it is physically incapable of emitting a token that would break the schema. Treating these two as interchangeable, as if constrained generation were merely a more reliable version of a well-written prompt, is the error. They are different categories of control, and the gap between them is where production systems quietly break.
Two ways to get JSON, and only one of them is enforcement
The weak way is to instruct the model, in plain language, to respond with JSON. You describe the fields you want, perhaps show an example, and parse whatever returns. This solves exactly one problem: syntax. With a clear enough instruction, the model will usually emit something that parses. But parseable and correct are not the same property. A prompt-only request says nothing binding about which fields are present, what types they hold, or whether a key your application depends on actually exists. The model settles the structure for itself as it generates. Picture a record your billing logic keys on: it expects a field your code spells one way, and the model spells it another, or spells it correctly on most inputs and drops it on the one where it found nothing to populate it with. No layer sits between that choice and your parser. The text is valid. What reaches the next stage is a shape your code never agreed to.
The strong way is constrained decoding. The schema stops being advice in the prompt and becomes a grammar the decoder is compiled against: at each step, any token that would carry the output outside the schema is removed from the set the model can sample. A required key cannot go missing, a name cannot come back misspelled, a typed value cannot arrive as the wrong type, because no path through generation produces those results. State the guarantee precisely, because the precision is the whole point. Constrained decoding binds the full contract, every field name, every type, every required key, not merely the property that the bytes parse as JSON. Parseability is the floor a prompt can reach on a good day. Conformance to the declared shape is a ceiling only enforcement reaches.
Hold the two apart this way. A prompt-level request can clear the syntax bar and then leaves the contract entirely to you to check after the fact; enforcement clears the syntax bar and the contract in the same motion. Every consequence that follows is an unfolding of that one difference.
The guarantee does not erode as the schema grows
The intuition to dislodge is that enforcement is just a sturdier prompt, the same tactic with a higher success rate. It is a different kind of claim altogether. A carefully written prompt raises the odds of the right structure. A grammar removes the odds from the question. Non-conforming output is not made unlikely; it is made unreachable.
This is why the strength of constrained generation does not degrade as your schema grows more demanding. Prompt-only reliability erodes as you add fields, nest objects, introduce typed arrays, or require strict enumerated values. Each additional constraint is one more thing the model has to track and respect from instructions alone, and the probability of getting all of it right on every call falls as the surface area grows. Constrained decoding has no such erosion, because compliance is enforced at the token level regardless of how elaborate the schema is. A schema with thirty nested fields and three enumerated types is exactly as guaranteed as a flat object with two strings. The cost of the schema’s complexity moves out of the reliability budget and into the design work of writing the schema well, which is a problem you can solve once rather than gamble on every request.
Where prompt-only JSON quietly fails
The failures of best-effort structure are not dramatic. They are small, intermittent, and easy to miss in testing, which is exactly what makes them dangerous in production. Three failure modes recur, and constrained decoding eliminates all three.
The first is the quietly dropped field. Hand the model an input that gives it nothing to put in a required slot, and it may resolve the tension by leaving the slot out. No exception fires, no warning prints. You get back a well-formed object that happens to be missing a key the rest of the system treats as guaranteed. The gap shows up much later and somewhere else, as a null dereference or a branch the code was never meant to enter.
The second is type drift. The key is present and correctly named, but the value comes back in the wrong type: a count rendered as text, a flag rendered as the word for it rather than an actual boolean, because nothing pinned the type down and the surrounding content nudged the model toward a different reading. The object parses cleanly. The mismatch travels until some operation assumes the declared type and breaks on the real one, usually well downstream of where it originated.
The third is drift in the shape itself from one call to the next. Nothing holds the structure fixed, so the same instruction over different inputs can return subtly different layouts. The data that came back flat on Monday comes back nested a level deeper on Tuesday, not because the request changed but because the content did. A consumer compiled against the layouts you saw in development meets one you did not.
None of this matters when the application is forgiving and a misshapen result costs nothing to discard. It matters enormously when a later stage branches on whether a field is there, or reads a type and acts on it. There the three failures stay invisible individually; they collect, call after call, across the spread of real inputs, until the accumulation finally trips something a user can see. This is the class of defect that sails through a demo and surfaces only at volume.
The reliability hierarchy
There is a clear ordering among the mechanisms for getting structured output, and knowing where each sits tells you which one to reach for given how strict your requirements are.
Constrained decoding sits at the top, alone, because its enforcement is structural and structural enforcement leaves no room for a non-conforming result. When the capability is exposed through more than one interface, every interface runs the same enforcement underneath, so they earn the same place in the ordering. Picking among them is a question of which fits the task, never a question of which is safer.
A step down sit format instructions written into the prompt, a request for JSON or for some named shape with nothing binding generation to it. Their hit rate is real but conditional: it rises as the asked-for shape gets less ambiguous and as the pattern gets easier for the model to infer, and it offers no floor at the moment of generation. Treat this tier as a capable fallback, not as a control you can lean a contract on.
At the bottom are worked examples standing on their own. Show the model a few instances of the shape you want and it will generalize from them and match it much of the time, but nothing flags the times it does not. Examples sharpen any of the approaches above them; relied on by themselves to hold a structure, they are the weakest instrument on the list.
The ordering is not an abstract quality ranking. It is something to decide with. The less a downstream stage can absorb a malformed result, the higher up the list you have to commit to.
Two interfaces, one guarantee
Constrained decoding usually reaches you through one of two interfaces, and seeing that a single enforcement mechanism sits behind both keeps you from inflating the choice between them.
One form returns a single structured object directly: you supply a schema, and you get back an object that conforms to it. There is no tool plumbing and no function-call indirection, just a schema in and a conforming object out. This is the natural default when all you want is one structured result from the model.
The other comes through the tool-calling, or function-calling, interface. That interface exists so a model can hand an external function a fully formed argument object, but the same machinery serves perfectly well when nothing on the other side ever runs. You declare a tool whose parameter list is the schema you want populated, the model issues the call, and you read its arguments as your result. Execution never happens. The act of calling is the payload. This form pays off when the task is itself a choice or an assembly: the model picking which of several outputs to produce, or building a result out of more than one tightly scoped schema, which is the shape the interface was designed around.
One caveat traps people here. The tool-calling interface is best-effort by default. Out of the box, argument shapes are honored by training and habit, reliably but not certainly, and the certainty only arrives once you opt into the strict variant that compiles the schema into a grammar. Flip that on and you get the identical token-level enforcement the direct-object form already had. Leave it off and you are trusting the model’s manners, not the decoder’s constraints, which is the exact thing you were trying to escape if you reached for tool calling because the shape had to be guaranteed. The general lesson: enforcement comes from constrained decoding being engaged, not from the interface you happened to engage it through.
So the decision between the forms is purely a fit question. Let the task choose: when the work is selection or composition across several shapes, the tool-calling form was built for it; when you want a single object back with the least apparatus around it, the direct form is cleaner. The guarantee underneath is identical, so neither one is the safer pick.
When best-effort is the right call
Constrained decoding is the right default whenever the schema matters, but it is not the only defensible choice, and reaching for it reflexively can be its own mistake. Three situations genuinely call for the lighter, prompt-only approach.
The first is availability. When the runtime you are building on exposes no constrained-generation primitive at all, the strongest instrument within reach is a precise format instruction backed by validation as the output is parsed. A guarantee you have no way to invoke does nothing for you.
The second is work that has not settled. Early on, while the shape is still moving under you, a formal schema is overhead spent on a target you will redraw next week. Pinning a rigid contract onto a structure still in flux buys certainty you have no use for yet and charges you maintenance for the privilege.
The third is output that is genuinely shallow and cheap to get wrong. A flat object with a couple of plain fields, an obvious shape the instructions make hard to misread, and a recovery path that costs nothing when a result comes back malformed, is a case where a prompt-level request carries its own weight. Bolting on enforcement would buy schema-definition work and no protection you actually need.
The common thread is that these are the cases where structural variation is tolerable or recoverable. The moment a missing field or a wrong type causes a downstream failure that is expensive, silent, or hard to trace, that tolerance disappears and best-effort stops being a sufficient control. The skill is recognizing which case you are in, and catching the moment a system crosses from the forgiving case into the unforgiving one while its tooling still assumes the forgiving one.
The architectural payoff
The reason this distinction rises to the level of system design, rather than staying a local implementation detail, is that the guarantee changes where validation has to live and how much of it you need.
With best-effort output, every consumer of the model’s result has to assume the result might be malformed. Validation, type coercion, presence checks, and fallback handling spread outward from the model call into everything that touches its output. The model’s response is untrusted input, and untrusted input has to be guarded everywhere it flows. That defensive code is a tax paid on every integration point, and it is the kind of code that is easy to get subtly wrong because it handles cases that rarely fire.
With a structural guarantee, the model’s output stops being untrusted input and becomes a contract. The shape is known. Downstream code can be written against the schema as a fact rather than a hope, which removes an entire category of defensive handling and lets the structure of the data carry meaning the rest of the system can rely on. You still validate semantics, because a schema-conforming object can be confidently, structurally wrong, and no grammar will catch a correctly-typed field that holds the wrong value. But you no longer validate structure, because structure is no longer in question. That is a real and permanent reduction in the surface area where things can fail.
Choosing between best-effort and guaranteed structure is therefore not a choice about output formatting. It is a choice about where you are willing to absorb the risk of the model getting the shape wrong: distributed across every downstream consumer as defensive code, or eliminated at the source as a generation-time guarantee. Framed that way, the decision is the same one you make everywhere else in system design, which is whether to enforce an invariant once at a boundary or to re-check it everywhere it matters. When the schema is non-negotiable, you enforce it at the boundary, and constrained decoding is what makes that boundary real.
