Every Ambiguous Instruction Hands a Decision to the Model

The largest single lever over the quality of a model’s output is not the model, the sampling settings, or the length of the prompt. It is how completely the instruction defines what a good answer would be. An instruction that leaves a decision open does not leave it unmade. The model makes it, silently, on your behalf, and you find out which way it went only when you read the result. Specificity is the discipline of making those decisions yourself, in writing, before the model ever runs. Ambiguity is the practice of delegating them blindly and paying for the delegation later.

This is worth stating sharply because ambiguity rarely announces itself. A vague instruction does not fail loudly. It produces something plausible, something that survives a quick read, and the gap between what you wanted and what you got hides inside that plausibility. The cost is real, it compounds across every iteration, and it is almost entirely deferred to the moment you can least afford it.

A model resolves ambiguity instead of surfacing it

The defining property of an ambiguous instruction is that it admits more than one reasonable reading. The defining property of a model is that, faced with more than one reasonable reading, it does not stop to ask which you meant. It selects the interpretation it judges most likely and proceeds as if that interpretation were the only one. Nothing about that behavior is a defect. This is the model operating as intended, not malfunctioning. The problem is that the selection happens out of view, and neither side registers that a fork existed at all.

This is what separates ambiguity from an outright error. An error produces something visibly wrong, which is the easy case, because you can see it and fix it. Ambiguity produces something quietly off-target that still looks correct. The output is coherent, well formed, and responsive to the instruction as the model read it. It simply answers a slightly different question than the one you were asking. You cannot catch that by checking whether the output is good in isolation. You can only catch it by comparing the output against an intent that lives in your head and was never written down, which is precisely the comparison that is hardest to perform reliably.

Issue the identical instruction on two separate runs and the outputs frequently diverge, sometimes sharply. That spread is not just the sampler at work. It is a readout of how much room the instruction left for the sampler to move in. A prompt that produces the same answer every time has closed its interpretation gaps. A prompt that wanders has left them open, and the width of the wander measures the size of the gap. Treating output variance as a property of the model rather than a property of the prompt is one of the most common and most expensive misdiagnoses in this work.

Specificity measures how much of the answer is already settled

It helps to be precise about what specificity actually is, because the word is often used loosely to mean “longer” or “more detailed,” and it means neither. Specificity measures how much of the answer the instruction has already decided. A maximally specific instruction admits essentially one correct answer. A maximally ambiguous one admits a wide field of answers, all defensible, most of them not what you had in mind. Every decision an instruction leaves open is a dimension along which the output is free to vary, and the model will exercise that freedom whether or not you intended to grant it.

Consider what an instruction as ordinary as “summarize this document” actually leaves undecided. Nothing in it fixes how long the result should run, whether it should arrive as prose or as a structured list, who the reader is, or which points are important enough to keep once the rest is dropped. Each of those is a real choice, and the model makes every one of them without telling you, substituting its judgment for yours at four or five points you never saw. It reads as one request. It is really four or five separate decisions bundled under a single verb.

The remedy is not to write more words. It is to close those questions. An instruction that names the length, the format, the reader, and the elements that must appear has resolved the same decisions in advance, and the output collapses toward predictability because the space of acceptable answers has collapsed with it. Each added constraint narrows that space along a different axis, and the constraints compound rather than merely accumulate. A stated format settles structure. A stated length settles scope. A named reader settles register and the depth of explanation. Because each governs a different axis, they stack instead of overlapping, and the result is not a longer prompt so much as a smaller target. A smaller target is what consistent output requires.

This also clarifies the most useful diagnostic question to ask when an output disappoints. The instinct is to ask what the model did wrong. The better question is which decision the instruction forced the model to make for you. Most disappointing outputs are not the model failing to follow instructions. They are the model faithfully following an instruction that contained a gap, and filling that gap with a guess. Find the gap, name it, close it. That is usually a faster route to what you wanted than any amount of re-rolling the same vague prompt and hoping for a better draw.

Stating what to do beats stating what to avoid

There is a particular form of ambiguity that hides inside instructions that feel specific, and it deserves its own treatment: negative framing. Telling the model what not to do feels like a constraint, and in a narrow sense it is, but it constrains the wrong thing. “Don’t be too technical” removes one region of the output space and says nothing about where in the remaining space the answer should land. “Avoid long responses” rules out one end of a range and leaves the rest undetermined. Negation prunes. It does not point. The model still has to decide what the output should positively be, and you have given it no help with that decision, only a fence around one corner it should stay out of.

Positive instruction does the opposite. It names the operation you want performed. A verb like classify or rank carries a recognizable shape the model can execute directly, instead of leaving it to reverse-engineer an action from a list of prohibitions. The reliable pattern is to state the positive goal first, then layer constraints onto that goal, rather than to circle the goal with a ring of negations and hope the model infers the center. When you catch yourself writing a string of “don’t” clauses, it is usually a sign that you have not yet decided what you actually want, only what you fear, and the fix is to make the positive decision the negations were standing in for.

A related discipline is to hold each instruction to a single objective. An instruction that bundles several distinct objectives forces the model to divide its attention across them, and the quality of each one degrades in proportion to how many it is competing with. This is the seam where clarity gives way to a separate concern, the decomposition of a task into ordered steps, which is its own subject. The point that belongs here is narrower: even before you reach for decomposition, a single instruction carrying a single clear goal is easier to make specific than one carrying three, because there is only one thing whose acceptable outputs you have to pin down.

The savings are immediate and the bill is deferred

The reason ambiguity persists, despite being so easy to describe, is that its cost structure is perfectly designed to escape notice. Writing a vague instruction is fast and feels productive. The instruction is short, it reads sensibly, and it produces output immediately. All of the savings land at writing time, where they are visible and gratifying. All of the costs land later, at review time, where they are diffuse and easy to misattribute. The accounting runs exactly backwards: the work is cheap precisely where you are paying attention and expensive precisely where you are not.

The heaviest of those later costs is debugging, and it is worth seeing why it weighs so much. A disappointing output cannot be diagnosed by reading what came back, because what came back is internally coherent. You have to reconstruct the assumption the model must have made, settle the decision you never made explicit, and re-encode it in the instruction. Underneath that sits a quieter tax: an underspecified prompt returns a distribution of behaviors rather than a fixed one, so you cannot form stable expectations about it or trust it as a component of anything larger, and a result that read as acceptable today may not tomorrow. Each of these is cheap to absorb once. Across the dozens of iterations any real system accumulates, the deferred cost dwarfs whatever was saved by writing quickly at the start.

There is a compounding effect on top of the per-iteration cost. An ambiguous instruction does not just cost more to fix once. It resists being fixed, because each rewrite that fails to identify the actual gap produces a new output that is differently wrong, which sends you back to guessing. Precision at writing time is not merely cheaper in total. It converts an open-ended search into a closed one. You stop sampling from a distribution of plausible-looking results and start verifying against a target you defined. That shift, from searching to verifying, is the real return on specificity, and it is why the time spent closing gaps up front is almost always recovered several times over.

The dimensions that actually close gaps

Turning a vague instruction into a precise one is more tractable than it sounds, because the gaps tend to fall along a small number of recurring dimensions, and naming any of them closes a real category of variance.

Audience is the quietest lever and often the most powerful. Telling the model who the result is for fixes vocabulary, tone, depth, and the amount of background it is entitled to assume, every one of which it would otherwise invent from nothing. Format is the most mechanical and tends to pay back fastest: declaring prose, a list, a table, or structured fields collapses an entire axis of interpretation in a few words, and structure left unstated is one of the biggest sources of output you end up reshaping by hand. Length governs scope, and it does more than cap size; it forces a decision about what earns a place and what gets cut, rather than letting the model emit whatever volume feels proportionate. Perspective sets the point of view, a grammatical person or a named role the answer should come from, and it settles a surprising amount of stance and register at once.

These are independent levers. Any one of them narrows the output space on its own, and applying all of them to a single instruction is not over-engineering. It is closing the four most common sources of variance in one pass, which is the cheapest moment to close them.

Specificity has a ceiling, and the failure modes on the other side

The argument so far runs entirely in one direction, and a senior reader will rightly ask where it stops. It stops in three places, and recognizing them is part of using specificity well rather than mechanically.

The first is exploration. When you do not yet know what you want, premature specificity is not rigor; it is a guess you have frozen into a requirement. If the goal is to see the range of what a model might produce, to discover framings you had not considered, or to let the output surprise you in a useful way, constraining it hard defeats the purpose. Specificity is the right tool when you have a target and want to hit it reliably. It is the wrong tool when you are still deciding what the target is, and forcing it early simply locks in your first guess as if it were a decision.

The second is the difference between specifying form and specifying substance. It is entirely possible to write an instruction that is tightly constrained along every dimension above and still produces a confidently wrong answer, because the constraints governed the shape of the output and not its content. A precisely formatted summary of a document the model misunderstood is still a misunderstanding, now harder to spot because it looks authoritative. Specificity about format, length, audience, and perspective closes interpretation gaps. It does not supply knowledge the model lacks or context the prompt withheld. Treating it as if it could is a way of polishing the wrong thing, and the polish makes the underlying error easier to miss, not easier to catch.

The third is the point where stacking constraints onto one instruction stops being precision and becomes a symptom. If an instruction needs five or six constraints to be even roughly reliable, the constraints are often compensating for a task that is doing too much in a single step. Past a certain density, adding another clause yields less than splitting the instruction would, and the constraints begin to interact in ways that are themselves a source of ambiguity, as when two of them quietly pull in opposite directions. The signal to watch for is the feeling of trying to make one prompt airtight by piling on qualifications. That feeling is usually telling you the unit of work is wrong, not that you need one more qualification. Where that line falls, and how to decompose past it, is the subject that picks up where this one ends.

Specificity is the cheapest reliability you can buy

The throughline is a single accounting fact. Every gap an instruction leaves open is a decision the model will make for you, invisibly, and the bill for that decision comes due at review time, when it is most expensive to pay and hardest to attribute. Closing the gap in advance moves the cost to writing time, where it is small, visible, and paid once. That trade, a little deliberate effort up front against a large diffuse cost later, is almost always worth making, and it is available on essentially every instruction you write.

The practical form of the discipline is a habit, not a rule. Before running an instruction, ask whether a careful reader could take it more than one way, and if so, which decisions you are about to hand off. Name what the output should include and where it should stop. State the goal positively and let the constraints follow it. Then run it, and when the result disappoints, resist the urge to blame the model and ask instead what you left for it to assume. Specificity will not make a model more capable than it is. What it will do is ensure that the model spends its capability answering your question rather than a plausible neighbor of it, which, more often than not, is the entire difference between output you can rely on and output you have to second-guess.