A Tool Schema Steers the Call Before It Ever Guards It

Once a model has chosen a tool, one artifact decides whether the resulting call is any good: the input schema. It is read at two different moments by two different readers. The model reads it while it writes the arguments, and your system reads it when it decides whether to accept them. Most teams design a schema entirely for the second reading, as a gate that rejects malformed input, and treat the first reading as an accident that happens to work out. That is backwards. The schema’s most valuable work happens at generation time, before there is anything to validate, because the same declarations that let your code reject a bad call are what stop the model from producing one in the first place. A schema written only to guard the boundary leaves the harder half of the job undone.

The reframe that makes this tractable is to stop thinking of the schema as a passive description of accepted input and start thinking of it as an active instruction to the thing generating that input. Every property you declare, every constraint you attach, every name you choose is in the model’s context at the moment it constructs the call, shaping the arguments as they are written. Precision there is the single largest lever on tool-call reliability, larger than any amount of validation logic downstream, because validation can only reject what the model already got wrong, while the schema can keep it from going wrong at all.

The model reads the schema while it writes the arguments

The mechanical fact underneath all of this is that the model builds each argument by reading the schema as it generates, in this call, right now. A parameter declared as nothing more than a string is a parameter the model fills with whatever value looks plausible given the surrounding conversation. Add a description, a format hint, a pattern, or a bounded range, and you narrow the space of things it can plausibly write. The declaration is not documentation the model consults if it feels uncertain. It is part of the prompt the model is answering, and it exerts its influence whether or not you intended it to.

This is also where a dangerous assumption tends to live. It is easy to believe that because you declared a field as an integer, or listed it in the required set, the platform will hand you a conforming call or refuse to make one. By default that is not what happens. The declared schema guides generation, but it does not bind it. The model can still emit a string where you asked for a number, or omit a field you marked required, because ordinary generation is probabilistic and the schema is a strong suggestion rather than a hard rail. Conformance becomes a guarantee only under constrained decoding, where generation is forced through the schema itself and the sampler is never offered a token that would break it. Absent that, accepting a tool call without validating it in your own code is a mistake, and it is the most common way a schema that looks airtight lets bad arguments through.

So the schema plays two separable roles, and keeping them distinct clarifies everything after. As a generation-time guide, it is always in effect, always shaping what the model writes, and the quality of that shaping depends entirely on how specific the declarations are. As an enforcement mechanism, it is in effect only when something actually checks it: either constrained decoding at the platform level, or a validation pass in your own code before the tool runs. Design the schema for both readings. Write it precisely enough that the model rarely produces a bad call, then validate it anyway, because “rarely” is not “never” until the grammar is enforcing it.

Names are the first thing the model reads

A parameter’s name is not a mechanical handle the way it is in code, where the compiler cares about the binding and the string could be anything. To the model, the name is a semantic signal, often the first one it uses to decide what value belongs in the field. A field called id could reasonably hold a user identifier, an order number, a product key, or a session token, and the model will pick whichever the surrounding context makes most available, which is not necessarily the one you meant. Name it customer_id and the field can only mean one thing, so the range of plausible-but-wrong fills is gone before the model reaches it. That is the whole value of a specific, domain-meaningful name: it narrows a wide set of confident guesses down to the one you actually want.

The anti-patterns are worth naming because they recur in almost every schema that produces mysterious argument errors. Generic names like data, input, or value carry no information about purpose, so the model has nothing to work from but the conversation, and it guesses. Overloaded names, where the same token means one thing in one tool and something else in another, are worse than uninformative, because the model accumulates associations across a session and carries them into places they do not belong. Cryptic abbreviations force the model to reconstruct meaning from a string that was compressed for a human who already knew the domain, and it will sometimes reconstruct the wrong meaning. Each of these produces the same failure: a call that is structurally valid, passes a shape check, and carries content that is quietly incorrect. That failure is more expensive than a rejected call, because nothing flags it until the wrong value has already done its work downstream.

A parameter description steers what fills the field

Descriptions on individual parameters are read at generation time in exactly the way the tool-level description is read when the model is choosing which tool to call. The difference is what they steer. The tool description decides which tool wins the request. The parameter description decides what goes into a field once that tool has been chosen. Treating a parameter description as human-facing documentation, the kind you write for whoever maintains the code, wastes its most important function.

The contrast is easiest to see at the extremes. A description that reads “the user’s identifier” is technically accurate and nearly useless, because it tells the model the field holds an identifier without saying which one, where it comes from, or what form it takes. A description that reads “the authenticated user’s identifier taken from the current session, a thirty-six character hyphenated hexadecimal string” names the source, fixes the format, and pins the length at once, leaving very little room to write the wrong thing. The gain is largest exactly where it matters most, on fields whose format has to be precise or the call fails outright. A description is aimed at the thing generating the value, not at whoever maintains the code, and the fields that most need one are those where a plausible guess is indistinguishable from a correct value right up until it breaks.

Every required field is a new way for the call to fail

The required set is where schema design does its most consequential work, and where the instinct to be thorough does the most damage. Marking a field required is not a statement about how important the field is. It is a promise that a value for it will be available in every context from which the tool can be called. When that promise holds, the field belongs in the required set. When it does not, you have built a tool that fails whenever it is invoked in a context that lacks the value, and the failure is often not a clean one. A model that cannot source a required value rarely declines the call; it supplies something shaped like the answer. That is strictly worse than a missing argument, because the invented value clears every structural check and reaches your tool untouched, where it produces a confidently wrong result instead of an honest error.

The discipline that follows is to keep the required set as lean as the tool’s actual dependencies. Required status is earned, not assigned: reserve it for the one or two values the tool cannot compute for itself, default around anything it can produce a sensible result without, and where a field is context that only sometimes exists, make the code that reads it responsible for its absence. A lean required set widens the range of contexts in which the model can call the tool successfully, which is the same thing as saying it reduces the number of situations that push the model toward hallucinating a value. Every field you move out of the required set is one less promise you have to keep on every call.

The corollary is that optional fields carry an obligation of their own. An optional parameter is a declaration that the tool behaves correctly when the field is absent, and that behavior has to actually exist. Either the schema declares a default, or the tool implementation handles the missing value deliberately. An optional field with no default and no handling is not flexible, it is a latent crash waiting for the first call that omits it.

Enums turn a freeform field into a bounded decision

When a parameter accepts only a fixed set of values, declaring those values as an enumeration changes what the model is doing from generation to selection. Instead of writing a string and hoping it matches what your code expects, the model chooses from the list you supplied, and the field becomes deterministic. The value of this is easiest to see on the fields that route. A priority field left as a plain string invites every variation the conversation happens to suggest, and code that dispatches on exact values breaks the moment one arrives that it did not anticipate. Constrain the same field to a small enumeration and the model can only return one of the values you listed, so the downstream code that would have normalized the string is no longer needed and an entire class of argument errors stops existing. One change to the schema removes work from your code and removes a failure mode from the system at once.

Two kinds of field earn an enum. One is a routing value, where the point is that the string stays stable enough for a dispatcher to switch on it. The other is a label the model assigns when it sorts an item into one of a closed set of categories, where the fixed list is what keeps those labels consistent enough for downstream code to trust. Both depend on the valid set being small and stable, and that dependency is also the boundary. An enumeration works well for a handful of values that rarely change. Once the set runs to hundreds of members, or churns frequently, an enum becomes a liability: it bloats the schema the model has to read on every call, and it goes stale the moment the real set of valid values moves past the one you hardcoded. Large or volatile value sets belong behind a different validation strategy, checked against a live source rather than frozen into the schema.

Constraints move errors off your backend and out of the model’s reach

Beyond names, descriptions, and the required set, the schema vocabulary offers a set of constraints whose combined effect is to shrink what the model can generate before anything reaches your code. Length bounds keep a string field inside a sane range. A pattern constraint requires a value to match a regular expression, which is the sharpest tool available for identifiers, codes, and any string whose form is fixed. Format hints attach a semantic meaning to a string, signaling that it represents a timestamp or an address rather than arbitrary text, and that meaning is read by both the model generating the value and any validator checking it. Numeric bounds hold a page size, a score, or a retry count inside the range the downstream logic can actually handle, catching a value that would otherwise fail three layers deeper. And a rule that forbids undeclared fields locks the call to exactly the shape you defined, refusing any extra property the model might invent, which is worth reaching for whenever you need tight and predictable input.

Each constraint does double duty. It narrows the model’s generation surface, so fewer malformed values are produced, and it gives your validation something concrete to enforce, so the malformed values that do slip through are caught at the boundary instead of in the backend. The reason to add them is not primarily defensive. It is that a constraint is another piece of the instruction the model reads while it writes the call, and a well-constrained field is one the model finds it harder to fill incorrectly.

Constraints encode assumptions, and wrong assumptions turn brittle

The failure mode on the other side is over-constraint, and it is real enough to take seriously. Every constraint you add is an assumption about the domain frozen into the schema, and an assumption that is slightly wrong becomes a call that is reliably broken. A pattern that is very nearly right will reject the legitimate value that falls just outside it, and because the pattern also shapes generation, the model may contort its output trying to satisfy a rule that should not have applied, in the same way an over-broad required set pushes it toward fabrication. Bounds set too narrow reject valid extremes. An enumeration that omits a value the domain actually uses forces the model to choose a wrong label because the right one is not on offer. The tighter the schema, the more it presumes to know, and the more it costs when the presumption is off.

The balance is to constrain what genuinely must hold and to leave slack where the domain legitimately varies. Precision is the goal, not maximal restriction, and the two are not the same. There is also a class of rule the schema simply cannot carry: constraints that depend on the relationship between fields, where one parameter is required only because another was supplied, or a value is valid only in combination with a second. A shape language expresses those awkwardly or not at all, and forcing them into the schema produces a declaration that is both hard to read and still incomplete. That logic belongs in the tool implementation, enforced in code and surfaced through a useful error when it is violated, which is a different part of the contract than the one the schema holds.

The contract is only as good as its weakest reader

A tool schema is a contract, and the thing that makes designing one unusual is that one of the parties to the contract is a probabilistic reader that fills in whatever the contract fails to specify. A contract written for a deterministic system can rely on the system to fault loudly on anything it did not expect. A contract read by a model cannot, because the model will always produce something, and if the schema left a field underspecified it will produce a plausible guess and hand it over as though it were certain. That is why the schema has to be written for the reader that generates the call, not only for the code that inspects it afterward. Specific names, descriptions that carry the source and the format, a required set pared to genuine dependencies, enumerations in place of freeform strings wherever the values are fixed, and constraints that bound what must be bounded and no more: each of these is a place where the schema stops being a guess and becomes an instruction.

Do that work and the payoff compounds, because a call that was well-formed at generation is a call you rarely have to reject, retry, or reason about after the fact. Skip it and you inherit the opposite economy, catching at the boundary, or failing to catch at all, everything a precise schema would have prevented. Validation and error handling still have their place, and they always will, because guaranteed conformance requires either enforcement in code or grammar-constrained decoding standing behind the declarations. But a tool call that was never generated is one you never have to catch, and the schema is where you stop it from being generated in the first place.