The production AI code we wrote in 2023 was mostly prompt engineering with a thin Python wrapper. "Please return JSON in this shape." "Do not wrap the JSON in markdown." "Only output valid JSON and nothing else." Retry on parse failure. Sleep. Retry again.
The code we write in 2026 looks different. The prompt is shorter. The schema is longer. And the schema is now the contract.
This is not a minor change. It has shifted the shape of the craft enough that hiring the right engineer now looks different than it did eighteen months ago.
What changed
Every major provider offers structured-output mode now. OpenAI's strict JSON schema, Anthropic's tool-use JSON schemas, Google's response_schema, plus open models via grammar-constrained decoding (llguidance, xgrammar, outlines, and friends). These are not hints to the model. They are decoder-level constraints that make invalid output literally impossible to emit.
The practical effect. Before: we wrote long prompts explaining what the output should look like, caught parse failures, retried with a differently worded prompt, logged the failures, and still saw a small percentage leak through. After: we define the schema, let the decoder constrain the token stream, and get valid structure back on every call.
Our per-call reliability on schema-conforming output went from about 96 per cent to effectively 100 per cent on every task we moved across. The retry logic came out of the code. The post-processing came out of the code. Observed latency fell because we stopped retrying.
That is before we talk about the engineering wins.
The schema is the new spec
When the schema controls the shape, the prompt is free to focus on what the fields mean. The engineering time that used to go into "how do I get it to stop wrapping things in markdown" now goes into designing the schema itself, and designing schemas for LLMs is genuinely a different skill from designing schemas for HTTP APIs.
Three patterns we rely on.
Enums as routers. Model decisions that change downstream behaviour become enum fields. action: "approve" | "reject" | "escalate" is cleaner than parsing free-text rationale for the three words. The enum is machine-readable; the free-text rationale sits next to it for humans. The downstream code dispatches on the enum, and the model's free-text rationale becomes a UI explanation, not a control signal.
Nested structures with optional fields. A tool call with an optional explanation, an optional confidence, an optional follow-up suggestion. The model fills what it has. Nothing gets stuffed into a single free-text string and then regex-parsed out later. Optional fields are a permission structure: the model tells us what it knows, and says nothing for what it does not.
Union types for branching. Different output shapes for different outcomes. A Response type with discriminated variants like { kind: "answer", content: ... } vs { kind: "clarifying_question", question: ... } vs { kind: "refusal", reason: ... }. This is where schema design is most like product design. You are declaring, ahead of time, the entire surface of what the system can say. If the variants are wrong, the product is wrong. If the variants are right, the code on both sides becomes exhaustively checkable.
What the prompt is for now
Three things, roughly in this order.
The context. What the model is being asked to do, in plain language. Two or three sentences.
The semantics. What each enum value means. What triggers an escalation. When to leave an optional field null versus fill it with a best guess. This is where domain knowledge lives, and this is the part that does not compress.
A handful of examples. One good example, one edge case, one that should return null in the optional fields. Not twenty. Twenty examples is the old style; three well-chosen ones do more work, and they are easier to keep in sync with the schema when the schema evolves.
A modern prompt is often 300 to 500 tokens of instructions. A 2023 prompt was 2,000 to 4,000. Claude and GPT both improved enough that the verbosity stopped buying us anything, and the schema took over the job the verbosity was doing.
The tests that replaced string matching
Testing structured output is cleaner than testing strings. Our test cases assert on fields. "Did the model classify this as approve? Did the confidence fall in the expected band? Did the optional explanation get filled when the amount was below the review threshold?"
Structural tests survive prompt changes. We can refine the instructions without the test suite failing because one word shifted in an output sentence. That alone halved our eval maintenance time.
When not to use structured outputs
A pitfall we see. Teams force every model call into a rigid schema because "that is the best practice now". It is not. Two cases where free-form is better.
Genuinely generative work. Writing a first-draft essay. An apology email. A marketing paragraph. If the shape of the output is the content, a schema will flatten it. Let the model write. Parse afterwards if you need to.
Exploration. Research tasks where you do not yet know the shape of the answer. Tighten the schema once you know what good looks like. Prematurely schematising an exploration forces the model to fit a box you have not yet validated, and the box will be wrong, and you will iterate the schema for two weeks before you ship.
The rule is simple. If the downstream code needs a field, the schema should contain it. If the downstream code needs prose, the schema should have a prose field, but the rest of the output should still be structured around it.
What hiring looks like now
We test for schema intuition in interviews. Candidates who jump to "here is the prompt I would write" are shipping a 2023 solution to a 2026 problem. Candidates who sketch the schema first, ask "what does the caller need from this?", and draft a short system prompt against the schema, are working in the right shape.
This has shifted our preferred background. Strong API design skills carry over more than we expected. People who have built and evolved versioned HTTP APIs know how to evolve a schema without breaking clients, how to deprecate a field gracefully, how to version a response envelope. That is the daily craft of AI engineering now, not the prompt poetry it was three years ago.
The short version
Schemas used to be a thing we bolted on for parsing safety. They are now the spec. The prompt is the accent, the schema is the sentence, and the upstream engineering time has moved to match. The teams that kept writing five-page prompts into 2026 are the same teams whose code still has three-layer-deep try/except blocks around json.loads. There is a faster way to do this now, and it has been faster for more than a year.