
Agent Architecture
Between intent and execution
How the layer between a language model and the system that acts on its behalf determines whether agentic AI can be trusted with consequence.

Artificial-intelligence agents are increasingly being asked to take actions on behalf of human users: schedule a procedure, approve a payment, modify an infrastructure configuration, file a regulatory return. The systems built to host these agents face a question that the public debate about machine intelligence has largely overlooked: not whether the underlying language model is intelligent enough, but whether the layer that converts what the model says into what the system does can be trusted with the consequences.
That layer, the one that sits between a free-form linguistic suggestion and a structured, executable instruction, is the part of every agentic system most thoroughly under-discussed. It is also, increasingly, the part where the assurance of the whole system is won or lost.
The most striking feature of this layer is how much of it is improvised. Most organisations deploying language-model agents have one. They must, since something has to convert "pay the supplier invoice attached" into a transfer instruction the back office can execute. Few have designed it deliberately. Fewer still can describe what it actually guarantees.
This article is about the design of that layer. It is not a survey of vendors or products; the interesting question is what the constraints of consequence force any builder to choose, regardless of brand. What follows is a map of the small space of coherent answers: the architectural positions any team building agentic AI eventually finds itself occupying, the trade-offs between them, and the discipline of formal assurance that separates the systems that can be defended from those that merely sound defensible.
The layer between intent and consequence
Every system that lets a language model act on its behalf has, at minimum, a translation layer. Free text arrives; structured action departs. That translation may be performed by a few lines of ad-hoc parsing, or by a carefully constructed pipeline with explicit invariants. What it cannot avoid being is load-bearing: every consequence the system produces passes through it.
Built without deliberation, the layer fails in three characteristic ways. The first is silent acceptance of incoherent instructions: a proposal whose pieces are individually plausible but whose combination cannot, in fact, be executed. The model has named a real counterparty in a real currency, but the counterparty has no account in that currency in this jurisdiction. The proposal looks valid, parses successfully, and only fails much later, at the moment the executing system is asked to do something it cannot.
The second failure is plausibly-shaped hallucination. The model invents an entity that resembles a real one closely enough to slip past a permissive validator: a counterparty name with a missing space, a currency code from a defunct standard, a regulatory category that has been retired. The structure passes; the meaning does not.
The third, and the most pernicious in production, is drift between what the system has been told and what the system enforces. The natural-language instruction sent to the model says one thing; the validator codes another; over months, the two diverge. New rules are added to one and not the other. The result is a system whose claims about its own behaviour are no longer accurate.
The model proposes. The system around it decides.
The disciplined response, increasingly visible across the small number of teams building agentic systems for high-stakes domains, is to treat the translation layer as a first-class component with explicit guarantees. What those guarantees are, where they live and how they are enforced is the substance of what follows.
Four positions in a design space
There is no single right architecture for the translation layer. There is, however, a small space of coherent ones: positions that take consistent answers to a handful of basic questions about where the system's vocabulary lives, when its constraints are checked and what part of the resulting machinery is amenable to proof.
Four such positions recur across serious deployments. They are not exhaustive, but they are the points at which the trade-offs become legibly different from one another.
1. Rules in the prompt
The control layer lives entirely in the natural-language instructions sent to the model. The system inspects the output afterwards and rejects what it can detect. Cheap to set up, immediate to iterate on and structurally insufficient for any action whose consequences cannot be undone. It is the architecture of prototypes; it should rarely be the architecture of production.
2. Typed contracts at the boundary
The system declares, in a precise machine-readable form, what shape an acceptable proposal must take: which fields it must contain, which values they may carry, which combinations are required or forbidden. The model is constrained to produce that shape at the boundary; the receiving code parses the result back into typed objects the rest of the application can reason about. This is the industry default for production systems that care about reliability but do not yet face an external party demanding proof.
3. Shape enforced during generation
The constraint reaches into the generation process itself. At every step the model takes, the set of outputs that would still satisfy the specification is computed; outputs outside that set are made impossible to choose. The result is structural correctness by construction, not by check. Categories of failure that the typed-contract approach catches after the fact, malformed payloads, hallucinated fields, truncated structures, are eliminated before they can occur.
4. A proven intermediary
The control layer is a small, formally specified component sitting between the model and the executing system, with a mathematical proof that no action it approves can violate the system's safety rules. The model remains an untrusted oracle; the intermediary is the part of the system in which trust is concentrated and against which it can be defended. This architecture is the most demanding to build and the only one that gives proven, system-level guarantees in the face of adversarial input: a model that has been prompt-injected, that hallucinates with confidence, or that has been replaced by a hostile substitute.
Each position is a trade. Ease of construction against assurance of behaviour. Speed of iteration against defensibility under scrutiny. The right answer is not the one furthest along the spectrum; it is the one that matches what the system can afford to be wrong about.
What can be proved, and what cannot
The vocabulary of "formal verification" has migrated, in recent years, from research papers into vendor marketing. The migration has not been kind to its accuracy. Much of what is claimed as verified in language-model systems is not verified in any technical sense; some of what is verifiable is dismissed as too academic to bother with. Both errors are common, and they tend to produce the same outcome: a system whose actual assurances are obscured.
The honest picture is that formal verification is real, but real in specific layers of the stack. Some properties are tractable today, with mature tooling and bounded cost. That a data shape is well-formed; that a validator always terminates and produces a sound answer; that a specified component cannot violate a stated safety invariant: these are engineering territory now. The methods are decades old, the proof obligations are routine for teams that have invested in the discipline, and the cost of getting them done is known in advance.
Other properties are genuinely research problems. Whether a language model has understood what the user actually wanted; whether the system end-to-end did the right thing, in the sense the user would recognise: there is no general method for proving this. Empirical evaluation can give probabilistic answers; formal methods cannot give proofs. Claiming otherwise is not engineering; it is marketing dressed in mathematics.
The line between these two territories is, in practice, the single most important thing to get right when discussing the assurance of an AI system. It is the line that separates a defensible verification claim from a misleading one. The first thing a careful auditor will do is ask which side of the line a particular guarantee sits on, and what artefact supports it.
The fastest way to mislead is to claim verification of properties that are not verifiable. The fastest way to under-sell is to lump everything under "you cannot verify AI". Both errors are common; both erode trust faster than admitting the limit would.
The inversion
The most consequential idea in the engineering of trustworthy AI is not technical but conceptual. It can be stated in a single sentence: you do not verify the language model. You verify the system around it.
This inverts the way conventional software verification is usually framed. In the older tradition, the object of proof is the program itself: the code is the thing whose properties one establishes. In the AI case, the equivalent of "the code" is the model, and the model is, by its nature, not amenable to that kind of reasoning. Its parameters are too many and its behaviour too statistical to admit a proof of correctness in any conventional sense.
The mature response is to relocate the proof obligation. The model is treated as an untrusted oracle, a producer of structured proposals whose semantic content carries no warranty. The system around the model, the validator, the resolver, the intermediary, the policy gate, the executing component, is what mathematics gets to reason about. The proof is not "the model does the right thing." The proof is "the system around the model cannot do the wrong thing, regardless of what the model does."
Get this inversion right and the architecture choices fall out of it. The four positions described above can each be read as different points on the same axis: how much of the trust the system places in the model itself, and how much is shifted onto components that admit proof. Get the inversion wrong and "verified AI" remains the marketing phrase it currently is: used loosely, defended weakly and respected by no one whose job depends on a system actually being right.
Why the unglamorous wins
The question that will define the next generation of agentic-AI infrastructure is not whose model is the largest, the fastest or the most fluent. It is whose translation layer can be defended: to a regulator, to an auditor, to a customer, to a court.
That question is coming on a timetable the AI industry has not yet fully internalised. Financial-conduct authorities are publishing draft expectations for agentic systems that act on client mandates. Medical-device regulators are circulating guidance for AI-assisted decision support in clinical workflows. Transport-safety regulators have long since established the vocabulary in which such systems are reviewed, and they will not lower their standards because the new entrant is wearing a different label. In each case, the demand is recognisably the same: prove that this cannot happen.
The architectures described here are the choices that determine whether that demand can be answered with a paper trail or only with assurances. A system built on rules in the prompt has assurances. A system built on a proven intermediary has a paper trail. The difference is not aesthetic; it is the difference between a system that can survive scrutiny and one that cannot.
There is a recurring pattern in the history of industries that learned to handle consequence at scale. Each one built an unglamorous middle layer first: the standards of double-entry bookkeeping, the discipline of clearing infrastructure, the signalling protocols that allowed trains to share track without colliding. None of these were the visible part of the industry. All of them were what allowed the visible part to operate without producing catastrophes. Agentic AI is at the beginning of the same story.
The layer that will outlast the model
The competitive question for the next generation of AI infrastructure is not whose model is largest, fastest or most fluent. Those properties change with each release. What does not change with each release is the architecture of the layer beneath the model: the layer that decides what the model is allowed to say, what the system will act on and what can be proved to a third party about the result.
The systems whose control layers are designed deliberately, with their guarantees made explicit and their proof obligations discharged, will be the systems that can be trusted to do work that matters. The systems built without that discipline will look the same on the surface until they fail.
The deepest layers of any system change rarely. When they do, they redefine what the system can be trusted to do.
That layer is being built now. It will outlast the models that prompted it.


