Engineering · 2026-05-13

Policy before quality, in an LLM gateway

A short argument for the order in which an LLM gateway should resolve its routing decisions. Eligibility first, optimisation second. The ordering is not subtle, and the systems we have seen suffer most are the ones that took the other order by default.

The default treatment of LLM routing in the public discourse is an optimisation problem. Given a prompt, pick the cheapest model that meets a quality bar. Anyone who has shipped a gateway in an environment where the cheapest model that meets the quality bar is sometimes also a model the data should not have reached has had to reckon with this default's limits.

The argument we'd make, after building Hermes and watching customer integrations of it, is that routing has two phases and the order matters: eligibility first, optimisation second. Eligibility is the question which providers may legally and contractually see this prompt at all? Optimisation is the question of those that may, which is best? If quality runs first, the optimiser produces a recommendation that the policy layer then has to fight, and the system feels strained whenever the recommended model is ineligible. If policy runs first, the optimiser ranks among providers that were already cleared, and the system does the right thing by construction.

The principle has a few corollaries that are worth stating directly.

Eligibility should be a property of the prompt's origin, not its content. A classifier that reads the prompt and decides whether it contains personal data is the wrong shape, in our experience. Classifiers are wrong often enough that any system depending on one for compliance defensibility ends up either alarming on every disagreement or trusting the classifier and losing the audit story. The cleaner contract is that the calling application — which knows what kind of data flow it represents — declares the prompt's classification when it issues the call. The gateway enforces; it does not infer.

The policy should be data, not code. A policy expressed as a declarative file, version-controlled, signed at deploy, and shared with the customer is an audit artifact in itself. A policy expressed as branching logic in the gateway's source code is not. When a compliance officer asks how do we know prompts of class X never went to provider Y?, the answer should be a file the officer can read, the deploy that activated it, and the per-request log line that records which version of it ran. Pointing at gateway source code is acceptable to engineers and unsatisfying to everyone else.

Quality is bounded by policy, and that boundary should be visible. When a customer's tag set rules out the model that would otherwise be optimal, the gateway should emit a structured signal saying so. Customers who care can subscribe to it as a metric — over the last week, X% of calls were routed to a non-optimal eligible model because the optimal one was ineligible — and use it as input into their data-flow review. Hiding the loss makes it tempting to relax the policy without realising one is doing so.

What we are deliberately not saying in this post: the specific tag vocabulary one should adopt, the structure of the policy file, the failure modes of any specific provider's eligibility profile. Those are properties of a particular gateway, and the gateway's customers are the right audience for them. The principle — eligibility first, optimisation second — is portable, and the systems we have watched suffer most are the ones that took the other order by default and then bolted compliance on later.

A short note on what this isn't an argument against. We are not arguing that quality and cost don't matter. They do. The optimiser is doing real work, and a gateway that routes everything to the most-permissively-tagged provider regardless of capability is no use to anyone. The argument is about the order, not the priority. After the eligibility filter has run, the optimiser has every reason to be aggressive about cost and quality. It just doesn't get to consider providers that were never eligible to begin with.

← All engineering posts