Engineering · 2026-06-24

Designing a prompt-classification schema for an LLM gateway

A schema that determines which providers may see a given prompt is the most load-bearing data structure in a compliance-aware gateway. Here are the principles we'd argue for, separately from the specific vocabulary any particular gateway adopts.

If a gateway routes prompts according to policy before quality, then the policy is a function of how prompts are classified, and the classification schema becomes the most load-bearing data structure in the system. The schema sits between the calling applications (which must understand and emit it correctly) and the policy reviewers (who must read it without the help of an engineer). Both audiences are demanding, and they pull in different directions. This post is about the principles we'd argue for in designing such a schema, separately from the specific vocabulary any particular gateway adopts.

Choose a small number of orthogonal axes, not a long list of flags. A schema with twenty independent boolean flags has, in principle, more than a million combinations. The eligibility table that maps combinations to provider lists cannot be human-verified at that size; it has to be generated. A schema with three or four orthogonal axes of small cardinality has perhaps a hundred combinations, which fits on a page and can be reviewed by a non-engineer. The expressive power lost relative to the bigger schema is mostly imaginary — the flag-rich design encodes distinctions that the policy file cannot meaningfully act on, and they end up collapsing in practice. Compress early; expand later only when a real policy distinction needs the new axis.

Separate "what is it" from "what may we do with it". The most common conflation we have seen is between content sensitivity (the prompt contains personal data) and handling rules (the provider must not retain logs). They look similar — both are constraints on the prompt's path through the system — but they are reviewed by different people in a typical customer organisation. Content sensitivity is the data-protection officer's question. Handling rules are a contract-and-procurement question. Putting them on the same axis means each reviewer has to render a decision on the other's territory, and the schema becomes harder to maintain.

Encode the prompt's origin and its content separately. This came from our customer's customer and this contains personal data are different facts that often co-occur but sometimes don't. A public document containing personal data is a different routing target from a customer's confidential message containing none. Folding them together loses information the policy file may want to act on.

Make jurisdiction coarser than you'd think. It is tempting to model jurisdiction as a per-country flag with full ISO-3166 granularity. The operational distinctions that actually matter for routing — does the EU regime apply, does the UK's post-Brexit equivalence apply, does Switzerland sit inside or outside, does the destination provider's jurisdiction matter — collapse into a handful of buckets. Modelling at country granularity creates the appearance of precision without buying any. We have not yet seen a gateway whose policy file actually distinguished between, say, France and Germany; we have seen many that imagined they would.

Make the calling application emit the classification. It is tempting to put a classifier on the gateway's front door. Everyone wants the application teams not to have to think about classification. The cost of that move is high: every classifier disagreement is either a false alarm or a silent miss, and the audit story now depends on a probabilistic component. We argue instead that the calling application — which knows the data flow it represents — should emit the classification, and the gateway should reject an unclassified call rather than guess. The contract is cleaner. The audit story is cleaner. The cost is that application teams have to know what their data is, which is not really a cost.

Treat the schema as versioned protocol, not configuration. Once applications are emitting classifications and the gateway is enforcing on them, the vocabulary is a contract between distributed components. Adding a new value to an existing axis is a backward-compatible change with sensible defaults; adding a new axis is a coordinated migration. We would not let the schema drift the way internal configuration tends to drift. It needs to live in version control, in a stable canonical encoding, with explicit deprecation rules.

Resist axes whose values are not actionable. A classification that does not change what happens to the prompt is decoration. We have seen schemas accumulate flags that capture interesting properties but that no policy file ever reads — prompt is auto-generated by template, prompt is a follow-up to a previous query, and so on. Each unused flag is a maintenance tax and a source of drift between application teams' tagging discipline and the gateway's actual behaviour. If the policy file does not read it, the gateway should not require it.

A worked example of how these principles play out: a schema that distinguishes content sensitivity, origin, jurisdiction, and handling rules across four axes of small cardinality each gives, say, a few hundred combinations. The policy file's eligibility table fits on a page. Application teams can be trained on the schema in a meeting. New axes are added on the order of once a year, with care. The gateway emits a structured event whenever a tag combination removes the model the customer would otherwise have used, so that the cost of the policy is visible. None of this is novel; what is novel, sometimes, is doing it deliberately rather than accumulating it.

The schema we use in production is one application of these principles. We won't enumerate it here — it lives in the products' documentation where customers actually need it — but we will say that it has fewer values than any of us would have guessed before we started designing it, and that the smallness has been load-bearing in every customer review we have run it through.

← All engineering posts