Engineering · 2026-09-16

A redactable hash chain, as a design pattern

An audit log that is append-only, tamper-evident, and redactable looks contradictory at first reading. The contradiction dissolves when you separate the manifest from the content. A short note on the pattern, independent of any specific application.

A class of compliance-flavoured systems wants four properties out of an audit log that are usually presented as if they are in tension.

Append-only. Once an entry is written, it cannot be removed.
Tamper-evident. Any modification of a prior entry is detectable.
Reproducible. Verifying integrity from genesis to head is possible without trusting the system itself.
Redactable. Personal data must be erasable on a verified GDPR Article 17 request, and confidential customer data must be removable on contract termination.

The first three are the textbook properties of a Merkle-style hash chain. The fourth contradicts them if you read them naïvely: any redaction breaks the chain. The contradiction dissolves when you separate the manifest from the content, and the rest of this post is a short tour of that pattern, independent of any specific application.

The manifest is the chain. Each entry has the shape:

entry_id        : monotonic int
created_at      : RFC3339 timestamp
record_kind     : domain-specific (evidence | attestation | redaction-marker | ...)
content_uri     : opaque URI for the content blob
content_hash    : sha256 of the content at the moment of recording
metadata        : domain-specific structured fields
prev_hash       : sha256 of the previous entry's canonical encoding
entry_hash      : sha256 of this entry's canonical encoding

The content lives in a separate object store, addressed by its content_uri. The entry references the content but does not include it. The chain is computed over the manifest entries only — including their content_hash, but not the content bytes themselves.

This separation is the load-bearing idea. Three things follow from it.

Redaction does not break the chain. When personal data must be erased, the underlying content blob is deleted or zeroed, and a redaction-marker entry is appended to the manifest. The original entry is not modified — it still says "at this point, content with hash X was recorded for purpose Y." It just no longer points at retrievable bytes. The redaction-marker entry says "the content of entry N was redacted on date D, by operator O, under authority A." Anyone walking the chain still computes a consistent head; they can also see, at the relevant index, that the content has been erased and on what grounds.

Auditors see redaction as a first-class event. A redaction is visible. An auditor reading the chain doesn't have to wonder whether content existed and was deleted — the marker entry tells them, with the date, the reason, and the authorising party. That is the right answer for a system that is supposed to be transparent about its own operation. An audit log that hides redactions is worse than one that displays them: hiding is the violation auditors most want to catch.

Tampering is still detectable. If an attacker modifies a manifest entry, the chain head no longer matches a recomputation. If an attacker modifies the content of a blob that hasn't been redacted, the recomputed content_hash no longer matches the manifest's recorded hash. If an attacker modifies a redacted blob, there is nothing to compare against by design — but the redaction marker prevents that case from being interesting, since the content is supposed to be unrecoverable. The space of detectable tampering is what one wants it to be.

A few details that, in our experience, take iteration to land.

Canonical encoding matters more than it sounds. The hash is computed over a canonical JSON (or CBOR, or whatever serialisation choice fits) — UTF-8, sorted keys, no whitespace, no trailing newlines, fixed timestamp format, fixed numeric encoding. A serialisation library upgrade that subtly changes any of those will diverge the chain across deployments. The encoder needs to be treated as protocol-load-bearing — versioned, tested, not allowed to change without an explicit migration path.

Time is part of the chain. Each entry's created_at is included in the canonical encoding. Reordering entries then also requires fabricating consistent timestamps, which an external pinning step (a daily signature of the chain head, exported to a customer-controlled append-only sink) can constrain to within a sensible window. For applications that need finer ordering guarantees, the external sink can be a transparency log with per-entry timestamping rather than per-day.

The redaction-marker entry should itself be authenticated. Anyone who can append entries can append a marker; the marker's value as evidence depends on it being clear who authorised the redaction. The application layer can require signatures from a redaction-quorum, multi-party approval, or whatever the local policy demands. The chain machinery itself doesn't validate the signature scheme — that is application logic — but it does carry the signature payload as part of the entry, so audit-time verification can.

Verification should be possible from outside the system. The integrity story is only as strong as the auditor's ability to check it without trusting the vendor. A small CLI that walks the chain, recomputes the hashes, verifies the signatures, and prints a clean yes/no with detail is the right shape. Open-sourcing the verifier reduces the surface for yes but the vendor's verifier is the one telling us it's all fine, which is exactly the surface an auditor is trying to eliminate.

What this pattern is not, on its own, sufficient for. It does not establish provenance — the question of where the content came from before it was recorded. It does not establish that the recorder was authorised to record. It does not authenticate the source of the content beyond what the metadata captures. Those are upstream concerns, addressed by signing schemes on the producer side, attestation hierarchies, and whatever the application's trust model demands. The chain is one component in a larger story about evidence integrity, not the whole story.

We use this pattern in our compliance products. The specific shape of the manifest, the metadata fields, the substrate, and the operational story are application details we won't enumerate here. The pattern itself, though, is portable enough that we'd recommend it to anyone facing the same four-properties-that-look-contradictory problem. Most of the time, the contradiction is not in the requirements; it is in conflating two layers that should have been separate from the start.

← All engineering posts