HACKER Q&A
📣 brundolf

Append-Only SQL Schema?


I've had an idea kicking around in my head for how to structure a SQL database, and I'm sure I'm not the first person to think of this, but I'm curious why I've never heard of it being used:

The idea is that every write to your major entities would take the form of a "message" row, inserted into a table (you could have multiple message tables for different kinds of messages). You'd eagerly record events, append-only, and that would be all your persisted data. Then concrete things like "the current status of user accounts" would be in a materialized view over the raw message sequence (or if it turned out to be more performant, you could compute/cache this data in any number of other downstream places; services, redis, etc)

Obviously performance could be a challenge, especially as the message tables get large, which is the only problem I can really think of. But the advantages would be:

- Full history of all DB states (for auditing, recovery, etc)

- Eventual-consistency gets really simple, you just merge message histories

- Schema changes, etc become less of a headache. Your source of truth is always just "what has happened", and you can always add to or reinterpret that however you want. Data integrity should basically become a non-issue if you're never overwriting or rewriting history

I'm also wondering if you could periodically "squash" the messages as needed to preserve performance. I.e. if an account changes to one state and then back to the previous state, remove the previous messages and only keep what's needed to create the latest state. You could decide that eg. you only want the full history for the past 30 days and squash everything before that on an interval

I've definitely seen this used as a programming model, just not at the DB level. It's also roughly how git works

I think Datomic does sorta work this way, but it's unusual in a bunch of other ways too; seems like you could do this with a regular DB

Anybody tried this or heard of it being done?


  👤 avinassh Accepted Answer ✓
I think you would love this talk by Martin Kleppmann - Turning the database inside out with Apache Samza: https://www.youtube.com/watch?v=fU9hR3kiOK0

He basically explores similar idea, you have a log to persist and materialised views to query.


👤 necovek
Common approach for the last few decades was to add "event log" tables that get updated on triggers in every INSERT, UPDATE and DELETE.

Basically, for every table "things"

(id INT, name VARCHAR, ...)

you'd also have a "things_history"

(operation VARCHAR, id INT, name VARCHAR, ...)

This is basically a slight inverse of your model, but pretty much the same implementation: your "materialized views" are actually the target tables, and auto-maintained tables give you the source of truth for audit etc.


👤 sergiomattei
This reminds me of how Reddit works internally. It’s not exactly the same, but I recall they had a “thing” table that aggregated every kind of object into one.

Pretty cool setup. Once again, not the same, but interesting