How do teams remember why infrastructure decisions were made?

Question

On teams I&rsquo;ve worked with, we often run into systems where nobody really knows why certain configs, services, or architectural choices exist. Docs are outdated, Slack history is messy, and the people who made the decision are often gone. When something breaks, we end up rediscovering the same context over and over. Is this just inevitable on long-lived systems, or do experienced teams have a better way of preserving this kind of context?

toomuchtodo · Accepted Answer

ADR records. Store as markdown file(s) in the repo.https://adr.github.io/https://github.com/adr/madr

d--b · Answer

Ah I wanted to make a product to solve this problem. I posted here to see if anyone thought it was worth solving, but nobody seemed to care.
The way I wanted to do this is to create dashboards that would serve at the same time as infrastructure diagrams for documentation and live health monitoring.
Right now, most documentation solutions aren't used on a daily basis so become out of date, because people don't think about it when making changes and fixes.
And monitoring solutions only show you charts of things you're supposed to already know. They're very technically-oriented, and not business-logically oriented, if that makes sense. Like they'll tell you that process x is running on machine m, and that it's running out of ram, but nothing will tell you that process y that depends on x's outputs is going to fail as well.

mmarian · Answer

Confluence pages, ADRs in github. Not perfect though.