Ask HN: How do teams remember why infrastructure decisions were made?

8•curious_sre•1mo ago

On teams I’ve worked with, we often run into systems where nobody really knows why certain configs, services, or architectural choices exist. Docs are outdated, Slack history is messy, and the people who made the decision are often gone. When something breaks, we end up rediscovering the same context over and over. Is this just inevitable on long-lived systems, or do experienced teams have a better way of preserving this kind of context?

Comments

toomuchtodo•1mo ago

ADR records. Store as markdown file(s) in the repo.

https://adr.github.io/

https://github.com/adr/madr

curious_sre•1mo ago

That makes sense. In your experience, do ADRs actually get revisited during incidents or onboarding, or do they mostly exist as reference docs that people forget about?

toomuchtodo•1mo ago

Rarely useful for incidents in my experience, mostly useful for onboarding those new to a codebase or part of a system. It’s a mechanism to preserve institutional knowledge that would otherwise evaporate due to not being represented as code; formalized documentation.

The benefit of this information existing in markdown files is this can also be used with LLMs and RAG if getting a natural language interface to the knowledge might be relevant to your enterprise.

curious_sre•1mo ago

That’s helpful context, thanks. In practice, when incidents happen, where do people actually go to reconstruct the “why”? Is it mostly Slack archaeology and asking senior engineers, or is there something else that works better?

toomuchtodo•1mo ago

As you said, chat logs, logging source of truth, and any context that can be provided by subject matter experts on the system(s) in question.

Google's SRE resources on this topic are somewhat helpful, consider reviewing and evaluating for your environment.

https://sre.google/workbook/postmortem-culture/

https://sre.google/sre-book/postmortem-culture/

gardenhedge•1mo ago

Good but needs to be driven by a head of architecture type role incentivizing the practice. Otherwise it's a patchwork of records and largely ignored.

d--b•1mo ago

Ah I wanted to make a product to solve this problem. I posted here to see if anyone thought it was worth solving, but nobody seemed to care.

The way I wanted to do this is to create dashboards that would serve at the same time as infrastructure diagrams for documentation and live health monitoring.

Right now, most documentation solutions aren't used on a daily basis so become out of date, because people don't think about it when making changes and fixes.

And monitoring solutions only show you charts of things you're supposed to already know. They're very technically-oriented, and not business-logically oriented, if that makes sense. Like they'll tell you that process x is running on machine m, and that it's running out of ram, but nothing will tell you that process y that depends on x's outputs is going to fail as well.

mmarian•1mo ago

After I read your comment, I remembered I posted about it too, and came to the same conclusion ^_^

mmarian•1mo ago

Confluence pages, ADRs in github. Not perfect though.

gus_massa•1mo ago

I send an email to the team, that has a few notes. It's actually an email for me in the future, and the notes are in case the future me forgot the details.

curious_sre•1mo ago

Thanks all, this has been really helpful. One recurring theme seems to be that anything manual eventually gets ignored. Out of curiosity, has anyone seen any system that captures context automatically, without relying on people to write it down?

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

AI Regex Scientist: A self-improving regex solver

Tell HN: Another round of Zendesk email spam

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Kernighan on Programming

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is it just me or are most businesses insane?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: OpenClaw users, what is your token spend?

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

AI Regex Scientist: A self-improving regex solver

Tell HN: Another round of Zendesk email spam

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Kernighan on Programming

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is it just me or are most businesses insane?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: OpenClaw users, what is your token spend?

Ask HN: How do teams remember why infrastructure decisions were made?

Comments