frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: How do you keep system context from rotting over time?

15•kennethops•1d ago
Former SRE here, looking for advice.

I know there are a lot of tools focused on root cause analysis after things break. Cool, but that’s not what’s wearing me down. What actually hurts is the constant context switching while trying to understand how a system fits together, what depends on what, and what changed recently.

As systems grow, this feels like it gets exponentially harder. Add logs and now you’ve created a million new events to reason about. Add another database and suddenly you’re dealing with subnet constraints or a DB choice that’s expensive as hell, and no one noticed until later. Everyone knows their slice, but the full picture lives nowhere, so bit rot just keeps creeping in.

This feels even worse now that AI agents are pushing large amounts of code and config changes quickly. Things move faster, but shared understanding falls behind even faster.

I’m honestly stuck on how people handle this well in practice. For folks dealing with real production systems, what’s actually helped? Diagrams, docs, tribal knowledge, tooling, something else? Where does it break down?

Comments

amadeuswoo•17h ago
One thing that’s evidently helped: using CLAUDE.md / agent instructions as de facto architecture docs. If the agent needs to understand system boundaries to work effectively, those docs actually get maintained
kennethops•14h ago
But how do you ensure the .md file is able to see all of the details of the infra?
amadeuswoo•13h ago
You don't, it's a map of intent, not infra state. What exists, why, what talks to what. Live state still needs IaC and observability. The .md captures the 'why' that terraform can't
htrp•16h ago
I don't think OP is looking for context from the AI model perspective but rather a process for maintaining a mental picture of the system architecture and managing complexity.

I'm not sure I've seen any good vendors but I remember seeing a reverse devops tool posted a few days ago that would reverse engineer your VMs into Ansible code. If that got extended to your entire environment, that would almost be an auto documenting process.

dexdal•15h ago
Context rots when it stays implicit. Make the system model an explicit artifact with fixed inputs and checkpoints, then update it on purpose. Otherwise you keep rebuilding the same picture from scratch.
kennethops•15h ago
Im honestly looking for both. I haven't found a vender to do this well for just humans nor am I seeing something that can expose this context, read only, to all of the ai agent coding models

I will check that tool out.

liveoneggs•15h ago
Monitoring tools (APM) will show dependencies (web calls, databases, etc) and should contain things like deployment markers and trend lines.

All of those endpoints should be documented in an environment variable or similar as well.

The breakdown is when you don't instrument the same tooling everywhere.

Documentation is generally out of date by the time you finish writing it so I don't really bother with much detail there.

kennethops•14h ago
This has been my experience as well. imo documentation feels like one of the few areas that AI can be good at today.
liveoneggs•4h ago
It's okay but it often lies. At an SRE level you need a pretty zoomed-out view of the world until you are trying to zoom-in to a problem component.

Always start at the head (what a customer sees -- actually load the website) and work down into each layer.

If something is breaking way downstream and customers don't see it then it doesn't actually matter right now.

nitwit005•15h ago
Every company I've worked with has started with an ER diagram for their primary database (and insisted on it, in fact), only to give up when it became too complex. You quickly hit the point where no one can understand it.

You then eventually have that same pattern happen with services, where people give up on mapping the full thing out as well.

What I've done for my current team is to list the "downstream" services, what we use them for, who to contact, etc. It only goes one level deep, but it's something that someone can read quickly during an incident.

kennethops•14h ago
Sorry what is an ER diagram?
gnabgib•14h ago
First hits on DDG, anonymous Google, Bing

ERD/ Entity Relationship Diagram https://www.lucidchart.com/pages/er-diagrams

ERM / Entity-Relationship Model https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_mo...

(same-same, ERD is the more common acronym)

kennethops•14h ago
That is what I figured it would be, but you never know anymore with the amount of acronyms thrown around nowadays.
canhdien_15•13h ago
If the system is so good, why constantly change the context?
BOOSTERHIDROGEN•10h ago
I think it is because of continous improvement mindset.
canhdien_15•10h ago
Continuous improvement is essential, but we must distinguish between progress and mere decoration. If an old car runs perfectly and a new one offers the same speed but with a different shell, why replace the entire vehicle? It’s a waste of time and resources. Why not focus on upgrading the 'shell' instead of reinventing the wheel?
kennethops•28m ago
but think about the shareholders!
dlcarrier•9h ago
Good hierarchical documentation

A laptop computer is extremely complex, but is actively developed and maintained by a small number of people, built on parts themselves developed by a small number of people, many of which are themselves built on parts themselves developed by a small number of people, and so on and so forth.

This works well in electronics design, because everything is documented and tested to comply with the documentation. You'd think this would slow things down, but developing a new generation of a laptop takes fewer man hours and less calendar time than developing a new generation of any software of a similar complexity running on it, despite the laptop skirting with the limitations of physics. Technical debt adds up really fast.

The top-level designers only have access to what the component manufacturers have published, and not to their internal designs, but that doesn't matter because the publications include correct and relevant data. When the component manufacturer comes out with something new, they use documentation from their supplier, to design the new product.

As long as each components of documentation is complete and accurate, it will meet all of the needs of anyone using that component. Diving deeper would only be necessary if something is incomplete or inaccurate.

linux4dummies•5h ago
I use nix (nixos) with AI-agents. Its everything i ever dreamed of and a bit more. Makes all other distros and buildsystems look old and outdated :D
kennethops•3h ago
Woah what are you doing?

Bending Spoons laid off almost everybody at Vimeo yesterday

78•Daemon404•1h ago•38 comments

Ask HN: Do you have any evidence that agentic coding works?

372•terabytest•1d ago•379 comments

Avoid Cerebras if you are a founder

5•remusomega•53m ago•2 comments

Ask HN: Revive a mostly dead Discord server

17•movedx•20h ago•23 comments

Ask HN: COBOL devs, how are AI coding affecting your work?

167•zkid18•2d ago•183 comments

Ask HN: Which common map projections make Greenland look smaller?

17•jimnotgym•23h ago•16 comments

Ask HN: Is retreq / retspec a thing?

2•foobarbecue•4h ago•0 comments

Ask HN: How do you keep system context from rotting over time?

15•kennethops•1d ago•20 comments

Ask HN: Why don't tech companies provide housing?

5•alcasa•5h ago•7 comments

Ask HN: Is it even possible to stop Google Calendar Spam?

4•artur_makly•1h ago•1 comments

Ask HN: How to introduce Claude Code to a team?

8•9dev•1d ago•3 comments

Ask HN: What are the recommender systems papers from 2024-2025?

14•haensi•1d ago•1 comments

Ask HN: What's an API that you wish existed?

9•tornikeo•1d ago•14 comments

Ask HN: Did past "bubbles" have so many people claiming we were in a bubble?

16•bmau5•19h ago•18 comments

Ask HN: Local models to support home network infrastructure?

5•DrAwdeOccarim•1d ago•3 comments

Ask HN: Breaking into tech project management from different field?

4•conner_h5•20h ago•4 comments

Ask HN: How worried should I be about running LLM code on my machine?

9•scoofy•1d ago•4 comments

Ask HN: Should you combine your personal website and blog or keep them separate?

6•nanfinitum•23h ago•3 comments

Ask HN: Clipboard overflows causing system crashes in macOS Tahoe 26.3 beta 2?

8•nhubbard•1d ago•3 comments

Ask HN: How would you design for this scale today?

4•phs318u•1d ago•4 comments

Ask HN: Would you trust a new browser security extension in 2025?

3•linklock•1d ago•8 comments

Ask HN: What non-fiction do you read?

14•yanis_t•1d ago•15 comments

TruCite–an independent verification layer for AI outputs in regulated workflows

3•docmani74•1d ago•0 comments

Ask HN: What should I do with my old laptop in 2026?

5•nanfinitum•1d ago•8 comments

Treating anxiety as a bug in legacy code (engineering approach)

5•bitkin_dev•1d ago•5 comments

AI Californication

6•shoman3003•1d ago•2 comments

Ask HN: Do we need independence and autonomy in Edge-Cloud?

2•Dutchhack•19h ago•3 comments

Ask HN: how to detect teammate vs. enemy in Krunker.io?

2•kracked0x•20h ago•0 comments

Fabric lets me assess online AI from my Unix CLI

2•oldguy101•21h ago•1 comments

Ask HN: Claude Opus performance affected by time of day?

39•scaredreally•5d ago•39 comments