Eventually the real topology lives in people’s heads and you get the classic: “Don’t touch that! Karl built it, left months ago and the diagram like the documentation is probably a lie... but it still runs.”
I’m curious why this stays unsolved in practice, even with modern automation and vendor APIs. Most teams still end up maintaining heavy documents, or growing forest of scripts to parse all sorts of information, just to answer a seemingly simple question: what’s connected to what, right now? (Ideally with an up-to-date, human-readable detailed report.)
I’d love to listen your “this never works” stories:
- What’s your actual source of truth today: a database, a forest of CSVs, IaC state, discovery tooling, or simply “the as-is running system”? - What breaks first for you: identity (naming/IDs), relationship mapping, change process, or tooling limitations? - What’s your worst “Karl system” (critical infra with missing/poor documentation) you have been facing? - How do you keep architecture information consistent across IT and OT, especially with unavoidable legacy systems?
If a CLI that pulls from hosts and generates reports (Markdown/Mermaid, maybe draw.io) with a convenient diff view existed, what would be the killer feature(s) that keep it from becoming shelfware?
I’m especially interested in failure cases and edge conditions that make this problem so persistent.