Show HN: Premortem, a coding-agent-powered airplane blackbox

https://github.com/tilework-tech/nori-premortem

2•theahura•1h ago

A few weeks ago, I was getting random OOMs on my linux box and I had no idea what was causing them. At one point when I realized that the memory was getting sucked up by some process, I kicked off a claude code job to see if it could figure out what was happening in real time. And it did!

In real time the coding agent ran through a suite of system commands, figured out which jobs were causing problems, and then even started to dig into the explicit function calls (python and node processes can both be inspected at the function call level by sideloaded processes) before the entire system finally crashed.

Besides being extremely cool, I realized that with a few tweaks I could make this a legitimately useful tool. The basic idea: any time certain system vitals cross a threshold, spin up a coding agent and have the agent debug what is going on as aggressively as possible, with all logs being streamed to a third party server (in addition to being stored on disk). This basic abstraction would solve two huge problems:

- Most of the time it is very hard to figure out why exactly a machine went down. This tool would effectively act as an airplane blackbox, a sort of last record of what was going on that specifically is focused on debugging the failure as it happened. Massive speed up on figuring out system-breaking issues.

- Most of the time there are available interventions that someone could take that would prevent the system from going down at all, if a human was around when the crash was happening. For example, if I see that I’m about to OOM from vitest, I can just kill a bunch of the processes that are spiking memory and prevent the system from crashing that way.

We now have premortem running on all of our production machines.

Hope this is useful for other folks!

Comments

doormatt•41m ago

>When running multiple intensive processes in parallel, pushing machines to their limits to maximize throughput, traditional monitoring only provides alerts when thresholds breach.

>Premortem continuously watches system vitals (CPU, memory, disk, processes) and spawns Claude agents to diagnose problems when thresholds are breached.

Surely you see the irony here...

theahura•26m ago

Sure do! I'm not saying that it won't bring about the end even faster. But you do get some very valuable things out of the machine as its taking its dying breaths

doormatt•22m ago

I don't think you do. Your solution will also only start "firing alerts" once a threshold has been breached.

theahura•19m ago

o whoops. updating the readme lol

theahura•15m ago

updated, thanks. This is what I get for having an AI write the README

Cline-Bench: A Real-World, Open-Source Benchmark for Agentic Coding

Amazon Greenlights a New Stargate Series

Post-Capitalism for Martians (2015)

Bill Watterson Returns with The Mysteries (2024)

Grok: Yep, Elon Musk Is More Fit Than LeBron, More Handsome Than Brad Pitt

Show HN: Facetime Influencer AI Avatars Real-Time

HP and Dell disable HEVC support built into their laptops' CPUs

CDC website changed to contradict conclusion that vaccines don't cause autism

Data Science Weekly – Issue 626

Show HN: 0Portfolio – AI-powered portfolio builder for everyone

Trustworthy Systems Group: secure and performant real-world computer systems

Are cellular towers the next landlines?

Show HN: CampaignTree – A visual alternative to spreadsheets for planning ads

RI judge intervenes after ICE mistakenly detains Superior Court intern

I Let a Brain Organoid Make My Investment Decisions

Is C++26 getting destructive move semantics?

Federal prosecutors move to dismiss charges against woman shot by Border Patrol

The Calvin and Hobbes search Takedown (2010)

Mudyla: Multimodal dynamic launcher, a DAG-based bash script orchestrator

PrivateCut – Trim videos 100% in the browser, no upload, works offline

Putting Down Your Phone May Help You Live Longer (2019)

Ask HN: How Do you undo or checkout changes from Codex CLI and others?

Suppression of pair beam instabilities in a laboratory analogue of blazar jets

Nvidia pushes hotfix after Windows 11 October update tanks gaming performance

Morgan Stanley Delays Data Center Debt Sale Amid Alibaba Risks

Apple Watch's algorithm detects 89% of sleep apnea

Humanoid robot Figure 02 helps build over 30k BMW X3s

Abstractive Thinking Model

Over-Regulation Is Doubling the Cost by Peter Reinhardt

The Game Awards 2025 Nominations