We wrote about something we keep thinking about: the gap between code that runs and code that's provably correct.
The post walks through a simple Java banking app with three bugs that compilation won't catch, tests are unlikely to catch, and an LLM reviewing the code would almost certainly miss. The most interesting one is structural: LLMs reason within a context window, but the most dangerous bugs are global properties that cross component boundaries. The preservation of money in a transfer isn't a property of withdraw() or deposit() in isolation.
Curious whether others have run into this class of problem in production.
jorgegalindo•1h ago
We wrote about something we keep thinking about: the gap between code that runs and code that's provably correct.
The post walks through a simple Java banking app with three bugs that compilation won't catch, tests are unlikely to catch, and an LLM reviewing the code would almost certainly miss. The most interesting one is structural: LLMs reason within a context window, but the most dangerous bugs are global properties that cross component boundaries. The preservation of money in a transfer isn't a property of withdraw() or deposit() in isolation.
Curious whether others have run into this class of problem in production.