frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Claude Can (Sometimes) Prove It

https://www.galois.com/articles/claude-can-sometimes-prove-it
23•lairv•2d ago

Comments

r0ze-at-hn•14m ago
> What I’ve found is that given a tool that can detect mistakes, the agent can often correct them

This is the most important line of the entire article

When iterating on a Manifesto for AI Software Development (https://metamagic.substack.com/p/manifesto-for-ai-software-d...) over the last two years the key attribute more than any other that I found was empirical validation. While AI (and humans) are not able to accurately judge their own work when we give AI (and human) the ability to do empirical validation its success skyrockets. This might be intuitive, but there are still papers testing that this applies to AI too. While reaching to have the AI write unit tests I've been embracing fuzzing because then AI can't cheat with bonus tests. The idea of reaching back to school and using interactive theorem proving didn't even cross my mind and now that it has been presented it is a whole paradigm shift on how to push my AI use forward so it can work even more autonomously.

AI can iterate at speeds humans can't. When it has even basic empirical validation (building the code, running tests) it removes the human from the loop. Move that to using fuzzing (such as with golang) and you get way better coverage and way better progress before a human has to intervene. So it isn't a surprise that interactive theorem proving is a perfect match for AI.

It is interesting how this same lesson plays out elsewhere, earlier in the article

> Why is ITP so hard? Some reasons are quite obvious: interfaces are confusing, libraries are sparse, documentation is poor, error messages are mysterious. But these aren’t interesting problems

Remember when llvm got really good c++ error messages and it was life changing? High quality error messages means we could find/fix the error fast and iterate fast. These are actually the MOST interesting problems because it enable the user to learn faster. When a user has high success they will use a product again and again. High quality error messages in all tools will enable Claude code to be able to work longer on problems without human intervention, make less mistakes and overall work faster.

While error messages should always be good a new question that really hammers this home is "When AI encounters this error message, can it fix the problem?"

SCREAM CIPHER ("ǠĂȦẶAẦ ĂǍÄẴẶȦ")

https://sethmlarson.dev/scream-cipher
26•alexmolas•2d ago•18 comments

Less is safer: How Obsidian reduces the risk of supply chain attacks

https://obsidian.md/blog/less-is-safer/
356•saeedesmaili•12h ago•164 comments

If all the world were a monorepo

https://jtibs.substack.com/p/if-all-the-world-were-a-monorepo
150•sebg•4d ago•46 comments

Show HN: FocusStream – Focused, distraction-free YouTube for learners

https://focusstream.media
28•pariharAshwin•3h ago•22 comments

Claude Can (Sometimes) Prove It

https://www.galois.com/articles/claude-can-sometimes-prove-it
23•lairv•2d ago•1 comments

Compiling with Continuations

https://swatson555.github.io/posts/2025-09-16-compiling-with-continuations.html
49•swatson741•3d ago•3 comments

High-performance read-through cache for object storage

https://github.com/s2-streamstore/cachey
44•pranay01•6h ago•7 comments

PYREX vs. Pyrex: What's the Difference?

https://www.corning.com/worldwide/en/products/life-sciences/resources/stories/in-the-field/pyrex-...
55•lisper•4h ago•35 comments

Sangaku Puzzle I Can't Solve

https://samjshah.com/2025/08/05/sangaku-puzzle-i-cant-solve/
21•speckx•3d ago•3 comments

Show HN: WeUseElixir - Elixir project directory

https://weuseelixir.com/
165•taddgiles•14h ago•32 comments

Hidden risk in Notion 3.0 AI agents: Web search tool abuse for data exfiltration

https://www.codeintegrity.ai/blog/notion
129•abirag•13h ago•34 comments

Ants that seem to defy biology – They lay eggs that hatch into another species

https://www.smithsonianmag.com/smart-news/these-ant-queens-seem-to-defy-biology-they-lay-eggs-tha...
396•sampo•22h ago•131 comments

Feedmaker: URL + CSS selectors = RSS feed

https://feedmaker.fly.dev
134•mustaphah•13h ago•21 comments

The best YouTube downloaders, and how Google silenced the press

https://windowsread.me/p/best-youtube-downloaders
354•Leftium•22h ago•159 comments

Internet Archive's big battle with music publishers ends in settlement

https://arstechnica.com/tech-policy/2025/09/internet-archives-big-battle-with-music-publishers-en...
318•coloneltcb•4d ago•122 comments

Supporting Our AI Overlords: Redesigning Data Systems to Be Agent-First

https://arxiv.org/abs/2509.00997
22•derekhecksher•7h ago•5 comments

Show HN: Zedis – A Redis clone I'm writing in Zig

https://github.com/barddoo/zedis
111•barddoo•12h ago•79 comments

LLM-Deflate: Extracting LLMs into Datasets

https://www.scalarlm.com/blog/llm-deflate-extracting-llms-into-datasets/
8•gdiamos•3h ago•3 comments

If you are good at code review, you will be good at using AI agents

https://www.seangoedecke.com/ai-agents-and-code-review/
53•imasl42•5h ago•43 comments

I'm Not a Robot Game

https://neal.fun/not-a-robot/
95•meetpateltech•3d ago•52 comments

Three-Minute Take-Home Test May Identify Symptoms Linked to Alzheimer's Disease

https://www.smithsonianmag.com/smart-news/three-minute-take-home-test-may-identify-symptoms-linke...
94•pseudolus•15h ago•44 comments

Kernel: Introduce Multikernel Architecture Support

https://lwn.net/ml/all/20250918222607.186488-1-xiyou.wangcong@gmail.com/
161•ahlCVA•19h ago•46 comments

Micro-LEDs boost random number generation

https://discovery.kaust.edu.sa/en/article/25936/micro-leds-boost-random-number-generation/
51•giuliomagnifico•3d ago•16 comments

Your very own humane interface: Try Jef Raskin's ideas at home

https://arstechnica.com/gadgets/2025/09/your-very-own-humane-interface-try-jef-raskins-ideas-at-h...
97•zdw•17h ago•15 comments

Shipping 100 hardware units in under eight weeks

https://farhanhossain.substack.com/p/how-we-shipped-100-hardware-units
132•M_farhan_h•1d ago•76 comments

A 3D-Printed Business Card Embosser

https://www.core77.com/posts/138492/A-3D-Printed-Business-Card-Embosser
88•surprisetalk•2d ago•31 comments

An untidy history of AI across four books

https://hedgehogreview.com/issues/lessons-of-babel/articles/perplexity
107•ewf•16h ago•37 comments

R MCP Server

https://github.com/finite-sample/rmcp
93•neehao•3d ago•13 comments

Show the Physics

https://interactivetextbooks.tudelft.nl/showthephysics/Introduction/About.html
176•pillars•3d ago•7 comments

Trump to impose $100k fee for H-1B worker visas, White House says

https://www.reuters.com/business/media-telecom/trump-mulls-adding-new-100000-fee-h-1b-visas-bloom...
1132•mriguy•14h ago•1426 comments