frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Proof Loop – I make my coding agents prove they finished the task

https://github.com/LeoStehlik/proof-loop
1•LeoStehlik•50m ago
I built this because my coding agent kept telling me he did complete the task, but when I verified it, it was not the case.

I made Proof Loop fairly light, intentionally. It’s basically a protocol helper script for AI agent tasks:

- set acceptance criteria before coding/implementation - keep the builder and verifier roles separate - each criteria tested with results PASS, FAIL or UNKNOWN - attach evidence of done - keep the proof evidence in the repo, so that the next agent / run can inspect it and see what was already done

You can try it via commandline from the cloned repo, go the the proof-loop directory and run make demo.

Teh demo creates a task, checks the proof bundle, fails if evidence is missing, then passes when acceptance criteria have evidence attached.

There is also an OpenClaw skill version now, so the easiest use is

openclaw skills install proof-loop

In the GitHub repo, there is harness-agnostic version and examples.

I would especially like criticism and/or any feedback from people who run Codex, Claude Code or OpenCode on long-running multi-step tasks.

Note this is a utility that I use myself, FoC, MIT Licensed, OpenSourced, with no intention of any commercialization.

Comments

crionuke•38m ago
For me unit or even int tests are more reliable signs to get agent is done or still not, and you ?
LeoStehlik•12m ago
same, but that follows. Why I wanted a proof first is so that I don’t waste time running tests on code that was far from finished yet. Especially early days this year, I’d get agent confirming to me “I did this” whilst later I uncovered it struggled to use tools, so it just said it was done. When I recieve the evidence of “I’ve done it” (iterate if anything is missing), only then I trigger the round of unit tests. I know this may sound like a bit of too much careful handholding, but got burned so many times this pays off.

Grok falls flat in Washington, undercutting SpaceX's AI growth story

https://www.reuters.com/world/grok-falls-flat-washington-undercutting-spacexs-ai-growth-story-202...
1•1vuio0pswjnm7•11s ago•0 comments

Does using LLMs make me dumber?

https://wilsoniumite.com/2026/05/21/does-using-llms-make-me-dumber/
1•Wilsoniumite•44s ago•0 comments

SpaceX and OpenAI both filing for IPO the same week

https://www.forbes.com/sites/antoniopequenoiv/2026/05/20/elon-musks-spacex-files-for-highly-antic...
1•pzxc•52s ago•1 comments

Vitamin C as a nitrosation inhibitor: Modeling study across dietary patterns

https://www.sciencedirect.com/science/article/pii/S002251932600069X?via%3Dihub
1•bookofjoe•58s ago•0 comments

I tested Haiku vs. Sonnet across 3 agent tasks – the cheap model won every time

https://github.com/aimvik07/agent-eval
1•aimvik07•2m ago•0 comments

Language models are weird for the same reason human cultures are weird

https://davidoks.blog/p/language-models-are-weird-for-the
1•jprs•3m ago•0 comments

Big Tech's AI Debt Binge Tests High-Grade Market, Barclays Says

https://www.bloomberg.com/news/articles/2026-05-21/big-tech-s-ai-debt-binge-tests-high-grade-mark...
1•1vuio0pswjnm7•3m ago•0 comments

Qwen 3.7 Max is on OpenRouter: $2.5 in, $7.5 out

https://xcancel.com/OpenRouter/status/2057500097206976983
1•theanonymousone•3m ago•0 comments

Show HN: Computer Police – block malicious NPM/pip installs locally

https://computer.police.dev/
1•kannthu•5m ago•0 comments

Drones reshape war in Colombia as deaths and injuries mount

https://www.theguardian.com/world/2026/may/18/drones-war-colombia-civilians-farc-acled
1•YeGoblynQueenne•5m ago•0 comments

The Claude -pocalypse

https://theautomatedoperator.substack.com/p/the-claude-pocaylpse-or-how-i-learned
1•idopmstuff•5m ago•0 comments

Throughput vs. Goodput: The Performance Metricin LLM Testing

https://qainsights.com/throughput-vs-goodput-the-performance-metric-you-are-probably-ignoring-in-...
1•qainsights•6m ago•1 comments

MyIPNow – IP and Network Toolkit

https://myipnow.net/
1•myipnow•6m ago•0 comments

micnik – 10s voice message anonymous microblogging

https://micnik.stagas.deno.net/
1•stagas•7m ago•0 comments

Musk's SpaceX discloses massive losses ahead of expected record-breaking IPO

https://www.washingtonpost.com/technology/2026/05/20/elon-musk-spacex-initial-public-offering-fil...
1•1vuio0pswjnm7•8m ago•0 comments

On Not Being a Language Model

https://www.xydac.com/blog/on-not-being-a-language-model/
1•xydac•8m ago•0 comments

Geminis Ad Auction Revealed: "Mechanism Design for Large Language Models"

https://arxiv.org/abs/2310.10826
1•jcfrei•9m ago•0 comments

Nvidia's revenue blows past Wall Street expectations as AI boom accelerates

https://www.theguardian.com/technology/2026/may/20/nvidia-revenue-ai-boom
1•Brajeshwar•11m ago•0 comments

Agents Are Not One Thing

https://jlmr.dev/posts/agents-are-not-one-thing/
1•jelmersnoeck•11m ago•0 comments

Show HN: Agent.email – sign up via curl, claim with a human OTP

2•adisingh13•11m ago•0 comments

SpaceX's historic IPO plans: Billions in losses and Musk's ownership

https://www.cnbc.com/2026/05/20/spacex-ipo-live-updates.html
2•samaysharma•14m ago•0 comments

Show HN: Personal business communication coach for Technical Leaders

https://clarityhoop.com/
1•Sanej•15m ago•1 comments

Framework-agnostic design systems: a practical approach to web components

https://piccalil.li/blog/framework-agnostic-design-systems-part-1/
1•paulathevalley•16m ago•0 comments

The LLM Death Spiral

2•robomartin•16m ago•0 comments

Oura, Maker of Popular Smart Rings, Files Confidentially for IPO

https://www.bloomberg.com/news/articles/2026-05-21/oura-maker-of-popular-smart-rings-files-confid...
2•brandonb•18m ago•1 comments

London Mayor Blocks Palantir

https://www.theguardian.com/uk-news/2026/may/21/london-mayor-sadiq-khan-blocks-met-police-deal-wi...
21•ZiiS•19m ago•6 comments

Up to 3x faster stored-vector queries in Elasticsearch

https://www.elastic.co/search-labs/blog/elasticsearch-vector-search-lookup
1•eigenBasis•19m ago•0 comments

Necrobotics: Dead Spiders Reincarnated as Robot Grippers (2022)

https://spectrum.ieee.org/robot-bugs
1•thunderbong•20m ago•0 comments

Waymo pauses Atlanta service as its robotaxis keep driving into floods

https://techcrunch.com/2026/05/21/waymo-pauses-atlanta-service-as-its-robotaxis-keep-driving-into...
5•mattas•23m ago•3 comments

Ask HN: Danger or Fun? encoding secret messages into HN comments

1•smalltorch•23m ago•4 comments