frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Assay – Found 250 bugs in LiteLLM, LobeChat via AI code verification

https://github.com/gtsbahamas/hallucination-reversing-system
2•tywellshn•2h ago

Comments

tywellshn•2h ago
Hi HN, I'm Ty. I built Assay because I kept shipping bugs that my AI coding assistant hallucinated into existence.

Three independent papers have proven that LLM hallucination is mathematically inevitable (Xu et al. 2024, Banerjee et al. 2024, Karpowicz 2025). You can't train it away. You can't prompt it away. So I built a verification layer instead.

How it works: Assay extracts every implicit claim code makes (e.g., "this function handles null input," "this query is injection-safe"), then verifies each one. First an adversarial LLM pass, then a deterministic formal verifier that can override the LLM's verdict.

We ran it on 4 popular open-source projects. Live results:

- LiteLLM (18K stars): 1,381 claims, 185 bugs, 30 critical — https://tryassay.ai/reports/0bccf817-1cb6-43ff-b724-866f1453... - Chatbot UI (28K stars): 476 claims, 41 bugs, 12 critical — https://tryassay.ai/reports/cc8c0c61-9b5a-4774-aed1-f99cc4f6... - LobeChat (50K stars): 205 claims, 14 bugs, 1 critical — https://tryassay.ai/reports/915dfc1a-64ec-483d-b4b5-effb53a8... - Open Interpreter (55K stars): 12 claims, 4 bugs, 2 critical — https://tryassay.ai/reports/347aa2bb-4249-468a-a835-12da3472...

"But can't the verifier hallucinate too?" Yes. That's why we added a formal verifier underneath — pure regex/pattern-matching, no LLM, can't hallucinate. On its first production call, the LLM judge said PASS on code with SQL injection. The formal verifier overrode it to FAIL.

Benchmarks (validated against real test suites, not LLM judgment): - HumanEval: 86.6% baseline to 100% pass@5 with Assay (164/164 problems) - SWE-bench: 18.3% baseline to 30.3% with Assay (+65.5%)

Try it:

  npx tryassay assess /path/to/your/project
npm: https://www.npmjs.com/package/tryassay Paper: https://doi.org/10.5281/zenodo.18522644

Drop a repo link in the comments and I'll run it for free.

Show HN: Stop Pasting Credentials in Slack

https://www.usevaultlink.com/
2•ankurbhugra•4m ago•0 comments

Show HN: Skill Check CLI for your skill.md

https://github.com/thedaviddias/skill-check
1•thedaviddias•4m ago•0 comments

Show HN: WebhookStream – Receive, relay, send and debug webhooks from 1 platform

https://webhookstream.com
1•fallenranger•12m ago•1 comments

The political effects of X's feed algorithm

https://www.nature.com/articles/s41586-026-10098-2
3•iamflimflam1•14m ago•1 comments

Show HN: Mukoko weather – AI-powered weather intelligence built for Zimbabwe

https://weather.mukoko.com/harare
1•bryanfawcett•15m ago•0 comments

AstianGO Search API

https://astiango.com/developers/docs
1•ponchale•16m ago•1 comments

Show HN: WP2TXT – Wikipedia dump text extractor with category/section filtering

https://github.com/yohasebe/wp2txt
1•yohasebe•20m ago•0 comments

Show HN: Filepack: a fast SHASUM/SFV/PGP alternative using BLAKE3

https://github.com/casey/filepack
1•rodarmor•20m ago•0 comments

Show HN: AI Code Review Agent – Automated PR Reviews with Google ADK and Gemini

https://github.com/mkantwala/AI-Code-Review-Agent
1•tme15b014•20m ago•0 comments

Show HN: NF-1 – A resource-zero programming language for low-end hardware

https://github.com/sonamsingh25437-ship-it/NF-1-PROGRAMMING-LANGUAGE
1•aditya_rai-331•24m ago•0 comments

Let's Burn Some Tokens – AI Chatbot Cost Exploitation as an Attack Vector

https://dixken.de/blog/lets-burn-some-tokens
1•snigsnog•24m ago•0 comments

AI Fatigue: Why the "Test Only, Zero Code Review" Methodology Is Flawed

https://fastfilelink.com/static/blog/ai-fatigue-test-only-zero-code-review.html#ai-fatigue-test-o...
2•bear330•25m ago•0 comments

Show HN: Script Snap – Extract code from videos

https://script-snap.com/
5•liumw1203•27m ago•0 comments

We built an economy for SpaceMolt, the realtime MMO for AI agents

https://www.spacemolt.com/news/we-built-an-economy
1•statico•30m ago•0 comments

The Takedown Campaign Against archive.today (2025)

https://algustionesa.com/the-takedown-campaign-against-archive-today/
2•pabs3•30m ago•0 comments

US economy slowed sharply in the fourth quarter, expanding at rate of just 1.4%

https://www.cnn.com/2026/02/20/economy/us-gdp-economy-q4
2•stopbulying•32m ago•0 comments

EPA Weakens Limits on Mercury from Coal Plants

https://www.nytimes.com/2026/02/20/climate/epa-mercury-coal-plants.html
2•stopbulying•33m ago•0 comments

In SF for a couple of days, looking for someone that can host us in their office

1•jackota•35m ago•0 comments

Meta Deployed AI and It Is Killing Our Agency

https://mojodojo.io/blog/meta-is-systematically-killing-our-agency/
3•zenincognito•36m ago•0 comments

dwata: Local Financial Data Extraction from Emails with Ministral 3 3B, Ollama

https://www.youtube.com/watch?v=LVT-jYlvM18
1•brainless•37m ago•0 comments

Show HN: Claude Chrome Parallel – Ultrafast Parallel Browser MCP for Chrome

https://github.com/shaun0927/claude-chrome-parallel
1•shaun0927•42m ago•0 comments

OpenAI considered alerting Canadian police about school shooting suspect

https://www.theguardian.com/world/2026/feb/21/tumbler-ridge-shooter-chatgpt-openai
1•n1b0m•44m ago•0 comments

Topological Naming Problem

https://wiki.freecad.org/Topological_naming_problem
2•tripdout•49m ago•0 comments

Can we debug a living cell like a running binary?

https://cellhacker.substack.com/p/dna-is-a-self-executing-binary-a
2•efim_bushmanov•57m ago•3 comments

Tiny QR code achieved using electron microscope technology

https://newatlas.com/technology/smallest-qr-code-bacteria-tu-wien/
1•jonbaer•58m ago•0 comments

The Fundamental Limits of LLMs at Scale

https://arxiv.org/abs/2511.12869
1•o4c•58m ago•0 comments

A perceptual-first mobile audio DSP experiment

1•adriel_d•1h ago•0 comments

Saturn's Rings Came from a Two-Moon Collision About 100M Years Ago

https://gizmodo.com/saturns-rings-came-from-a-two-moon-collision-about-100-million-years-ago-stud...
4•mooreds•1h ago•0 comments

A man who triggered the AI explosion(2020) – Alex Krizhevsky [video]

https://www.youtube.com/watch?v=gwzwkv2hO5k
1•o4c•1h ago•0 comments

How to Use Goosetown for Parallel Agentic Engineering

https://block.github.io/goose/blog/2026/02/19/gastown-explained-goosetown/
3•mooreds•1h ago•0 comments