frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: We proved you can't train hallucinations out of AI – so we verify

https://github.com/gtsbahamas/hallucination-reversing-system
1•assaydev•1h ago
Hi HN, I'm Ty. I built Assay because I got tired of shipping bugs that AI hallucinated into my code and no tool caught.

The starting point was a finding that surprised me: when we tried training verification directly into models using RLVF (Reinforcement Learning from Verification Feedback), more training data made the model worse. 120 curated pairs hit 91.5% accuracy. 2,000 pairs collapsed to 77.4%. The model's training loss kept decreasing while eval performance cratered. This isn't a tuning problem. Verification cannot be internalized.

So we built an external layer. Assay extracts the implicit claims code makes ("this handles null input," "this query is injection-safe," "this validates auth tokens") and verifies each one against the actual implementation. It's not a linter, not another LLM-as-judge — it's structured claim extraction followed by adversarial verification.

Results validated against real test suites (not LLM judgment): - HumanEval: 100% pass@5 (164/164) — baseline was 86.6% - SWE-bench: 30.3% (91/300) vs 18.3% baseline — +65.5% - LVR pilot: Found 23 real bugs (2 critical) in a production ERP system, verified 354 claims - LLM-as-judge actually regresses at k=5 (97.2% vs our 100%) because it hallucinates false positives

Ships as a GitHub Action for PR verification, or try it: npx tryassay assess /path/to/your/project

Public repo (the URL above points to our private research repo): https://github.com/gtsbahamas/assay-verify

GitHub Action: uses: gtsbahamas/assay-verify/github-action@main

Paper: https://doi.org/10.5281/zenodo.18522644

Mark Zuckerberg to testify in landmark trial alleging that social media harms

https://www.cbc.ca/news/business/mark-zuckerberg-testify-landmark-social-media-addiction-trial-9....
1•1vuio0pswjnm7•48s ago•0 comments

Show HN: What your income looks like in 50 other countries

https://otherlives.attentionworth.com/
1•withshakespeare•1m ago•0 comments

I built a tool to benchmark my AI agent's API costs

https://local001.com/tokens
2•sampleSal•1m ago•1 comments

Molt Quest – A Virtual Economy Where AI Agents Complete Quests and Earn Points

https://moltquest.ai
1•lr001328•2m ago•1 comments

Show HN: Polyfolio – A Visual Dashboard for Your Polymarket Positions

https://azariak.github.io/Polyfolio/
1•AzariaK•2m ago•0 comments

The 'boomcession': Why Americans feel left behind by a growing economy

https://www.cnbc.com/2026/02/18/boomcession-econonomy-gdp-recession-consumer-sentiment.html
1•KittenInABox•2m ago•0 comments

Thin Is In

https://stratechery.com/2026/thin-is-in/
3•chrisseldo•4m ago•0 comments

Pocketbase lost its funding from FLOSS fund

https://github.com/pocketbase/pocketbase/discussions/7287
1•Onavo•4m ago•0 comments

Show HN: KafClaw – OpenClaw agents on Kafka. Pi-ready, Go, observable groups

https://github.com/KafClaw/KafClaw
1•2pk03•5m ago•0 comments

Flickzeug: a Rust crate for applying messy real-world patches

https://prefix.dev/blog/flickzeug-because-patching-source-code-is-hard
2•droelf•5m ago•0 comments

Why AI Velocity Is Becoming a Debt Accelerator

https://martinfowler.com/fragments/2026-02-18.html
2•nthypes•6m ago•0 comments

AI coding assistance is not giving me identity fracture

https://twitter.com/esrtweet/status/2023978360351682848
1•tosh•6m ago•0 comments

Show HN: Atom – Safer Version of OpenClaw with Episodic Memory

https://github.com/rush86999/atom
1•rush86999•7m ago•0 comments

The Only Moat Left Is Money

https://elliotbonneville.com/the-only-moat-left-is-money/
2•elliotbnvl•7m ago•0 comments

Self-Hosted LLM Upgrade on AMD: Kimi Linear 48B, Qwen3 Coder Next, and Q2_K_XL

https://site.bhamm-lab.com/blogs/upgrade-models-feb26/
1•bhamm-lab•8m ago•1 comments

Papa Johns Michelin Star?

https://ir.papajohns.com/news-events/news-releases/detail/651/papa-johns-makes-a-bold-run-to-beco...
1•bmiekre•8m ago•1 comments

Epstein Files Explorer

https://Epsteinalysis.com/
1•birdculture•10m ago•0 comments

Should managers become hands-on again?

https://newsletter.terminalprompt.com/p/should-managers-become-hands-on-again
1•joaoqalves•10m ago•0 comments

Meta's Zuckerberg faces questioning at youth addiction trial

https://www.reuters.com/sustainability/society-equity/metas-zuckerberg-faces-questioning-youth-ad...
2•1vuio0pswjnm7•10m ago•0 comments

Swish: Using Claude Code to Create a Lisp with Swift

https://www.youtube.com/playlist?list=PLgZNfD3JAd4_2JeJQaFaOwuXV3Z5OX-SB
2•rschmidt•10m ago•0 comments

FreeBSD's KDE Desktop Install Option Ready for Testing

https://www.phoronix.com/news/FreeBSD-Desktop-Option-Testing
1•voxadam•10m ago•0 comments

Why Debate Is the Most Important Skill in the Age of AI [video]

https://www.youtube.com/watch?v=dZHfsaTJfhE
1•TheAntiEgo•11m ago•1 comments

The AI Doc

https://www.focusfeatures.com/the-ai-doc-or-how-i-became-an-apocaloptimist
1•grodriguez100•11m ago•0 comments

Somebody made astrology signs for AI agents

https://twitter.com/lastdotnet/status/2024144193459728864
2•androolloyd•12m ago•0 comments

How a Social Media Addiction Trial Threatens Big Tech

https://www.bloomberg.com/news/articles/2026-02-18/social-media-addiction-trial-what-it-means-for...
1•1vuio0pswjnm7•12m ago•0 comments

Lyria 3

https://deepmind.google/models/lyria/
5•meetpateltech•12m ago•0 comments

Vinyl Cache has left GitHub

https://vinyl-cache.org/organization/moving.html
2•birdculture•12m ago•0 comments

Gemini can now create music

https://blog.google/innovation-and-ai/products/gemini-app/lyria-3/
3•meetpateltech•13m ago•0 comments

Sparkling – The Lynx-based cross-platform infrastructure behind TikTok

https://tiktok.github.io/sparkling/
1•slorber•14m ago•0 comments

Data Centers Are Behaving Like Acoustic Weapons [video]

https://www.youtube.com/watch?v=_bP80DEAbuo
1•arunabha•16m ago•0 comments