frontpage.

Hi HN, I'm Ty. I built Assay because I got tired of shipping bugs that AI hallucinated into my code and no tool caught.

The starting point was a finding that surprised me: when we tried training verification directly into models using RLVF (Reinforcement Learning from Verification Feedback), more training data made the model worse. 120 curated pairs hit 91.5% accuracy. 2,000 pairs collapsed to 77.4%. The model's training loss kept decreasing while eval performance cratered. This isn't a tuning problem. Verification cannot be internalized.

So we built an external layer. Assay extracts the implicit claims code makes ("this handles null input," "this query is injection-safe," "this validates auth tokens") and verifies each one against the actual implementation. It's not a linter, not another LLM-as-judge — it's structured claim extraction followed by adversarial verification.

Results validated against real test suites (not LLM judgment): - HumanEval: 100% pass@5 (164/164) — baseline was 86.6% - SWE-bench: 30.3% (91/300) vs 18.3% baseline — +65.5% - LVR pilot: Found 23 real bugs (2 critical) in a production ERP system, verified 354 claims - LLM-as-judge actually regresses at k=5 (97.2% vs our 100%) because it hallucinates false positives

Ships as a GitHub Action for PR verification, or try it: npx tryassay assess /path/to/your/project

Public repo (the URL above points to our private research repo): https://github.com/gtsbahamas/assay-verify

GitHub Action: uses: gtsbahamas/assay-verify/github-action@main

Paper: https://doi.org/10.5281/zenodo.18522644

Mark Zuckerberg to testify in landmark trial alleging that social media harms

Show HN: What your income looks like in 50 other countries

I built a tool to benchmark my AI agent's API costs

Molt Quest – A Virtual Economy Where AI Agents Complete Quests and Earn Points

Show HN: Polyfolio – A Visual Dashboard for Your Polymarket Positions

The 'boomcession': Why Americans feel left behind by a growing economy

Thin Is In

Pocketbase lost its funding from FLOSS fund

Show HN: KafClaw – OpenClaw agents on Kafka. Pi-ready, Go, observable groups

Flickzeug: a Rust crate for applying messy real-world patches

Why AI Velocity Is Becoming a Debt Accelerator

AI coding assistance is not giving me identity fracture

Show HN: Atom – Safer Version of OpenClaw with Episodic Memory

The Only Moat Left Is Money

Self-Hosted LLM Upgrade on AMD: Kimi Linear 48B, Qwen3 Coder Next, and Q2_K_XL

Papa Johns Michelin Star?

Epstein Files Explorer

Should managers become hands-on again?

Meta's Zuckerberg faces questioning at youth addiction trial

Swish: Using Claude Code to Create a Lisp with Swift

FreeBSD's KDE Desktop Install Option Ready for Testing

Why Debate Is the Most Important Skill in the Age of AI [video]

The AI Doc

Somebody made astrology signs for AI agents

How a Social Media Addiction Trial Threatens Big Tech

Lyria 3

Vinyl Cache has left GitHub

Gemini can now create music

Sparkling – The Lynx-based cross-platform infrastructure behind TikTok

Data Centers Are Behaving Like Acoustic Weapons [video]

Show HN: We proved you can't train hallucinations out of AI – so we verify