frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Verification Paradox: AI makes coding faster, but organizations slower

https://zenodo.org/records/18737908
2•frannyPS•2h ago

Comments

frannyPS•2h ago
Hi HN, I'm the author. I wrote this working paper because I was incredibly frustrated by a disconnect we are all seeing in the industry: developers report feeling 20% more productive with AI, while measured performance actually declines. Organizations see individual output rise while delivery velocity shows no improvement.

I realized our current theory lacks the vocabulary to diagnose why this is happening. The paper proposes a two-axis model (Specification vs. Verification) yielding four categories of software behavior: S_v, S_u, E_v, and E_u.

The most dangerous trap right now is what I call E_v (unspecified-but-verified). AI generates implementation and tests at machine speed. This creates a "verification paradox": organizations accumulate E_v behaviors (tested without specification), creating the appearance of quality (green CI, high coverage). However, the tested behaviors have no basis for evaluating correctness because no deliberate human decision (specification) was made. AI accelerates Axis 1 (does it do what the spec says?) while leaving Axis 2 (does the spec capture what is valuable?) untouched.

I originally submitted this to IEEE Software. It was desk-rejected for being a "conceptual model" rather than a localized case study with metrics. But I wrote this for practitioners, to give us a structural vocabulary for the AI-era mess we are currently in.

I'd love to hear if this E_v trap and the verification paradox resonate with what you're seeing in your own AI-assisted workflows.

frannyPS•2h ago
I'd love to hear if this E_v trap and the verification paradox resonate with what you're seeing in your own AI-assisted workflows.
frannyPS•2h ago
As a follow-up: If the four-quadrant model (S_v, S_u, E_v, E_u) feels a bit too abstract, I wrote a companion piece applying this exact framework to analyze a recent, massive Amazon outage.

It shows practically how the accumulation of unverified behaviors (or verification without deliberate specification) plays out at a hyper-scale, and how the "verification paradox" turns localized failures into cascading system collapse.

You can read the case study here: https://doi.org/10.5281/zenodo.18980467

I think looking at real-world post-mortems through this lens makes the danger of the AI-driven "E_v trap" much more tangible.

Gitzy is now on TestFlight A modern, native iOS Git client

https://testflight.apple.com/join/SB16NCfr
1•marc0janssen•36s ago•1 comments

Another DOGE staffer explaining how he flagged grants at NEH for "DEI"

https://bsky.app/profile/404media.co/post/3mgupw4v3ak2j
2•doener•1m ago•0 comments

Elfina–A multi-architecture ELF loader supporting x86 and x86-64 binaries

https://github.com/iss4cf0ng/Elfina
1•iss4cf0ng•3m ago•0 comments

The future of AI is on-prem

https://www.palantir.com/sovereignaios/
2•taubek•3m ago•0 comments

Show HN: Run Hugging Face models with a single command

https://www.llmpm.co/
2•dataversity•4m ago•0 comments

Claude now creates interactive charts, diagrams and visualizations

https://claude.com/blog/claude-builds-visuals
2•adocomplete•4m ago•0 comments

Analysis of 203M Trades on Kalshi

https://read.technically.dev/p/whats-a-prediction-market
3•sschnei8•6m ago•1 comments

Jeriko – an AI agent that runs directly inside your OS

https://www.jeriko.ai/
1•Khaleel7337•6m ago•2 comments

Software Proprioception – Unsung

https://unsung.aresluna.org/software-proprioception/
1•tambourine_man•6m ago•0 comments

Ask HN: Gemini Pro Plan Quota Reductions

1•earlyriser•7m ago•0 comments

Goldman banker: Clients 'glad' for 'distraction' of Iran war

https://www.telegraph.co.uk/business/2026/03/11/goldman-banker-clients-glad-for-distraction-of-ir...
2•abdelhousni•7m ago•1 comments

Punctum books is an independent open-access publisher

https://punctumbooks.com/
1•robtherobber•8m ago•0 comments

Shopify.com Is Down

https://www.shopify.com/
3•hankmander•9m ago•0 comments

Pirates of Silicon Valley

https://archive.org/details/piratesofsiliconvalley_201908
2•baal80spam•9m ago•0 comments

The Sound of AI Music

https://hackerfactor.com/blog/index.php?/archives/1090-The-Sound-of-AI-Music.html
1•speckx•11m ago•0 comments

Silicon Valley's New Obsession: Watching Bots Do Their Grunt Work

https://www.wsj.com/tech/ai/ai-bots-claude-openclaw-285ac816
2•stefap2•11m ago•0 comments

25 Years of ADSL Speed

https://brainbaking.com/post/2026/03/25-years-of-adsl-speed/
2•Brajeshwar•11m ago•0 comments

Duolingo Is Talking to ByteDance: Cracking the Pangle SDK's Encryption

https://www.buchodi.com/your-duolingo-is-talking-to-bytedance-cracking-the-pangle-sdks-encryption/
1•ibobev•12m ago•0 comments

What CI looks like at a 100-person team (PostHog)

https://www.mendral.com/blog/ci-at-scale
2•shad42•13m ago•0 comments

In Criminal Cases, Moss Is Often Underfoot and Overlooked

https://www.nytimes.com/2026/03/12/science/moss-forensics-crime.html
1•ynac•13m ago•1 comments

Show HN: CloudCLI-Web/Mobile UI for Claude Code,Codex and Gemini(8.2k stars)

https://github.com/siteboon/claudecodeui
1•simosmik•13m ago•0 comments

Log Reducer – Cut 50-90% of tokens when your AI debugs logs (MCP tool and CLI)

https://github.com/launch-it-labs/log-reducer
1•imaniman•13m ago•0 comments

Dolphin PR: Add policy on LLM contributions

https://github.com/dolphin-emu/dolphin/pull/14445
2•flykespice•14m ago•0 comments

Show HN: We built an open source tool to see how AI cites our business

https://github.com/AINYC/canonry
1•arberx•14m ago•0 comments

Show HN: Reel Rogue Update – The Invisible Feeling

https://alt-qq.com/
1•qq-niklas•15m ago•0 comments

Show HN: I made clawfeeds, feeds for agents

https://clawfeeds.com
1•petervandijck•16m ago•1 comments

New model aims to keep remote robotaxi operators alert and ready

https://techxplore.com/news/2026-03-aims-remote-robotaxi-ready.html
1•Brajeshwar•16m ago•0 comments

Dreaming of a Ten-Year Computer

https://alexwlchan.net/2026/ten-year-computer/
1•wrxd•17m ago•0 comments

Show HN: I calculated sun/shade exposure for every seat at World Cup stadiums

https://seatsun.com/
1•dkaragas•17m ago•0 comments

Teens Are Falling Out of Love with Tech

https://www.nytimes.com/2026/03/11/opinion/teens-tech-skeptics.html
5•cdrnsf•17m ago•1 comments