frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Verification, the Key to AI (2001)

http://incompleteideas.net/IncIdeas/KeytoAI.html
34•anjneymidha•14h ago

Comments

a3w•10h ago
Nice. LLMs can prove barely anything, providing some sources, or doing pure math that already circulates. AFAICT, so far, no novel ideas have been proven, i.e. the "these systems never invented anything"-paradox for three years now.

Symbolic AI seems to prove everything it states, but never novel ideas, either.

Let's see if we get neurosymbolic AI that can do something both could not do on their own — I doubt it, AI might just be a doom cult after all.

tasuki•8h ago
You can use an external proving mechanism and feed the results to the LLM.

A sufficiently rich type system (think Idris rather than C) or a sufficiently powerful test suite (eg property-based tests) should do the trick.

jrvarela56•9h ago
This applies to coding agents. If the agent can't run the code, it's unlikely that it can produce working code. Add to running: linting, running tests, compiling, code review and any other tool/process humans do to check if software is 'good' or working.

If the agent can apply these processes to the output, then we're on our way to getting good chunk of our work done for us. Even from the product pov, if the agent is allowed to experiment by making deployments and check user-facing metrics, it eventually could build software product - but we should still solve the coding part as it seems easier to objectively verify quickly.

jgalt212•7h ago
You're right, but actually running the code can be destructive (even when run as intended). You really need to be careful about dev environments. Even the destructive operations will cost you time (and money) in resetting the dev environment.
jrvarela56•4h ago
Agreed and I think this highlights the importance of interactivity/snappiness as well as idempotency. This is needed for a human to play around with also.

If the agent has fast+safe feeback loop to experiment then it can go through more cycles, faster, and improve its output.

jbellis•7h ago
Wow, 2001. Legitimately prescient.

And verification ("evaluation" we call it now) really is the key, although most people working on "AI apps" haven't figured it out yet.

Follow Hamel to catch up on the state of the art: https://x.com/HamelHusain

ALICE detects the conversion of lead into gold at the LHC

https://www.home.cern/news/news/physics/alice-detects-conversion-lead-gold-lhc
311•miiiiiike•4h ago•182 comments

In the Network of the Conclav: How we "guessed" the Pope using network science

https://www.unibocconi.it/en/news/network-conclave
52•taubek•1h ago•26 comments

Launch HN: Nao Labs (YC X25) – Cursor for Data

59•ClaireGz•2h ago•26 comments

Past, present, and future of Sorbet type syntax

https://blog.jez.io/history-of-sorbet-syntax/
67•PaulHoule•3h ago•26 comments

Sofie: open-source web based system for automating live TV news production

https://nrkno.github.io/sofie-core/
195•rjmunro•5h ago•27 comments

21 GB/s CSV Parsing Using SIMD on AMD 9950X

https://nietras.com/2025/05/09/sep-0-10-0/
181•zigzag312•5h ago•76 comments

New Tool: lsds – List All Linux Block Devices and Settings in One Place

https://tanelpoder.com/posts/lsds-list-linux-block-devices-and-their-config/
16•mfiguiere•1h ago•0 comments

Inventing the Adventure Game

http://www.warrenrobinett.com/inventing_adventure/
7•CaesarA•41m ago•0 comments

Itter.sh – Micro-Blogging via Terminal

https://www.itter.sh/
119•rrr_oh_man•5h ago•36 comments

Show HN: A backend agnostic Ruby framework for building reactive desktop apps

https://codeberg.org/skinnyjames/hokusai
41•zero-st4rs•3h ago•17 comments

Rollstack (YC W23) Is Hiring TypeScript Engineers (Remote US/CA)

https://www.ycombinator.com/companies/rollstack-2/jobs/QPqpb1n-software-engineer-typescript-us-canada
1•yjallouli•2h ago

Show HN: Oliphaunt – A native Mastodon client for macOS

https://testflight.apple.com/join/Epq1P3Cw
39•anosidium•2h ago•13 comments

Show HN: BlenderQ – A TUI for managing multiple Blender renders

https://github.com/KyleTryon/BlenderQ
32•TechSquidTV•3h ago•4 comments

LegoGPT: Generating Physically Stable and Buildable Lego

https://avalovelace1.github.io/LegoGPT/
511•nkko•14h ago•132 comments

Show HN: Hyvector – A fast and modern SVG editor

https://www.hyvector.com
203•jansan•8h ago•44 comments

Cell Mates: Extracting Useful Information from Tables for LLMs

https://www.gojiberries.io/cell-mates-extracting-useful-information-from-tables-for-llms/
13•goji_berries•2d ago•1 comments

Show HN: Aberdeen – An elegant approach to reactive UIs

https://aberdeenjs.org/
141•vanviegen•6h ago•77 comments

CryptPad: An Alternative to the Google Suite

https://cryptpad.org/
102•ColinWright•7h ago•30 comments

Show HN: Hydra (YC W22) – Serverless Analytics on Postgres

https://www.hydra.so/
28•coatue•3h ago•13 comments

The Anarchitecture Group

https://www.spatialagency.net/database/the.anarchitecture.group
19•jruohonen•2h ago•2 comments

Data manipulations alleged in study that paved way for Microsoft's quantum chip

https://www.science.org/content/article/data-manipulations-alleged-study-paved-way-microsoft-s-quantum-chip
157•EvgeniyZh•7h ago•112 comments

The birth of AI poker? Letters from the 1984 WSOP

https://www.poker.org/latest-news/the-birth-of-ai-poker-letters-from-the-1984-wsop-a4v2W4N4X3EP/
32•indigodaddy•4d ago•5 comments

NSF faces shake-up as officials abolish its 37 divisions

https://www.science.org/content/article/exclusive-nsf-faces-radical-shake-officials-abolish-its-37-divisions
322•magicalist•7h ago•418 comments

Implementing a Struct of Arrays

https://brevzin.github.io/c++/2025/05/02/soa/
96•mpweiher•8h ago•33 comments

Former Supreme Court justice David Souter has died

https://www.npr.org/2025/05/09/g-s1-65326/justice-david-souter-dies
69•danso•4h ago•28 comments

A Taxonomy for Rendering Engines

https://c0de517e.com/021_taxonomy.htm
35•ibobev•3d ago•12 comments

The CL1: the first code deployable biological computer

https://corticallabs.com/cl1.html
27•sprawl_•2d ago•11 comments

Show HN: Agents.erl (AI Agents in Erlang)

https://github.com/arthurcolle/agents.erl
22•arthurcolle•2d ago•10 comments

Hollow Core Fiber (HCF)

https://www.holightoptic.com/what-is-hollow-core-fiber-hcf%ef%bc%9f/
43•giuliomagnifico•5h ago•20 comments

The Linux Kernel's PGP Web of Trust

https://blog.kleine-koenig.org/ukl/the-linux-kernels-pgp-web-of-trust.html
66•JNRowe•8h ago•11 comments