Verification, the Key to AI (2001)

http://incompleteideas.net/IncIdeas/KeytoAI.html

34•anjneymidha•14h ago

Comments

a3w•10h ago

Nice. LLMs can prove barely anything, providing some sources, or doing pure math that already circulates. AFAICT, so far, no novel ideas have been proven, i.e. the "these systems never invented anything"-paradox for three years now.

Symbolic AI seems to prove everything it states, but never novel ideas, either.

Let's see if we get neurosymbolic AI that can do something both could not do on their own — I doubt it, AI might just be a doom cult after all.

tasuki•8h ago

You can use an external proving mechanism and feed the results to the LLM.

A sufficiently rich type system (think Idris rather than C) or a sufficiently powerful test suite (eg property-based tests) should do the trick.

jrvarela56•9h ago

This applies to coding agents. If the agent can't run the code, it's unlikely that it can produce working code. Add to running: linting, running tests, compiling, code review and any other tool/process humans do to check if software is 'good' or working.

If the agent can apply these processes to the output, then we're on our way to getting good chunk of our work done for us. Even from the product pov, if the agent is allowed to experiment by making deployments and check user-facing metrics, it eventually could build software product - but we should still solve the coding part as it seems easier to objectively verify quickly.

jgalt212•7h ago

You're right, but actually running the code can be destructive (even when run as intended). You really need to be careful about dev environments. Even the destructive operations will cost you time (and money) in resetting the dev environment.

jrvarela56•4h ago

Agreed and I think this highlights the importance of interactivity/snappiness as well as idempotency. This is needed for a human to play around with also.

If the agent has fast+safe feeback loop to experiment then it can go through more cycles, faster, and improve its output.

jbellis•7h ago

Wow, 2001. Legitimately prescient.

And verification ("evaluation" we call it now) really is the key, although most people working on "AI apps" haven't figured it out yet.

Follow Hamel to catch up on the state of the art: https://x.com/HamelHusain

ALICE detects the conversion of lead into gold at the LHC

In the Network of the Conclav: How we "guessed" the Pope using network science

Launch HN: Nao Labs (YC X25) – Cursor for Data

Past, present, and future of Sorbet type syntax

Sofie: open-source web based system for automating live TV news production

21 GB/s CSV Parsing Using SIMD on AMD 9950X

New Tool: lsds – List All Linux Block Devices and Settings in One Place

Inventing the Adventure Game

Itter.sh – Micro-Blogging via Terminal

Show HN: A backend agnostic Ruby framework for building reactive desktop apps

Rollstack (YC W23) Is Hiring TypeScript Engineers (Remote US/CA)

Show HN: Oliphaunt – A native Mastodon client for macOS

Show HN: BlenderQ – A TUI for managing multiple Blender renders

LegoGPT: Generating Physically Stable and Buildable Lego

Show HN: Hyvector – A fast and modern SVG editor

Cell Mates: Extracting Useful Information from Tables for LLMs

Show HN: Aberdeen – An elegant approach to reactive UIs

CryptPad: An Alternative to the Google Suite

Show HN: Hydra (YC W22) – Serverless Analytics on Postgres

The Anarchitecture Group

Data manipulations alleged in study that paved way for Microsoft's quantum chip

The birth of AI poker? Letters from the 1984 WSOP

NSF faces shake-up as officials abolish its 37 divisions

Implementing a Struct of Arrays

Former Supreme Court justice David Souter has died

A Taxonomy for Rendering Engines

The CL1: the first code deployable biological computer

Show HN: Agents.erl (AI Agents in Erlang)

Hollow Core Fiber (HCF)

The Linux Kernel's PGP Web of Trust