Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

https://github.com/FeSens/auto-arch-tournament/blob/main/docs/auto-arch-tournament-blog-post.md

37•fesens•11h ago

Comments

sho_hn•1h ago

Salient on the value of the verifier. Matches my experience in the last two quarters.

Nice detail on the encountered failures. Very similar experiences with my own loops against testsuites.

Great post. A snapshot in time.

pteetor•1h ago

In case you are unfamiliar with Karpathy's Loop[1], it is a genetic algorithm[2] where the genetic "mutations" are clever-but-random ideas generated by an LLM agent, aimed at improving a system.

  (1) Let the LLM randomly perturbate the system.
  (2) Measure the system's performance.
  (3a) If the perturbation improved performance, keep the change.
  (3b) Otherwise, don't.
  (4) Repeat

[1] https://github.com/karpathy/autoresearch

[2] https://en.wikipedia.org/wiki/Genetic_algorithm

fc417fc802•1h ago

Extremely interesting but I don't understand why it was written by an LLM. Either the frontier models are far better than I realized or else writing this document required a lot of manual work regardless at which point why not keep it in your own voice?

> The agent did not know that would also halve the LUT count. It found out by doing it and watching the synthesizer.

So I guess this is an example of an LLM anthropomorphizing and making wild conjectures about the internal workings of a different LLM.

thin_carapace•49m ago

> "If you can write the rules down, an agent will satisfy them faster than your team will."

a fantastic opportunity to become the next next big thing and write a verifier verifier.

at the hypothesized inflexion point where AI instantly performs exactly as commanded, what happens to heavily regulated industries like medical? do we get huge leaps and bounds everywhere EXCEPT where it matters, or is regulation going to be handed over to a verifier verifier?

outside1234•40m ago

Has anyone actually written a verifier for a business / project?

sho_hn•36m ago

I'd say "a verifier" here is a loose term. A great testsuite is a verifier. I've done reverse-engineering projects that involved generating trace logs from the object under test, having a reimplementation emit the same logs, and running strict comparisons.

OP's post is basically pointing out what certainly many others have independently discovered: Your agent-based dev operation is as good as the test rituals and guard rails you give the agents.

dataviz1000•7m ago

Can you explain your question a little more? The recursive agents will find the minimum to satisfy the deterministic termination condition including cheating. In other words, it will be literally correct yet wrong. I would go so far to say malicious compliance.

I have recursive agent that finds trading strategies after recreating academic research and probing the model using its training on everything. It works really well but I have to force it to write out every line and write a proof that data in the future from the time of the wall clock didn't enter the system. Even then some stupid thing like not converting the timezone with daylight savings will allow it to peek into the future 1 hour. These types of bugs are almost impossible to find. Now there needs to be another agent whose only purpose to write out every line explaining that the timezone for that line of code was correct.

Ghostty is leaving GitHub

Before GitHub

How ChatGPT serves ads

Bugs Rust won't catch

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

Regression: malware reminder on every read still causes subagent refusals

OpenAI models coming to Amazon Bedrock: Interview with OpenAI and AWS CEOs

We decreased our LLM costs with Opus

We still don't have a more precise value for "Big G"

I won a championship that doesn't exist

Apple CMF (Color-Matching Functions) 2026

Behavioral timescale synaptic plasticity rewires the brain after an experience

Nonlinearity Affects a Pendulum

GitHub RCE Vulnerability: CVE-2026-3854 Breakdown

Intel Arc Pro B70 Review

Who owns the code Claude Code wrote?

Your phone is about to stop being yours

Warp is now open-source

Claude for Creative Work

Show HN: Drive any macOS app in the background without stealing the cursor

Localsend: An open-source cross-platform alternative to AirDrop

Talkie: a 13B vintage language model from 1930

CJIT: C, Just in Time

I have officially retired from Emacs

UAE to leave OPEC

VibeVoice: Open-source frontier voice AI

An update on GitHub availability

Infisical (YC W23) Is Hiring Full Stack Software Engineers (Remote)

Patch applies fake diffs from commit messages

A playable DOOM MCP app

Ghostty is leaving GitHub

Before GitHub

How ChatGPT serves ads

Bugs Rust won't catch

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

Regression: malware reminder on every read still causes subagent refusals

OpenAI models coming to Amazon Bedrock: Interview with OpenAI and AWS CEOs

We decreased our LLM costs with Opus

We still don't have a more precise value for "Big G"

I won a championship that doesn't exist

Apple CMF (Color-Matching Functions) 2026

Behavioral timescale synaptic plasticity rewires the brain after an experience

Nonlinearity Affects a Pendulum

GitHub RCE Vulnerability: CVE-2026-3854 Breakdown

Intel Arc Pro B70 Review

Who owns the code Claude Code wrote?

Your phone is about to stop being yours

Warp is now open-source

Claude for Creative Work

Show HN: Drive any macOS app in the background without stealing the cursor

Localsend: An open-source cross-platform alternative to AirDrop

Talkie: a 13B vintage language model from 1930

CJIT: C, Just in Time

I have officially retired from Emacs

UAE to leave OPEC

VibeVoice: Open-source frontier voice AI

An update on GitHub availability

Infisical (YC W23) Is Hiring Full Stack Software Engineers (Remote)

Patch applies fake diffs from commit messages

A playable DOOM MCP app

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

Comments