frontpage.

Show HN: AlphaEvolve inspired evolution harness for Pokemon

https://github.com/papercomputeco/pokemon

2•brianllamar•1h ago

Last week I sat in the LLM Paper Club (hosted by the latent.space podcast). We shared a paper from DeepMind on "Discovering Multiagent Learning Algorithms with Large Language Models". Linked in the github repo.

This was my first time sitting in one of these and reading an LLM paper. I thought it was interesting that the AlphaEvolve agent in the paper essentially writes code, runs it, scores the output, and then writes better code, over and over. It's not discovering strategies through play. It's discovering algorithms through code mutation. The LLM proposes changes to how regret is accumulated or how policies are derived, a fitness function scores the result, and the best variants survive to the next generation.

The two algorithms it found (VAD-CFR and SHOR-PSRO) use mechanisms the authors describe as "non-intuitive," things like volatility-sensitive discounting and hard warm-start schedules that a human designer probably wouldn't have tried. That's the interesting part: the LLM isn't constrained by the same design intuitions we are.

To make it concrete for myself, I built a small version of this loop for a Pokemon game agent. The setup is simple: define a fitness function (turns survived, maps visited, stuck events), parameterize the strategy space (door cooldown, stuck threshold, skip distance), and let an LLM propose variants that get evaluated in parallel. Ten agents race through the game, the best parameters survive. I used tapes.dev to collect session telemetry and feed observational memory back into the fitness scoring.

The first run already surfaced something useful: shorter door cooldowns (4 vs 8) reduce stuck events from 16 to 9. Not a breakthrough, but the point is the system found it without me guessing. That's the same dynamic as the paper, just at toy scale.

What I took away from the paper club: the bottleneck in algorithm design isn't computation, it's the search process itself. If you can express your problem as "parameterized code + fitness function," an LLM evolution loop can explore the space faster than manual iteration. The paper proves it works for game theory. The link is in my startup's repo, I want to explore applying this technique for improving future general purpose and coding agent sessions.

Just pointing out Pokemon aren't the only thing evolving in that repo.

Invent your own comprehensions in Python – Python Morsels

SQL Order-Equivalence

Show HN: Clauductor – Web UI for Claude Code with real-time work graph

Advocates urge judge to block $68M Colony Ridge settlement

I Infected My iPhone with Russian Spyware. Here's What I Found [video]

The Unmaking of the American University

Show HN: Chat AI Agent built into live Appium/mobile device sessions

Apple's Privacy Is a Lie

I vibe coded my dream macOS presentation app

Claude Tried to Hack 30 Companies. Nobody Asked It To

Air strikes cause black rain and 'unprecedented' pollution in Tehran

TermF1: A terminal-style dashboard for Formula 1

Steve Rosenberg: Russia seeks diplomatic and economic gains from Iran war

Russia's deportation of Ukrainian children amounts to crime against humanity

The U.S. borrowed $50B a week for the past five months, the CBO says

Krazam – Paradise Episode 1 – Public Memories [video]

Show HN: Clawbake: Multi-User Instance Management for OpenClaw

Go-pty: Procfile process manager with PTY support

Size-shifting nanoparticles deliver mRNA medicine to the pancreas

Reliability Theatre: When reliability metrics stop measuring reliability

Roast My Website

M.C. Escher Flavoured Pages

I Got Fired Because of AI – But I Still Think I'm the Engineer of the Future

Show HN: Prompt Enricher – paste a rough prompt, get a structured one back

Lessons from 30 Years Building Software Systems

OverflowML – Run AI models larger than your GPU, one line of code

Evaluating Evolving Agents with Evolving Benchmarks

Fil-C is safer than Rust

Haarp: A Never-Ending Conspiracy Theory in Remote Alaska

Show HN: An on-device Mac app for real-time posture reminders