Continuous agents and what happens after Ralph Wiggum?

4•waynenilsen•3w ago

Is anyone else doing the full software lifecycle for toy projects completely hands off the wheel? I have had Claude running in a Ralph like loop for over 15 hours unsupervised creating over 118 commits.

The technique works like this

while true: if tickets exist -> burn down the backlog by one ticket, exit if not -> figure out what feature would make sense to add next, create PRD and ERD, break down into tickets, exit

It did get stuck once due to tty issues related to running playwright in a non-tty environment but otherwise I have not had to manually step in.

I have it running in a droplet using systemd continuously.

Toy code the agent is creating is a multi-tenant todo kata. Here is the set of prompts:

https://github.com/waynenilsen/ralph-kata-2/tree/main/prompts

Anyone could make their own version of the same, these are just the set of prompts that work for me.

In 15 hours it created a full multi-tenant auth system from scratch and todos with assignees due dates, email reminders, tags and full text search. I created the first PRD by hand with something like "create a PRD for a multi-tenant todo system".

For anyone looking to do something similar, the e2e tests have played a critical role in closing the agent's loop with reality.

The age of programming with prompts is clearly arriving.

Comments

codingdave•3w ago

> it created a full multi-tenant auth system from scratch

OK. And did that scratch auth system pass any level of security testing? If it did, great, that is worth talking about. But what I've seen generated by AI isn't anywhere near secure.

waynenilsen•3w ago

i have seen the same, however, it can often easily find its own bugs when prompted to do so, in this case, with a ticket perhaps

the ticket burndown is a very nice feature because whenever you want to add a ticket it'll just pick it up and do its best

ironbound•3w ago

You'd believe LLM's could reverse engineering software.. but this is not the case today

Are AI agents ready for the workplace? A new benchmark raises doubts

Show HN: AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

Solid-State Freezer Needs No Refrigerants

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

From Zero to Hero: A Brief Introduction to Spring Boot

NSA detected phone call between foreign intelligence and person close to Trump

How to Fake a Robotics Result

It's time for the world to boycott the US

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

The AI CEO Experiment

Speed up responses with fast mode

MS-DOS game copy protection and cracks

Updates on GNU/Hurd progress [video]

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

MyFlames: Visualize MySQL query execution plans as interactive FlameGraphs

Show HN: LLM of Babel

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

Famfamfam Silk icons – also with CSS spritesheet

Apple is the only Big Tech company whose capex declined last quarter

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

The Greater Copenhagen Region could be your friend's next career move

Do Not Confirm – Fiction by OpenClaw

The Analytical Profile of Peas

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

What AI is good for, according to developers

OpenAI might pivot to the "most addictive digital friend" or face extinction

Show HN: Know how your SaaS is doing in 30 seconds

ClawdBot Ordered Me Lunch