What stood out to me was that this architecture largely rhymes with the coding workflows I and others already do with coding agents. It's basically automating the connective tissue between the workflows I was already doing in Claude Code, and then brute-forcing a result. In the dark factory model, a spec goes in, code gets generated, built in Docker, validated against scenarios the agent never saw, scored, and failures feed back until it converges.
I've tried it with mostly standard CRUD/REST API apps and it works. I haven't tried anything with HTML/JS yet. You can try the sample specs in the repo.
Some raw notes from the experience:
1. I don't want to maintain the code these factories generate. It works. The phenotype is (largely) correct, but the genotype is pretty wild and messy. I did not use OctopusGarden to build OctopusGarden (you can tell because it uses strict linting and tests). I know the point of these systems is zero human in the loop, but I think there's a real opportunity to get factories to generate code that humans actually want to maintain. I'm going to work on getting OctopusGarden there.
2. Compliance might be a nightmare. In my day job I think a lot about ISO 27001 and SOC 2 compliance. The idea of deploying dark-factory-generated projects into my environments and checking compliance boxes sounds painful. That might just be the current state of OctopusGarden and the code it generates, but I think we can get to a point where generated code is completely linted, statically checked, and tested inside the factory. That's not OctopusGarden today, but maybe it will be there next week? I can see this moving fast.
3. These dark factory apps will be hard to debug. There was a Claude outage today and I couldn't run my smoke tests or generate new apps. I don't want to maintain services that can't be debugged and fixed by a human in a pinch. We're already partially there with AI-assisted code, but this factory-generated code is even more convoluted. Requiring AI to create a new app version is probably worth it...but it's still yet another thing between you and quickly patching an urgent bug.
4. Security needs a better story. These things need real security hardening. Maybe that's just better spec files and scenarios, maybe it's something more. I'm going to drink a strong cola and think about this one.
5. The unit of responsibility keeps growing. Last year we said code must come in PR-sized bites — that's how we manage risk. Now we're talking about deploying meshes of services created and deployed with no humans in the loop (except at creation). AI-generated services could really push the scale of what people are willing to accept responsibility for. Most SRE teams manage 1-5 services at big companies. Will that number increase per team? How much GDP is one person willing to manage via agents? Just a shower thought.
6. I was surprised this works. I'm surprised at how easy it was to make. I'm surprised more of these aren't out there already. I only did a couple of GitHub searches and didn't find many. I'm bad at searching. Sorry if I didn't find your project.
deltaops•6h ago
foundatron•6h ago