It's not that there's nothing useful, maybe even important, in there, it's just so far it's all just the easy parts: playing around inside a computer.
I've noticed a certain trend over the years where you get certain types of projects that get lots of hype and excitement and much progress seems to be made, but when you dig deep enough you find out that it's all just the fun, easy sort of progress.
The fun progress, which not at all coincidentallly tends to also be the easy progress, is the type that happens solely inside a computer.
What do I mean by that? I mean programs who only operate at the level of artificial computer abstractions.
The hard part is always dealing with "the real world": hardware that returns "impossible" results to your nicely abstract api functions, things that stop working in places they really shouldn't be able to, or even, and this is the really tricky bit, dealing with humans.
Databases are a good example of this kind of thing. It's easy to start off a database writing all the clever (and fun) bits like btrees and hash maps and chained hashes that spill to disk to optimize certain types of tables and so on, but I'd wager that at least half of the code in a "real" database like sqlite or postgresql is devoted to dealing with strange hardware errors or leaky api abstractions across multiple platforms or the various ways a human can send nonsensical input into the system and really screw things up.
I'd also bet that this type of code is a lot less fun to write and took much longer than the rest (which incidentally is why I always get annoyes when programming language demos show code with only a happy path, but that's another rant and this comment is already excessive).
Anyways, this AI thing is definitely a gold rush and it's important to keep in mind that there was in fact a lot of gold that got dug up but, as everyone constantly repeats, the more consistent way to benefit is sell the shovels and this is very definitely an ad for a shovel.
Agents and wrappers that put you deeper into LLM spending frenzy is like the new "todo app".
I think we are at the beginning of the second such journey. Lots of people will get hurt while we learn how to scale it up. It's why I've gone with dangerous sounding theming and lots of caution with Gas Town.
I only think it takes 2 years this time though.
Gas Town is clearly the same thing multiplied by ten thousand. The number of overlapping and adhoc concepts in this design is overwhelming. Steve is ahead of his time but we aren't going to end up using this stuff. Instead a few of the core insights will get incorporated into other agents in a simpler but no less effective way.
And anyway the big problem is accountability. The reason everyone makes a face when Steve preaches agent orchestration is that he must be in an unusual social situation. Gas Town sounds fun if you are accountable to nobody: not for code quality, design coherence or inferencing costs. The rest of us are accountable for at least the first two and even in corporate scenarios where there is a blank check for tokens, that can't last. So the bottleneck is going to be how fast humans can review code and agree to take responsibility for it. Meaning, if it's crap code with embarrassing bugs then that goes on your EOY perf review. Lots of parallel agents can't solve that fundamental bottleneck.
Yeah this describes my feeling on beads too. I actually really like the idea - a lightweight task/issue tracker integrated with a coding agent does seem more useful than a pile of markdown todos/plans/etc. But it just doesnt work that well. Its really buggy and the bugs seem to confuse the agent since it was given instructions to do things a certain way that dont work consistently.
And also auditable, trackable, reportable, etc..
I was sort of kidding with "JIRA for Agents", obviously using the API and existing tool you can make agents use it.
We use Github at my current job and similarly have Claude Code update issues and PRs when it does work.
There's a lot of strange things going on in that project.
try to add some common sense, and you'll get shouted out.
which is fine, I'll just make my own version without the slop.
Or did you find one that's good?
But yeah, I'm only running one code agent at a time, so that's not a problem I have. I should probably start with just a todo list as plain text.
It unlocks a (still) hidden multiagent orchestration function in Claude code. The person making it unminified the code and figured out how to unlock it.
I find it quite well done - I started a orchestrator project a few days ago and scrapped it because it'll be fully integrated soon it seems.
> Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day. I just created it in October. If that makes you uncomfortable, get out now.
Despite it's quirks I think beads is going to go down as one of the first pieces of software that got some adoption where the end user is an agent
What do you like about Linear? Is it suitable for hobby projects?
Linear is great, it's what JIRA should've been. Basically task management for people who don't want to deal with task management. It's also full featured, fast (they were famously one of the earlier apps to use a local-first sync-engine style architecture), and keyboard-centric.
Definitely suitable for hobby projects, but can also scale to large teams and massive codebases.
Show HN: I replaced Beads with a faster, simpler Markdown-based task tracker - https://news.ycombinator.com/item?id=46487580 - Jan 2026 (2 comments) (<-- I've put this one in the SCP - see https://news.ycombinator.com/item?id=26998308 for explanation)
Solving Agent Context Loss: A Beads and Claude Code Workflow for Large Features - https://news.ycombinator.com/item?id=46471286 - Jan 2026 (1 comment)
Beads – A memory upgrade for your coding agent - https://news.ycombinator.com/item?id=46075616 - Nov 2025 (68 comments)
Beads: A coding agent memory system - https://news.ycombinator.com/item?id=45566864 - Oct 2025 (1 comment)
It's 2025, accountability is a thing of the past. The future belongs to the unaccountable and their AI swarm.
Facebook burned something like $70bn on "metaverse" with seemingly zero results. There's a lot more capital (and biosphere) to burn on AI agents.
The simplicity of just plugging a few lines of code in a framework or a workflow engine means the barrier of entry is really, really low, what guarantees that we will have thousands of business process running through those duct taped agents in almost every kind of industry you can imagine.
Mountains of code nobody understand, even more Byzantine post-training to shoehorn more complex tool-usage into the models.
Compliance issues galore. Security incidents by the ton.
The future is going to be very, very interesting pretty soon. Why would you leave your front-row seat right now?
I would instead invest some good time and money in buying and learning to play a modern replica of a greek Kithara.
How would you go about doing that?
I'm looking for "the Emacs" of whatever this is, and I haven't read a blog post which isolates the design yet.
I don't known the details but I was wondering why people aren't "just" writing chat venues any commns protocols for the chats? So the fundamental unit is a chat that humans and agents can be a member of.
You can also have DMs etc to avoid chattiness.
But fundmantally if you start with this kind of madness you don't have a strict hierarchy and it might also be fun to see how it goes.
I briefly started building this but just spun out and am stuck using PAL MCP for now and some dumb scripts. Not super content with any of it yet.
This explains why some of the comments have timestamps that appear older than the post itself. I got tired of trying to make them line up, sorry!)
IMHO, it's less disorienting to have the post dated after the comments than it is to see a comment you thought you wrote a couple days ago but is dated today. So you're welcome to stop trying to line up timestamps.
Status quo sucks also, it just sucks less. Haven't yet figured out an actually good solution. Sorry!
The most I imagine most folks saying is "Didn't I see this post on the front page days ago?". For many other discussion fora, it's not uncommon for posts to be at the top of the pile for many days... so a days-old post date should be nothing unusual.
Re artificial uplifting a.k.a. re-upping, see https://news.ycombinator.com/item?id=26998308 and https://news.ycombinator.com/pool
WARNING DANGER CAUTION GET THE F** OUT YOU WILL DIE
I have never met Steve, but this warning alone is :chefskiss:
Gas Town is from the creator of beads.
Outside of that its trial and error, but I've learned you don't need to kick off a new chat instance very much if at all. I also like Beads because if I have to "run" or go offline I can tell it to pause and log where it left off / where its at.
For some projects I tell claude not to close tickets without my direct approval because sometimes it closes them without testing, my baseline across all projects is that it compiles and runs without major errors.
Think of as an extended bipolar-optimism-fueled glimpse into the future. Steve's MO is laid out in the medium post - but basically, it's okay to lose things, rewrite whole subsystems, whatever, this is the future. It's really fun and interesting to watch the speed of development.
I've made a few multi agent coding setups in the last year, and I think gas town has the team side about right: big boss (mayor), operations boss (deacon), relatively linear keeper of truth (witness), single point for merges (refiner), lots of coders with their code held lightly.
I love the idea of formulas - a lot of what makes gas town work and informs how well it ultimately will work is the formulas. They're close conceptually to skills.
I don't love the mad max branding, but meh, whatever, it's fun, and a perk of the brave new world where you can make stuff like for a few hundred bucks a month sent to anthropic - software can have personality again, yay.
Conceptually I think there is a product team element to this still missing - deploy engineers, product managers, visual testing. Everything is sort of out there, janky in parts, but workable to glue together right now, and will only improve. That said, the mad max town analogy is going to get overstretched at some point; we already have pretty good names for all the parts that are needed, and as coordination improves, we're going to want to add more stuff into the coordination. So, I'd like to see a version of this with normal names and expanded.
Upshot - worth a look - if beads is any indication, give it a month or two or four to settle down unless you like living on the bleeding bleeding edge.
I pointed it at a Postgres time series project I was working on, and it deployed a much better UI and (with some nudging) fixed docker errors on a remote server, which involved logging in to the server to check logs. It probably opened and fixed 50 or so beads in total.
I'd reach for it first to do something complicated ("convoy" or epic) over Claude Code even as is -- like, e.g. "copy this data ingestion we do for site x, and implement it for sites y,z,,a,b,c,d. start with a formal architecture that respects our current one and remains extensible for all these sites" is something I think it would do a fair job at.
As to cost - I did not run out of my claude pro max subscription poking around with it. It infers ... a lot ... though. I pulled together a PR that would let you point some or all of the agent types at local or other endpoints, but it's a little early, I think for the codebase. I'd definitely reach for some cheaper and/or faster inference for some of the use cases.
The article was pretty Ok. Kubernetes has it's own share of obnoxious terminology that often comes up as "we name it different so that it doesn't sound like AWS". At some point you just accept the terminology in relation to the tool you use and move on.
Assuming this isn't a parody project, maybe this just isn't for me, and thats fine. I'm struggling to understand a production use case where I'd be comfortable letting this thing loose.
Who is the intended audience for this design?
I promptly gave Claude the text to the articles and had him rewrite using idiomatic distributed systems naming.
Fun times!
Town = Central orchestrator / control plane
Rig = Project or workspace namespace
Polecat = Ephemeral worker job
Refinery = Merge queue manager
Witness = Worker health monitor
Crew = Persistent worker pool
Beads = Persistent work items / tasks
Hooks = Work queues / task slots
GUPP = Work processing guarantee
Molecules/Wisps = Structured, persistent workflows
Convoys = Grouped feature work units
https://chatgpt.com/share/695c6216-e7a4-800d-b83d-fc1a22fd8a...Update: I was hoping it'd at least be smart enough to automatically test the project still builds but it did not. It also didn't commit the changes.
> are you the mayor?
Yes. I violated the Mayor protocol - I should have dispatched this work to the gmailthreading crew worktree instead of implementing it directly myself.
The CLAUDE.md is clear: "Mayor Does NOT Edit Code" and "Coordinate, don't implement."
Maybe Yegge should have build it around Codex instead - Codex is a lot better at adhering to instructions.Pros: The overall system architecture is similar to my own latest attempt at solving this problem. I like the tmux-based console-monitoring approach (rather than going full SDK + custom UI), it makes it easier to inspect what is going on. The overlap between my ideas and Steve's is around 75%.
Cons: Arguing with "The Mayor" about some other detached processes poor workmanship seems like a major disconnect and architectural gap. A game of telephone is unlikely to be better than simply using claude. I was also hoping gastown would amplify my intent to complete the task of "Add feature X" without early-stopping, but so far it's more work than both 1. Vibing with claude directly and 2. Creating a highly-detailed spec with checkboxes and piping in "do the next task" until it's done.
Definitely looking forward to seeing how the tools in this space evolve. Eventually someone is bound to get it right!
P.s. the choice of nomenclature throughout the article is a bit odd, making it hard to follow. Movie characters, dogs and raccoons, huh? How about striving for descriptive SWE clarity?
that's what got us CQRS "command query responsibility segregation" which is technically correct word but absolutely fucking meaningless to anyone that doesn't know what it means already.
It should have been called "read here, write there" but noooooooOOOOOooooo we need descriptive SWE clarity so only people with CS degrees that know all the acronyms already can understand wtf is being said.
Most likely, tens of other bugs are being introduced at each step, etc etc, right?
```Gas Town is also expensive as hell. You won’t like Gas Town if you ever have to think, even for a moment, about where money comes from. I had to get my second Claude Code account, finally; they don’t let you siphon unlimited dollars from a single account, so you need multiple emails and siphons, it’s all very silly. My calculations show that now that Gas Town has finally achieved liftoff, I will need a third Claude Code account by the end of next week. It is a cash guzzler.'''
Since I am quite capable of shitting up my own code for free, and I've got zero interest in this stupid AI nonsense anyway, I'm vanishingly unlikely to actually use this. But, still: I like to keep half an eye on what is going on, even if I hate it. And I am more than somewhat intrigued about what the numbers actually look like.
We're trying to orchestrate a horde of agents. The workers (polecats?) are the main problem solvers. Now you need a top level agent (mayor) to breakdown the problem and delegate work, and then a merger to resolve conflicts in the resulting code (refinery). Sometimes agents get stuck and need encouragement.
The molecules stuff confused me, but I think they're just "policy docs," checklists to do common tasks.
But this is baby stuff. Only one level of hierarchy? Show me a design for your VP agent and I'll be impressed for real.
Has to be close for the shortest time from first commit to HN front page.
...no, I haven't lost the plot. I'm seeing another fad of the intoxicated parting with their money bending a useful tool into a golden hammer of a caricature. I dread seeing the eventual wreckage and self-realization from the inevitable hangover.
I've never understood this argument. Do you ever work with other humans? They are very much not deterministic, yet they can often produce useful code that helps you achieve more than you could by yourself.
I'll add a personal anecdote - 2 years ago, I wrote a SwiftUI app by myself (bare you, I'm mostly an infrastructure/backend guy with some expertise in front end, where I get the general stuff, but never really made anything big out of it other than stuff on LAMPP back in 2000s) and it took me a few weeks to get it to do what I want to do, with bare minimum of features. As I was playtesting my app, I kept writing a wishlist of features for myself, and later when I put it on AppStore, people around the world would email me asking for some other features. But life, work and etc. would get into way, and I would have no time to actually do them, as some of the features would take me days/weeks.
Fast forward to 2 weeks ago, at this point I'm very familiar with Claude Code, how to steer multiple agents at a time, quick review its outputs, stitch things together in my head, and ask for right things. I've completed almost all of the features, rewrote the app, and it's already been submitted to AppStore. The code isn't perfect, but it's also not that bad. Honestly, it's probably better from what I would've written myself. It's an app that can be memory intensive in some parts, and it's been doing well from my testings. On top of it, since I've been steering 2-3 agents actively myself, I have the entire codebase in my mind. I also have overwhelming amount of more notes what I would do better and etc.
My point is, if you have enough expertise and experience, you'll be able to "stitch things together" cleaner than others with no expertise. This also means, user acquisition, marketing and data will be more valuable than the product itself, since it'll be easier to develop competing products. Finding users for your product will be the hard part. Which kinda sucks, if I'll be honest, but it is what it is.
I've had the same experience as you. I've applied it to old projects which I have some frame of reference for and it's like a 200x speed boost. Just absolutely insane - that sort of speed can overcome a lot of other shortcomings.
I'm a full stack dev, and solo, so I write data schema, backends and frontends at the same time, usually flipping between them to test parts of new features. As far as AI use, I'm really just at the level of using a single Claude agent in an IDE - and only occasionally, because it writes a lot of nonsense. So maybe I'm missing out on the benefits of multiple agents. But where I currently see value in it is in writing (1) boilerplate and (b) sugar - where it has full access to a large and stable codebase. Where I think it fails is in writing overarching logical structures, especially early on in a project. It isn't good at writing elegant code with a clear view of how data, back and front should work together. When I've tried to start projects from scratch with Claude, it feels like I'm fighting against its micro-view of each piece of code, where it's unable to gain a macro-view of how to orchestrate the whole system.
So like, maybe a bottomless wallet and a dozen agents would help with that, but there isn't so much room for errors or bugs in my work code as there is in my fun/play/casual game code. As a result I'm not really seeing that much value in it for paid work.
If your end goal is to produce some usable product, then the implementation details matter less. Does it work? Yes? OK then maybe dont wrestle with the agent over specific libraries or coding patterns.
I don’t see how we get there, though, at least in the short term. We’re still living in the heavily-corporate-subsidized AI world with usage-based pricing shenanigans abound. Even if frontier models providers find a path to profitability (which is a big “if”), there’s no way the price is gonna go anywhere but up. It’s moviepass on steroids.
Consumer hardware capable of running open models that compete with frontier models is still a long ways away.
Plus, and maybe it’s just my personal cynicism showing, but when did tech ever reduce pricing while maintaining quality on a provided service in the long run? In an industry laser focused on profit, I just don’t see how something so many believe to be a revolutionary force in the market will be given away for less than it is today.
Billions are being invested with the expectation that it will fetch much more revenue than it’s generating today.
We're also seeing significant price reductions every year for LLM's. Not for frontier models, but you can get the equivalent of last year's model for cheaper. Hard to tell from the outside, but I don't think it's all subsidized?
I think maybe people over-updated on Bitcoin mining. Most tech is not inherently expensive.
If training of new models ceased, and hardware was just dedicated to inference, what would that do to prices and speed? It's not clear to me how much inference is actually being subsidized over the actual cost to run the hardware to do it. If there's good data on that I'd love to learn more though.
That's an old world that we experienced in 2000s, and maybe in early 2010s, where we cared about the quality on a provided service in the long run. For anything web-app-general-stuff related, that's long gone, as everyone (reads: mostly everyone) has very short attention span, and what is needed is "if the thing i desire can be done right now". In long run? Who cares. I keep seeing this in every day life, at work, discussions with my previous clients and etc.
Once again, I wish it wasn't true, but nothing is pointing that it's not true.
Or, if it does _now_, how long it'll be before it' will work well using downloadable models that'll run on, say, a new car's worth of Mac Studios with a bunch of RAM in them to allow a small fleet of 70B and 120B models (or larger) to run locally? Perhaps even specialised models for each of the roles this uses?
There's little evidence this is true. Even OpenAI who is spending more than anyone is only losing money because of the free version of ChatGPT. Anthropic says they will be profitable next year.
> Plus, and maybe it’s just my personal cynicism showing, but when did tech ever reduce pricing while maintaining quality on a provided service in the long run? In an industry laser focused on profit, I just don’t see how something so many believe to be a revolutionary force in the market will be given away for less than it is today.
Really?
I mean I guess I'm showing my age but the idea I can get a VM for a couple of dollars a month and expect it to be reliable make me love the world I live in. But I guess when I started working there was no cloud and to get root on a server meant investing thousands of dollars.
According to Ed Zitron, Anthropic spent more than it's total revenue in the first 9 months of 2025 on AWS alone: $2.66 billion on AWS compute on an estimated $2.55 billion in revenue. That's just AWS, not payroll, not other software or hardware spend. He's regularly reporting concrete numbers that look horrible for the industry while hyperscalers and foundation model companies continue to make general statements while refusing to get specific or release real revenue figures. If you only listen to what the CEOs are saying, then sure it sounds great.
Anthropic also said that AI would be writing 95% of code in 3 months or something, however many months ago that was.
Yes, but it's unclear how much of that is training costs vs operational costs. They are very different things.
But how many of those providers are too subsidizing their offering through investment capital? I don't know offhand of anyone in this space that is running at or close to breakeven.
It feels very much like the early days of streaming when you could watch everything with a single Netflix account. Those days are long gone and never coming back.
Since we have version control, you can restart anywhere if you think it's a good place to fork from. I like greenfield development, but I suspect that there are going to be a lot more forks from now on, much like the game modding scene.
Companies with money-making businesses are gonna find themselves in an interesting spot when the "vibe juniors" are the vast majority of the people they can find to hire. New ways will be needed to reduce the risk.
...go to jail?
I have enjoyed Steve's rants since "Execution in the Kingdom of Nouns" and the Google "Platform rant", but he may need someone to talk to him about bamboo and what a terrible life choice it is. Unless you can keep it the hell away from you and your neighbours it is bad, very bad. I'm talking about clumping varieties, the runners are a whole other level.
There is a repo and I am not sure; the only way to resolve it probably is to spend some of that money he’s talking about.
He as a dev should know that adding a layer of names on top of already named entities is not a good practice. But he just had fun and this came up. Which is fantastic. But I don't want to have to translate names in my head all the time.
Just not useful. Beads also... really sorry to say this, but it is a task runner with labels, but it has 0 awareness of the actual tasks.
I don't know, maybe I am wrong, but this just doesn't seem like a thing that will work. Which is why I think it will be popular, nobody will be able to make it work, but they will not want to look dumb and will say it is awesome and amazing. Like another AI thingy I could name but will not that everyone is using.
But love Yegge and hope he does well. Amp for a little bit that I used it, is really solid agent and delivered much better results than many others.
But when high level languages were getting started, we had to read and debug the the transformed lower level output they made (hello C-front). At a certain point, most of us stopped debugging the layer below and most LLVM IR and assembly flow by without anyone reading it.
I use https://exe.dev to orchestrate several agents, and I am seeing the same benefits as Steve (with a better UI). My code smell triggers with lots of diffs that flow by, but just as often this feeling of, "oh, that's a nice feature, it's much better than I could have made" is also triggered. If you work with colleagues who occasionally delight and surprise you with excellent work, it's the same thing.
Maybe if you are not used to the feeling of being surprised and mostly delighted by your (human) colleagues, orchestrated agentic coding is hard to get your head around.
In the past a large codebase indicated that maybe you might take the project serious, as some human effort was expended in its creation. There were still some outliers like Urbit and it's 144 KLOC of Hoon code, perverse loobeans and all.
Now if I get so much as a whiff of AI scent of a project, I lot all interest. It indicates that the author did not a modicum of their own time in the project, so therefore I should waste my own time on it.
(I use LLM-based coding tools in some of my projects, but I have the self-respect to review the generated code before publishing init.)
Of course as a developer you still have to take responsibility for your code, minimally including a disclaimer, and not dumping this code in to someone else’s code base. For example at work when submitting MRs I do generally read the code and keep MRs concise.
I’ve found that there is a certain kind of coder that hears of someone not reading the code and this sounds like some kind of moral violation to them. It’s not. It’s some weird new kind of coding where I’m more creating a detailed description of the functionality I want and incrementally refining it and iterating on it by describing in text how I want it to change. For example I use it to write GUI programs for Ubuntu using GTK and python. I’m not familiar with python-gtk library syntax or GTK GUI methods so there’s not really much of a point in reading the code - I ask the machine to write that precisely because I’m unfamiliar with it. When I need to verify things I have to come up with ways for the machine to test the code on its own.
Point is I think it’s honestly one new legitimate way of using these tools, with a lot of caveats around how such generated code can be responsibly used. If someone vibe coded something and didn’t read it and I’m worried it contains something dangerous, I can ask Claude to analyze it and then run it in a docker container. I treat the code the same way the author does - as a slightly unknown pile of functions which seem to perform a function but may need further verification.
I’m not sure what this means for the software world. On the face of it it seems like it’s probably some kind of problem, but I think at the same time we will find durable use cases for this new mode of interacting with code. Much the same as when compilers abstracted away the assembly code.
This is not exactly that, but it is one step up. Having agents output code that then gets compiled/interpreted/whatever, based upon contextual instruction, feels very, very familiar to engineers who have ever worked close to the metal.
"Old fashioned", in this aspect, would be putting guardrails in place so that you knew that what the agent/compiler was creating was what you wanted. Many years ago, that was binaries or bytecode packaged with lots of symbols for debugging. Today, that's more automated testing.
I started "fully vibecoding" 6 months ago, on a side-project, just to see if it was possible.
It was painful. The models kept breaking existing functionality, overcomplicating things, and generally just making spaghetti ("You're absolutely right! There are 4 helpers across 3 files that have overlapping logic").
A combination of adjusting my process (read: context management) and the models getting better, has led me to prefer "fully vibecoding" for all new side-projects.
Note: I still read the code that gets merged for my "real" work, but it's no longer difficult for me to imagine a future where that's not the case.
https://github.com/shepherdjerred/scout-for-lol/blob/main/es...
2 years sounds more likely than 2 months since the established norms and practices need to mature a lot more than this to be worthy of the serious consideration of the considerably serious.
On my personal project I do sometimes chat with ChatGPT and it works as a rubber duck. I explain, put my thoughts into words and typically I already solve my problem when I'm thinking it through while expressing it in words. But I must also admit that ChatGPT is very good at producing prose and I often use it for recommending names of abstractions/concepts, modules, functions, enums etc. So there's some value there.
But when it comes to code I want to understand everything that goes into my project. So in the end of the day I'm always going to be the "bottle neck", whether I think through the problem myself and write the code or I review and try to understand the AI generated code slop.
It seems to me that using the AI slop generation workflow is a great fit for the industry though, more quantity rather quality and continuous churn. Make it cheaper to replace code so that the replacement can be replaced a week later with another vibe-coded slop. Quality might drop, bugs might proliferate but who cares?
And to be fair, code itself has no value, it's ephemeral, data and its transformations are what matter. Maybe at some point we can just throw out the code and just use the chatbots to transform the data directly!
LLMs are far from being as trustworthy as compilers.
Now I've got tools and functionality that I would have paid for before as separate apps that are running "for free" locally.
I can't help but think this is the way forward and we'll just have to deal with the landmine as/when it comes, or hope that the tooling gets drastically better so we the landmine isn't as powerful as we fear.
Looking at the screenshot of "Tracked Issues", it seems many of the "tasks" are likely overlapping in terms of code locality.
Based on my own experience, I've found the current crop of models to work well at a slightly higher-level of complexity than the tasks listed there, and they often benefit from having a shared context vs. when I've tried to parallelize down to that level of work (individual schema changes/helper creation/etc.).
Maybe I'm still just unclear on the inner workings, but it's my understanding each of those tasks is passed to Claude Code and developed separately?
In either case, I think this project is a glimpse into the future of software development (albeit with a grungy desert punk tinted lens).
For context, I've been "full vibe-coding"[0] for the past 6 months, and though it started painfully, the models are now good enough that not reading the code isn't much of an issue anymore.
Once in a blue moon it will not completely fail and might even spit out something impressive at first glance, so some people will latch on to that.
What I am finding most beneficial almost immediately is I have a dedicated Telegram channel that I can post all sorts of unstructured data into and it's automatically routed via LLMS and stored into the right channel and then other agents work on that data to provide me insights. I have a calorie counter, workout capture, reminders, daily diary prompts all up and running as of right now, and honestly it's better than anything I could have bought "off the shelf"
Last night I needed a C# console app to convert PDFs to a sprite sheet. I spent 30 seconds writing the prompt and another 30 seconds later the app was running and successfully converting PDFs on the first try. I then spent about another 2 mins adding a progress bar, tweaking the output format and moving the main logic into a new library.
> First, you should locate yourself on the chart. What stage are you in your AI-assisted coding journey? > Stage 1: Zero or Near-Zero AI: maybe code completions, sometimes ask Chat questions > Stage 2: Coding agent in IDE, permissions turned on. A narrow coding agent in a sidebar asks your permission to run tools. > Stage 3: Agent in IDE, YOLO mode: Trust goes up. You turn off permissions, agent gets wider. > Stage 4: In IDE, wide agent: Your agent gradually grows to fill the screen. Code is just for diffs. > Stage 5: CLI, single agent. YOLO. Diffs scroll by. You may or may not look at them. > Stage 6: CLI, multi-agent, YOLO. You regularly use 3 to 5 parallel instances. You are very fast. > Stage 7: 10+ agents, hand-managed. You are starting to push the limits of hand-management. > Stage 8: Building your own orchestrator. You are on the frontier, automating your workflow. > *If you’re not at least Stage 7, or maybe Stage 6 and very brave, then you will not be able to use Gas Town. You aren’t ready yet.*
He is so in love with his own voice.
Try to find actual screenshots of this shit or what it really does in the 200 000-word diarrhea (funnily he agrees it's diarrhea [1]).
---
He also references his previous slop called beads. To quote, "Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day".
It's slop to a level that people create extensive scripts to try and purge it from the system since it infects everything you do: https://gist.github.com/banteg/1a539b88b3c8945cd71e4b958f319...
Do not listen to newly converted or accept anything from them. Steve Yegge used to be a good engineer with great understanding of the world. Now it's all gupps and polecats
[1] Quote from the article: "it’s a bunch of bullshit I pulled out of my arse over the past 3 weeks, and I named it after badgers and stuff."
"Psychedelics are the latest employee health benefit" (tech company) https://www.ft.com/content/e17e5187-8aa7-4564-9e63-eec294226...
"A new psychedelic era dawns in America" (specifically about use in california) https://www.ft.com/content/5b64945f-da21-46d9-853f-c949a95b9...
"How Silicon Valley rediscovered LSD" https://www.ft.com/content/0a5a4404-7c8e-11e7-ab01-a13271d1e...
I could go on, but the knowledge that psychadelic drugs are prominent in the tech community is not a new fact.
https://news.ycombinator.com/item?id=44530767
(posted here a few months back)
This is instantly recognizable as the work of someone who's been up for a couple days on Adderall.
Of course, there may be other explanations, including other drugs. But if I was one to bet...
It's far from a homogenous crowd. Yegge stands out with extreme opinions even from people who adopted the new tools daily.
I'm excited the author shared and so exuberantly; that said I did quick-scroll a bunch of it. It is its own kind of mind-altering substance, but we have access to mind-bending things.
If you look at my AgentDank repo [1], one could see a tool for finding weed, or you could see connecting world intelligence with SQL fluency and pairing it with curated structured data to merge the probabilistic with the deterministic computing forms. Which I quickly applied to the OSX Screentime database [2].
Vibe coding turned a corner in November and I'm creating software in ways I would have never imagined. Along with the multimodal capabilities, things are getting weirder than ever.
Mr Yegge now needs to add a whole slew of characters to Gas Town to maintain multi-modal inputs and outputs and artifacts.
Just two days I go, I had LLMs positioning virtual cameras to render 3D models it created using the Swift language after looking at a picture of what to make, and then "looking" at the results to see the next code changes. Crazy. [3]
ETA: It was only 14 months earlier that I was amazed that a multi-modal model could identify a trend in a chart [4].
[1] https://github.com/AgentDank/dank-mcp
[2] https://github.com/AgentDank/screentime-mcp
[3] https://github.com/ConAcademy/WeaselToonCadova/
[4] https://github.com/NimbleMarkets/ollamatea/blob/main/cmd/ot-...
I recognize 100% that a tool to manage ai agents with long term context tracking is going to be a big thing. Many folks have written versions of this already. But mashing together the complexity of k8s with a hodge podge of lotr and mad max references is not it.
Its like the complexity of J2EE combined with AI-fueled solipsim and a microdosing mushroom regime gone off the rails. What even are all the layers of abstractions here? and to build what? What actual apps or systems has this thing built? AFAICT it has built gas town, and nothing else. Not surprising that it has eaten its own tail.
The amount of jargon, ai art, pop culture references, and excessive complexity going on here is truly amazing, and I would assume its satire if I didn't know Yegge's style and previous writings. Its like someone looked at the amount of overlapping and confusing tools Anthropic has released around Claude Code, and said "hold my beer, hand me 3 red bulls and a shot of espresso, I can top that!".
I do think a friend of mine nailed it though with this quote: "This whole "I'm using agents to write so much software" building-in-public trend, but without actually showing what they built, reminds me of the people selling courses on stock trading or drop shipping."
The amount of get-rich quick schemes around any new tech are boundless. As yegge himself points out in the post towards the end, you'd be surprised what you can pull off with a ridiculous blog post, big-tech reputation, and excessive LOC dev-tools in a hype-driven market. How could it be wrong if it aligns so closely with so many CEOs dreams?
225k lines for a cli issue tracker? What the fuck?
With vibe coding you just give the code some constraints and then system will try to work within those constraints, but what if those constraints are wrong? What if you’re asking the wrong question? Then you’ll end up with over complicated slop.
It’s a shame that vibe coded slop seems to be a new standard, when in fact you can use AI tools to produce much higher quality code if you actually care to engage in thoughtful conversations with the AIs and take a growth mindset.
We intend to sing the love of danger, the habit of energy and fearlessness.
Courage, audacity, and revolt will be essential elements of our poetry.
Up to now literature has exalted a pensive immobility, ecstasy, and sleep. We intend to exalt aggresive action, a feverish insomnia, the racer’s stride, the mortal leap, the punch and the slap.
We affirm that the world’s magnificence has been enriched by a new beauty: the beauty of speed. A racing car whose hood is adorned with great pipes, like serpents of explosive breath—a roaring car that seems to ride on grapeshot is more beautiful than the Victory of Samothrace. … https://www.arthistoryproject.com/artists/filippo-tommaso-ma...
Our civilization is doomed if this is the future. Zero quality, zero resiliency, zero coherent vision, zero cohesive intent. Just chaotic slop everywhere, the ultimate Ouroboros.
I think Gas Town looks interesting directionally and as a PoC. Like it or not, that's the world we'll end up in. Some products will do it well and some will be horrible monsters. (Like I'm already dreading Oracle Gas Town and Azure Gas Town).
I think the Amp coding agent trends in the direction of Gas Town already. Powerful but expensive, uses a mix of models and capabilities to do something that's greater than the sum of the parts.
mccoyb•5d ago
> Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.
Then:
> Working effectively in Gas Town involves committing to vibe coding. Work becomes fluid, an uncountable that you sling around freely, like slopping shiny fish into wooden barrels at the docks. Most work gets done; some work gets lost. Fish fall out of the barrel. Some escape back to sea, or get stepped on. More fish will come. The focus is throughput: creation and correction at the speed of thought.
I see -- so where exactly is my focus supposed to sit?
As someone who sits comfortably in the "Stage 8" category that this article defines, my concern has never been throughput, it has always been about retaining a high-degree of quality while organizing work so that, when context switching occurs, it transitions me to near-orthogonal tasks which are easy to remember so I can give high-quality feedback before switching again.
For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C.
On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.
This works and actually gets shit done. I'm not convinced that 20 Claudes or massively parallel worktrees or whatever improves on quality, because, indeed, I always have to intervene at some point. The blocker for me is not throughput, it's me -- a human being -- my focus, and the random points of intervention which ... by definition ... occur stochastically (because agents).
Finally:
> Opus 4.5 can handle any reasonably sized task, so your job is to make tasks for it. That’s it.
This is laughably not true, for anyone who has used Opus 4.5 for non-trivial tasks. Claude Code constantly gives up early, corrupts itself with self-bias, the list goes on and on. It's getting better, but it's not that good.
iamwil•5d ago
Can you talk more about the structure of your workflow and how you evolved it to be that?
mccoyb•5d ago
"What if Opus wrote the code, and GPT 5~ reviewed it?" I started evaluating this question, and started to get higher quality results and better control of complexity.
I could also trust this process to a greater degree than my previous process of trying to drive Opus, look at the code myself, try and drive Opus again, etc. Codex was catching bugs I would not catch with the same amount of time, including bugs in hard math, etc -- so I started having a great degree of trust in its reasoning capabilities.
I've codified this workflow into a plugin which I've started developing recently: https://github.com/evil-mind-evil-sword/idle
It's a Claude Code plugin -- it combines the "don't let Claude stop until condition" (Stop hook) with a few CLI tools to induce (what the article calls) review gates: Claude will work indefinitely until the reviewer is satisfied.
In this case, the reviewer is a fresh Opus subagent which can invoke and discuss with Codex and Gemini.
One perspective I have which relates to this article is that the thing one wants to optimize for is minimizing the error per unit of work. If you have a dynamic programming style orchestration pattern for agents, you want the thing that solves the small unit of work (a task) to have as low error as possible, or else I suspect the error compounds quickly with these stochastic systems.
I'm trying this stuff for fairly advanced work (in a PhD), so I'm dogfooding ideas (like the ones presented in this article) in complex settings. I think there is still a lot of room to learn here.
mlady•4d ago
It's cool to see others thinking the same thing!
anthonypasq•1d ago
this is the equivalent of some crazy inventor in the 19th century strapping a steam engine onto a unicycle and telling you that some day youll be able to go 100mph on a bike. He was right in the end, but no one is actually going to build something usable with current technology.
Opus 4.5 isnt there. But will there be a model in 3-5 years thats smart enough, fast enough, and cheap enough for a refined vision of this to be possible? Im going to bet on yes to that question.
leftbehinds•1d ago
anthonypasq•1d ago
fragmede•1d ago
jbl0ndie•1d ago
https://www.wired.com/story/london-bitcoin-pub/
benregenspan•1d ago
leipert•1d ago
adw•1d ago
The POS software's on GitHub: https://github.com/sde1000/quicktill
Quarrelsome•1d ago
dzdt•1d ago
mccoyb•1d ago
> something like gas town is clearly not attempting to be a production grade tool.
Compare to the first two sentences:
> Gas Town is a new take on the IDE for 2026. Gas Town helps you with the tedium of running lots of Claude Code instances. Stuff gets lost, it’s hard to track who’s doing what, etc. Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.
Compared to your read, my read is confused: is it or is it not intending to be a useful tool (we can debate "production" quality, here I'm just thinking something I'd actually use meaningfully -- like Claude Code)?
I think the author wants us to take this post seriously, so I'm taking it seriously, and my critique in the original post was a serious reaction.
alexjurkiewicz•1d ago
This tool is dangerous, largely untested, and yet may be of interest if you are already doing similar things in production.
andrewl-hn•1d ago
mccoyb•1d ago