It's interesting that AI is finally forcing businesses to think about coding maintenance costs though.
When I started working on https://saasufy.com/ as a dev tool many years ago, I was frustrated that no big company cared about software maintenance costs and I really couldn't imagine a world where maintenance costs would be a problem (which is what my platform was addressing). So this is one positive thing from my perspective, I guess. But how much longer before people put 2-and-2 together and realize that architectural complexity is the leading cause? That's the real moment I'm still waiting for.
Will what's left of the socio-economic system be sufficiently capitalist that I will be able to capitalize on that? That's my next problem.
It's a very different experience with a messy codebase. In this case, the agent spends most of its time trying to gather the relevant context and it's like a game of whac-a-mole. The agent burns through tokens and can take a long time to resolve the issue with a lot of human intervention required. I would say it takes possibly just as long or longer than a human engineer would. Also, psychologically, the temptation for the engineer to trust the AI is massive because they don't want to load themselves up with all that ugly, complex context. They are more likely to let the agent create more hacks on top.
On a relatively well-structured codebase with loose coupling and high cohesion, the experience is usually very positive, mind-blowing, even; because it feels like the agent is reading your mind and fast-forwarding you. You don't need to correct it as much. And when you do, it's usually minor things.
The first case represents a net loss of value because tech debt is being added and compounding the complexity each time a problem is 'solved'. On the other hand, the second case is a significant speedup, for me, I would say it's at least a 5x speedup. I love using AI in this way. I'm in control and not at the mercy of the agent.
Is anyone using a local router to deal with that? Something thats like "don't even bother with sonnet for this task, just go with Opus". I wonder if Haiku could even do that math and recommend the model you should be in?
But all that might be somewhat obsolete, the latest update for claude code looks like it uses workflows with various models, so they might already be optimizing that.
The Red Queen's Haiku Run faster, she said— each cheaper token consumed to hold the same place
Mr. Meeseeks' Law: "An agent that cannot finish a task spawns another agent to help. No task reveals its difficulty until it is attempted; as such, the cost of any unattended task can exceed it's value"
https://support.claude.com/en/articles/9797531-what-is-the-e...
If you mean cheaper than training, sure
Sounds plausible but I doubt it outmatches ICE warehouse concentration camp spending
Which is now the future of this country unless we force a course correction, by 2029 you'll drive down highways and it will just be one datacenter and ICE prison warehouse after another
I do not understand why you need as many GPUs powered up than people in the country or even a 1:10 ratio, it's all going to sit idle until they find something practical to do with "AI" other than entertainment purposes because it's not profitable, how are they going to monetize it, they cannot
Google has a lot of systems to make a very large monorepo manageable so builds and code search don't take forever. The build system is Blaze (on which Bazel is based), which has a Pythonic syntax and was once Python but that hasn't been the case (AFAIK) for over a decade. This means you build a massive digraph of build artifacts. By "large" I mean somewhere between 100M and 1B vertices (guessing). Loading that became a significant problem for a build so there's heavy caching around that. There's also heavy caching around build artifacts (ie Forge).
So, part of the issue with every developer using Claude is that you have a ton of inefficiency becasue everybody has a significant context. And what is context really? It's not too dissimilar to the build graph and/or code search you already have.
So the infra I would be working on would be some kind of "global context" or "context cache". Now a lot of context changes when you do a local change but a lot doesn't. As an ordinary engineer, you aren't generally modifying /base. You're modifying leaf nodes or branches for very few leaf nodes.
The reasons I see to do this are:
1. Cost-savings by deduplication;
2. Speed if context is partially-cached;
3. You avoid issues of sending out your codes to third-parties. In the case of Google or Amazon, if they use Claude at all, they would probably only be using their own clouds so they avoid this. But Uber doesn't have that luxury;
4. You avoid any issues of people using your prompts for responses for training and leaking any potential sensitie information that way;
5. You can use off-peak resources for a lot of this work;
6. You can control resources within your own pervasive resource management (in the case of Google); and
7. You can more easily integrate into internal tooling.
I also think that expanding compute power is the biggest risk to Anthropic (and OpenAI). There's a vast difference between a model you need a cluster of NVidia's finest to run vs one you can run on a Macbook Pro. We aren't there yet on a Macbook Pro but it'll only be a few years we are.
Those are generally the core reasons most SaaSes exists. Additionally, (a) is the biggest issue because there is no open-weights model that can match GPT 5.5/Opus 4.8.
I don't understand why CEO doesn't optimize and automate himself out of the job, like the software engineers are told to do
There is a lot of frustration and even anger over CEOs pushing AI onto employees and some schadenfreude when it goes wrong. But there is some element of "fail fast" happening here.
I am glad wealthy corporations are footing the bill by stretching this technology to its limit. The fact of the matter is, we don't know how effective the best-of-the-best models are at scale.
There is a feeling that once we figure out how to leverage these agents, we'll see explosive growth. It's just going to cost a lot of money figuring it out.
It seems that for now, handing over 100% of code writing to LLMs is going to be too expensive. Cost per token for equivalent code is too high.
And the first data point is in your favor, kind of. I mean, Uber engineers were sufficiently incentivized to use the tokens they were given. It isn't easy to determine what the exact motivation was. What might result from this latest round of CEO backtracking is either relief (don't have to pretend to use AI anymore) or frustration (upset at a useful tool being taken away).
There are two possible stories here. One, they forced everyone to use AI and didn't get enough benefit to justify the cost. Two, they gave the opportunity to their employees to use unlimited AI and those employees jumped at the chance with a vigor that management didn't expect.
All we really know is that value per token must have been low enough to cause this change.
I made this argument earlier and I'll make it again, I think a major contributing factor to AI budgets exploding is the token leaderboards, culture of "tokenmaxxing" and the the constant narrative that if you're not burning X tokens a month, you're not a good engineer.
So I personally can easily believe that. Especially that a lot of people will just try to see if model can make that huge improvement / refactoring they’ve been hoping to do a reality, or tons of experiments to validate ideas.
ChrisArchitect•1h ago
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
https://news.ycombinator.com/item?id=48335388