I guess that fell on deaf ears.
How to comply with a demand to show more information by showing less information.
It's just a whole new world where words suddenly mean something completely different, and you can no longer understand programs by just reading what labels they use for various things, you need to also lookup if what they think "verbose" means matches with the meaning you've built up understanding of first.
If they tried to create a better product I'd expect them to just add the awesome option, not hide something that saves thousands of tokens and context if the model goes the wrong way.
The latest "meta" in AI programming appears to be agent teams (or swarms or clusters or whatever) that are designed to run for long periods of time autonomously.
Through that lens, these changes make more sense. They're not designing UX for a human sitting there watching the agent work. They're designing for horizontally scaling agents that work in uninterrupted stretches where the only thing that matters is the final output, not the steps it took to get there.
That said, I agree with you in the sense that the "going off the rails" problem is very much not solved even on the latest models. It's not clear to me how we can trust a team of AI agents working autonomously to actually build the right thing.
As you as you start to work with a codebase that you care about and need to seriously maintain, you'll see what a mess these agents make.
Not to say that there's no value in AI written code in these codebases, because there is plenty. But this whole thing where 6 agents run overnight and "tada" in the morning with production ready code is...not real.
What fills the holes are best practices, what can ruin the result is wrong assumptions.
I dont see how full autonomy can work either without checkpoints along the way.
As tedious as it is a lot of the time ( And I wish there was an in-between "allow this session" not just allow once or "allow all" ), it's invaluable to catch when the model has tried to fix the problem in entirely the wrong project.
Working on a monolithic code-base with several hundred library projects, it's essential that it doesn't start digging in the wrong place.
It's better than it used to be, but the failure mode for going wrong can be extreme, I've come back to 20+ minutes of it going around in circles frustrating itself because of a wrong meaning ascribed to an instruction.
I still think it’d be nice to allow an output mode for you folks who are married to the previous approach since it clearly means a lot to you.
Curious what plans you’re using? running 24/7 x 5 agents would eat up several $200 subscriptions pretty fast
amelius•1h ago
tux3•1h ago
amelius•1h ago
_aavaa_•1h ago
Terretta•32m ago
https://news.ycombinator.com/item?id=9224
Or this pulls the exchange under the famous HN post itself:
https://news.ycombinator.com/item?id=8863