This is my main issue with all these agentic frameworks - they always conviniently forget that there is nothing "individual" about the thing they label "an agent" and draw a box around.
Such "on demand" agents, spawned directly from previos LLM output, are never in any way substantially different from dynamic context compression/filtering.
I think the only sensible framework is to think in terms of tools, with clear interfaces, and a single "agent" (single linear interaction chain) using those tools towards a goal. Such tools could be LLM-based or not. Forcing a distinction between a "function tool" and an "agent that does somethng" doesn't make sense.
Here task is fullfilled with the full context so far, and then compressed. Might work better IMO.
The latter is really helpful for getting a coding assistant to settle on a high quality solution. You want critic subagents to give fresh and unbiased feedback, and not be influenced by arbitrary decisions made so far. This is a good thing, but inheriting context destroys it.
From the perspective of the agent, whether the tools are deterministic functions, or agents themselves, is irrelevant.
I bring this up because this article discusses context management mostly in terms of context windows having a maximum size. I think that context management is far more than that. I'm still new to this building agents thing, but my experience suggests that context problems start cropping up well before the context window fills up.
I'm using an old Android phone (Pixel 2 from 2017), a 5$ a month unlimited SMS plan from Tello, and https://github.com/capcom6/android-sms-gateway. For bonus points (I wanted to roll my own security, route messages from different numbers to prod and ppe instances of my backend, and dedup messages) I built a little service in Go that acts as an intermediary between my backend and android-sms-gateway. I deploy this service to my android device using ADB, android-sms-gateway talks to it, and it talks to my backend. I also rooted the android device so I could disable battery management for all apps (don't do this if you want to walk around with the phone of course). It works pretty well!
I plan to open-source this eventually TM, but first I need to decouple my personal deployment infra from the bits useful to everyone else
Has this changed since June? Because I’ve been experimenting over the last month with Claude Cide subagents that work in parallel and agents which write code (doing both simultaneously is inadvisable for obvious reasons, at least without workspace separation).
And oh great, another Peter Thiel company booted to the top of HN, really?
> "Cognition AI, Inc. (also known as Cognition Labs), doing business as Cognition, is an artificial intelligence (AI) company headquartered in San Francisco in the US State of California. The company developed Devin AI, an AI software developer...Originally, the company was focused on cryptocurrency, before moving to AI as it became a trend in Silicon Valley following the release of ChatGPT... With regards to fundraising, the company was backed by Peter Thiel's Founders Fund which provided $21 million of funding to it in early 2024, valuing the company at $350 million.[2] In April 2024, Founders Fund led a $175 million investment into Cognition valuing the company at $2 billion making it a Unicorn."
The bubble's gonna pop, and you'll have so much egg on your face. This stuff is just compilers with extra compute and who got rich off compilers? VC people...
The agentic revolution is very different from the chatbot/model revolution because agents aren't a model problem, they're a tools/systems/process problem. Honestly the models we have now are very close to good enough for autonomous engineering, but people aren't giving them the right tools, the right processes, we aren't orchestrating them correctly, most people have no idea how to benchmark them to tune them, etc. It's a new discipline and it's very much in its infancy.
The principles are super basic to get the first time you build an agent.
The real problem is to get reliability. If you have reliability and clear defined input and output you can easily go parallel.
THis seems like a bad 5th class homework
I would emphasize, though, that getting clearly defined input _that remains stable_ is hard. Often something is discovered during implementation of a task that informs changes in other task definitions. A parallel system has to deal with this or the results of the parallel tasks diverge.
Separating agents, has a clear advantage. For example, suppose you have a coding agent with a set of rules for safely editing code. Then you also have a code search task, which requires a completely different set of rules. If you try to combine 50 rules for code editing with 50 rules for code searching, the AI can easily get confused.
It’s much more effective to delegate the search task to a search agent and the coding task to a code agent. Think of it this way: when you need to switch how you approach a problem, it helps to switch to a different “agent”, a different mindset with rules tailored for that specific task.
Do i need to think differently about this problem? if yes, you need a different agent!
So yes, conceptually, using separate agents for separate tasks is the better approach.
I was playing around with this task: give a prompt to a low-end model, get the response, and then get the higher-end model to evaluate the quality of the response.
And one thing I've noticed, is while sometimes the higher-end model detects when the low-end model is misinterpreting the prompt (e.g. it blatantly didn't understand some aspect of it and just hallucinated), it still often allows itself be controlled by the low-end model's framing... e.g. if the low-end model takes a negative attitude to an ambiguous text, the high-end model will propose moderating the negativity... but the thing it doesn't realise, is if given the prompt without the low-end model's response, it might not have adopted that negative attitude at all.
So one idea I had... a tool which enables the LLM to get its own "first impression" of a text... so it can give itself the prompt, and see how it would react to it without the framing of the other model's response, and then use that as additional input into its evaluation...
So this is an important point this post doesn't seem to understand – sometimes less is more, sometimes leaving stuff out of the context is more useful than putting it in
> It turns out subagent 1 actually mistook your subtask and started building a background that looks like Super Mario Bros. Subagent 2 built you a bird, but it doesn’t look like a game asset and it moves nothing like the one in Flappy Bird. Now the final agent is left with the undesirable task of combining these two miscommunications
It seems to me there is another way to handle this... allow the final agent to go back to the subagent and say "hey, you did the wrong thing, this is what you did wrong, please try again"... maybe with a few iterations it will get it right... at some point, you need to limit the iterations to stop an endless loop, and either the final agent does what it can with a flawed response, or escalate to a human for manual intervention (even the human intervention can be a long-running tool...)
They really handing out ai domain to anyone these days.
This is quite a whopper. For one thing, the web started off reactive. It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way (though, I'm sorry, IMO that doesn't actually include react). Second, "modularity" has been a thing for quite some time before the web existed. (If you want to get down to it, separating and organizing and your processes in information systems predate computers.)
behnamoh•1h ago
I'm not surprised—most AI "engineers" are not really good software engineers; they're often "vibe engineers" who don't read academic papers on the subject and keep re-inventing the wheel.
If someone asked me why I think there's an AI bubble, I'd point exactly to this situation.
downrightmike•1h ago
But then next year and the year after, the technical debt will be to the point where they just need to throw out the code and start fresh.
Then the head count must go up. Typical short term gains for long term losses/bankruptcy
antonvs•1h ago
There’s no good evidence to support that claim. Just one study which looked at people with minimal AI experience. Essentially, the study found that effective use of AI has a learning curve.
madrox•1h ago
I'm not sure that means the people who do this aren't good engineers, though. If someone rediscovers something in practice rather than through learning theory, does that make them bad at something, or simply inexperienced? I think it's one of the strengths of the profession that there isn't a singular path to reach the height of the field.
jll29•1h ago
In the early 2000s, we used Open Agent Architecture (OAA) [1], which had a beautiful (declarative) Prolog-like notation for writing goals, and the framework would pick & combine the right agents (all written in different languages, but implementing the OAA interface through proxy libraries) to achieve the specified goals.
This was all on boxes within the same LAN, but conceptually, this could have been generalized.
[1] https://medium.com/dish/75-years-of-innovation-open-agent-ar...
ramchip•1h ago
WalterSear•1h ago
With respect, if there's an AI bubble, I can't see it for all the sour grapes, every time it's brought up, anywhere.
antonvs•1h ago
WalterSear•25m ago
It's an entirely new way of thinking, nobody is telling you the rules of the game. Everything that didn't work last month works this month, and everything you learned two months ago, you need to throw away. Coding assistants are inscrutable, overwhelming and bristling with sharp edges. It's easier than ever to paint yourself into a corner.
Back when it took weeks to put out a feature, you were insulated from the consequences of bad architecture, coding and communication skills: by the time things get bad enough to be noticed, the work had been done months ago and everyone on the team had touched the code. Now you can seeing the consequences of poor planning, poor skills, poor articulation being run to their logical conclusion in an afternoon.
I'm sure there are more reasons.