Of course agents is now a buzzword that means nothing so there is that.
It had a lot of moving parts of which agents were the top 30% other systems would interact with. Storing, retrieving and ranking the information was the more important 70% that isn't as glamorous and no one makes courses about.
I still have no idea why everyone is talking about whatever the hottest decoder only model is, encoder only models are a lot more useful for most tasks not directly interfacing with a human.
*https://www.slideserve.com/verdi/seng-697-agent-based-softwa...
I have been working on LLMs since 2017, both training some of the biggest and then creating products around them and consider I have no experience with agents.
GPT-3, while being impressive at the time, was too bad to even let it do that, it would break after 1 or 2 steps, so letting it do anything by itself would have been a waste of time where the human in the loop would always have to re-do everything. It's planning ability was too bad and hallucinations way to frequent to be useful in those scenarios.
The move by Cloudflare will totally ruin the AI scraper and the AI agent hype.
They’ll just get the agent to operate a browser with vision and it’s over. CAPTCHAs were already obsolete like 2-3 years ago.
What Claude Code has taught me is that steering an agent via a test suite is an extremely powerful reinforcement mechanism (the feedback loop leads to success, most of the time) -- and I'm hopeful that new thinking will extend this into the other "soft skills" that an agent needs to become an increasingly effective collaborator.
- creating the right context for parallel and recursive tasks;
- removing some steps (eg, editing its previous response) to show only the corrected output;
- showing it its own output as my comment, when I want a response;
Etc.
Obvious: while the agent can multiply the amount of work I can do, there's a multiplicative reduction in quality, which means I need to account for that (I have to add "time doing curation")
More seriously, yes it makes sense that LLMs are not going to be able to take humans entirely out of the loop. Think about what it would mean if that were the case: if people, on the basis of a few simple prompts could let the agents loose and create sophisticated systems without any further input, the there would be nothing to differentiate those systems, and thus they would lose their meaning and value.
If prompting is indeed the new level of abstraction we are working at, then what value is added by asking Claude: make me a note-taking app? A million other people could also issue this same low-effort prompt; thus what is the value added here by the prompter?
10-15 years ago the challenge in ML/PR was "feature engineering", the careful crafting of rules that would define features in the data which would draw the attention of the ML algorithm.
Then deep learning came along and it solved the issue of feature engineering; just throw massive amounts of data at the problem and the ML algorithms can discern the features automatically, without having to craft them by hand.
Now we've gone as far as we can with massive data, and the problem seems to be that it's difficult to bring out the relevent details when there's so much data. Hence "context engineering", a manual, heuristic-heavy processes guided by trial and error and intuition. More an art than science. Pretty much the same thing that "feature engineering" was.
For me, Claude Code completely ignores the instruction to read and follow AGENTS.md, and I have to remind it every time.
The joys of non-deterministic blackboxes.
https://gist.github.com/artpar/60a3c1edfe752450e21547898e801...
(specially the AGENT.knowledge is quite helpful)
I'd also be interested in your process for creating these files, such as examples of prompts, tools, and references for your research.
can you elaborate a bit? how do you proceed? what does your process look like?
The way I use AI today is by keeping a pretty tight leash on it, a la Claude Code and Cursor. Not because the models aren't good enough, but because I like to weigh in frequently to provide taste and direction. Giving the AI more agency isn't necessarily desirable, because I want to provide that taste.
Maybe that'll change as I do more and new ergonomics reveal themselves, but right now I don't really want AI that's too agentic. Otherwise, I kind of lose connection to it.
My experience is that, for many workflows, well-done “prompt engineering” is more than enough to make AI models behave more like we’d like without constantly needing us to weight in.
If we use a real world analogy, think of someone like an architect designing your house. I'm still going to be heavily involved in the design of my house, regardless of how skilled and tasteful the architect is. It's fundamentally an expression of myself - delegating that basically destroys the point of the exercise. I feel the same for a lot of the stuff I'm building with AI now.
From your comments, I’d venture a guess that you see your AI-assisted work as a creative endeavor — an expression of your creativity.
I certainly wouldn’t get my hopes up for AI to make innovative jokes, poems and the like. Yet for things that can converge on specific guidelines for matters of taste and preferences, like coding, I’ve been increasingly impressed by how well AI models adapt to our human wishes, even when expressed in ever longer prompts.
In the end the agentic coding bit was garbage, but i appreciated claude’s help on writing the boilerplate to interface with stockfish
I do agree - the models have good taste and often do things that delight me, but there's always room for me to inject my taste. For example, I don't want the AI to choose what state management solution I use for my Flutter app because I have strong opinions about that.
I like Bloc the most!
What's good prompting for one model can be bad for another.
No.
--- start quote ---
prompt engineering is nothing but an attempt to reverse-engineer a non-deterministic black box for which any of the parameters below are unknown:
- training set
- weights
- constraints on the model
- layers between you and the model that transform both your input and the model's output that can change at any time
- availability of compute for your specific query
- and definitely some more details I haven't thought of
https://dmitriid.com/prompting-llms-is-not-engineering
--- end quote ---
For example, a single prompt could tell an llm to make sure a code change doesn't introduce mutability when the same functionality can be achieved with immutable expressions. Another one to avoid useless log statements (with my specific description of what that means).
When I want to evaluate a code change, I run all these prompts separately against it, collecting their structured (with MCP) output. Of course, I incorporate this in my code-agent to provide automated review iterations.
If something escapes where I feel the need to "manually" provide context, I add a new prompt (or figure out how to extend whichever one failed).
The old adage still applies: there is no free lunch. It makes sense that LLMs are not going to be able to take humans entirely out of the loop.
Think about what it would mean if that were the case: if people, on the basis of a few simple prompts could let the agents loose and create sophisticated systems without any further input, the there would be nothing to differentiate those systems, and thus they would lose their meaning and value.
If prompting is indeed the new level of abstraction we are working at, then what value is added by asking Claude: make me a note-taking app? A million other people could also issue this same low-effort prompt; thus what is the value added here by the prompter?
Although sometimes the difficult part is knowing what to make, and LLMs are great for people who actually know what they want, but don’t know how to do it
Spamming is not only obnoxious, but a very weak example. Spamming is so error tolerant that if 30% of the output is totally wrong, the sender won't notice. Response rates are usually very low. This is a singularly un-demanding problem.
You don't even need "AI" for this. Just score LinkedIn profiles based on keywords, and if the score is high enough, send a spam. Draft a few form letters, and send the one most appropriate for the keywords. Probably would have about the same reply rate.
We see these patterns do much so that we packaged it up for Airflow (one of the most popular workflow tools)!
I suspect a reason so many people are excited about agents is they are used to "chat assistants" as the primary purpose of LLMs, which is also the ideal use case for agents. The solution space in chat assistants is not defined in advance, and more complex interactions do get value from agents. For example, "find my next free Friday night and send a text to Bob asking if he's free to hang out" could theoretically be programmatically solved, but then you'd need to solve for every possible interaction with the assistant; there are a nearly unlimited number of ways of interfacing with an assistant, so agents are a great solution.
By the time you got a nice well established context with the right info... just give it to the user.
I like the idea of hallucination-free systems where the LLM merely classifies things at most.
Question -> classifier -> check with user action to take -> act using no AI
I think there's some truth to using the right orchestration for the job, but I think that there's a lot more jobs that could benefit from agentic orchestration than the article would have you believe.
Hard disagree with most of the narrative. Dont start with models, start with Claude Code. For any use case. Go from there depending on costs.
> When NOT to use agents
> Enterprise Automation
Archive this blog.
The real lesson is don't let any company other than the providers dictate what an agent is vs isnt.
Computer use agents are here, they are coming for the desktop of non-technical users, they will provide legitimate RPA capability and beyond, anyone productizing agents will build on top of provider sdks.
I used to build the way most of his examples are just functions calling LLMs. I found it almost necessary due to poor tool selection etc. But I think the leading edge LLMs like Gemini 2.5 Pro and Claude 4 are smart enough and good enough at instruction following and tool selection that it's not necessarily better to create workflows.
I do have a checklist tool and delegate command and may break tasks down into separate agents though. But the advantage of creating instructions and assigning tool commands, especially if you have an environment with a UI where it is easy to assign tool commands to agents and otherwise define them, is that it is more flexible and a level of abstraction above something like a workflow. Even for visual workflows it's still programming which is more brittle and more difficult to dial in.
This was not the case 6-12 months ago and doesn't apply if you insist on using inferior language models (which most of them are). It's really only a handful that are really good at instruction following and tool use. But I think it's worth it to use those and go with agents for most use cases.
The next thing that will happen over the following year or two is going to be a massive trend of browser and computer use agents being deployed. That is again another level of abstraction. They might even incorporate really good memory systems and surely will have demonstration or observation modes that can extract procedures from humans using UIs. They will also learn (record) procedural details for optimization during exploration from verbal or written instructions.
If you skip the modeling part and rely on something that you don't control being good enough, that's faith not engineering.
The goal _should_ be to avoid doing traditional software engineering or create a system that requires typical engineering to maintain.
Agents with leading edge LLMs allow smart users to have flexible systems that they can evolve by modifying instructions and tools. This requires less technical skill than visual programming.
If you are only taking advantage of the LLM to handle a few wrinkles or a little bit of natural language mapping then you aren't really taking advantage of what they can do.
Of course you can build systems with rigid workflows and sprinkling of LLM integration, but for most use cases it's probably not the right default mindset for mid-2025.
Like I said, I was originally following that approach a little ways back. But things change. Your viewpoint is about a year out of date.
You're YOLOing it, and okay that may be fine but may also be a colossal mistake, especially if you remove or never had a human in the loop.
The process is encoded in natural language and tool options.
I'm not YOLOing anything.
The callout on enterprise automation is interesting b/c it's one of the $T sized opportunities that matters most here, and while I think the article is right in the small, I now think quite differently in the large for what ultimately matters here. Basically, we're crossing the point where one agent written in natural language can easily be worth ~100 python scripts and be much shorter at the same time.
For context, I work with teams in operational enterprise/gov/tech co teams like tier 1+2 security incident response, where most 'alerts' don't get seriously investigated as underresourced & underautomated teams have to just define them away. Basically every since gpt4, it's been pretty insane figuring this stuff out with our partners here. As soon as you get good at prompt templates / plans with Claude Code and the like to make them spin for 10min+ productively, this gets very obvious.
Before agents:
Python workflows and their equivalent. They do not handle variety & evolution because they're hard-coded. Likewise, they only go so far on a task because they're brain dead. Teams can only crank out + maintain so many.
After agents:
You can easily sketch out 1 investigation template in natural language that literally goes 10X wider + 10X deeper than the equiv of Python code, including Python AI workflows. You are now handling much more of the problem.
it would be helpful to know which models where used in each scenario, otherwise this can largely be ignored
See also https://ai.intellectronica.net/the-case-for-ai-workflows
ldjkfkdsjnv•8h ago
candiddevmike•8h ago
tptacek•8h ago
bGl2YW5j•6h ago
tptacek•6h ago
snek_case•8h ago
It tends to work better when you give the LLMs some specific narrow subtask to do rather than expecting them to be in the driver's seat.
mccoyb•8h ago
tptacek•8h ago
m82labs•8h ago
ldjkfkdsjnv•8h ago
malfist•7h ago
imhoguy•3h ago