I'd expect 100% of "agentic AI" to be hype. It's a meaningless term because almost any long-running software with an execution engine can qualify. What really differentiates "real agentic" from "slapped IFTT and an LLM together"?
Sounds great but given the exact same context they don't design the same path each time.
The word "agentic" doesn't bring anything new. We've always been doing this.
If you have used GPT o3, you have used Agentic models. If you use Claude Code or Cursor, you have used Agentic models. If you have used Claude + MCP, you have used agentic models.
This isn't theoretical - its already being applied. In fact, Langchain just hosted an entire conference to showcase how Fortune 100 companies were using LangGraph and I can assure you if they could have used a more simple architecture they would have.
If you wait around until everyone considers it to be acceptable in the mainstream all the good opportunities to create disruptive tech will be gone. So please - keep publishing this luddite trash and throw shade on the sidelines. I will keep doing agent research and building with it. We will see in 10 years who was wrong I guess.
https://www.qodo.ai/blog/building-agentic-flows-with-langgra...
For example, there was a paper recently (unfortunately I can't remember which it was) that talked about how if you support pausing generation and allowing the human to edit the generated text, then resuming, you can get much better results than if you force the human to respond to the AI as a separate message. That's something that is more difficult for a framework to adapt to than for a library to adapt to.
I'm building an agentic framework that is more on the library side of things, and it necessarily leaves control of the iteration in the hands of the developer. The developer literally writes a while loop to run the agent, offloading things like RAG and tool calling to different pieces of the library. But every part of the iteration is controlled by the developer's agent source code.
However the reason I'm using a computer is because I want repeatable behaviour.
If there's a bug, I want to fail every single time so I can fix it (yeah I know, buffer overruns and it's not always repeatable) but least I can squash the bug for good if I can actually find where I've stuffed up.
Side note: Happening less with Rust now protecting me from myself which pleases me, at least the failures are more repeatable if I can get the inputs right.
Where as now the "prompt engineers" try to fluff up the prompt with more CAPS and "CRITICAL" style wording to get it over the line today and they know full well it'll just break in some other weird way tomorrow.
Hopefully a few more cases of LLMs going rouge will educate them since they refuse to learn from Air Canada or Cursor Support.
I have written about 20 AI books in the last 30 years and I am nkw finding myself to be a mild AI skeptic. My current AI use case is as a research assistant, and that is about all I use Gemini and ChatGPT for any longer. (Except for occasional code completion.) And, think about all the energy we are using for LLM inference!
> My current AI use case is as a research assistant
> I am now finding myself to be a mild AI skeptic
Seems contradictory.
Augmented coding as Kent beck puts it is filled with errors but more and more people are starting to find to be a 2x+ improvement for most cases.
People are spending too much time arguing that the the extreme hype is extremely hyped and what can't be done and aren't looking at the massive progress in terms of what can be done.
Also no one I know uses any of the models in the article at this point. They called out a 50% improvement in models spaced 6 months apart... that's also where some of the hype comes from.
Reminds me of this classic: AI == Actually Indian
https://www.businesstoday.in/technology/news/story/700-india...
Edit: this is false and has been debunked. The real story is in the child comment.
This has impact on those careers so it’s worth getting right.
More like GIR from Invader Zim.
A dog walks into a butcher shop with a purse strapped around his neck. He walks up to the meat case and calmly sits there until it's his turn to be helped. A man, who was already in the butcher shop, finished his purchase and noticed the dog. The butcher leaned over the counter and asked the dog what it wanted today. The dog put its paw on the glass case in front of the ground beef, and the butcher said, "How many pounds?"
The dog barked twice, so the butcher made a package of two pounds ground beef.
He then said, "Anything else?"
The dog pointed to the pork chops, and the butcher said, "How many?"
The dog barked four times, and the butcher made up a package of four pork chops.
The dog then walked around behind the counter, so the butcher could get at the purse. The butcher took out the appropriate amount of money and tied two packages of meat around the dog's neck. The man, who had been watching all of this, decided to follow the dog. It walked for several blocks and then walked up to a house and began to scratch at the door to be let in. As the owner opened the door, the man said to the owner, "That's a really smart dog you have there."
The owner said, "He's not that smart. This is the second time this week he forgot his key."
For code, this would allow agents to build a web app, unit/e2e test it, take visual screenshots of the app and iterate on the design, etc. And do 50+ iterations of this at once. So you get 50 versions of the app in a few minutes with no input, with maybe another agent that ranks them and gives you the top 5 to play around with. Same for new features after you've built the initial version.
Right now they are so slow and have limited context windows that this isn't really feasible. But it would just require a few orders of magnitude improvements in context windows (at least) and speed (ideally, to make the cost more palatable).
I feel you can 'brute force' quality to a certain extent (even assuming no improvement in model quality) if you can keep a huge context window going (to avoid it going round in circles) and have multiple variations in parallel.
Companies hoping to automate decision-making had better keep in mind that article 22 of the GDPR [1] requires them, specifically in the case of automated decision-making, to "implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision."
santana16•5h ago
parpfish•4h ago
Once the industry has to acknowledge a plateau, there’ll be a pivot to “human in the loop” products. But instead of discussing their inability to do full automation, they’ll play up the ethics of preserving jobs for humans
Lerc•4h ago
I'll step out of the metaphor now before it gets stretched too far, but the point is tranformers and LLMs were a real advancement. From reading papers, it seems like things are still advancing at a steady pace. The things that advance at a steady pace are just not the things that grab the headlines. Frequently the things that do grab the headlines do not contribute to to the overall advancement.
To use another metaphor. The crashing waves reach you because the tide is coming in, you notice the waves first, but the waves are not the tide. The wave receding does not mean the tide is going out.