I was under the impression that some kind of caching mechanism existed to mitigate this
They don't go into implementation details but Gemini docs say you get a 75% discount if there's a context-cache hit: https://cloud.google.com/vertex-ai/generative-ai/docs/contex...
LLMs degrade with long input regardless of caching.
And I have an AI workflow that generates much better posts than this.
I read and generate hundreds of posts every month. I have to read books on writing to keep myself sane and not sound like an AI.
Or it's possible that he is one of those people that _realy_ adopted LLMs into _all_ their workflow, I guess, and he thinks the output is good enough as is, because it captured his general points?
LLMs have certainly damaged trust in general internet reading now, that's for sure.
Judging by the other comments this is clearly low-effort AI slop.
> LLMs have certainly damaged trust in general internet reading now, that's for sure.
I hate that this is what we have to deal with now.
Perhaps more interesting is whether their argument is valid and whether their math is correct.
"We can't allow this post to create FUD about the current hype on AI agents and we need the scam to continue as long as possible".
I’m at this stage where I’m fine with AI generated content. Sure, the verbosity sucks - but there’s an interesting idea here, but make it clear that you’ve used AI, and show your prompts.
This is not remotely true. Think of any business process around your company. 99.9% availability would mean only 1min26 per day allowed for instability/errors/downtime. Surely your human collaborators aren't hitting this SLA. A single coffee break immediately breaks this (per collaborator!).
Business Process Automation via AI doesn't need to be perfect. It simply needs to be sufficiently better than the status quo to pay for itself.
Reliability means 99.9% of the time when I hand something off to someone else it's what they want.
Availability means I'm at my desk and not at the coffee machine.
Humans very much are 99.9% accurate, and my deliverable even comes with a list of things I'm not confident about
This is an extraordinary claim, which would require extraordinary evidence to prove. Meanwhile, anyone who spends a few hours with colleagues in a predominantly typing/data entry/data manipulation service (accounting, invoicing, presales, etc.) KNOWS the rate of minor errors is humongous.
99.99% is just absurd.
The biggest variable though with all this is that agents don't have to one shot everything like a human because no one is going to pay a human to do the work 5 times over to make sure the results are the same each time. At some point that will be trivial for agents to always be checking the work and looking for errors in the process 24/7.
I'd imagine future agents will include training to design these checks into any output, validating against the checks before proceeding further. They may even include some minor risk assessment beforehand, such as "this aspect is crucial and needs to be 99% correct before proceeding further".
on a personal note, I'm happy to hear that. I've been apprehensive and haven't tried it, purely due to my fear of the cost.
My thinking: In a financial system collapse (a la The Big Short), the assets under analysis are themselves the things of value. Whereas betting on AI to collapse a technology business is at least one step removed from actual valuation, even assuming:
1. AI Agents do deliver just enough, and stay around long enough, for big corporations to lay off large number of employees
2. After doing so, AI quickly becomes prohibitively expensive for the business
3. The combination of the above factors tank business productivity
In the event of a perfect black swan, the trouble is that it's not actually clear that this combination of factors would result in concrete valuation drops. The business just "doesn't ship as much" or "ships more slowly". This is bad, but it's only really bad if you have competitors that can genuinely capitalise on that stall.
An example immediately on-hand: for non-AI reasons, the latest rumors are that Apple's next round of Macbook Pros will be delayed. This sucks. But isn't particularly damaging to the company's stock price because there isn't really a competitor in the market that can capitalise on that delay in a meaningful way.
Similarly, I couldn't really tell you what the most recent non-AI software features shipped by Netflix or Facebook or X actually were. How would I know if they're struggling internally and have stopped shipping features because AI is too expensive and all their devs were laid off?
I guess if you're looking for a severe black swan to bet against AI Agents in general, you'd need to find a company that was so entrenched and so completely committed to and dependent on AI that they could not financially survive a shock like that AND they're in a space where competitors will immediately seize advantage.
Don't get me wrong though, even if there's no opportunity to actually bet against that situation, it will still suck for literally everyone if it eventuates.
I don't think this one is worth shorting because there's no specific event to trigger the mindshare to start moving and validating your position. You'd have to wait for very big public failures before the herd start to move.
Claude Code is impressive but it still produces quite a bit of garbage in my experience, and coding agents are likely to be the best agents around for the foreseeable future.
This phrase is usually followed by some, you know...Math?
Agents have captivated the minds of groups of people in each large engineering org. I have no idea what their goal is other then they work on “GenAI”. For over a year now they have been working on agents with the promise that the next framework that MSFT or Alphabet publishes will solve their woes. They don’t actually know what they are solving for except everything involves agents.
I have yet to see agents solve anything but for some reason this idea that having an agent that you can send anything and everything will solve all problems for the company. LLMs have a ton of interesting applications but agents have yet to grasp me as interesting, I also don’t understand why so many large companies have focused time around it. They are not going to be cracking the code ahead of a commercial tool or open source project. In the time spent toying around with agents there are a lot of interesting applications that could have built, some of which may be technically an agent but without so much focus and effort on trying to solve for all use cases.
Edit: after rereading my post wanted to clarify that I do think there is a place for tool call chains and the like but so many folks I have talked to first hand are trying to create something that works for everything and anything.
That said, I have been using LLMs for a while now with great benefit. I did not notice anything missing, and I am not sure what agents bring to the table. Do you know?
I updated a svelte component at work, and while i could test it in the browser and see it worked fine, the existing unit test suddenly started failing. I spent about an hour trying to figure out why the results logged in the test didn't match the results in the browser.
I got frustrated, gave in and asked Claude Code, an AI agent. The tool call loop is something like: it reads my code, then looks up the documentation, then proposed a change to the test which i approve, then it re-runs the test, feeds the output back into the AI, re-checks the documentation, and then proposes another change.
It's all quite impressive, or it would be if at one point it didn't randomly say "we fixed it! The first element is now active" -- except it wasn't, Claude thought the first element was element [1], when of course the first element in an array is [0]. The test hadn't even actually passed.
An hour and a few thousand Claude tokens my company paid for and got nothing back for lol.
I do think it’s a step up when done correctly. Thinking of tools like Cursor. Most of my concern and issue comes from the amount of folks I have seen trying to great a system that solves everything. I know in my org people were working on Agents without even a problem they were solving for. They are effectively trying to recreate ChatGPT which to me is a fools errand.
I think it is a mix of fomo and the 'upside' potential of being able to minimize ( ideally remove ) the expensive "human component". Note, I am merely trying to portray a specific world model.
<< In the time spent toying around with agents there are a lot of interesting applications that could have built, some of which may be technically an agent but without so much focus and effort on trying to solve for all use cases.
Preaching to the choir man. We just got custom AI tool ( which manages to have all my industry specific restrictions rendering it kinda pointless, low context making it annoying, and slower than normal, because it now has to go through several layers of approval including 'bias' ).
At the same time, committee bickers over minute change to a process that has effectively no impact on anything of value.
Bonkers.
For me the only problem I have is I find typing slow and laborious. I've always said if I could find a way to type less I would take it. That's why I've been using tab completion and refactoring tools etc for years now. So I'm kind of excited about being able to get my thoughts into the computer more quickly.
But having it think for me? That's not a problem I have. Reading and assimilating information? Again, not a problem I have. Too much of this is about trying to apply a solution where there is no problem.
goal is to fire you (human), decrease costs and increase profits
The fundamental difference is we need HITL to reduce errors instead of HOTL which leads to the errors you mentioned
> AI tools aren't perfect yet. They sometimes make mistakes, and they can't always understand what you are trying to do. But they're getting better all the time, In the future, they will be more powerful and helpful. They'll be able to understand your code even better, and they'll be able to generate even more creative ideas.
From another post on the same site. [0]
Yup, slop.
[0]: https://utkarshkanwat.com/writing/review-of-coding-tools/
It seems the author never used prompt/workflow optimization techniques.
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow https://arxiv.org/pdf/2501.16673
Also, if you look at any human process you will realize that none of them have a 100% reliability rate. Yet, even without that we can manufacture e.g. a plane, something which takes millions of steps, each without a 100% success rate.
I actually think the article makes some good points, but especially when you are making good points it is unnecessary to stretch credibility with exaggerating your arguments.
My point was that something extremely complex, like a plane, works, because the system tries hard to prevent compounding errors.
So until these techniques are baked into the model by OpenAI, you have to come up with these ideas yourself.
If 50% of training data is not factually accurate, this needs to be weeded out.
Some industries require a first principles approach, and there are optimal process flows that lead to accurate and predictable results. These need research and implementation by man and machine.
It's hard to make *one* good product (see startup failure rates). You couldn't make 12 (as seemingly a solo dev?) and you're surprised?
we've been working on Definite[0] for 2 years with a small team and it only started getting really good in the past 6 months.
0 - data stack + AI agent: https://www.definite.app/
Just because we'd love to have fully intelligent, automatic agents, doesn't mean the tech is here. I don't work on anything that generates content (text, images, code). It's just slob and will bite you in the ass in the long run anyhow.
roschdal•3h ago
bboygravity•2h ago
Sounds like good business to me.
block_dagger•2h ago
satyrun•18m ago
All you are really saying with this comment is you have an incredibly narrow set of interests and absolutely no intellectual curiosity.