As the most well known example: Anthropic examined their AIs and found that they have a "name recognition" pathway - i.e. when asked about biographic facts, the AI will respond with "I don't know" if "name recognition" has failed.
This pathway is present even in base models, but only results in consistent "I don't know" if AI was trained for reduced hallucinations.
AIs are also capable of recognizing their own uncertainity. If you have an AI-generated list of historic facts that includes hallucinated ones, you can feed that list back to the same AI and ask it about how certain it is about every fact listed. Hallucinated entries will consistently have less certainty. This latent "recognize uncertainty" capability can, once again, be used in anti-hallucination training.
Those anti-hallucination capabilities are fragile, easy to damage in training, and do not fully generalize.
Can't help but think that limited "self-awareness" - and I mean that in a very mechanical, no-nonsense "has information about its own capabilities" way - is a major cause of hallucinations. An AI has some awareness of its own capabilities and how certain it is about things - but not nearly enough of it to avoid hallucinations consistently across different domains and settings.
I predict we'll get a few research breakthroughs in the next few years that will make articles like this seem ridiculous.
You’re right in that it’s obviously not the only problem.
But without solving this seems like no matter how good the models get it’ll never be enough.
Or, yes, the biggest research breakthrough we need is reliable calibrated confidence. And that’ll allow existing models as they are to become spectacularly more useful.
And now, in some cases for a while, it is training on its own slop.
But I agree that being confidently wrong is not the only thing they can't do. Programming, great, maths, apparently great nowadays, since Google and OpenAI have something that could solve most problems on the IMO, even if the models we get to see probably aren't models that can do this, but LLMs produce crazy output when asked to produce stories, they produce crazy output when given too long confusing contexts and have some other problems of that sort.
I think much of it is solvable. I certainly have ideas about how it can be done.
But memory is a minor thing. Talking to a knowledgeable librarian or professor you never met is the level we essentially need to get it to for this stuff to take off.
Ha, that almost seems like an oxymoron. The previous encounters can be the new training data!
I think the next iteration of LLM is going to be "interesting", i.e. now that all the websites they used to freely scrape have been increasingly putting up walls.
I‘ve yet to see a convincing article for artificial training data.
This. Lack of any way to incorporate previous experience seems like the main problem. Humans are often confidently wrong as well - and avoiding being confidently wrong is actually something one must learn rather than an innate capability. But humans wouldn't repeat same mistake indefinitely.
The feedback you get is incredibly entangled, and disentangling it to get at the signals that would be beneficial for training is nowhere near a solved task.
Even OpenAI has managed to fuck up there - by accidentally training 4o to be a fully bootlickmaxxed synthetic sycophant. Then they struggled to fix that for a while, and only made good progress at that with GPT-5.
Re training data - We have synthetic data, and we probably haven't hit a wall. Gpt-5 was only 3.5 months after o3. People are reading too much into the tea leaves here. We don't have visibility into the cost of Gpt-5 relative to o3. If it's 20% cheaper, that's the opposite of a wall, that's exponential like improvement. We don't have visibility into the IMO/IOI medal winning models. All I see are people curve fitting onto very limited information.
1. The words "the only thing" massively underplays the difficulty of this problem. It's not a small thing.
2. One of the issues I've seen with a lot of chat LLMs is their willingness to correct themselves when asked - this might seem, on the surface, to be a positive (allowing a user to steer the AI toward a more accurate or appropriate solution), but in reality it simply plays into users' biases & makes it more likely that the user will accept & approve of incorrect responses from the AI. Often, rather than "correcting" itself it merely "teaches" the AI how to be confidently wrong in an amenable & subtle manner which the individual user finds easy to accept (or more difficult to spot).
If anything, unless/until we can solve the (insurmountable) problem of AI being wrong, AI should at least be trained to be confidently & stubbornly wrong (or right). This would also likely lead to better consistency in testing.
Except they don't correct themselves when asked.
I'm sure we've all been there, many, many, many,many,many times ....
- User: "This is wrong because X"
- AI: "You're absolutely right ! Here's a production-ready fixed answer"
- User: "No, that's wrong because Y"
- AI: "I apologise for frustrating you ! Here's a robust answer that works"
- User: "You idiot, you just put X back in there"
- and so continues the vicious circle....
They tend to very quickly lose useful context of the original problem and stated goals.
Is there anything else I can help you with?
But I've had it consistently happens to me on tiny contexts (e.g. I've had to spend time trying - and failing - to get it to fix a mess it was making with a straightforward 200-ish line bash script).
And its also very frequently happened to me when I've been very careful with my prompts (e.g. explicitly telling it to use a specific version of a specific library ... and it goes and ignores me completely and picks some random library).
I would be willing to record myself using them across paid models with custom instructions and see if the output is still garbage.
> Yeah I think our jobs are safe.
I give myself 6-18 months before I think top-performing LLM's can do 80% of the day-to-day issues I'm assigned. > Why doesn’t anyone acknowledge loops like this?
Thisis something you run into early-on using LLM's and learn to sidestep. This looping is a sort of "context-rot" -- the agent has the problem statement as part of it's input, and then a series of incorrect solutions.Now what you've got is a junk-soup where the original problem is buried somewhere in the pile.
Best approach I've found is to start a fresh conversation with the original problem statement and any improvements/negative reinforcements you've gotten out of the LLM tacked on.
I typically have ChatGPT 5 Thinking, Claude 4.1 Opus, Grok 4, and Gemini 2.5 Pro all churning on the same question at once and then copy-pasting relevant improvements across each.
While I agree, and also use your work around, I think it stands to reason this shouldn't be a problem. The context had the original problem statement along with several examples of what not to do and yet it keeps repeating those very things instead of coming up with a different solution. No human would keep trying one of the solutions included in the context that are marked as not valid.
Exactly. And certainly not a genius human with the memory of an elephant and a PhD in Physics .... which is what we're constantly told LLMs are. ;-)
> No human would keep trying one of the solutions included in the context that are marked as not valid.
Yeah, definitely not. Thankfully for my employment status, we're not at "human" levels QUITE yetIn theory you should be able to get a multiplicative effect on context window size by consolidating context into it's most distilled form.
30,000 tokens of wheel spinning to get the model back on track consolidated to 500 tokens of "We tried A, and it didn't work because XYZ, so avoid A" and kept in recent context
That means that positively worded instructions ("do x") work better than negative ones ("don't do y"). The more concepts that you don't want it to use / consider show up in the context, the more they do still tend to pull the response towards them even with explicit negation/'avoid' instructions.
I think this is why clearing all the crap from the context save for perhaps a summarizing negative instruction does help a lot.
> positively worded instructions ("do x") work better than negative ones ("don't do y")
I've noticed this.I saw someone on Twitter put it eloquently: something about how, just like little kids, the moment you say "DON'T DO XYZ" all they can think about is "XYZ..."
How long before there's an AI smart enough to say 'no' to half the terrible ideas I'm assigned?
This is going to age like "full self driving cars in 5 years". Yeah it'll gain capabilities, maybe it does do 80% of the work, but it still can't really drive itself, so it ultimately won't replace you like people are predicting. The money train assures that AGI/FSD will always be 6-18 months away, despite no clear path to solving glaring, perennial problems like the article points out.
My impression is rather: there exist two kinds of people who are "very invested in this illusion":
1. People who want to get rich by either investing in or working on AI-adjacent topics. They of course have an interest to uphold this illusion of magic.
2. People who have a leftist agenda ("we will soon all be replaced by AI, so politics has to implement [leftist policy measures like UBI]"). If people realize that AI is not so powerful, after all, such leftist political measures whose urgency was argued with the (hypothetical) huge societal changes that will be caused by AI will not have a lot backing in society, or at least not considered to be urgently implemented by society.
The more leftist position ever since the days of Marx has been that "right rather than being equal would have to be unqueal" to be equitable given that people have different needs, to paraphrase from Critique of the Gotha Program - UBI is in direct contradiction to socialist ideals of fairness.
The people I see pushing UBI, on the contrary, usually seems motivated either by the classically liberal position of using it to minimise the state, or driven by a fear of threats to the stability of capitalism. Saving capitalism from perceived threats to itself isn't a particularly leftist position.
It doesn't mean these loops aren't an issue, because they are, but once you stop engaging with them and cut them off, they're a nuisance rather than a showstopper.
With weak multi-turn instruction following, context data will often dominate over user instructions. Resulting in very "loopy" AI - and more sessions that are easier to restart from scratch than to "fix".
Gemini is notorious for underperforming at this, while Claude has relatively good performance. I expect that many models from lesser known providers would also have a multi-turn instruction following gap.
User: Fix this problem ...
Assistant: X
User: No, don't do X
Assistant: Y
User: No, Y is wrong too.
Assistant: X
It is generally pointless to continue. You now have a context that is full of the assistant explaining to you and itself why X and Y are the right answers, and much less context of you explaining why it is wrong.If you reach that state, start over, and constrain your initial request to exclude X and Y. If it brings up either again, start over, and constrain your request further.
If the model is bad at handling multiple turns without getting into a loop, telling it that it is wrong is not generally going to achieve anything, but starting over with better instructions often will.
I see so many people get stuck "arguing" with a model over this, getting more and more frustrated as the model keeps repeating variations of the broken answer, without realising they're filling the context with arguments from the model for why the broken answer is right.
If you mention X or Y, even if they're preceded by "DO NOT" in all caps, an LLM will still end up with both X and Y into its context, making it more likely it gets used.
I'm running out of ways to tell the assistant to not use mocks for tests, it really really wants to use them.
(And yes, it's a horrible workaround)
I think often it's not required to completely start over: just identify the part where it goes off the rails, and modify your prompt just before that point. But yeah, basically the same process.
Maybe because people expect AI systems that are touted as all-knowing, all-powerful, coming-for-your-job to be smart enough to remember what was said two turns ago?
Probably the ideal would be to have a UI / non-chat-based mechanism for discarding select context.
> To accomplish X you can just use Y!
But Y isn't applicable in this scenario.
> Oh, you're absolutely right! Instead of Y you can do Z.
Are you sure? I don't think Z accomplishes X.
> On second thought you're absolutely correct. Y or Z will clearly not accomplish X, but let's try Q....
Like you realize humans hallucinate too right? And that there are humans that have a disease that makes them hallucinate constantly.
Hallucinations don’t preclude humans from being “intelligent”. It also doesn’t preclude the LLM from being intelligent.
A developer that hallucinates at work to the extent that LLMs does would probably have issues getting their PRs past code reviews a lot.
Minority != wrong, with many historic examples that imploded in spectacular fashion. People at the forefront of building these things aren't immune from grandiose beliefs, many of them are practically predisposed to them. They also have a vested interest in perpetuating the hype to secure their generational wealth.
The ai can easily answer correctly complex questions NOT in its data set. If it is generating answers to questions like these out of thin air it fits our definition of intelligence.
They also dont have an internal world model. Well I don't think so, but the debate is far from settled. "Experts" like the cofounders of various AI companies (whose livelihood depends on selling these things) seem to believe that. Others do not.
So presumably we have a solid, generally-agreed-upon definition on intelligence now?
> autocompleting things with humanity changing intelligent content.
What does this even mean?
It's not obvious how long until that point or what form it will finally take, but it should be obvious that it's going to happen at some point.
My speculation is that until AI starts having senses like sight, hearing, touch and the ability to learn from experience, it will always be just a tool/help/aider to someone doing a job, but could not possibly replace that person in that job as it lacks the essential feedback mechanisms for successfully doing that job in the first place.
Pronoun and noun wordplay aside ( 'Their' ... `themselves` ) I also agree that LLMs can correct the path being taken, regenerate better, etc...
But the idea that 'AI' needs to be _stubbornly_ wrong ( more human in the worst way ) is a bad idea. There is a fundamental showing, and it is being missed.
What is the context reality? Where is this prompt/response taking place? Almost guaranteed to be going on in a context which is itself violated or broken; such as with `Open Web UI` in a conservative example: Who even cares if we get the responses right? Now we have 'right' responses in a cul-de-sac universe. This might be worthwhile using `Ollama` in `Zed` for example, but for what purpose? An agentic process that is going to be audited anyway, because we always need to understand the code? And if we are talking about decision-making processes in a corporate system strategy... now we are fully down the rabbit hole. The corporate context itself is coming or going on whether it is right/wrong, good/evil, etc... as the entire point of what is going on there. The entire world is already beating that corporation to death or not, or it is beating the world to death or not... so the 'AI' aspect is more of an accelerant of an underlying dynamic, and if we stand back... what corporation is not already stubbornly wrong, on average?
How is that wordplay? Those are the correct pronouns.
Could real-time observability into the network's internals somehow feed back into the model to reduce these hallucination-inducing shortcuts? Like train the system to detect when a shortcut is being used, then do something about it?
Exactly. One could argue that this is just an artifact from the fundamental technique being used: it’s a really fancy autocomplete based on a huge context window.
People still think there’s actual intelligence in there, while the actual problems by making these systems appear intelligent is mostly algorithms and software managing exactly what goes into these context windows at what place.
Don’t get me wrong: it feels like magic. But I would argue that the only way to recognize a model being “confidently wrong” is to let another model, trained on completely different datasets with different techniques, judge them. And then preferably multiple.
(This is actually a feature of an MCP tool I use, “consensus” from zen-mcp-server, which enables you to query multiple different models to reach a consensus on a certain problem / solution).
But it happened at a time where hype can be delivered at a magnitude never before seen by humanity as well to a degree of volume that is completely unnatural by any standard set previously by hype machines created by humanity. Not even landing on the moon has inundated people with as much hype. But inevitably like landing on the moon, humanity is suffering from hype fatigue.
Too much hype makes us numb to the reality of how insane the technology is.
Like when someone says the only thing stopping LLMs is hallucinations… that is literally the last gap. LLMs cover creativity, comprehension, analysis, knowledge and much more. Hallucinations is it. The final problem is targeted and boxed into something much more narrower then just build a human level AI from scratch.
Don’t get me wrong. Hallucinations are hard. But this being the last thing left is not an underplay. Yes it’s a massive issue but yes it is also a massive achievement to reduce all of agi to simply solving just an hallucination problem.
What we got instead is a bunch of wisecracking programmers who like to remind everyone of the 90–90 rule, or the last 10 percent.
Yes! I often find myself overthinking my phrasing to the nth degree because I've learned that even a sprinkle of bias can often make the LLM run in that direction even if it's not the correct answer.
It often feels a bit like interacting with a deeply unstable and insecure people pleasing person. I can't say anything that could possibly be interpreted as a disagreement because they'll immediately flip the script, I can't mention that I like pizza before asking them what their favorite food is because they'll just mirror me.
Humans have meta-cognition that helps them judge if they're doing a thing with lots of assumptions vs doing something that's blessed.
Humans decouple planning from execution right? Not fully but we choose when to separate it and when to not.
If we had enough data on here's a good plan given user context and here's a bad plan, it doesn't seem unreasonable to have a pretty reliable meta cognition capability on the goodness of a plan.
They are at their most useful when it is cheaper to verify their output than it is to generate it yourself. That’s why code is rather ok; you can run it. But once validation becomes more expensive than doing it yourself, be it code or otherwise, their usefulness drops off significantly.
“LLMs don’t know what they don’t know” https://blog.scottlogic.com/2025/03/06/llms-dont-know-what-t...
But I wouldn’t say it is the only problem with this technology! Rather, it is a subtle issue that most users don’t understand
chatGPT (5) is not there especially in replacing my field and skills: graphic, web design and web development. The first 2 there it spits out solid creations per your prompt request yet can not edit it's creations just creates new ones lol. So it's just another tool in my arsenal not a replacement to me.
Such Makes me wonder how it generates the logos and website designs ... is it all just hocus pocus.. the Wizard of OZ?
I don't know about replacing anyone but our UI/UX designers are claiming it's significantly faster than traditional mock ups
Because “ai” is fallible, right now it is at best a very powerful search engine that can also muck around in (mostly JavaScript) codebases. It also makes mistakes in code, adds cruft, and gives incorrect responses to “research-type” questions. It can usually point you in the right direction, which is cool, but Google was able to do that before its enshittification.
s/AI/LLMs
The part where people call it AI is one of the greatest marketing tricks of the 2020s.
As Mazer Rackham from Ender's Game said: "Only the enemy shows you where you are weak."
Because MCPs solve the exact issue the whole post is about
Isn't it obvious?
It's all built around probability and statistics.
This is not how you reach definitive answers. Maybe the results make sense and maybe they're just nice sounding BS. You guess which one is the case.
The real catch --- if you know enough to spot the BS, you probably didn't need to ask the question in the first place.
It makes you a walking database --- an example of savant syndrome.
Combine this with failure on simple logical and cognitive tests and the diagnosis would be --- idiot savant.
This is the best available diagnosis of an LLM. It excels at recall and text generation but fails in many (if not most) other cognitive areas.
But that's ok, let's use it to replace our human workers and see what happens. Only an idiot would expect this to go well.
https://nypost.com/2024/06/17/business/mcdonalds-to-end-ai-d...
AI: “I’ve deployed the API data into your app, following best practices and efficient code.”
Me: “Nope thats totally wrong and in fact you just wrote the API credential into my code, in plaintext, into the JavaScript which basically guarantees that we’re gonna get hacked.”
AI: “You’re absolutely right. Putting API credentials into the source code for the page is not a best practice, let me fix that for you.”
> "I will admit, to my slight embarrassment … when we made ChatGPT, I didn't know if it was any good," said Sutskever.
> "When you asked it a factual question, it gave you a wrong answer. I thought it was going to be so unimpressive that people would say, 'Why are you doing this? This is so boring!'" he added.
https://www.businessinsider.com/chatgpt-was-inaccurate-borin...
On a different note: is it just me or are some parts of this article oddly written? The sentence structure and phrasing read as confusing - which I find ironic, given the context.
I asked Perplexity some question for sample UI code for Rust / Slint, it gave me a beautiful web UI, I think it got confused because I wanted to make a UI for an API that has its own web UI, I told it you did NOT give me code for Slint, even though some of its output made references to "ui.slint" and other Rust files, it realized its mistake and gave me exactly what I wanted to see.
tl;dr why dont llms just vet themselves with a new context window to see if they actually answered the question? The "reasoning" models don't always reason.
"Reasoning" models integrate some of that natively. In a way, they're trained to double check themselves - which does improve accuracy at the cost of compute.
It’s like saying you built a 3D scene on a 2D plane. You can employ clever tricks to make 2D look 3D at the right angle, buts it’s fundamentally not 3D, which obviously shows when you take the 2D thing and turn it.
It seems like the effectiveness plateau of these hacks will soon be (has been?) reached and the smoke and mirrors snake oil sales booths cluttering Main Street will start to go away. Still a useful piece of tech, just, not for every-fucking-thing.
Technically, I can't prove that they're wrong, novel solutions sometimes happen, and I guess the calculus is that it's likely enough to justify a trillion dollars down the hole.
His big idea is that evolution/advancements don't happen incrementally, but rather in unpredictable large leaps.
He wrote a whole book about it that's pretty solid IMO: "Why Greatness Cannot Be Planned: The Myth of the Objective."
[0] https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_t... [1] https://en.wikipedia.org/wiki/HyperNEAT
I guess the bitter lesson is gospel now, which doesn't sit right with me now that we're past the stage of Moore's Law being relevant, but I'm not the one with a trillion dollars, so I don't matter.
I remember a few years ago, we were planing to make some kind of math forum for students in the first year of the university. My opinion was that it was too easy to do it wrong. On one way you can be like Math Overflow were all the questions are too technical (for first year of the university) and all the answers are too technical (first year of the university). On the other way, you can be like Yahoo! Answers, where more than half of the answers were "I don't know", with many "I don't know" per question.
For the AI, you want to give it some room to generalize/bullshit. It one page says that "X was a few months before Z" and another page says that "Y was a few days before Z", than you want an hallucinated reply that says that "X happened before Y".
On the other hand, you want the AI to say "I don't know.". They just gave too little weight to the questions that are still open. Do you know a good forum where people post questions that are still open?
Has anyone had any success with continuous learning type AI products? Seems like there’s a lot of hype around RL to specialise.
There's no known good recipe for continuous learning that's "worth it". No ready-made solution for everyone to copy. People are working on it, no doubt, but it's yet to get to the point of being readily applicable.
It's literally just a statistical model that guesses what you want based on the prompt and a whole bunch of training data.
If we want a black box that's AGI/SGI, we need a completely new paradigm. Or we apply a bunch of old-school AI techniques (aka. expert systems) to augment LLMs and get something immediately useful, yet slightly limited.
RIght now LLMs do things and are somewhat useful. Short of some expectations, butter than others, but yeah, a statistical model was never going to be more than the sum of its training data.
The key feature of formalization is the ability to create statements, and test statements for correctness. ie, we went from fuzzy feel-good thinking to precise thinking thanks to the formalization.
Furthermore, the ingenuity of humans is to create new worlds and formalize them, ie we have some resonance with the cosmos so to speak, and the only resonance that the LLMs have is with their training datasets.
Yesterday I asked ChatGPT a really simple, factual question. "Where is this feature on this software?" And it made up a menu that didn't exist. I told "No,, you're hallucinating, search the internet for the correct answer" and it directly responded (without the time delay and introspection bubbles that indicate an internet search) "That is not a hallucination, that is factually correct". God damn.
_Algernon_•3h ago
chpatrick•3h ago
blueflow•3h ago
chpatrick•3h ago
blueflow•2h ago
chpatrick•2h ago
blueflow•2h ago
chpatrick•2h ago
blueflow•2h ago
- Jacob Veldhuyzen van Zanten, respected aviation expert, 1977 teneriffa, brushing off the flight engineers concern about another machine on the runway
chpatrick•2h ago
Zigurd•3h ago
chpatrick•3h ago
Zigurd•2h ago
chpatrick•2h ago
Zigurd•2h ago
contagiousflow•2h ago
chpatrick•2h ago
krapp•2h ago
chpatrick•2h ago
Also you could always pick the most likely token in an LLM as well to make it deterministic if you really wanted.
krapp•2h ago
One thing humans tend not to do is confabulate entirely to the degree that LLMs do. When humans do so, it's considered a mental illness. Simply saying the same thing in a different way is not the same as randomly randomly syntactically correct nonsense. Most humans will not, now and then, answer that 2 + 2 = 5, or that the sun rises in the southeast.
chpatrick•1h ago
staticman2•1h ago
You are in my mind rightfully getting pushback for writing "human experts also output tokens with some statistical distribution. "
chpatrick•1h ago
You have a big opaque box with a slot where you can put text in and you can see text come out. The text that comes out follows some statistical distribution (obviously), and isn't always the same.
Can you decide just from that if there's an LLM or a human sitting inside the box? No. So you can't make conclusions about whether the box as a system is intelligent just because it outputs characters in a stochastic manner according to some distribution.
staticman2•46m ago
That shouldn't even be controversial, I don't think?
You wrote "The text that comes out follows some statistical distribution".
At the risk of being over my head here did you mean the text can be described statistically or "follows some statistical distribution". Are these two concepts the same thing? I don't think so.
A program by design follows some statistical distribution. A human is doing whatever electrochemical thing it's doing that can be described statistically after the fact.
Regardless my point was pretty simple, I know this will never happen but I wish tech people would drop this tech language when describing humans and adopt neuroscience language.
chpatrick•37m ago
Doesn't matter what they think in. A token can be a letter or a word or a sound. The point is that the box takes some sequence of tokens and produces some sequence of tokens.
> You wrote "The text that comes out follows some statistical distribution". > At the risk of being over my head here did you mean the text can be described statistically or "follows some statistical distribution". Are these two concepts the same thing? I don't think so. > A program by design follows some statistical distribution. A human is doing whatever electrochemical thing it's doing that can be described statistically after the fact.
Again, it doesn't matter how the box works internally. You can only observe what goes in and out and observe its distribution.
> Regardless my point was pretty simple, I know this will never happen but I wish tech people would drop this tech language when describing humans and adopt neuroscience language.
My point is neuroscience or not doesn't matter. People make the claim that "the box just produces characters with some stochastic process, therefore it's not intelligent or correct", and I'm saying that implication is not true because there could just as well be a human in the box.
You can't decide whether a system is intelligent just based of the method with which it communicates.
staticman2•4m ago
I'd say anybody who writes "the LLM just produces characters with some stochastic process, therefore it's not intelligent or correct" is making an implicit argument about the way the LLM works and the way the human brain works. There might even be an implicit argument about how intelligence works.
They are not making the argument that you can't make up statistical models to describe a box, a human, the weather, the works of Shakespeare, a Jellybean, or an expert human opinion. But that seems to be the claim you are responding to.
nijave•2h ago
I.e. ability to plug in expert data sources
Zigurd•2h ago
JamesSwift•2h ago
chpatrick•2h ago