It... isn't? Hallucinations are surely a limitation of LLMs, but I haven't heard people worrying about it as some kind of existential question for a long time. You accept it's a non-deterministic system. You build appropriate safeguards or deterministic checks around it. And you accept it's not perfect, there will be occasional mistakes. No large enough organization can claim determinism for any sufficiently large system.
He claims that his company has "solved" hallucination by creating a verifiable fact-finding system, which is like saying that a person has solved plan crashes by creating a plane that never leaves the ground.
When an LLM says something incorrect, it often is due to that LLM reaching the limits of its abilities, but it doesn't "know" (for lack of a better term) what being wrong feels like, so it will try its best to fit the information it has into a compelling story. The reason why scaling leads to fewer hallucinations is that the model can hold more abstractions, more facts about the world, it can work through the complex, vague machinery of reason with more scaffolding, and more of a buffer (via its weights) to reason with nuance. This is why LLM's are useful, not because they can be fed into a fact-retrieval system, but because they can produce new information via the association of things they know.
The point is, we want LLM's to actually produce new information and work out things via their thinking, not be limited to citing facts that already exist and avoid veering into the limits of its abilities. In that sense hallucination is really just exposing the limits of scale, which would necessitate scaling models further.
Scaling is the only way we have gotten to this interesting, emergent property of LLM's. Further, the best way to make small models which don't hallucinate (that we've found so far) is to train a big model first, then distill it, or use it as a teacher to a smaller model. Either way, pursuing scale is the most defensible strategy, and a more robust solution to hallucination.
No, it can hold more floating point numbers.
I'm not an expert in the field, but I've yet to see a solid rebuttal to this paper;
This is the big problem with "agentic" AI. If you let the AI system do anything important, it's going to screw up reasonably often, and screw up in an expensive way occasionally. The usual solution to this is to make the errors an externality - dump them on the consumer-grade end user or an employee. As Google Search puts it, at the end of each result, "AI can make mistakes, so double-check responses".
External checking, which Cringley is pushing, has potential for search type systems. It's not likely to help when there's no one source text that can be used as an authority for checking. It's not likely to help with systems that actually do something.
How's end to end neural net driving working out?
> No, it can hold more floating point numbers.
Fallacy of composition. Just because an LLM is made up of floating point numbers doesn't mean its capabilities are limited to that of bare floating point numbers, in the same way that the individual faculties of a neuron don't preclude the human brain from emergent properties born from the synthesis of its synapses.
JSR_FDED•2d ago
But what about things that only scale can achieve? Like the superhuman security vulnerability assessment capabilities that Fable showed? That would be a reason to continue to spend, wouldn’t it?
B1FF_PSUVM•2d ago
I have a bad feeling about this, and it's about us, not AIs ...
(I fear that we're #$&@%!## most of the time, and just oblivious about it)
thewebguyd•17m ago
I don't think "just throw more compute at it forever" is the only way to go, but if that turns out to be true, the labs aren't going to share that knowledge because that would be a risk to the dump trucks of cash getting dumped at their feet if they came out and said "You know, we don't really need much more compute, we found a better way to make a smarter model" the cash would slow down.