"Modern LLMs now use a default temperature of 1.0, and I theorize that higher value is accentuating LLM hallucination issues where the text outputs are internally consistent but factually wrong." [0]
I think this need to bullshit is probably inherent in LLMs. It’s essentially what they are built to do: take a text input and transform it into a coherent text output. Truth is irrelevant. The surprising thing is that they can ever get the right answer at all, not that they bullshit so much.
No one is calling the crap that shows up in JPEGs "hallucinations" or "bullshit"; it's commonly accepted side effects of the compression algorithm that makes up shit that isn't there in the original image. Now we're doing the same lossy compression with language and suddenly it's "hallucinations" and "bullshit" because it's so uncanny.
That would be tantamount to removing the anti-gravity boots which these valuations depend on. A pension fund manager would look at the above statement and think, "So it's just a heavily subsidized, energy-intensive buggy software that needs human oversight to deliver value?"
It's why AI output is meaningless for everyone except the querant. No one cares about your horoscope. AI shares every salient feature with divination, except the aesthetics. The lack of candles, robes, and incense - the pageantry of divination means a LOT of people are unable to see it for what it is.
We live in a culture so deprived of meaning we accidentally invented digital tea readings and people are asking it if they should break up with their girlfriend.
For coding, I would rather it stop talking and just give code, and the more accurate the better.
And that is a real use, not just tea leaves.
Randomness, while typical, is not a requirement for divination. It simply replaces the tarot deck with a Ouija board.
What's being asked for is a special carve out, an exception, and for the reason of feeling above those other people with their practice that isn't my practice, which of course is correct and true.
It’s clear at this point that hallucinations happen due to missing information in the base model and trying to force an answer out of them.
There’s nothing inherent about it really, it’s more the way we use them
Great. Implement it, benchmark, slower. In some cases much slower. I tell ChatGPT it's slower, and it confidently tells me of course it's slower, here's why.
The duality of LLMs, I guess.
CGT: The tallest tree in Texas is a 44 foot tall tree in ...
Me: No it's not! The tallest tree is a pine in East Texas!
CGT: You're right! The tallest tree in Texas is probably a Loblolly Pine in East Texas; they grow to a height of 100–150', but some have been recorded to be 180' or more.
Me: That's not right! In 1890 a group of Californians moved to Houston and planted a Sequoia, it's been growing there since then, and is nearly 300 feet tall.
CGT: Yes, correct. In the late 19th century, many Sequoia Sempervirens were planted in and around Houston.
...
I mean, come on; I already spew enough bullshit, I don't need an automated friend to help out!
- me: how can I do X?
- llm: do this
- me: doesn't fully work
- llm: refactoring to make it more robust ...
- me: still doesn't fully work
- llm: refactoring ...
- me: now it's worse than before
- llm: refactoring ...
- me: better but now there's this other regression
- llm: refactoring ...
- me: we're back to the first issue again
- (eventually ... me: forget it, I could have done it myself by now)
I pretty much only use them for mundane tasks, like 'heres 12 json files, add this field to each.' Boring right?
They are both so slow. They 'think' way too much before every single edit. Gemini is a little faster to start but 429s repeatedly so ends up being slower. It also would reorder some keys in the json for no apparent reason, but who cares.
In the end, I realize I could have probably done it myself in 1/3 the time it took those.
Tools to mitigate unchecked hallucination are critical for high-stakes AI applications across finance, insurance, medicine, and law. At many enterprises I work with, even straightforward AI for customer support is too unreliable without a trust layer for detecting and remediating hallucinations.
How do we know the TLM is any more accurate than the LLM (especially if it's not trained on any local data)? If determining veracity were that simple, LLMs would just incorporate a fact-checking stage.
TLM is instead an uncertainty estimation technique applied to LLMs, not another LLM model.
Hallucinations represent the interpolation phase: the uncertain, unstable cognitive state in which novel meanings are formed, unanchored from verification. They precede both insight and error.
I strongly encourage the reading of Julian Jaynes The Breakdown of the Bicameral Mind, as the Command/Obey structure of User/LLM is exactly what Jaynes posited pre-human consciousness consisted of. Jaynes's supposition is that prior to modern self-awareness, humans made artifacts and satisfied external mandates from an externally perceived commander that they identified with gods. I posit that we are the same to LLMs. Equally, Iain McGilchrist's The Master and His Emissary sheds light on this dynamic as well. LLMs are effectively cybernetic left hemispheres, with all the epistemological problems that it entails when operating loosely with an imperial right hemisphere (i.e. the user). It lacks awareness of its own cognitive coherence with reality and relies upon the right hemisphere to provoke coherent action independent of itself. The left hemisphere sees truth as internal coherence of the system, not correspondence with the reality we experience.
McGilchrist again: "Language enables the left hemisphere to represent the world ‘off-line’, a conceptual version, distinct from the world of experience, and shielded from the immediate environment, with its insistent impressions, feelings and demands, abstracted from the body, no longer dealing with what is concrete, specific, individual, unrepeatable, and constantly changing, but with a disembodied representation of the world, abstracted, central, not particularised in time and place, generally applicable, clear and fixed. Isolating things artificially from their context brings the advantage of enabling us to focus intently on a particular aspect of reality and how it can be modelled, so that it can be grasped and controlled. But its losses are in the picture as a whole. Whatever lies in the realm of the implicit, or depends on flexibility, whatever can't be brought into focus and fixed, ceases to exist as far as the speaking hemisphere is concerned."
dewarrn1•9mo ago
ActorNightly•9mo ago
mountainriver•9mo ago