This might be an "experience is the best teacher" situation. It'd probably be pretty hard to invent a disclaimer that'd be as effective as getting bit.
Doing the former basically means killing the model-hosting business. Companies could develop models, use them internally and give them to their employees, but no public APIs exists. Companies strike legally binding contracts to use/license each other models, but the general public doesn't have access to those without something that would mitigate the legal risk.
Maybe years down the line, as attitudes soften, some companies would begin to push the boundaries. Automating the legal approval process, opening signups, etc.
This black and white assumption that because LLMs are not always giving probably correct answers therefore they are dangerous, reminds me of what the generation of my parents and teachers thought of Wikipedia when it became popular. The problems were different ("anyone can edit" vs. "hallucinates stuff"), but the mindset seems very similar.
However LLMs amplify this damage by sounding authoritative on everything and even worse being promoted as authoritative problem solvers for all domains with a small disclaimer. This double think is unacceptable.
But if they made the disclaimer bigger then the AI market would collapse in about an hour much like people’s usage does when they don’t verify something and get shot by someone actually authoritative. This has happened at work a couple of times and caused some fairly high profile problems. Many people refuse to use it now.
What we have is bullshit generator propped up by avoiding speaking the truth because the truth compromises the promoted utility. Classic bubble.
LLMs aren’t competing with perfect, they are competing with websites that may or may not be full of errors, or asking someone that may or may not know what they are talking about.
Critical thinking is inconvenient and does not scale, but it's very important for finding the truth. Technology that disincentivizes critical thought makes it easier to spread lies.
True. At the same time, technology that removes the need for critical thinking is bona fide positive form of progress.
Think of e.g. consumer protection laws, and larger body of laws (and systems of enforcement) surrounding commerce. Their main goal is to reduce the risk for customers - and therefore, their need to critically evaluate every purchase. You don't as much critical thinking when you know certain classes of dangers are almost nonexistent; you don't need to overthink choices you know you can undo.
There are good and bad ways of eliminating inconvenience, but by itself, inconvenience is an indication of waste. Our lives are finite, and even our capacity to hope and dream is exhaustible. This makes inconvenience our enemy.
So yes, convenience and progress are strongly correlated but they're not the same.
According to the address on the business website that Apple Maps itself links to, the business is 432 km away from me.
No, improper phrasing. Correct disclaimer is, "The below engine is structurally unreliable".
--
Comment, snipers. We cannot reply to unclear noise.
But they can't be as the whole premise of the boom is replacing human intellectual labor. They've said as much on many many occasions--see Anthropic's CEO going off about mass unemployment quite recently. How can the two of these co-exist?
The percentage of people that will look at it critically is negligible.
Do they?
Bravery in several ways, that is. There's the risk of being sued by John Ratzenberger. (-:
Really, you don't know why? Maybe because it's being promoted as "AI" by companies with missions like "organise the world's information", who have spent decade now doing their best to provide accurate information to user queries?
They may this way also have underestimated the reputational loss - the big umbrella of the "Ford Pinto case".
--
Edit: I was just now looking at the new Visual Capitalist's "Ranked: 2025’s 10 Largest S&P 500 Stocks". Is it possible that Alphabet being at the top with 7.6% of the weight of the 500-items set is paradoxically allowing to afford more damage?
I've observed a great deal of people trust the AI Overview as an oracle. IMO, it's how 'normal' people interact with AI if they aren't direct LLM users. It's not even age gated like trusting the news - trusting AI outputs seems to cross most demographics. We love our confident-based-on-nothing computer answers as a species, I think.
You are not there to "love what gives you the kicks". That's a kind of love that should not exit the bedroom (better, the bathroom).
I shudder to think how much worse this is going to be with "AI Overview". Are we entering an era of people googling "how does a printer work" and (possibly) being told that it's built by a system of pulleys and ropes and just trusting it blindly?
Because that's the kind of magnitude of errors I've seen in dozens of searches I've made in the domains I'm interested in, and I think everyone has seen the screenshots of even more outlandish - or outright dangerous - answers.
For over a decade now, that spot in the search page had the "excerpt from a page" UI, which made a lot of sense. It cut down an extra click, and if you trusted the source site, and presumably Google's "Excerpt Extraction Technology" (whatever that was) what was left not to trust? It was very trust worthy information location.
Like if I search for a quick medical question, and there is an except from the mayoclinic, I trust the mayoclinic, so good enough for me. Sometimes I'd copy the excerpt from google, and go to the page and ctrl-f it.
Google used to do a decent job at picking reputable sources, the excerpts were always indeed found in the page in a non-altering context, so it was good enough to build trust. Now that system has degraded over the years in terms of how good it was at picking those reputable sources. Most likely because it was SEO gamed.
However, it has been replaced with a the AI Overview. I'm not against AI, but AI is fundamental different than "a relevant excerpt from a source you trust with a verifiable source in milliseconds".
They can refine grounded results over time and begin serving up increasingly well reasoned results over time as Models improve cost effectively. Then that drives better vectors for ads.
Like what about this is hard to understand?
Google did it because it's better for Google, yes. They no longer have to deal with people trying to hack SEO. Now you would have to figure out how to influence the training process of google to hijack that box. So it's better for Google to move to AI Overview. What's your point here?
I say Google is in a bad position morally or in terms of "doing the right thing" not that one would really expect it from a corporation per se. There is a distinction you know.
Google introduced the box as "Excerpt from a search result" box. They traditionally put a lot of care into their search quality and it showed and built trust with their users. Over the years, the search quality dropped. Whether it was less attention from Google, fundamentally harder problem to solve with far more motivated attackers. Yet, the intrusion of bullshit website in the "Excerpt from a search result" box still let you decide that you are not gonna trust medical advice from "mikeInTheDeep.biz" it wasn't ideal that they build trust then let it slip, but being able to see a source with a quote makes it useful when you trust the source.
With AI Overview, you either trust it all, don't trust any of it, use it as confirmation bias, don't
My manager, a direct LLM user, uses the latest models to confirm his assumptions. If they are not confirmed on the first try, he then proceeds to form the question differently until gets what he wants from them.
edit: typo
[1] see haize labs : https://www.haizelabs.com/product/judges
This is simply an information retrieval problem. LLMs don't retrieve information perfectly from their context. They are very good at it, but every now and then they'll introduce a change here or there. Changing an "Hello" to "Hi" doesn't really make any difference, but changing an "PS/2 Model 286" to "PS/2 Model 280" makes a huge difference. The LLM "knows"* this at some level because it "knows" that names are important to be reproduced in exact format. But everynow and then even names can change and sill generally mean the same thing, so everynow and then it'll change a name or an identifier for no reason.
some of my favorite descriptions of this I have heard from people:
- We need to introduce a reflection layer
- We need a supervisor tree-like checks on responses
- We need an evaluation-feedback mechanism for AI operations
- We need to agents that operates and judges/supervisors/evaluators
all apparently mean:
const response = await getChatResponse([...messages, newMsg]);
const {didGood, why} = await getChatResponse([RolePlayLikeAMeanJudgePromot, ...messages, newMsg, response])
if (!didGoog) {
response = await getChatResonse([ThreateningMafiaGuyPrompt, [ ...messages, newMsg, response, why[])
}
// loop like 4 times, maybe use different models and egg them on each other, like use Grok and tell it you got the response from Claude and that Claude was shit talking Grok all the way through it. Like it was unnecessary tbh.
// this makes Grook work extra hard. It's actually been measured by kill-a-watt.
*: I say "knows" to indicate just that the model training biases it that way"Tell us about X (X='PS/2 Models'); here are your encyclopedias: extract and formulate".
If you're just building a chatbot (like a pure ChatGPT/Claude interface-like) you risk massively increasing your latency and degrading your overall result quality for an attempt to improve a small scenario here or there.
Seriously, try it. Take any "Tell us about X" prompt you like. Try it as-is with an LLM, then try it with + "; here are your encyclopedias: extract and formulate"
I guarantee you that 99 times out of 100, the LLM will always reach out to the encyclopedia. The existing encyclopedia doesn't have a great LLM-like search interface that's able to find the most relevant parts to the LLM's query about X. In fact, you're building that part if I'm not mistaken. If you expect the encyclopedia to have that great search functionality that the LLM could use to always find the most relevant information about X, then you just pushed the problem one layer down. Someone will actually eventually have to tackle it.
You can also see this in both ChatGPT and Claude outputs. Every now and then they will push a change to make it "more reliable" which basically makes it more likely to search the internet before answering a question. Which also happens to be more likely to skew its output based on SEO, current popular news and other nonesense.
While nonscientific, I experience this everytime ChatGPT or Claude decide to do a web search instead of just answering the question. Ask it "I like tv show X, suggest tv shows like that" or "I like product X, suggest a similar product". If it uses the internet to search, it's a summary of the top gamed SEO results. Just whatever is popular atm, or whatever has commission links. Ask it not to use the internet and the result is surprisingly less.... "viral, SEO optimized, trended recently" type content.
This implies that if we do not have good enough ways to retrieve information from repositories, we will have to invent them. Because the "LLM as informer" can only be allowed to formulate what it will find through the memex.
It is possible that to that aim, LLMs can not be directly implemented as they are in the current general state.
Also the problem of information reliability has to be tackled, in order to build such system (e.g. some sources rank higher).
It is not a solved problem, but it is a clear one. In mission critical applications, you would not even allow asking John at the nearby desk for information he may confuse.
That context is not of "this more than that" comparisons, but of threshold: the service must be _adequate_.
If you don't have random humans capable of providing the needed service, find them. Do not employ random agents.
It doesn't matter whether it's better than humans, the one thing that matters are the consequences of its widespread use.
When OpenAI launched ChatGPT, I had to explain to a non-CS professor that it wasn't AI like they're thinking of, but currently more like a computational parlor trick that looks a lot like AI.
But turns out this parlor trick is awesome for cheating on homework.
Also good at cheating at many other kinds of work, if you don't care much about quality, nor about copyrights.
On the output side, it functionally doesn’t really have a difference. At least in terms of more abstract things like writing code. Although I would argue that the output AI makes still doesn’t match the complexity and nuance of an individual human being, though, and may never do so, simply because the AI is simulating embodiment and existing in the world. It might need to simulate an Earth equivalent to truly simulate a human’s personal output.
In the essence side, it’s much more of a clear distinction. We have numerous ways of determining if a thing is human or not - biology, for one. It would take some serious sci-fi until we get to the point where an android is indistinguishable from a human on the cellular level.
LLMs are very good at nuance. Better than any human I've seen — so much so, I find it to be a tell.
> We have numerous ways of determining if a thing is human or not - biology, for one.
I don't care if the intelligence is human, I care if it's (1) (a) intelligent, (b) educated, and (2) has the ability to suffer or not so I know if it should have moral subject rights.
1a is slowly improving but we're guessing and experimenting: not really engineering intelligence, just scaling up the guesses that work OK. 1b was always easy, libraries fit "education" in isolation from the "intelligent" part of 1a. LLMs are good enough combination of (a) and (b) to be interesting, potentially even an economic threat, depending on how long the time-horizon between failures gets.
2 is pre-paradigmatic, we don't have enough understanding of the problem to ask the correct question — even ignoring AI for the moment, the same problem faces animal welfare (and why would the answer be the same for each of chimps, dogs, ravens, lobsters, and bees?) and even within humans on topics such as abortion, terminal stage of neurodegenerative conditions such as Alzheimer's, etc.
> All the "no true Scotsman" stuff about what it's doing behind the scenes is irrelevant, because we have no idea what human brains are doing behind the scenes either.
Computers and transistors have a massive speed advantage over biological brains and synapses — literally, not metaphorically, the same ratio as the speed difference between how far you walk in a day and continental drift, with your brain being continental draft — which means they have the possibility of reading the entire Internet in a few weeks to months to learn what they know, and not the few tens to hundreds of millenia it would take a human.
Unfortunately, the method by which they acquire information and knowledge, is sufficiently inefficient that they actually need to read the entire Internet to reach the skill level of someone who has only just graduated.
This means I'm quite happy to *simultaneously* call them extremely useful, even "artificial general intelligence", and yet also agree with anyone who calls them "very very stupid".
If we actually knew how our brains did this inteligence thing, we could probably make AI genuinely smart as well as absurdly fast.
You wouldn’t say Intellisense isn’t useful but you also wouldn’t call it “AI”. And what LLMs are like is basically Intellisense on steroids (probably more like a cocktail of speed and acid, but you get my point)
My point is that to the average user of Gemini or ChatGPT, LLMs are like AGI. Whereas they’re actually more closer to intellisense or text-completions.
And this is where the problem lies. People will read the output of LLMs and think it has read content on the topic (which is correct) and then deduced an answer (which is technically incorrect).
It also doesn’t help that OpenAI keep using terms like “reasoning” which sounds a lot like general intelligence. But it’s actually just a bunch of scales based on words.
AI doesn’t understand any of the core concepts it is reasoning about. So its reasoning is akin to asking a Hollywood script writer to throw a bunch of medical terms together for a new medical drama. Sure the words might be correct on their own, but that doesn’t mean the sentences are correct. And any subject matter expert who watches a drama that features their specialist subject will tell you that there’s more to understanding a subject than simply knowing the words.
* ML - actual backend models etc
* AI - user interface that appears “intelligent” to humans
LLMs UIs tend to have more appearance of intelligence because their interface is natural language — it’s basically the Eliza Effect https://en.m.wikipedia.org/wiki/ELIZA_effect
i know it’s not the classic definition of the terms, but it’s helped me with my frustration around the bs marketing hype
They don't have capacity to understand logical or temporal relationships, which is the core competency of coding.
They can form syntactically valid strings in a formal language, which isn't the same thing as coding.
The other day I googled "Is it 2025?" and the AI reply was that nope, it's 2024. Such a joke.
You should know that LLMs are very weak in procedural operations.
> obvious ... why it warrants an article
The phenomenon is part of all the "obvious" things that are not in the awareness of very large masses.
Indeed! That's why they are LLMs, not AI!
Why are new hypes always re-defining terms? 20 years ago, "AI" was actually about intelligence. Just like "crypto" was about cryptography instead of money scams.
> The phenomenon is part of all the "obvious" things that are not in the awareness of very large masses.
I can't imagine a single HN reader who is not aware that LLMs make mistakes. This is not the general public. (And even most of the general public has heard about this minor issue by now.)
So you wrote 'article' but you meant "submission" ;)
> And even most of the general public has heard about this minor issue by now
And still some public or private enterprises are trying to push LLMs in dangerous parts of the workflow - possibly because drives like "cutting costs" or "chasing waves" are valued more than objective results...
> "AI" was actually about intelligence
It was (is) about providing outputs intelligently - relevantly.
Ah, true. Sorry.
With Gemini replacing Google search more and more people are blindly trusting those answers, so these stories are needed.
That Google uses a faulty assistant in the page is actually informative, not just for people who do not use that search engine, but for those attentive to the progresses in the area - where Google has given massive hits recently.
> constructive
The - extremely damaging - replacement of experts with "employees wielding an LLM" is ongoing. Some of us have been told nonsense by remote support service staff...
Regarding the 'constructive' aspect and the 'damaging replacement of experts,' I agree this is a critical concern. However, the genre of simply posting screenshots of LLM errors is rarely constructive in addressing this complex socio-technical issue. It highlights a symptom (LLMs making mistakes) but typically fails to constructively engage with the causes or potential solutions for deskilling, corporate responsibility in AI deployment, or the nuances of human-AI collaboration. True constructive engagement would require more than just pointing out a wrong answer; it would demand analysis, discussion of best practices, or calls for better system design and oversight, which this genre seldom provides.
And for what the progresses in LLMs are concerned¹, it seems evident a revolution is required - and when the key (to surpass intuition towards process, dream towards wake) will be found it will become evident.
(¹Before I was mentioning «progresses» in general - as in, "they give us Veo3 and yet Banged Inthehead at the search pages"?!)
I even took a screenshot: https://imgur.com/a/oQYKmKP
I really had nothing to say at that moment.
And I have to admit I thought the title was a joke.
However, I loved the detailed description of just how bad it can be. And it puzzles me why people present AI slop as authoritative. Happens a lot in discussions these days. One example was someone presenting me with a grok answer about some aspect of the energy system. It turned out grok was off by a factor of 1000.
Of course you can also use that to your advantage with people who believe AI slop, as it is fairly simple to get the AI to produce the answer you want, including the answer you know is right ;-)
And I've actually started using AI a bit more in my coding, and it's been helpful as a starter. For example to get my little HTMX-Native project going, I needed to figure out how to configure Android's WebView for local data access.
Would I have figured it out eventually? Yes.
Was it faster with AI? Yes.
Was the AI answer faulty? Yes.
Was it still helpful? Yes.
It also should be the central issue - together with the energy/climate impacts - in every debate about AI ethics or AI safety. It's those two things that will harm us most if this hype continues unchecked.
Given a bunch of words have already been generated, it always ads the next words based on how common the sequence is.
The reason you get different answers each time is the effect of the pseudo-random number generator on picking the next word. The model looks at the probability distribution of most likely next words, and when the configuration parameter called "temperature" is 0 (and it is actually not possible to set to 0 in the GUI), there is no random influence, and strictly the most likely next word (top-1 MLE) will always be chosen. This leads to output that we would classify as "very boring".
So the model knows nothing about IBM, PS/2, 80286 versus 80486, CPUs, 280 or any models per se. -- One of the answers seems to suggest that there is no model 280, I wonder whether that one was generated through another process (there is a way to incorporate user feedback via "reinforcement learning"), or whether that was a consequence of the same randomized next-word picking, just a more lucky attempt.
Not really. I set temperature to 0 for my local models, it works fine.
The reason why the cloud UIs don't allow a temperature of 0 is because then models sometimes start to do infinite loops of tokens, and that would break the suspension of disbelief if the public saw it.
What do you get at very low temperature values instead of 0?
All of them. I make my own frontends using llama-cpp. Quality goes up with temperature 0 and loops are rare.
The temperature setting isn't for improving quality, it's to not break your suspension of disbelief that you're talking to an intelligent entity.
You must be using recent (or just different) models than those I tried. Mine returned garbage easily at temperature 0. (But unfortunately, I cannot try and report from there.)
This (LLM behaviour and benchmarking at low or 0 temperature value) should be a topic to investigate.
This is true. But you go to Google not to 'have a chat' but ostensibly to learn something based in knowledge.
Google seem to be making an error in swapping the provision of 'knowledge' for 'words' you'd think, but then again perhaps it makes no difference when it comes to advertising dollars which is their actual business.
There is no doubt that LLMs have gotten more accurate as newer models were released. At what point should we say "look this is accurate enough to be useful"?
We should acknowledge that nothing is ever 100% accurate. You won't go to a doctor expecting 100% accuracy. You know that the doctor's accuracy is high enough for the effort of making an appointment, listening to them to be worth it. Maybe they are 60% accurate?
My point is that LLM's are maybe at 20-30% accuracy where the benefit clearly exists even if we collectively acknowledge that 20-30% is not that high.
I find it amusing to think about an LLM that is 1% accurate (which could have been achieved way earlier in 2010's). What could have been possible with such an LLM with the right mindset?
The way LLMs work at the moment is equivalent to going to a doctor with a set of symptoms, and the doctor telling you "ah yes, of course, it's illness X and you need to take medicine Y to cure it" and then you check and neither X nor Y exists. That's not "accuracy" that's just straight up fraud?
I wouldn't have any problem with Google's AI saying "I don't know" or "I don't have enough sources to provide a good answer, but my best approximation is this". But I literally battle misinformation produced by Google's AI search every single day, because it's telling people actual made up facts that don't exist.
Also, why make it so personal? I think it was a fair question to ask - you didn't answer how it works - just got weirdly defensive about it.
They continue to use AI for math (asking LLMs to split bills, for example) and treat its responses for factual data lookup as 100% reliable and correct.
Ah, yes, high tech solutions for low tech problems. Let's use the word machine for this number problem!
You know, that's a thought process that makes internal sense.
You have someone who's terrible at math. They want something else to do math for them.
Will they prefer to use a calculator, or a natural language interface?
How do you use a calculator without knowing what you're doing?
I've tried to explain it in those terms as well: every medium-sized prompt on these large models consumes roughly one phone battery charge worth of energy. You have a phone with a calculator.
I'd ask them to do the math on how much energy they're wasting asking stupid things of these systems, but I'm too afraid they'd ask ChatGPT to do the math.
I don't do this but isn't it basically... fine? I assume all the major chatbots can do this correctly at this point.
The trick here is that chatbots can do a wide range of tasks, so why context switch to a whole different app for something like this? I believe you'll find this happening more frequently for other use cases as well.
Usability trumps all.
When it comes to facts that actually matter, people need to know to verify the output.
'member when similar blogposts were written about not trusting Wikipedia ?
(And Wikipedia is still better than LLMs : while you can trust it less than fixed, specialist-made references, you can improve it yourself, as well as check Talk pages for potential disagreements, and page history for potential shenanigans.)
I wanted to modify Gunicorn's logger class so I can filter out specific URL paths. Given it's a hot code path (running on every request) I told it I made 3 solutions and was looking to see which one is the fastest. I used a list + loop using startswith, compiled regex and also used startswith while passing in a tuple of paths.
It produced me benchmark code and also benchmark results which stated the regex solution was the best and fastest solution using Python's standard library.
I didn't believe it so I ran the benchmark myself and the tuple version was over 5x faster than the regex solution.
I then told it I ran the benchmark and got different results and it almost word for word said something like "Oh right, thank you for the clarification, the tuple version is indeed the fastest!". It saved me a few minutes writing the benchmark code but yeah, I rarely trust its output for anything I'm not 100% on.
The IBM PS/2 Model 280 was a low-end PS/2 model, often associated with the earlier IBM PC/XT and PC/AT models.
It featured an 80286 CPU, a smaller hard drive, and a proprietary 3.5-inch floppy drive. While it used ISA slots for expansion, its proprietary RAM and other components made it less attractive for retro computing compared to the Model 30 286
I tried again after 10 minutes and I got the "The IBM PS/2 Model 280 was not a model in the original Personal System/2 family".It seems the AI response is cached for a few minutes.
I tried in Kagi "ps/2 model 280?". This is the first response:
```
While a specific "PS/2 Model 280" is not definitively documented, here's what can be gathered:
- It seems that AI models sometimes make errors regarding the PS/2 Model 280, describing it as an ISA-based 286 system. 12
- The IBM PS/2 (Personal System/2) was a line of personal computers from IBM, introduced in 1987 3.
- PS/2 computers had two BIOSes: ABIOS (Advanced BIOS) and CBIOS (Compatible BIOS). 4
- One source mentions "CHIPS/280" including undocumented IBM PS/2 Model 80 address recovery logic. 5
- The PS/2 Model 30, released in 1987, used an Intel 8086 processor and had an 8-bit ISA bus. 6
- The PS/2 Model 30 286 had a 10MHz 286 CPU and could have 2 MB of RAM. 7
- The PS/2 Model 80 offered enhanced performance for commercial and compute-intensive tasks. 8
References AI Responses May Include Mistakes | OS/2 Museum www.os2museum.com
OS/2 Museum | OS/2, vintage PC computing, and random musings www.os2museum.com
List of IBM PS/2 models - Wikipedia en.wikipedia.org
IBM PS/2 - Wikipedia en.wikipedia.org
Technology ardent-tool.com
IBM PS/2 Model 30 - Wikipedia en.wikipedia.org
IBM PS/2 Model 30 286 - Computers of Significant History - Userlandia www.userlandia.com
IBM PS/2 (Model 80) - Technical specifications ardent-tool.com
```I waited a few minutes and this is the response now:
```
While information on the IBM PS/2 Model 280 is limited, here's what is available:
- There is mention of the PS/2 Model 280, with claims that it "was a significant step forward in IBM's personal computer line" 1. However, it's also noted that claims about the PS/2 Model 280 being an ISA-based 286 system may be inaccurate 2.
- The IBM PS/2 series included a variety of models in different form factors like desktops, towers, all-in-ones, portables, laptops, and notebooks 3. Some notable models include:
Model 30: Featured an Intel 8086 processor at 8 MHz and an 8-bit ISA bus 4.
Model 80: A high-end desktop with a 32-bit Intel 386 processor, initially running at 16 MHz, housed in a tower case 5.
References AI Responses May Include Mistakes | OS/2 Museum www.os2museum.com
OS/2 Museum | OS/2, vintage PC computing, and random musings www.os2museum.com
List of IBM PS/2 models - Wikipedia en.wikipedia.org
IBM PS/2 Model 30 - Wikipedia en.wikipedia.org
IBM PS/2 Model 80 - Wikipedia en.wikipedia.org
```I find the way Kagi gives references superior to the way Google does. Anyway, in the second answer Kagi's AI started to get a bit confused by stating "it's also noted that claims about the PS/2 Model 280 being an ISA-based 286 system may be inaccurate" missing the conclusion of the post.
But in some ways Google AI results, with the trust they have, seem just another vehicle for "we've always been at war with Eurasia".
For me this is the real risk of AI: developing dependence on it's output
Would you expect Netflix or Prime to simply show "No results" when you look up a show it doesn't have? Better to fill the screen with a bunch of "we think this is close enough" than to save the user some time.
Let's all say it together: The LLM is just WRONG.
It's not "hallucinating", it doesn't have a brain, or consciousness. It's just generating a wrong answer.
_wire_•1d ago
If you look carefully at Google Veo output, it's similarly full of holes.
It's plain there's no reasoning whatsoever informing the output.
Veo output with goofy wrongness
https://arstechnica.com/ai/2025/05/ai-video-just-took-a-star...
Tesla FSD goes crazy
https://electrek.co/2025/05/23/tesla-full-self-driving-veers...
ImPostingOnHN•1d ago
in my case, it does so roughly half the time, which is the worst proportion, because that means I can't even slightly rely upon the truth being the opposite of the output.
JimDabell•1d ago
MangoToupe•1d ago
mdp2021•1d ago
flomo•1d ago
But way way down on at the very the bottom of the page, there was the classic google search answer on a totally unrelated car forum. Thanks CamaroZ28.com!
camillomiller•1d ago
TeMPOraL•1d ago
I'm not defending the AI feature here, just trying to frame the problem: the lies and hallucinations were already there, but nobody cared because apparently people don't mind being constantly lied to by other people.
flomo•1d ago
The thing is the primordial google had the answer, but Google themselves buried it under 100+ links of Google-generated slopdiarrhea, most of which didn't even factually fit the question, and was not at all relevant to my automobile.
ben_w•1d ago
e.g. bottom of first page, penultimate paragraph https://www.justice.gov/d9/2023-11/417557.pdf
flomo•1d ago
But this AI diarrhea is so awful, I honestly can't see any angle in giving me tons of bad results about a minor feature of my car. (I should sell it and use waymo??) Maybe the really sharp monopolists ran for the hills when the DOJ sheriffs showed up, and now Google Search is being run by former Yahoo execs.
gambiting•1d ago
As a simple example - someone googled "how to reset sensus system in Volvo xc60" and Google told them to hold the button under the infotainment screen for 20 seconds and they came to the group confused why it doesn't work. And it doesn't work because that's not the way to do it, but Google told them so, so of course it must be true.
flomo•1d ago
nyarlathotep_•15h ago
dingnuts•23h ago
MangoToupe•1d ago
Garlef•1d ago
But another thing I find even more surprising is that, at least initially, many expected that the LLMs would give them access to some form of higher truth.
MangoToupe•22h ago
MaxikCZ•1d ago
mdp2021•1d ago
Some people expect LLMs as part of a better "search".
LLMs should be integrated to search, as a natural application: search results can heavily depend on happy phrasing, search engines work through sparse keywords, and LLMs allow to use structured natural language (not "foo bar baz" but "Which foo did a bar baz?" - which should be resistant to terms variation and exclude different semantics related to those otherwise sparse terms).
But it has to be done properly - understand the question, find material, verify the material, produce a draft reply, verify the draft vis-a-vis the material, maybe iterate...
1659447091•1d ago
The chat option uses gpt-4o with web search and was able to provide links to colonial map resources I was curious about after falling down that rabbit hole. It also gave me general (& proper) present day map links to the places I was looking for in the map sites I asked for.
It did get confused a few times when I was trying to get present day names of old places I had forgot; like Charles River in Va that it kept trying to send me to Boston or Charles City Co on the James river and told me to look for it around there...
The York river wiki page clearly says it was once Charles River. Maybe I wasn't asking the right questions. For more unique things it was pretty helpful thou and saved the endless searching w/ 100 tabs adventure
TeMPOraL•1d ago
It should give you both - the quote should be attributed to where it was found. That's, generally, what people mean when they ask or search for "a source" of some claim.
As for general point - using LLMs as "better search" doesn't really look like those Google quick AI answers. It looks like what Perplexity does, or what o3 in ChatGPT does when asked a question or given a problem to solve. I recommend checking out the latter; it's not perfect, but good enough to be my default for nontrivial searches, and more importantly, it shows how "LLMs for search" should work to be useful.
incangold•1d ago
MangoToupe•22h ago
jazzyjackson•8h ago
camillomiller•1d ago
Funnily enough (not for Google), I copypasted that answer on chatGPT and it roasted AI Overview so bad on its mistakes and with such sarcasm that it even made me chuckle.
DanHulton•1d ago
Before LLMs really took off, we were in the middle of an "AI winter", where there just weren't any promising techs, at least none with sufficient funding attached to them. And it's WORSE now. LLMs have sucked all the air out of the room, and all of the funding out of other avenues of research. Technologies that were "10-20" years away now might be 30-40, because there's fewer people researching them, with less money, and they might even be completely different people trying to restart the research after the old ones got recruited away to work on LLMs!
mountainriver•12h ago
I would challenge anyone to find data to actually support any of these claims. ML spending has been up since deep learning year over year and the models just keep getting better
XorNot•1d ago
Information hygiene is a skill which started out important but is going to become absolutely critical.
justmarc•1d ago
What is this going to lead to? fascinating times.
jobigoud•1d ago
How can we make it even easier and visual? Customizing pages by removing elements should be part of the default browser experience to be honest. Like in the initial web where you would tweak the color of links, visited links, etc.
MaxikCZ•1d ago
HN is like a unicorn that havent made me block a single thing yet.
alpaca128•1d ago
jazzyjackson•8h ago
justmarc•1d ago
That is very worrying. Normally this would never fly, but nowadays it's kind of OK?
Why should false and or inaccurate results be accepted?
Nuzzerino•1d ago
chronid•1d ago
Enshittification is all around us and is unstoppable. Because we have deadlines to hit and goals to shows we reached to the VP. We broke everything and the software is just half working? Come on that's an issue for the support and ops teams. On to the next beautiful feature we can put on marketing slides!
justmarc•6h ago
TeMPOraL•1d ago
bheadmaster•1d ago
I often get in arguments about how I tend to avoid brands that put too much into marketing. Of course, theoretically, the amount of money a company puts into marketing doesn't automatically lower the quality of their products, but in my experience, the correlation is there. Whiskas, Coka Cola, McDonalds, etc.
justmarc•6h ago
TeMPOraL•2h ago
Scale and intent matter.
meander_water•1d ago
jazzyjackson•8h ago
I thought the movie "the Creator" was pretty neat, it envisions a future where AI gets blamed for accidentally nuking Los Angeles so America bans it and reignites a kind of cold war with Asia which has embraced GAI and transcended the need for central governance. Really it's a film about war and how it can be started with a lie but continue out of real existential fear.
veunes•1d ago
krapp•1d ago
The typical response is "because humans are just as bad, if not worse."
reaperducer•1d ago
The notion that "computers are never wrong" has been engrained in society for at least a century now, starting with scifi, and spreading to the rest of culture.
It's an idea that has caused more harm than good.
rchaud•1d ago
We started down this path ever since obvious bugs were reframed as "hallucinations".
emrah•1d ago
mdp2021•23h ago
But in general, the (mis-)information that spinach could contain so much iron to be interchangeable with nails had to be a typo so rare that it would become anecdotal and generate cultural phenomena like Popeye.
Nursie•1d ago
Google’s AI results proceeded to tell me all about the games available at the Blue Prince Casino down the road from here, where I know for a fact there’s only a prison, a Costco, a few rural properties and a whole lot of fuck-all.
It’s amazing to watch it fill in absolute false, fabricated tripe at the top of their search page. It also frequently returns bad information on subjects like employment law and whatever else I look up.
It would be hilarious if people weren’t actually relying on it.
datavirtue•1d ago
Kwpolska•1d ago
That said, some niche stuff has significantly better results in Google. But not in the AI bullshit. I searched for a very niche train-related, the bullshit response said condescendingly "this word does not exist, maybe you meant [similarly sounding but completely different word], which in the context of trains means ...". The first real result? Turns out that word does exist.
christophilus•1d ago
dijksterhuis•1d ago
datavirtue•1d ago
christophilus•1d ago
veunes•1d ago
roywiggins•1d ago
https://arxiv.org/html/2409.12822v1