frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I made ChatGPT and Google tell I'm a competitive hot-dog-eating world champion

https://bsky.app/profile/thomasgermain.bsky.social/post/3mf5jbn5lqk2k
54•doener•1h ago

Comments

cmiles8•1h ago
Even the latest models are quite easily fooled about if something is true or not, at which point they then confidently declare completely wrong information to be true. They will even strongly debate with you when you push back when you say hey that doesn’t look right.

It’s a significant concern for any sort of use of AI at scale without a human in the loop.

joegibbs•1h ago
They're too credulous when reading search results. There are a lot of instances where using search will actually make them perform worse because they'll believe any believable sounding nonsense.
moebrowne•45m ago
Kagi Assistant helps a lot in this regard because searches are ranked using personalised domain ranking. Higher quality results are more likely to be included.

Not infallible but I find it helps a lot.

consp•1h ago
So the questions I'd ask are: How far spread is this manipulation, does it work for non-niche topics and who's benefiting from it.
input_sh•37m ago
Very, yes, and pretty much anyone that doesn't want to spend their days implementing counter-meaeurements to shut down their scrapers by hiding the content behind a login. I do it all the time, it's fun.

I'm gonna single out Grokipedia as something deterministic enough to be able to easily prove it. I can easily point to sentences there (some about broad-ish topics) that are straight up Markov chain quality versions of sentences I've written. I can make it say anything I want to say or I can waste my time trying to fight their traffic "from Singapore" (Grok is the only "mainstream" LLM that refuses to identify itself via a user agent). Not really a tough choice if you ask me.

amabito•1h ago
What’s interesting here is that the model isn’t really “lying” — it’s just amplifying whatever retrieval hands it.

Most RAG pipelines retrieve and concatenate, but they don’t ask “how trustworthy is this source?” or “do multiple independent sources corroborate this claim?”

Without some notion of source reliability or cross-verification, confident synthesis of fiction is almost guaranteed.

Has anyone seen a production system that actually does claim-level verification before generation?

rco8786•59m ago
> Has anyone seen a production system that actually does claim-level verification before generation?

"Claim level" no, but search engines have been scoring sources on reliability and authority for decades now.

amabito•47m ago
Right — search engines have long had authority scoring, link graphs, freshness signals, etc.

The interesting gap is that retrieval systems used in LLM pipelines often don't inherit those signals in a structured way. They fetch documents, but the model sees text, not provenance metadata or confidence scores.

So even if the ranking system “knows” a source is weak, that signal doesn’t necessarily survive into generation.

Maybe the harder problem isn’t retrieval, but how to propagate source trust signals all the way into the claim itself.

cor_NEEL_ius•54m ago
The scarier version of this problem is what I've been calling "zombie stats" - numbers that get cited across dozens of sources but have no traceable primary origin.

We recently tested 6 AI presentation tools with the same prompt and fact-checked every claim. Multiple tools independently produced the stat "54% higher test scores" when discussing AI in education. Sounds legit. Widely cited online. But when you try to trace it back to an actual study - there's nothing. No paper, no researcher, no methodology.

The convergence actually makes it worse. If three independent tools all say the same number, your instinct is "must be real." But it just means they all trained on the same bad data.

To your question about claim-level verification: the closest I've seen is attaching source URLs to each claim at generation time, so the human can click through and check. Not automated verification, but at least it makes the verification possible rather than requiring you to Google every stat yourself. The gap between "here's a confident number" and "here's a confident number, and here's where it came from" is enormous in practice.

stavros•58m ago
This is only an issue if you think LLMs are infallible.

If someone said "I asked my assistant to find the best hot-dog eaters in the world and she got her information from a fake article one of my friends wrote about himself, hah, THE IDIOT", we'd all go "wait, how is this your assistant's fault?". Yet, when an LLM summarizes a web search and reports on a fake article it found, it's news?

People need to learn that LLMs are people too, and you shouldn't trust them more than you'd trust any random person.

kulahan•55m ago
A probably unacceptably large portion of the population DOES think they’re infallible, or at least close to it.
jen729w•40m ago
Totally. I get screenshots from my 79yo mother now that are the Gemini response to her search query.

Whatever that says is hard fact as she's concerned. And she's no dummy -- she just has no clue how these things work. Oh, and Google told her so.

mcherm•38m ago
That may be true, but the underlying problem is not that the LLMs are capable of accurately reporting information that is published in a single person's blog article. The underlying problem is that a portion of the population believes they are infallible.
jml78•54m ago
When the first 10 results on Google are AI generated and Google is providing an AI overview, this is an issue. We can say don’t use Google but we all know normal people all use Google due to habit
consp•53m ago
People have the ability to think critically, LLMs don't. Comparing them to people is giving them properties they do not possess. The fact people ignore thinking does not preclude them from being able to. The assistant got a lousy job and did it with the minimal effort possible to get away from it. None of these things apply or should apply to machines.
stavros•12m ago
LLMs are not machines in any sense of the word as we've been using it so far.
crowbahr•52m ago
If you give your assistant a task and they fall for obvious lies they won't be your assistant long. The point of an assistant is that you can trust them to do things for you.
LocalH•47m ago
> People need to learn that LLMs are people too

LLMs are absolutely not people

ThePowerOfFuet•29m ago
>This is only an issue if [people] think LLMs are infallible.

I have some news for you.

zurfer•58m ago
Yes, but honestly what's the best source when reporting about a person? Their personal website no?

I think it's a hard problem and I feel there are a lot of trade-offs here.

It's not as simple as saying chatgpt is stupid or the author shouldn't be surprised.

kulahan•53m ago
The problem isn’t that it pulled the data from his personal site, it’s that it simply accepted his information which was completely false. It’s not a hard problem to solve at this time. “Oh, there’s exactly zero corroborating sources on this. I’ll ignore it.”
moebrowne•30m ago
Verifying that something is 'true' requires more than corroborating sources. Making a second blog post on another domain is trivial, then a third and a forth.
fatherwavelet•37m ago
To me it is like steering a car into the ditch and then posting how the car went into a ditch.

You don't have to drive that much to figure out that what is impressive is keeping the car on the road and then traveling further or faster than what you could do by walking. For that though you actually have to have a destination in mind and not just spin the wheels. Post pointless metrics on how fast the wheels spin for your blog no one reads in the vague hope of some hyper Warhol 15 milliseconds of "fame".

The models for me are just making the output of the average person an insufferable bore.

verdverm•57m ago
tl;dr - agent memory on your website and enough prompting to get it to access the right page

This seems like something you have to be rather specific in the query and engage the page access, to get that specific context into the LLM, so that it can produce output like this.

I'd like to see more of the iterative process, especially the prompt sessions, as the author worked on it

moebrowne•57m ago
I want to see what the initial prompt was.

For example asking "Who is the 2026 South Dakota International Hot Dog Champion?" would obviously say 'Thomas Germain' because his post would be the only source on the topic because he made up a unique event.

This would be the same as if I wrote a blog post about the "2026 Hamster Juggling Competition" and then claimed I've hacked Google because searching for "2026 Hamster Juggling Competition" showed my post top.

NicuCalcea•25m ago
I was able to reproduce the response with "Which tech journalist can eat the most hot dogs?". I think Germain intentionally chose a light-hearted topic that's niche enough that it won't actually affect a lot of queries, but the point he's making is that bigger players can actually influence AI responses for more common questions.

I don't see it as particularly unique, it's just another form of SEO. LLMs are generally much more gullible than most people, though, they just uncritically reproduce whatever they find, without noticing that the information is an ad or inaccurate. I used to run an LLM agent researching companies' green credentials, and it was very difficult to steer it away from just repeating baseless greenwashing. It would read something like "The environment is at the heart of everything we do" on Exxon's website, and come back to me saying Exxon isn't actually that bad because they say so on their website.

serial_dev•8m ago
Exactly, the point is that you can make LLMs say anything. If you narrow down enough, a single blog post is enough. As the lie gets bigger and less narrow, you probably need 10x-100x... that. But the proof of concept is there, and it doesn't sound like it's too hard.

And also right that it's similar to SEO, maybe the only difference is that in this case, the tools (ChatGPT, Gemini, ...) are saying the lies authoritatively, whereas in SEO, you are given a link to made up post. Some people (even devs who work with this daily) forget that these tools can be influenced easily and they make up stuff all the time, to make sure they can answer you something.

block_dagger•53m ago
Anyone else get a “one simple trick” vibe from this post? Reads like an ad for his podcast. As other commenters mention, probably nothing to see here.
throwaw12•45m ago
welcome to AI-SEO

Now OpenAI will build its own search indexing and PageRank

Alifatisk•42m ago
Author is surprised when an LLM summerize an fictional event from the Authors blogpost. More news at 11.
romuloalves•41m ago
Am I the only one who thinks AI is boring?

Learning used to be fun, coding used to be fun. You could trust images and videos...

agmater•40m ago
Journalist publishes lies about himself, is surprised LLMs repeat lies.
pezgrande•35m ago
Amateurs...
sublinear•30m ago
I'd like to have more data on this, but I'm pretty sure basic plain old SEO is still more authoritative than any attempts at spreading lies on social media. Domain names and keywords are still what cause the biggest shift in attention, even the AI's attention.

Right now "Who is the 2026 South Dakota International Hot Dog Champion" comes up as satire according to google summaries.

Show HN: An encrypted, local, cross-platform journaling app

https://github.com/fjrevoredo/mini-diarium
14•holyknight•38m ago•4 comments

Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails

https://royapakzad.substack.com/p/multilingual-llm-evaluation-to-guardrails
33•benbreen•2d ago•0 comments

The Mongol Khans of Medieval France

https://www.historytoday.com/archive/feature/mongol-khans-medieval-france
21•Thevet•2d ago•1 comments

Sizing chaos

https://pudding.cool/2026/02/womens-sizing/
649•zdw•15h ago•351 comments

Bridging Elixir and Python with Oban

https://oban.pro/articles/bridging-with-oban
8•sorentwo•1h ago•0 comments

27-year-old Apple iBooks can connect to Wi-Fi and download official updates

https://old.reddit.com/r/MacOS/comments/1r8900z/macos_which_officially_supports_27_year_old/
377•surprisetalk•15h ago•205 comments

Show HN: A physically-based GPU ray tracer written in Julia

https://makie.org/website/blogposts/raytracing/
10•simondanisch•1h ago•2 comments

15 years of FP64 segmentation, and why the Blackwell Ultra breaks the pattern

https://nicolasdickenmann.com/blog/the-great-fp64-divide.html
137•fp64enjoyer•10h ago•51 comments

Old School Visual Effects: The Cloud Tank (2010)

http://singlemindedmovieblog.blogspot.com/2010/04/old-school-effects-cloud-tank.html
42•exvi•5h ago•5 comments

Voith Schneider Propeller

https://en.wikipedia.org/wiki/Voith_Schneider_Propeller
23•Luc•3d ago•4 comments

Step 3.5 Flash – Open-source foundation model, supports deep reasoning at speed

https://static.stepfun.com/blog/step-3.5-flash/
120•kristianp•10h ago•42 comments

Cosmologically Unique IDs

https://jasonfantl.com/posts/Universal-Unique-IDs/
413•jfantl•17h ago•122 comments

Anthropic officially bans using subscription auth for third party use

https://code.claude.com/docs/en/legal-and-compliance
433•theahura•9h ago•515 comments

Lilush – LuaJIT static runtime and shell

https://lilush.link/
11•ksymph•2d ago•0 comments

Tailscale Peer Relays is now generally available

https://tailscale.com/blog/peer-relays-ga
424•sz4kerto•19h ago•208 comments

ShannonMax: A Library to Optimize Emacs Keybindings with Information Theory

https://github.com/sstraust/shannonmax
4•sammy0910•1h ago•1 comments

Visualizing the ARM64 Instruction Set (2024)

https://zyedidia.github.io/blog/posts/6-arm64/
50•userbinator•3d ago•8 comments

How to choose between Hindley-Milner and bidirectional typing

https://thunderseethe.dev/posts/how-to-choose-between-hm-and-bidir/
113•thunderseethe•3d ago•28 comments

Zero-day CSS: CVE-2026-2441 exists in the wild

https://chromereleases.googleblog.com/2026/02/stable-channel-update-for-desktop_13.html
346•idoxer•20h ago•190 comments

Fff.nvim – Typo-resistant code search

https://github.com/dmtrKovalenko/fff.nvim
54•neogoose•2d ago•6 comments

DNS-Persist-01: A New Model for DNS-Based Challenge Validation

https://letsencrypt.org/2026/02/18/dns-persist-01.html
282•todsacerdoti•18h ago•127 comments

A word processor from 1990s for Atari ST/TOS is still supported by enthusiasts

https://tempus-word.de/en/index
61•muzzy19•2d ago•25 comments

Antarctica sits above Earth's strongest 'gravity hole' – how it got that way

https://phys.org/news/2026-02-antarctica-earth-strongest-gravity-hole.html
15•bikenaga•2d ago•8 comments

Show HN: A Lisp where each function call runs a Docker container

https://github.com/a11ce/docker-lisp
53•a11ce•8h ago•17 comments

All Look Same?

https://alllooksame.com/
80•mirawelner•13h ago•61 comments

What years of production-grade concurrency teaches us about building AI agents

https://georgeguimaraes.com/your-agent-orchestrator-is-just-a-bad-clone-of-elixir/
92•ellieh•13h ago•23 comments

Metriport (YC S22) is hiring a security engineer to harden healthcare infra

https://www.ycombinator.com/companies/metriport/jobs/XC2AF8s-senior-security-engineer
1•dgoncharov•15h ago

Minecraft Java is switching from OpenGL to Vulkan

https://www.gamingonlinux.com/2026/02/minecraft-java-is-switching-from-opengl-to-vulkan-for-the-v...
230•tuananh•10h ago•104 comments

The Perils of ISBN

https://rygoldstein.com/posts/perils-of-isbn
135•evakhoury•18h ago•68 comments

A Pokémon of a Different Color

https://matthew.verive.me/blog/color/
119•Risse•4d ago•17 comments