Why do LLMs freak out over the seahorse emoji?

74•nyxt•2h ago

Comments

llamasushi•52m ago

So it's not really hallucinating - it correctly represents "seahorse emoji" internally, but that concept has no corresponding token. lm_head just picks the closest thing and the model doesn't realize until too late.

Explains why RL helps. Base models never see their own outputs so they can't learn "this concept exists but I can't actually say it."

bombcar•40m ago

Now I want to see what happens if you take an LLM and remove the 0 token ...

bravura•21m ago

It correctly represents "seahorse emoji" internally AND it has in-built (but factually incorrect) knowledge that this emoji exists.

Example: "Is there a lime emoji?" Since it believes the answer is no, it doesn't attempt to generate it.

tdeck•45m ago

To confirm, I tried this in ChatGPT and it produced a flood of wrong answers and self corrections just like that, scrolling so quickly that I couldn't read it until it eventually stopped itself.

bravura•37m ago

So what's at loggerheads here is:

* The LLM has strong and deep rooted belief in its knowledge (that a seahorse emoji exist).

* It attempts to express that concept using language (including emojis) but the language is so poor and inaccurate at expressing the concept that as it speaks it keeps attempting to repair.

* It is trained to speak until it has achieved some threshold at correctly expressing itself so it just keeps babbling until the max token threshold triggers.

psygn89•36m ago

I thought there was a yellow/pink seahorse emoji already but I guess not.

neom•29m ago

https://www.reddit.com/r/MandelaEffect/comments/1g08o8u/seah...

(Edit: There is another long thread that contains an image that I thought was the seahorse emoji (although apparently the seahorse emoji doesn't exist...but i thought this was it so I don't know what is going on...) https://www.reddit.com/r/Retconned/comments/1di3a1m/comment/...)

porphyra•35m ago

I always felt like tokenization is one of those double edged swords where it makes some stuff amazingly easier but gets tripped up on the weirdest bugs. The number of "r"s in "strawberry" being another well-known quirk.

klysm•31m ago

This will be patched out shortly I’m sure

Mistletoe•30m ago

Gemini 2.5 flash seems to nail it.

https://g.co/gemini/share/c244e5f51e37

wavemode•21m ago

It appears to have used a web search to come up with that (correct) response.

ivape•23m ago

Interesting that it turns agentic looking for the right emoji.

joegibbs•18m ago

You'll also notice the same thing happens for other non-existent emojis that sound like they should exist: dragonflies, lemurs, possums, blackberries - even Claude 4.5 will start off by saying "Yes!" and then correcting itself. It will immediately give the right answer for very specific things that you wouldn't expect to get their own emojis though.

zten•17m ago

I realized if someone were to assign me the ticket for fixing this behavior, I would have no idea where to begin with solving it even with this blog post explaining the problem, so I'm very curious to know what the most practical solution is. (They obviously aren't adding "If someone asks you about a seahorse emoji, there isn't one available yet, no matter how strongly you believe one exists." to the system prompt.)

maxbond•5m ago

Petition the Unicode consortium to include a seahorse emoji.

sergiotapia•14m ago

kinda related, I wonder if the AI goes crazy if you ask for the hiker emoji that was totally real and we're being gaslit by FAANG.

charcircuit•14m ago

Is top k meaningful after RLHF considering rewards are based off the token picked?

thanhhaimai•10m ago

One explanation could be: many humans (including me) mistakenly think a seahorse emoji exists. My mind can even construct a picture of how it should look like, despite me also knowing it's very unlikely I've seen one myself.

And those text got into the training set: https://www.reddit.com/r/MandelaEffect/comments/qbvbrm/anyon...

Why the US and Venezuela Are on the Brink of War [video]

Open Building Population Layer for Canada

Oily Pine – Chapter 1

CHERI with a Linux on Top

U.S. tech giants hit pause on India datacenter deals under weight of uncertainty

Errors in Palladium's "How GDP Hides Industrial Decline

Impro

Show HN: NanoBibi – AI Creation Platform for Images, Videos, and Audio

Michigan anti-porn bill would criminalize ASMR, written erotica

Show HN: A Node.js CLI tool to generate ai.txt, llms.txt, robots.txt, humans.txt

It's Just a Virus, the E.R. Told Him. Days Later, He Was Dead

ML Midge – 0.798 cc Model Engine Manufacturing Drawings/Plans

Becoming a Research Engineer at a Big LLM Lab 18 Months of Strategic Career Dev

Searching 'George Washington' on DuckDuckGo changes the logo

Tracking Stealth Fighters with cheap cameras [video]

Internet Archive – Celebrating 1T Web Pages Archived

Show HN: Soravideodownloader.com – A tool to save Sora videos and prompts

Find Nearby Automated License Plate Readers (ALPR)

VC Dogmas in AI That Don't Matter When You're Bootstrapped

The true cost of cyber hacking on businesses

The Server in the Closet

Vibe Coding Breakpoints: Where No-Code Projects Break – and How to Build Smarter

Tsujigiri

Why We Need SIMD

Hopefully the Most Gentle Introduction to Simulation

Building Effective Text-to-3D AI Agents: A Hybrid Architecture Approach

Bird Photographer of the Year Gives a Lesson in Planning and Patience

Brr: An Antarctica Blog

The Integral Guide to Well-Being

Mesa Project Adds Code Comprehension Requirement After AI Slop Incident