I really found this fascinating as I had thought that these type of problems of how many e's are in the word strawberry etc. were stopped but as this video shows, perhaps its just that this question of how many e's are in the word strawberry itself got part in the training data and so even a slight variation of asking it for seventeen makes it fumble. I had thought that this was a solved issue but actually it isn't which was a bit fascinating to see in this video, so much so that I had to test it out and I found out that AI still hallucinates and had the same result for the most part.
CamperBob2•12m ago
The Qwen 3.6 27B 8-bit quant has no problem with it. I'd guess that most thinking models won't fail this kind of test anymore, while some base or instruct models that are not post-trained for reasoning will still fail it.
I also can't reproduce it in ChatGPT 5.3 Instant with auto-thinking disabled. Solved problem, as far as I'm concerned. Maybe this particular case was a bug in the voice model, or just some BS the YouTuber made up for clicks. (Notice that we never actually see the answer in text form.) Mission accomplished, I guess.
wvbdmp•1m ago
There is an old German joke from the comedy “Die Feuerzangenbowle”, where someone gives his name as “Pfeiffer, with 3 fs”, which is funny because it is robotically hypercorrect, since everybody knows the only necessary clarification is between “Pfeiffer” vs. “Pfeifer”, double-f vs. single f. So in a way “there are two rs in strawberry” is a very human “mistake”, because in any normal situation the asker is clearly interested in the “berry” part. This weird sycophancy, however, is entirely preposterous and hopefully just an artifact of some deliberate “the customer is always right” policy corporate tacked on, rather than a fundamental limitation of the technology.
Imustaskforhelp•1h ago
CamperBob2•12m ago
I also can't reproduce it in ChatGPT 5.3 Instant with auto-thinking disabled. Solved problem, as far as I'm concerned. Maybe this particular case was a bug in the voice model, or just some BS the YouTuber made up for clicks. (Notice that we never actually see the answer in text form.) Mission accomplished, I guess.