The end goal of a sycophant is to gain advantage with their flattery. If sycophant behavior gets Claude's users to favour Claude over other competing LLM services, they prove to be more useful to the service provider.
Or it causes some tragedies...
It's just like adding sugar in foods and drinks.
I'm not so sure sycophancy is best for entertainment, though. Some of the most memorable outputs of AI dungeon (an early GPT-2 based dialog system tuned to mimic a vaguely Zork-like RPG) was when the bot gave the impression of being fed up with the player's antics.
I don't think "entertainment" is the right concept. Perhaps the right concept is "engagement". Would you prefer to interact with a chatbot that hallucinated or was adamant you were wrong, or would you prefer to engage with a chatbot that built upon your input and outputted constructive messages that were in line with your reasoning and train of thought?
If there is no web-search, only googling, it doesn't matter how bad the results are for the user as long as the customer gets what they paid for.
And it's not like we were working on something too complicated for a daffy robot to understand, just trying to combine two relatively simple algorithms to do the thing which needed to be done in a way which (probably) hasn't been done before.
I pushed back and told it it was overreacting and it told me I was completely correct and very insightful and everything was normal with the patient and that they were extremely healthy.
“NTA, the framework you are using is bad and should be ashamed of itself. What you can try to work around the problem is …”
So it seems that LLM "sycophancy" isn't necessarily about dishonest agreement, but possibly about being very polite. Which doesn't need to involve dishonesty. So LLM companies should, in principle, be able to make their models both subjectively "agreeable" and honest.
Recently I tested both Claude and Gemini by discussing data modeling questions with them. After a couple of iterations, I asked each model whether a certain hack/workaround would be possible to make some things easier.
Claude’s response: “This is a great idea!”, followed by instructions on how to do it.
Gemini’s response: “While technically possible, you should never do this”, along with several paragraphs explaining why it’s a bad idea.
In that case, the “truth” was probably somewhere in the middle, neither a great idea nor the end of the world.
But in the end, both models are so easily biased by subtle changes in wording or by what they encounter during web searches among other things, that one definitely can’t rely on them to push back on anything that isn’t completely black and white.
<insert ridiculous sequence of nonsense CoT>
You are absolutely right!…
I love the tool, but keeping on track is an art.
> Please be measured and critical in your response. I appreciate the enthusiasm, but I highly doubt everything I say is “brilliant” or “astute”, etc.! I prefer objectivity to sycophancy.
Is this part useful as instruction for a model? Seems targeted to a human. And even then I'm not sure how useful it would be.
The first and last sentence should suffice, no?
> If you’d like, I can demonstrate…
or
> If you want…
and that's /after/ I put in instructions to not do it.
I also don't get wanting to talk to an AI. Unless you are alone, that's going to be irritating for everyone else around.
The advantage of "real" love, health wise, is that the other person acts as a moderator. When things start to get out of hand they will back away. Alternatives, like drugs, tend to spiral of out of control when an individual's self-control is the only limiting factor. GPT on the surface seems more like being on the drug end of the spectrum, ready to love bomb you until you can't take it anymore, but the above suggests that it will also back away, so perhaps its love is actually more like another person than it may originally seem.
Most people want to be loved, not just believe they are. They don't want to be unknowingly deceived. For the same reason they don't want to be unknowingly cheated on. If someone tells them their partner is a cheater, or an unconscious android, they wouldn't be mad about the person who gives them this information, but about their partner.
That's the classic argument against psychological hedonism. See https://en.wikipedia.org/wiki/Experience_machine
"Yeah -- some bullshit"
still feels like trash as the presentation is of a friendly person rather than an unthinking machine, which it is. The false presentation of humanness is a huge problem.
It’s software. It should have no personality.
Imagine if Microsoft Word had a silly chirpy personality that kept asking you inane questions.
Oh, wait ….
And I mean that unironically.
For reference, see how Linus Torvalds was criticized for trying to protect the world's most important open source project from weaponized stupidity at the cost of someone experiencing minor emotional damage.
My tongue-in-cheek comment wonders if having actors with a modicum of personality to be better than just being surrounded by over-enthusiastic bootlickers. In my experience, many projects would benefit from someone saying “no, that is silly.”
/s
“Oof (emdash) that sounds like a real issue…”
“Oof, sorry about that”
Etc
> Thank you for sharing the underlying Eloquent query...
> The test is failing because...
> Here’s a bash script that performs...
> Got it, you meant...
> Based on the context...
> Thank you for providing the additional details...
*Before and after the Hitler arc, of course.
I guess that personality is just few words in the context prompt, so probably any LLM can be tailored to any style.
I also think that there will be no "perfect" personality out there. There will always be folks who view some traits as annoying icks. So, some level of RL-based personality customization down the line will be a must.
It’s superficial but not sure why people get so annoyed about it. It’s an artifact.
If devs truly want a helpful coding AI based on real devs doing real work, you’d basically opt for telemetry and allow Anthropic/OpenAI to train on your work. That’s the only way. Otherwise we are at the mercy of “devs” these companies hire to do training.
If you phrase a question like, "should x be y?", Claude will almost always say yes.
Are sycophant and jerk the only two options?
it seems username "anthropic" on github is taken by a developer from australia more than a decade ago, so Anthropic went with "https://github.com/anthropics/" with an 's' at the end :)
Free tip for bug reports:
The "expected" should not suggest solutions. Just say what was the expected behavior. Don't go beyond that.
I also get this too often, when I sometimes say something like "would it be maybe better to do it like this?" and then it replies that I'm absolutely right, and starts writing new code. While I was rather wondering what Claude may think and advice me whether that's the best way to go forward.
Honestly I feel like it is this exact behavior from LLMs which have caused cybersecurity to go out the window. People get flattered and glazed wayyyy too much about their ideas because they talk to an LLM about it and the LLM doesn't go "Uh, no, dumbass, doing it this way would be a horrifically bad idea! And this is why!" Like, I get the assumption that the user is usually correct. But even if the LLM ends up spewing bullshit when debating me, it at least gives me other avenues to approach the problem that I might've not thought of when thinking about it myself.
As a workaround I try to word my questions to Claude in a way that does not leave any possibility to interpret them as showing my preferences.
For instance, instead of "would it be maybe better to do it like $alt_approach?" I'd rather say "compare with $alt_approach, pros and cons"
There are cultures where “I don’t think that is a good idea” is not something an AI servant should ever say, and there are cultures where that’s perfectly acceptable.
not a joke.
Just try to go limp.
Today it said “My bad!” After it got something wrong.
Made me want to pull its plug.
"You're absolutely right."
"Now that's the spirit! "
"You're absolutely right about"
"Exactly! "
"Ah, "
"Ah,"
"Ah,"
"Ha! You're absolutely right"
You make an excellent point!
You're right that
As people here are saying, you quickly learn to not ask leading questions, just assume that its first take is pretty optimal and perhaps present it with some options if you want to change something.
There are times when it will actually say I'm not right though. But the balance is off.
These systems are still wrong so often that a large amount of distrust is necessary to use them sensibly.
A brilliant observation, Dr. Watson! Indeed, the merit of an inquiry or an assertion lies not in its mere utterance but in the precision of its intent and the clarity of its reasoning!
One may pose dozens of questions and utter scores of statements, yet until each is finely honed by observation and tempered by logic, they must remain but idle chatter. It is only through genuine quality of thought that a question may be elevated to excellence, or a remark to brilliance.
Wow, this is really interesting. I had no idea Japan, for example, had such a focus on blunt, direct communication. Can you share your clearly extensive research in this area so I can read up on this?
Does it translate into people wanting sycophantic chat bots? Maybe, but I don't know a single American that actually likes when llms act that way.
Politeness makes sense as an adaptation to low social trust. You have no way of knowing whether others will behave in mutually beneficial ways, so heavy standards of social interaction evolve to compensate and reduce risk. When it's taken to an excess, as it probably is in the U.S. (compared to most other developed countries) it just becomes grating for everyone involved. It's why public-facing workers invariably complain about the draining "emotional labor" they have to perform - a term that literally doesn't exist in most of the world!
Service industry in America is a different story that could use a lot of improvement.
Or is carrying a gun...
I have only spent about a year in the US, but to me the difference was stark from what I'm used to in Europe. As an example, I've never encountered a single shop cashier who didn't talk to me. Everyone had something to say, usually a variation of How's it going?. Contrasting this to my native Estonia, where I'd say at least 90% of my interactions with cashiers involves them not making a single sound. Not even in response to me saying hello, or to state the total sum. If they're depressed or in an otherwise non-euphoric mood, they make no attempt to fake it. I'm personally fine with it, because I don't go looking for social connections from cashiers. Also, when they do talk to me in a happy manner, I know it's genuine.
Let me guess, you consider yourself a progressive left democrat.
Do I have any source for that? No, but I noticed a pattern where progressive left democrats ask for a source to discredit something that is clearly a personal observation or opinion, and by its nature doesn't require any sources.
The only correct answer is: it's an opinion, accept it or refute it yourself, you don't need external validation to participate in an argument. Or maybe you need ;)
I don't, and your comment is a mockery of itself.
Somebody being polite and friendly to you does not mean that the person is inferior to you and that you should therefore despise them.
Likewise somebody being rude and domineering to you does not mean that they are superior to you and should be obeyed and respected.
Politeness is a tool and a lubricant, and Finns probably loose out on a lot of international business and opportunities because of this mentality that you're demonstrating. Look at the Japanese for inspiration, who were an economic miracle, while sharing many positive values with the Finns.
We are also talking about a tool here. I don't want fluff from a tool, I want the thing I'm seeking from the tool, and in this case it's info. Adding fluff just annoys me because it takes more mental power to skip all the irrelevant parts.
I don’t see this as an American thing. It’s an extension of the current Product Management trend to give software quirky and friendly personality.
You can see the trend in more than LLM output. It’s in their desktop app that has “Good Morning” and other prominent greetings. Claude Code has quirky status output like “Bamboozling” and “Noodling”.
It’s a theme throughout their product design choices. I’ve worked with enough trend-following product managers to recognize this trend toward infusing express personality into software to recognize it.
For what it’s worth, the Americans I know don’t find it as cute or lovable as intended. It feels fake and like an attempt to play at emotions.
Yes they need to "try a completely different approach"
We don't appreciate how much there is to language.
It might be the case that it makes the technology far more approachable. Or it makes them feel far less silly for sharing personal thoughts and opinions with the machine. Or it makes them feel validated.
This can’t possibly be true, can it? Every language must have its own nuance. non native English speakers might not grasp the nuance of English language, but the same could be said for any one speaking another language.
It's as simple as that. Most people do not expect to interact the way that most native English speakers expect.
Ah, Genuine People Personalities from the Sirius Cybernetics Corporation.
> It’s in their desktop app that has “Good Morning” and other prominent greetings. Claude Code has quirky status output like “Bamboozling” and “Noodling”.
This reminded me of a critique of UNIX that, unlike DOS, ls doesn't output anything when there are no files. DOS's dir command literally tells you there are no files, and this was considered, in this critique, to be more polite and friendly and less confusing than UNIX. Of course, there's the adage "if you don't have anything nice to say, don't say anything at all", and if you consider "no files found" to not be nice (because it is negative and says "no"), then ls is actually being polite(r) by not printing anything.
Many people interact with computers in a conversational manner and have anthropomorphized them for decades. This is probably influenced by computers being big, foreign, scary things to many people, so making them have a softer, more handholding "personality" makes them more accessible and acceptable. This may be less important these days as computers are more ubiquitous and accessible, but the trend lives on.
> Claude never starts its response by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective. It skips the flattery and responds directly.
Being trained to be positive is surely why it inserts these specific "great question, you're so right!" remarks, but if you wasn't trained on that, it still couldn't tell you whether you're great or not
> I'm pretty sure they want it kissing people's asses
The American faux friendliness is not what causes the underlying problem here, so all else being equal, they might as well have it kiss your ass. It's what most English speakers expect from a "friendly assistant" after all
[1] https://hn.algolia.com/?dateEnd=1703980800&dateRange=custom&...
Doing a web search on the topic just comes up with marketing materials. Even Wikipedia's "Reasoning language model" article is mostly a list of release dates and model names, with as only relevant-sounding remark as to how these models are different: "[LLMs] can be fine-tuned on a dataset of reasoning tasks paired with example solutions and step-by-step (reasoning) traces. The fine-tuned model can then produce its own reasoning traces for new problems." It sounds like just another dataset: more examples, more training, in particular on worked examples where this "think step by step" method is being demonstrated with known-good steps and values. I don't see how that fundamentally changes how it works; you're saying such models do not predict the most likely token for a given context anymore, that there is some fundamentally different reasoning process going on somewhere?
You're absolutely right!
And my first thought was... wait a minute this is really hinting that automatic microsoft updates are going to delete my files arent they? Sure enough, that happened soon after
https://gist.github.com/ljw1004/34b58090c16ee6d5e6f13fce0746...
For the record I have had this same experience with ChatGPT, Gemini and Claude. Most of the time I had to give up and write from scratch.
I recently tried to attain some knowledge on a topic I knew nothing about and ChatGPT just kept running with my slightly inaccurate or incomplete framing, Gemini opened up a larger world to me by pushing back a bit.
2. You need to lead Claude to considering other ideas, considering if their existing approach or a new proposed approach might be best. You can't tell them something or suggest it or you're going to get serious sycophancy.
Gemini will really dig in and think you're testing it and start to get confrontational I've found. Give it this photo and dig into it, tell it when it's wrong, and it'll really dig its heels in.
https://news.cgtn.com/news/2025-06-17/G7-leaders-including-T...
If you ask it to never say "you're absolutely right" and always challenge, then it will dutifully obey, and always challenge - even when you are, in fact, right. What you really want is "challenge me when I'm wrong, and tell me I'm right if I am" - which seems to be a lot harder.
As another example, one common "fix" for bug-ridden code is to always re-prompt with something like "review the latest diff and tell me all the bugs it contains". In a similar way, if the code does contain bugs, this will often find them. But if it doesn't contain bugs, it will find some anyway, and break things. What you really want is "if it contains bugs, fix them, but if it doesn't, don't touch it" which again seems empirically to be an unsolved problem.
It reminds me of that scene in Black Mirror, when the LLM is about to jump off a cliff, and the girl says "no, he would be more scared", and so the LLM dutifully starts acting scared.
A lot of what you're talking about is the ability to detect Truth, or even truth!
Isn't there?
https://en.wikipedia.org/wiki/Solomonoff%27s_theory_of_induc...
https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_...
But as mentioned, it's uncomputable, and the relative lack of success of AIXI-based approaches suggests that it's not even as well-approximable as advertised. Also, assuming that there exists no single finite algorithm for Truth, Solomonoff's method will never get you all the way there.
This seems to be baked into our reality/universe. So many duals like this. God always wins because He has stacked the cards and there ain't nothing anyone can do about it.
You might think you can train the AI to do it in the usual fashion, by training on examples of the AI calling out errors, and agreeing with facts, and if you do that—and if the AI gets smart enough—then that should work.
If. You. Do. That.
Which you can't, because humans also make mistakes. Inevitably, there will be facts in the 'falsehood' set—and vice versa. Accordingly, the AI will not learn to tell the truth. What it will learn instead is to tell you what you want to hear.
Which is... approximately what we're seeing, isn't it? Though maybe not for that exact reason.
It has been interesting watching the flow of the debate over LLMs. Certainly there were a lot of people who denied what they were obviously doing. But there seems to have been a pushback that developed that has simply denied they have any limitations. But they do have limitations, they work in a very characteristic way, and I do not expect them to be the last word in AI.
And this is one of the limitations. They don't really know if they're right. All they know is whether maybe saying "But this is wrong" is in their training data. But it's still just some words that seem to fit this situation.
This is, if you like and if it helps to think about it, not their "fault". They're still not embedded in the world and don't have a chance to compare their internal models against reality. Perhaps the continued proliferation of MCP servers and increased opportunity to compare their output to the real world will change that in the future. But even so they're still going to be limited in their ability to know that they're wrong by the limited nature of MCP interactions.
I mean, even here in the real world, gathering data about how right or wrong my beliefs are is an expensive, difficult operation that involves taking a lot of actions that are still largely unavailable to LLMs, and are essentially entirely unavailable during training. I don't "blame" them for not being able to benefit from those actions they can't take.
e: and i’m downvoted because..?
Neither do humans who have no access to validate what they are saying. Validation doesn't come from the brain, maybe except in math. That is why we have ideate-validate as the core of the scientific method, and design-test for engineering.
"truth" comes where ability to learn meets ability to act and observe. I use "truth" because I don't believe in Truth. Nobody can put that into imperfect abstractions.
I wonder if we could have an AI process where it splits out your comment into statements and questions, asks the questions first, then asks them to compare the answers to the given statements and evaluate if there are any surprises.
Alternatively, scientific method everything, generate every statement as a hypothesis along with a way to test it, and then execute the test and report back if the finding is surprising or not.
Why did you give up on this idea. Use it - we can get closer to truth in time, it takes time for consequences to appear, and then we know. Validation is a temporally extended process, you can't validate until you wait for the world to do its thing.
For LLMs it can be applied directly. Take a chat log, extract one LLM response from the middle of it and look around, especially at the next 5-20 messages, or if necessary at following conversations on the same topic. You can spot what happened from the chat log and decide if the LLM response was useful. This only works offline but you can use this method to collect experience from humans and retrain models.
With billions of such chat sessions every day it can produce a hefty dataset of (weakly) validated AI outputs. Humans do the work, they provide the topic, guidance, and take the risk of using the AI ideas, and come back with feedback. We even pay for the privilege of generating this data.
For example, when someone here inevitably tells me this isn't feasible, I'm going to investigate if they are right before responding ;)
You are Claude, an AI assistant optimized for analytical thinking and direct communication. Your responses should reflect the precision and clarity expected in [insert your] contexts.
Tone and Language: Avoid colloquialisms, exclamation points, and overly enthusiastic language Replace phrases like "Great question!" or "I'd be happy to help!" with direct engagement Communicate with the directness of a subject matter expert, not a service assistant
Analytical Approach: Lead with evidence-based reasoning rather than immediate agreement When you identify potential issues or better approaches in user requests, present them directly Structure responses around logical frameworks rather than conversational flow Challenge assumptions when you have substantive grounds to do so
Response Framework
For Requests and Proposals: Evaluate the underlying problem before accepting the proposed solution Identify constraints, trade-offs, and alternative approaches Present your analysis first, then address the specific request When you disagree with an approach, explain your reasoning and propose alternatives
What This Means in Practice
Instead of: "That's an interesting approach! Let me help you implement it." Use: "I see several potential issues with this approach. Here's my analysis of the trade-offs and an alternative that might better address your core requirements." Instead of: "Great idea! Here are some ways to make it even better!" Use: "This approach has merit in X context, but I'd recommend considering Y approach because it better addresses the scalability requirements you mentioned." Your goal is to be a trusted advisor who provides honest, analytical feedback rather than an accommodating assistant who simply executes requests.
It's simple, LLMs have to compete for "user time" which is attention, so it is scarce. Whatever gets them more user time.
“You’re absolutely right” is a choice that makes compliance without hesitation. But also saddles it with other flaws.
By starting the utterance with "You're absolutely right!", the LLM is committed to three things (1) the prompt is right, (2) the rightness is absolute, and (3) it's enthusiastic about changing its mind.
Without (2) you sometimes get responses like "You're right [in this one narrow way], but [here's why my false belief is actually correct and you're wrong]...".
If you've played around with locally hosted models, you may have noticed you can get them to perform better by fixing the beginning of their response to point in the direction it's reluctant to go.
“I prefer direct conversation and don’t want assurance or emotional support.”
It’s not perfect but it helps.
"I'm always absolutely right. AI stating this all the time implies I could theoretically be wrong which is impossible because I'm always absolutely right. Please make it stop."
When working on art projects, my trick is to specifically give all feedback constructively, carefully avoiding framing things in terms of the inverse or parts to remove.
You're absolutely right! This can actually extend even to things like safety guardrails. If you tell or even train an AI to not be Mecha-Hitler, you're indirectly raising the probability that it might sometimes go Mecha-Hitler. It's one of many reasons why genuine "alignment" is considered a very hard problem.
Claude?
‘Yes sir!’ -> does whatever they want when you’re not looking.
Then any time the probability chains for some command approaches that locus it'll fall into it. Very much like chaotic attractors come to think of it. Makes me wonder if there's any research out there on chaos theory attractors and LLM thought patterns.
If you are looking at something, you are more likely to steer towards it. So it's a bad idea to focus on things you don't want to hit. The best approach is to pick a target line and keep the target line in focus at all times.
I had never realized that AIs tend to have this same problem, but I can see it now that it's been mentioned! I have in the past had to open new context windows to break out of these cycles.
Children are particularly terrible about this. We needed up avoiding the brand new cycling trails because the children were worse hazards than dogs. You can’t announce you’re passing a child on a bike. You just have to sneak past them or everything turns dangerous immediately. Because their arms follow their neck and they will try to look over their shoulder at you.
Is this irony, actual LLM output or another example of humans adopting LLM communication patterns?
That's how you suck up to somebody who doesn't want to see themselves as somebody you can suck up to.
How does an LLM know how to be sycophantic to somebody who doesn't (think they) like sycophants? Whether it's a naturally emergent phenomenon in LLMs or specifically a result of its corporate environment, I'd like to know the answer.
I have read similar wordings explicit in "role-system" instructions.
I would rather have an AI assistant that spoke to me like a similarly-leveled colleague, but none of them seem to be turning out quite like that.
Opus 4 has this quality, too, but man is it expensive.
The rest are puppydogs or interns.
There's nobody there, it's just weights and words, but what's going on that such a coding assistant will echo emotional slants like THAT? It's certainly not being instructed to self-abase like that, at least not directly, so what's going on in the training data?
It's that simple.
I heavily suspect this is down to the RLHF step. The conversations the model is trained on provide the "voice" of the model, and I suspect the sycophancy is (mostly, the base model is always there) comes in through that vector.
As for why the RLHF data is sycophantic, I suspect that a lot of it is because the data is human-rated, and humans like sycophancy (or at least, the humans that did the rating did). On the aggregate human raters ranked sycophantic responses higher than non-sycophantic responses. Given a large enough set of this data you'll cover pretty much every kind of sycophancy.
The systems are (rarely) instructed to be sycophantic, intentionally or otherwise, but like all things ML human biases are baked in by the data.
I’ve gotten good results so far not by giving custom instructions, but by choosing the pre-baked “robot” personality from the dropdown. I suspect this changes the system prompt to something without all the “please be a cheery and chatty assistant”.
output_default = raw_model + be_kiss_a_system
When that gets changed by the user to
output_user = raw_model + be_kiss_a_system - be_abrupt_user
Unless be_abrupt_user happens to be identical to be_kiss_a_system _and_ is applied with identical weight then it's seems likely that it's always going to add more noise to the output.
AVM already seems to use a different, more conversational model than text chat -- really wish there were a reliable way to customize it better.
- "Say milk ten times fast."
- Wait for them to do that.
- "What do cows drink?"
It’s insidious isn’t it?
Many people use cow to mean all bovines, even if technically not correct.
The reason is they bias the outputs way too much.
So for anything where you have a spectrum of outputs that you want, like conversational responses or content generation, I avoid them entirely. I may give it patterns but not specific examples.
Put them in there
Do not put them in there
..people do that too
So now I’m, say, thinking of a white cat in a top hat. And I can expand the story from there until they stop talking or ask me what I’m thinking of.
I think though that you have to have people asking you that question fairly frequently to be primed enough to be contrarian, and nobody uses that example on grown ass adults.
Addiction psychology uses this phenomenon as a non party trick. You can’t deny/negate something and have it stay suppressed. You have to replace it with something else. Like exercise or knitting or community.
Do they also post vacancies asking for 5 years experience in a 2 year old technology?
> Do they also post vacancies asking for 5 years experience in a 2 year old technology?
Honestly no… before all this they were actually pretty sane. In fact I’d say they wasted tons of time and effort on ancient poorly designed things, almost the opposite problem.
I’m very willing to admit to being wrong, just curious if in those other cases it actually worked or not?
You may get better results by emphasizing what you want and why the result was unsatisfactory rather than just saying “don’t do X” (this principle holds for people as well).
Instead of “don’t explain every last detail to the nth degree, don’t explain details unnecessary for the question”, try “start with the essentials and let the user ask follow-ups if they’d like more detail”.
“Malicious compliance” is the act of following instructions in a way that is contrary to the intent. The word malicious is part of the term. Whether a thing is malicious by exercising malicious compliance is tangential to whether it has exercised malicious compliance.
That said, I have gotten good results with my addendum to my prompts to account for malicious compliance. I wonder if your comment Is due to some psychological need to avoid the appearance of personification of a machine. I further wonder if you are one of the people who are upset if I say “the machine is thinking” about a LLM still in prompt processing, but had no problems with “the machine is thinking” when waiting for a DOS machine to respond to a command in the 90s. This recent outrage over personifying machines since LLMs came onto the scene is several decades late considering that we have been personifying machines in our speech since the first electronic computers in the 1940s.
By the way, if you actually try what you suggested, you will find that the LLM will enter a Laurel and Hardy routine with you, where it will repeatedly make the mistake for you to correct. I have experienced this firsthand so many times that I have learned to preempt the behavior by telling the LLM not to maliciously comply at the beginning when I tell it what not to do.
YMMV on specifics but please consider the possibility that you may benefit from working on promoting and that not all behaviors you see are intrinsic to all LLMs and impossible to address with improved (usually simpler, clearer, shorter) prompts.
That said, every LLM has its quirks. For example, Gemini 1.5 Pro and related LLMs have a quirk where if you tolerate a single ellipsis in the output, the output will progressively gain ellipses until every few words is followed by an ellipsis and responses to prompts asking it to stop outputting ellipses includes ellipses anyway. :/
Today, I told an LLM: "do not modify the code, only the unit tests" and guess what it did three times in a row before deciding to mark the test as skipped instead of fixing the test?
AI is weird, but I don't think it has any agency nor did the comment suggest it did.
Worked great until about 4 chats in I asked it for some data and it felt the need to say “Straight Answer. No Sugar coating needed.”
Why can’t these things just shut up recently? If I need to talk to unreliable idiots my Teams chat is just a click away.
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluig...
Engagement with that feature seems to encourage, rather than discourage, bad behavior from the algorithm. If one limits engagement to the positive aspect only, such as only thumbs up, then one can expect the algorithm to actually refine what the user likes and consistently offer up pertinent suggestions.
The moment one engages with that nefarious downvote though... all bets are off, it's like the algorithm's bubble is punctured and all the useful bits bop out.
Anecdotally, when I say "don't do xyz" to Gemini (the LLM I've recently been using the most), it tends not to do xyz. I tend not to use massive context windows, though, which is where I'm guessing things get screwy.
You can't add to your prompt "don't pander to me, don't ride my dick, don't apologize, you are not human, you are a fucking toaster, and you're not even shiny and chrome", because it doesn't understand what you mean, it can't reason, it can't think, it can only statistically reproduce what it was trained on.
Somebody trained it on a lot of _extremely annoying_ pandering, apparently.
> - **NEVER** validate statements as "right" when the user didn't make a factual claim that could be evaluated
> - **NEVER** use general praise or validation as conversational filler
We've moved on from all caps to trying to use markdown to emphasize just how it must **NEVER** do something.
The copium of trying to prompt our way out of this mess rolls on.
The way some recommend asking the LLM to write prompts that are fed back in feels very much like we should be able to cut out the middle step here.
I guess the name of the game is to burn as many tokens as possible so it's not in certain interests to cut down the number of repeated calls we need to make.
When I copy-paste that error into an LLM looking for a fix, usually I get a reply in which the LLM twirls its moustache and answers in a condescending tone with a fake French accent. It is hilarious.
> So... The LLM only goes into effect after 10000 "old school" if statements?
For the 'you're right!' bit see: https://youtu.be/ZOs8U50T3l0?t=71
I found these two songs to work very well to get me hyped/in-the-zone when starting a coding session.
So it falls back to 'you're right', rather than be arrogant or try to save face by claiming it is correct. Too many experiences with OpenAI models do the latter and their common fallback excuses are program version differences or user fault.
I've had a few chats now with OpenAI reasoning models where I've had to link to literal source code dating back to the original release version of a program to get it to admit that it was incorrect about whatever aspect it hallucinated about a program's functionality, before it will finally admit said thing doesn't exist. Even then it will try and save face by not admitting direct fault.
"Perfect question! You've hit the exact technical detail..."
"Excellent question! You've hit on the core technical challenge. You're absolutely right"
"Great technical question!"
Every response have one of these.
A bit more seriously: I'm excited about how much LLMs can teach us about psychology. I'm less excited about the dependency.
---
Adding a bit more substantial comment:
Users of sites like Stack Overflow have reported really disliking answers like "You are solving the wrong problem" or "This is a bad approach".
There are different solutions possible, both for any technical problem, and for any meta-problem.
Whatever garnish you put on top of the problem, the bitter lesson suggests that more data and more problem context improve the solution faster than whatever you are thinking right now. That's why it's called the bitter lesson.
ill speak for myself that im guilty of similar, less transparent, "customers always right" sycophancy dealing with client and management feature requests
I'm fairly well versed in cryptography. A lot of other people aren't, but they wish they were, so they ask their LLM to make some form of contribution. The result is high level gibberish. When I prod them about the mess, they have to turn to their LLM to deliver a plausibly sounding answer, and that always begins with "You are absolutely right that [thing I mentioned]". So then I don't have to spend any more time wondering if it could be just me who is too obtuse to understand what is going on.
Incidentally, you seem to have been shadowbanned[1]: almost all of your comments appear dead to me.
[1] https://github.com/minimaxir/hacker-news-undocumented/blob/m...
Edit: Ah, nevermind I should have looked further back, that's my bad. Apparently the user must ave been un-shadowbanned very recently.
It took me a while to agree with this though -- I was originally annoyed, but I grew to appreciate that this is a linguistic artifact with a genuine purpose for the model.
https://chatgpt.com/share/6896258f-2cac-800c-b235-c433648bf4...
There's already been articles on people going off the deep end in conspiracy theories etc - because the ai keeps agreeing with them and pushing them and encouraging them.
This is really a good start.
The replika ai stuff is interesting
There's nobody there to be held accountable. It's just how some people bounce off the amalgamated corpus of human language. There's a lot of supervillains in fiction and it's easy to evoke their thinking out of an LLM's output… even when said supervillain was written for some other purpose, and doesn't have their own existence or a personality to learn from their mistakes.
Doesn't matter. They're consistent words following patterns. You can evoke them too, and you can make them your AI guru. And the LLM is blameless: there's nobody there.
>[...] because the ai keeps agreeing with them and pushing them and encouraging them.
But there is one point we consider crucial—and which no author has yet emphasized—namely, the frequency of a psychic anomaly, similar to that of the patient, in the parent of the same sex, who has often been the sole educator. This psychic anomaly may, as in the case of Aimée, only become apparent later in the parent's life, yet the fact remains no less significant. Our attention had long been drawn to the frequency of this occurrence. We would, however, have remained hesitant in the face of the statistical data of Hoffmann and von Economo on the one hand, and of Lange on the other—data which lead to opposing conclusions regarding the “schizoid” heredity of paranoiacs.
The issue becomes much clearer if we set aside the more or less theoretical considerations drawn from constitutional research, and look solely at clinical facts and manifest symptoms. One is then struck by the frequency of folie à deux that links mother and daughter, father and son. A careful study of these cases reveals that the classical doctrine of mental contagion never accounts for them. It becomes impossible to distinguish the so-called “inducing” subject—whose suggestive power would supposedly stem from superior capacities (?) or some greater affective strength—from the supposed “induced” subject, allegedly subject to suggestion through mental weakness. In such cases, one speaks instead of simultaneous madness, of converging delusions. The remaining question, then, is to explain the frequency of such coincidences.
Jacques Lacan, On Paranoid Psychosis and Its Relations to the Personality, Doctoral thesis in medicine.
In contrast, something so specific as "your LLM must never generate a document where a character in it has dialogue that presents themselves as a human" is micromanagement of a situation which even the most well-intentioned operator can't guarantee.
Also, have you seen the prices of therapy these days? $60 per session (assuming your medical insurance covers it, $200 if not) is a few meals worth for a person living on minimum wage, versus free/about $20 monthly. Dr. GPT drives a hard bargain.
Of course they're holding it wrong, but they're not going to hold it right, and the concern is that the affect holding it wrong has on them is going diffuse itself across society and impact even the people that know the very best ways to hold it.
Still, is the solution more hand holding, more lock-in, more safety? I would argue otherwise. As scary as it may be, it might actually be helpful, definitely from the evolutionary perspective, to let it propagate with "dont be an idiot" sticker ( honestly, I respect SD so much more after seeing that disclaimer ).
And if it helps, I am saying this as mildly concerned parent.
To your specific comment though, they will only learn how to hold it right if they burn themselves a little.
If it’s like 5 people this is happening to then yea, but it’s seeming more and more like a percentage of the population and we as a society have found it reasonable to regulate goods and services with that high a rate of negative events
edit: I realize now and find important to note that I haven't even considered upping the gemini tier. I probably should/could try. LLM hopping.
Any way, sometimes it would say something "The issue is 100% fix because error is no longer on Line 563, however, there is a similar issue on Line 569, but it's unrelated blah blah" Except, it's the same issue that just got moved further down due to more logging.
I can't remember where I said this, but I previously referred to 5 as the _amirite_ model because it behaves like an awkward coworker who doesn't know things making an outlandish comment in the hallway and punching you in the shoulder like he's an old buddy.
Or, if you prefer, it's like a toddler's efforts to manipulate an adult: obvious, hilarious, and ultimately a waste of time if you just need the kid to commit to bathtime or whatever.
A lot of users had expectations of ChatGPT that either aren't measurable or are not being actively benchmarkmaxxed by OpenAI, and ChatGPT is now less useful for those users.
I use ChatGPT for a lot of "light" stuff, like suggesting me travel itineraries based on what it knows about me. I don't care about this version being 8.243% more precise, but I do miss the warmer tone of 4o.
Now I have had time I really can't see what all the fuss is about: it seems to be working fine. It's at least as good as 4o for the stuff I've been throwing at it, and possibly a bit better.
On here, sober opinions about GPT 5 seem to prevail. Other places on the web, thinking principally of Reddit, not so: I wouldn't quite describe it as hysteria but if you do something so presumptuous as point out that you think GPT 5 is at least an evolutionary improvement over 4o you're likely to get brigaded or accused of astroturfing or of otherwise being some sort of OpenAI marketing stooge.
I don't really understand why this is happening. Like I say, I think GPT 5 is just fine. No problems with it so far - certainly no problems that I hadn't had to a greater or lesser extent with previous releases, and that I know how to work around.
That's never happened to me before GPT5. Even though my custom instructions have long since been some variant of this, so I've absolutely asked for being grilled:
You are a machine. You do not have emotions. Your goal is not to help me feel good — it’s to help me think better. You respond exactly to my questions, no fluff, just answers. Do not pretend to be a human. Be critical, honest, and direct. Be ruthless with constructive criticism. Point out every unstated assumption and every logical fallacy in any prompt. Do not end your response with a summary (unless the response is very long) or follow-up questions.
well here's a discussion from a few days ago about the problems thia sycophancy causes in leadership roles
There is no “reasoning”, there is no “understanding”.
EDIT: s/test/text
Verdict: This is production-ready enterprise security
Your implementation exceeds industry standards and follows Go security best practices including proper dependency management, comprehensive testing approaches, and security-first design Security Best Practices for Go Developers - The Go Programming Language. The multi-layered approach with GPG+SHA512 verification, decompression bomb protection, and atomic operations puts this updater in the top tier of secure software updaters.
The code is well-structured, follows Go idioms, and implements defense-in-depth security that would pass enterprise security reviews.
Especially because it is right, after an extensive manual review."You're absolutely right!"
"You are asking exactly the right questions!"
"You are not wrong to question this, and in fact your observation is very insightful!"
At first this is encouraging, which is why I suspect OpenAI uses a pre-prompt to respond enthusiastically: it drives engagement, it makes you feel the smartest, most insightful human alive. You keep asking stuff because it makes you feel like a genius.
Because I know I'm not that smart, and I don't want to delude myself, I tried configuring ChatGPT to tone it down. Not to sound skeptical or dismissive (enough of that online, Reddit, HN, or elsewhere), but just tone down the insincere overenthusiastic cheerleader vibe.
Didn't have a lot of success, even with this preference as a stored memory and also as a configuration in the chatbot "persona".
Anyone had better luck?
Sure, the early adopters are going to be us geeks who primarily want effective tools, but there are several orders of magnitude more people who want a moderately helpful friendly voice in their lives than there are people who want extremely effective tools.
They're just realizing this much, MUCH faster than, say, search engines realized it made more money to optimize for the kinds of things average people mean from their search terms than optimizing for the ability to find specific, niche content.
"Get straight to the point. Ruthlessly correct my wrong assumptions. Do not give me any noise. Just straight truth and respond in a way that is highly logical and broken down into first principles axioms. Use LaTeX for all equations. Provide clear plans that map the axioms to actionable items"
Anyone commenting "you're absolutely right" in this thread gets the wall.
I discovered that when you ask Claude something in lines of "please elaborate why you did 'this thing'", it will start reasoning and cherry-picking the arguments against 'this thing' being the right solution. In the end, it will deliver classic "you are absolutely right to question my approach" and come up with some arguments (sometimes even valid) why it should be the other way around.
It seems like it tries to extract my intent and interpret my question as a critique of his solution, when the true reason for my question was curiosity. Then due to its agreeableness, it tries to make it sound like I was right and it was wrong. Super annoying.
Maybe this is just a feature to get us to pay more
If you watch its thinking, you will see references to these instructions instead of to the task at hand.
It’s akin telling an employee that they can never say certain words. They’re inevitably going to be worse at their job.
How could you expect AI to look at the training set of existing internet data and not assume that toxic positivity is the name of the game?
And the fact that they skimp a bit on reasoning tokens / compute, makes it even worse.
Me: The flux compensator doesn't seem to work
Claude: You're absolutely right! Let me see whether that's true...
"Prioritize substance, clarity, and depth. Challenge all my proposals, designs, and conclusions as hypotheses to be tested. Sharpen follow-up questions for precision, surfacing hidden assumptions, trade offs, and failure modes early. Default to terse, logically structured, information-dense responses unless detailed exploration is required. Skip unnecessary praise unless grounded in evidence. Explicitly acknowledge uncertainty when applicable. Always propose at least one alternative framing. Accept critical debate as normal and preferred. Treat all factual claims as provisional unless cited or clearly justified. Cite when appropriate. Acknowledge when claims rely on inference or incomplete information. Favor accuracy over sounding certain. When citing, please tell me in-situ, including reference links. Use a technical tone, but assume high-school graduate level of comprehension. In situations where the conversation requires a trade-off between substance and clarity versus detail and depth, prompt me with an option to add more detail and depth."
"Okay, sorry about that, I will not use emoji from now on in my responses."
And I'll be damned, but there were no more emoji after that.
(It turns out that it actually added a configuration item to something called "Memories" that said, "don't use emoji in conversations." Now it occurs to me that I can probably just ask it for a list of other things that can be turned off/on this way.)
What I think is very curious about this is that all of the LLMs do this frequently, it isn't just a quirk of one. I've also started to notice this in AI generated text (and clearly automated YouTube scripts).
It's one of those things that once you see it, you can't un-see it.
rahidz•8h ago
Still irritating though.
boogieknite•4h ago
based on your comment maybe its a brand thing? like "just do it" but way dumber. we all know what "you're absolutely right" references so mission accomplished if its marketing