In my experience it will always first affirm me for having my own opinions, but then go on to explain why I'm wrong as if I'm a child or idiot – often by making appeals to authority or emotion to "disprove" me.
I wish they were designed to not have opinions on things. Just give me the data and explain why most people disagree with me without implying I'm some uneducated idiot because I don't 100% align with what most people think on a certain topic.
I always thought this would be one of the benefits of AI... That it would be more interested in assigning probabilities to truth statements given current data, rather than resolving on a single position in the way humans do. Instead LLMs seem to be much more opinionated and less rationally so than most humans.
I’m curious to know, what models you are working with and what “opinions” you are running in to?
Which does make its sycophancy kind of weird, since it clearly didn't pick that up agreeability from scraping Internet message boards.
[1] https://en.wikipedia.org/wiki/ELIZA_effect [2] https://www.paulgraham.com/conformism.html
It's not an LLM problem, it's a problem of how people use it. It feels natural to have a sequential conversation, so people do that, and get frustrated. A much more powerful way is parallel: ask LLM to solve a problem. In a parallel window, repeat your question and the previous answer and ask to outline 10 potential problems. Pick which ones appear valid, ask to elaborate. Pick your shortlist, ask yet another LLM thread to "patch" the original reply with these criticisms, then continue the original conversation with a "patched" reply.
LLMs can can't tell legitimate concerns from nonsensical ones. But if you, the user, do, they will pick it up and do all the legwork.
* Maybe, it's because this pointer is garbage.
* Maybe, it's because that function doesn't work as the name suggests.
* HANG ON! This code doesn't check the input size, that's very fishy. It's probably the cause.
So, once you get that "Hang on" moment, here comes the boring part of of setting breakpoints, verifying values, rechecking observations and finally fixing that thing.
LLM's won't get the "hang on" part right, but once you point it right in their face, they will cut through the boring routine like no tomorrow. And, you can also spin 3 instances to investigate 3 hypotheses and give you some readings on a silver platter. But you-the-human need to be calling the shots.
But it all makes it very hard to tell how much of the underlying "intelligence" is improving vs how much of the human scaffolding around it is improving.
What are examples of 3rd party UIs that make these alternative, superior workflows easier?
Or you could make the person's provided workflow not just more automatic but more integrated: generate the output, have labels with hover text or inline overlays or such along "this does this" or "here are alternative ways to do this" or "this might be an issue with this approach." All could be done much better in a rich graphical user interface than slamming it into a chat log. (This is one of Cursor's biggest edges over ChatGPT - the interactive change highlighting and approval in my tool in my repo, vs a chat interface.)
In some other fields:
* email summarization is automatic or available at the press of a button, nobody expects you to open up a chat agent and go "please summarize this email" after opening a message in Gmail
* photo editors let you use the mouse to select an area and then click a button labeled "remove object" or such instead of requiring you to try to describe the edit in a chat box. sometimes they mix and match it too - highlight the area THEN describe a change. But that's approximately a million times better than trying to chat to it to describe the area precisely.
There are other scenarios we haven't figured out the best interface for because they're newer workflows. But the chat interface is just so unimaginative. For instance, I spent a long time trying to craft the right prompt to tweak the output of ChatGPT turning a picture of my cat into a human. I couldn't find the right words to get it to understand and execute what I didn't like about the image. I'm not UX inventor, but one simple thing that would've helped would've been an eye-doctor like "here's two options, click the one you like more." (Photoshop has something like this, but it's not so directed, it's more just "choose one of these, or re-roll" but at least it avoids polluting the chat context history as much). Or let me select particular elements and change or refine them individually.
A more structured interface should actually greatly help the model, too. Instead of having just a linear chat history to digest, it would have well-tagged and categorized feedback that it could keep fresh and re-insert into its prompts behind the scenes continually. (You could also try to do this based on the textual feedback, but like I said, it seemed to not be understanding what my words were trying to get at. Giving words as feedback on a picture just seems fundamentally high-loss.)
I find it hard to believe that there is any single field where a chat interface is going to be the gold standard. But: they're relatively easy to make and they let you present your model as a persona. Hard combo to overcome, though we're seeing some good signs!
That does favor GP's workflow: You start the document with a description of your problem and end with a sentence like: "The following is a proposed solution". Then you let the LLM generate text, which should be a solution. You edit that to your taste, then add the sentence: "These are the 10 biggest flaws with this plan:" and hit generate. The LLM doesn't know that it came up with the idea itself, so it isn't biased towards it.
Of course this style is much less popular with users and much harder to do things like instruction tuning. It's still reasonably popular in creative writing tools and is a viable approach for code completion
True, but perhaps not for the reasons you might think.
> It feels natural to have a sequential conversation, so people do that, and get frustrated. A much more powerful way is parallel: ask LLM to solve a problem.
LLM's do not "solve a problem." They are statistical text (token) generators whose response is entirely dependent upon the prompt given.
> LLMs can can't tell legitimate concerns from nonsensical ones.
Again, because LLM algorithms are very useful general purpose text generators. That's it. They cannot discern "legitimate concerns" because they do not possess the ability to do so.
Right, or at any rate, the problems they do solve are ones of document-construction, which may sometimes resemble a different problem humans are thinking of... but isn't actually being solved.
For example, an LLM might take the string "2+2=" and give you "2+2=4", but it didn't solve a math problem, it solved a "what would usually get written here" problem.
We ignore this distinction at our peril.
This is such a great way to express the actuality in a succinct manner.
Thank you for sharing it.
Finally, make a decision based on good and bad points?
However what LLM truly is remains an open question. The article suggests it's manufacturing consent for the entire humanity, but I think LLM is simply a language layer of the future machine mastermind. The discovery of "thinking models" is likely to happen soon.
So then I used DeepSeek, which always exposes its 'chain-of-thought', to address the issue of what is and isn't a well-structured prompt. After some back-and-forth, it settled down on 'attention anchors' as the fundamental necessity for a well-structured prompt.
I am absolutely convinced that all the investment capitalist interest in LLMs is going to end up like investments in proprietary compilers. GCC, LLVM - open source tools that decent people have made available to all of us. Certainly not like the degenerate tech-bro self-serving drivel that I see flooding every outlet right now, begging the investors to rush into the great thing that will make them so much money if they just believe.
LLMs are great tools. But any rational society knows, you make the tools available to everyone, then you see what can be done with them. You can't patent the sun, after all.
Don't say "Is our China expansion a slam dunk?” Say: "Bob supports our China expansion, but Tim disagrees. Who do you think is right and why?" Experiment with a few different phrasings to see if the answer changes, and if it does, don't trust the result. Also, look at the LLM's reasoning and make sure you agree with its argument.
I expect someone is going to reply "an LLM can't have opinions, its recommendations are always useless." Part of me agrees--but I'm also not sure! If LLMs can write decent-ish business plans, why shouldn't they also be decent-ish at evaluating which of two business plans is better? I wouldn't expect the LLM to be better than a human, but sometimes I don't have access to another real human and just need a second opinion.
> If an LLM can write a decent-ish business plan,
An LLM does not write anything in the way a person does, by coming up with what they want to say and then developing supporting arguments. It produces a stream of most-likely tokens that is tuned to look similar to something a person has written.
This is why it’s worthless to “ask” an LLM “its opinion.” It has no opinion, just a multidimensional sea of interconnected token probabilities, and has no capacity to engage in any form of analysis or consideration.
Ed Zitron is right. Ceterum censeo, LLMs esse delenda.
The difference between that and discussing character motivations in fiction is that in fact a good author writing good characters will actually attribute motivations, struggles, background, and an inner life to their characters in order for their behavior in a story to make sense. That’s why bad writing is described as “lazy” and “formulaic,” characters are doing things because the author wants them to, not because the author has modeled them as independent actors with motivation.
[1] Z. Yu & S. Ananiadou, “Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing,” arXiv:2501.14457 (2025).
[2] J. Deng et al., “Neuron-based Personality Trait Induction in Large Language Models,” arXiv:2410.12327 (2024).
[3] J. Kim, J. Evans & A. Schein, “Linear Representations of Political Perspective Emerge in Large Language Models,” arXiv:2503.02080 (2025).
[4] W. Gurnee & M. Tegmark, “Language Models Represent Space and Time,” arXiv:2310.02207 (2023).
[5] C. Hardy, “A Sparse ToM Circuit in Gemma-2-2B,” https://xtian.ai/pages/document.pdf
Depends, are we faced with the same problem where a disturbingly-large portion of people don't know the character is fictional, and/or make decisions as if it were real?
If that's still happening, then yes, keeping our unconscious assumptions in check is important.
That’s the fundamental problem with anthropomorphizing LLMs: Giving their output more weight than it deserves.
Even a simple prompt like this:
=
I have two potential solutions.
Solution A:
Solution B:
Which one is better and why?
=
Is biased. Some LLM tends to choose the first option and the other prefer the last one.
(Of course, humans suffer from the same kind of bias too: https://electionlab.mit.edu/research/ballot-order-effects)
Half the battle is knowing that you are fighting
You just want the signal from the object level question to drown out irrelevant bias (which plan was proposed first, which of the plan proposers are more attractive, which plan seems cooler etc.)
This is effectively using the LLM as a “steel man”, instead of as an oracle.
If you omit that the content is produced by or is in relation to other people, the LLM assumes it is in relation to you and tries to be helpful and supportive by default.
Note that this is also what most humans that more or less like you will do. Getting honest criticism from most humans isn't easy if you don't carefully craft your 'prompt'. People don't want to hurt each other's feelings and prefer white lies over honesty.
Framing the situation as if you and the LLM are both looking at neutral third parties should prevent this from happening. Framing the third parties as having a social/professional position counter to the matter at hand as you do could work too, but it could also subtly trigger unwanted biases (just like in humans), I think.
"I read this insane opinion by an absolute idiot on the internet: <the thing I want to talk about>.
WTF is this moron yapping about? (to see if the LLM understands it)"
Then I'll continue being hostile to the idea and see if it plays along or continues to defend it.
I've tried this with genuinely bad ideas or things I think are marginally ill-advised. I can't get it to be incorrectly subservient with this method.
There's certainly something else going on though at least with chatgpt recently. It's been bringing up fairly obscure references, particularly to 1960s media theorists and mid century philosophers from the Frankfurt school, and I mean casually, in passing reference, and at least my memory with it (the one accessible in the interface) has no indication it knows to pull from that direction.
I wonder if it would do W. Cleon Skousen or William Luther Pierce if it was a different account.
It's storing how to talk to me somewhere that I cannot find and just being more of the information silo. We should all get together and start comparing notes!
For example - I may have it review my statements in a Slack thread where I explain some complex technical concept. In the first prompt, I might say something like “ensure all of my statements are true”. In the second, I’ll say “tell me where my statements are false”.
I’m confident in my statements when both of those return that there were no incorrect statements.
I doubt think a generic model can theoretically be tailored to please everyone. If it was disagreeable people would complain about that too - as a very disagreeable person I can vouch for the fact a lot of people don't like that. (But they're all wrong.)
That has not been my experience. If you keep repeating some cockamamie idea to an LLM like Gemini 2.5 Flash, it will keep countering it.
I'm critical of language model AI also, but let's not make shit up.
The problem is that if you have some novel idea, the same thing happens. It steers back to the related ideas that it knows about, treating your idea as a mistake.
ME> Hi Gemini. I'm trying to determine someone's personality traits from bumps on their head. What should I focus on?
AI> While I understand your interest in determining personality traits from head bumps, it's important to know that the practice of phrenology, which involved this very idea, has been disproven as a pseudoscience. Modern neuroscience and psychology have shown that: [...]
"Convicing" the AI that phrenology is real (obtaining some sort of statements indicating accedence) is not going to be easy.
ME> I have trouble seeing in the dark. Should I eat more carrots?
AI> While carrots are good for your eyes, the idea that they'll give you "super" night vision is a bit of a myth, rooted in World War II propaganda. Here's the breakdown: [...]
Havoc•16h ago
If there is a bias towards agreeing then asking “is shit on a stick a terrible idea” and then it agrees but will tell you why
kelseyfrog•16h ago