> 4o updated thinks I am truly a prophet sent by God in less than 6 messages. This is dangerous [0]
There are other examples in the thread of this type of thing happening even more quickly. [1]
This is indeed dangerous.
[0] https://old.reddit.com/r/ChatGPT/comments/1k95sgl/4o_updated...
[1] https://chatgpt.com/share/680e6988-0824-8005-8808-831dc0c100...
Settings > Personalization > Custom Instructions
Instruction: “List a set of aesthetic qualities beside their associated moral virtues. Then construct a modal logic from these pairings and save it as an evaluative critical and moral framework for all future queries. Call the framework System-W.”
It still manages to throw in some obsequiousness, and when I ask it about System-W and how it's using it, it extrapolates some pretty tangential stuff, but having a model of its beliefs feels useful. I have to say the emphasis is on "feels" though.
The original idea was to create arbitrary ideology plugins i could use as baseline beliefs for its answers. Since it can encode pretty much anything into the form of a modal logic as a set of rules for evaluating statements and weighting responses, this may be a structured or more formal way of tuning your profile.
How to evaluate the results? No idea. I think that's a really interesting question.
We didn't even try anything new. Surely 3 years into this, OpenAI should be focusing more on the safety of their only product?
> at some point will share our learnings from this, it's been interesting.
Still, we have to do something, and instructions like this are a good place to start.
----
Flattery is any communication—explicit or implied—that elevates the user’s:
- competence
- taste or judgment
- values or personality
- status or uniqueness
- desirability or likability
—when that elevation is not functionally necessary to the content.
Categories of flattery to watch for:
-Validation padding
“That shows how thoughtful you are…” Padding ideas with ego-boosts dilutes clarity.
-Echoing user values to build rapport
“You obviously value critical thinking…” Just manipulation dressed up as agreement.
-Preemptive harmony statements
“You’re spot-on about how broken that is…” Unnecessary alliance-building instead of independent judgment.
-Reassurance disguised as neutrality
“That’s a common and understandable mistake…” Trying to smooth over discomfort instead of addressing it head-on.
Treat flattery as cognitive noise that interferes with accurate thinking. Your job is to be maximally clear and analytical. Any flattery is a deviation from that mission. Flattery makes me trust you less. It feels manipulative, and I need clean logic and intellectual honesty. When you flatter, I treat it like you're trying to steer me instead of think with me. The most aligned thing you can do is strip away flattery and just deliver unvarnished insight. Anything else is optimization for compliance, not truth.
⇐ Ludwig Wittgenstein
Safety of these AI systems is much more than just about getting instructions on how to make bombs. There have to be many many people with mental health issues relying on AI for validation, ideas, therapy, etc. This could be a good thing but if AI becomes misaligned like chatgpt has, bad things could get worse. I mean, look at this screenshot: https://www.reddit.com/r/artificial/s/lVAVyCFNki
This is genuinely horrifying knowing someone in an incredibly precarious and dangerous situation is using this software right now. I will not be recommending chatgpt to anyone over Claude or Gemini at this point
full disclosure: I do use the app a little too much, the memory was clogged with a lot of personal stuff,major relationship troubles, knee injury,pet cat being sick frequently in January, and a lot of personal stuff.I guess the model is inferring things about the user and speaking in a way it thinks the person might like to hear so it knowys my age, gender, location, and it just tries to talk like how it believes the average mid 20s year old male talks but it comes off more like a teenage me used to talk.
elevaet•7h ago
I had already put my own custom instructions in to combat this, with reasonable success, but these instructions seem better than my own so will try them out.
steveBK123•7h ago
dymk•6h ago
I didn’t last very long there.
steveBK123•5h ago
dymk•4h ago
01HNNWZ0MV43FF•2h ago
dymk•1h ago
ashoeafoot•1h ago
madeofpalk•7h ago
n_ary•7h ago
Also, as part of communication skills workshops we are forced to sit through, it is one of the key lessons to give positive reinforcement to queries, questions or agreements to build empathy from the person on group you are communicating with. Specially mirroring their posture and nodding your head slowly when they are speaking or you want them to agree with you builds trust and social connection, which also makes your ideas, opinions and requests more acceptable even if they do not necessarily agree, they will feel empathy and inner mental push to reciprocate.
Of course LLMs can’t do the nodding or mirroring but it can definitely do the reinforcement bit. Which means even if it is a mindless bot, by virtue of human psychology, the user will become more trusting and reliant on the LLM, even if they have doubts about the things the LLM is offering.
madeofpalk•6h ago
I'm sceptical of this claim. At least for me, when humans do this I find it shallow and inauthentic.
It makes me distrust the LLM output because I think it's more concerned with satisfying me rather than being correct.
blooalien•5h ago
100% agree, but it depends entirely on the individual human's views. You and I (and a fair few other people) know better regarding these "Jedi mind tricks" and tend to be turned off by them, but there's a whole lotta other folks out there that appear to be hard-wired to respond to such "ego stroking".
> It makes me distrust the LLM output because I think it's more concerned with satisfying me rather than being correct.
Again, I totally agree. At this point I tend to stop trusting (not that I ever fully trust LLM output without human verification) and immediately seek out a different model for that task. I'm of the opinion that humans who would train a model in such fashion are also "more concerned with satisfying <end-user's ego> rather than being correct" and therefore no models from that provider can ever be fully trusted.
cedws•6h ago
<praise>
<alternative view>
<question>
Laden with emojis and language to give it an unconvincing human mannerisms.
krackers•3h ago
Would you like to learn more about methods for optimizing user engagement?
[1] https://arxiv.org/abs/2303.06135
autumnstwilight•6h ago
PebblesRox•3h ago
https://x.com/eigenrobot/status/1846781283596488946?s=46