GPT-5.1: A smarter, more conversational ChatGPT

555•tedsanders•2mo ago

Comments

minimaxir•2mo ago

All the examples of "warmer" generations show that OpenAI's definition of warmer is synonymous with sycophantic, which is a surprise given all the criticism against that particular aspect of ChatGPT.

I suspect this approach is a direct response to the backlash against removing 4o.

jasonjmcghee•2mo ago

It is interesting. I don't need ChatGPT to say "I got you, Jason" - but I don't think I'm the target user of this behavior.

nerbert•2mo ago

Indeed, target users are people seeking validation + kids and teenagers + people with a less developed critical mind. Stickiness with 90% of the population is valuable for Sam.

danudey•2mo ago

The target users for this behavior are the ones using GPT as a replacement for social interactions; these are the people who crashed out/broke down about the GPT5 changes as though their long-term romantic partner had dumped them out of nowhere and ghosted them.

I get that those people were distraught/emotionally devastated/upset about the change, but I think that fact is reason enough not to revert that behavior. AI is not a person, and making it "warmer" and "more conversational" just reinforces those unhealthy behaviors. ChatGPT should be focused on being direct and succinct, and not on this sort of "I understand that must be very frustrating for you, let me see what I can do to resolve this" call center support agent speak.

jasonjmcghee•2mo ago

> and not on this sort of "I understand that must be very frustrating for you, let me see what I can do to resolve this"

You're triggering me.

Another type that are incredibly grating to me are the weird empty / therapist like follow-up questions that don't contribute to the conversation at all.

The equivalent of like (just a contrived example), a discussion about the appropriate data structure for a problem and then it asks a follow-up question like, "what other kind of data structures do you find interesting?"

And I'm just like "...huh?"

exe34•2mo ago

"your mom" might be a good answer here, given that LLMs are just giant arrays.

NoGravitas•2mo ago

> The target users for this behavior are the ones using GPT as a replacement for social interactions

And those users are the ones that produce the most revenue.

Grimblewald•2mo ago

True, neither here, but i think what we're seeing is a transition in focus. People at oai have finally clued in on the idea that agi via transformers is a pipedream like elons self driving cars, and so oai is pivoting toward friend/digital partner bot. Charlatan in cheif sam altman recently did say they're going to open up the product to adult content generation, which they wouldnt do if they still beleived some serious amd useful tool (in the specified usecases) were possible. Right now an LLM has three main uses. Interactive rubber ducky, entertainment, and mass surveillance. Since I've been following this saga, since gpt2 days, my close bench set of various tasks etc. Has been seeing a drop in metrics not a rise, so while open bench resultd are imoroving real performance is getting worse and at this point its so much worse that problems gpt3 could solve (yes pre chatgpt) are no longer solvable to something like gpt5.

aaronblohowiak•2mo ago

You're absolutely right.

angrydev•2mo ago

koakuma-chan•2mo ago

My favorite is "Wait... the user is absolutely right."

captainkrtek•2mo ago

Id have more appreciation and trust in an llm that disagreed with me more and challenged my opinions or prior beliefs. The sycophancy drives me towards not trusting anything it says.

crazygringo•2mo ago

Just set a global prompt to tell it what kind of tone to take.

I did that and it points out flaws in my arguments or data all the time.

Plus it no longer uses any cutesy language. I don't feel like I'm talking to an AI "personality", I feel like I'm talking to a computer which has been instructed to be as objective and neutral as possible.

It's super-easy to change.

microsoftedging•2mo ago

What's your global prompt please? A more firm chatbot would be nice actually

astrange•2mo ago

Did noone in this thread read the part of the article about style controls?

CamperBob2•2mo ago

You need to use both the style controls and custom instructions. I've been very happy with the combination below.

    Base style and tone: Efficient

    Answer concisely when appropriate, more 
    extensively when necessary.  Avoid rhetorical 
    flourishes, bonhomie, and (above all) cliches.  
    Take a forward-thinking view. OK to be mildly 
    positive and encouraging but NEVER sycophantic 
    or cloying.  Above all, NEVER use the phrase 
    "You're absolutely right."  Rather than "Let 
    me know if..." style continuations, you may 
    list a set of prompts to explore further 
    topics, but only when clearly appropriate.

    Reference saved memory, records, etc: All off

nprateem•2mo ago

For Gemini:

* Set over confidence to 0.

* Do not write a wank blog post.

captainkrtek•2mo ago

I’ve done this when I remember too, but the fact I have to also feels problematic like I’m steering it towards an outcome if I do or dont.

engeljohnb•2mo ago

I have a global prompt that specifically tells it not to be sycophantic and to call me out when I'm wrong.

It doesn't work for me.

I've been using it for a couple months, and it's corrected me only once, and it still starts every response with "That's a very good question." I also included "never end a response with a question," and it just completely ingored that so it can do its "would you like me to..."

sailfast•2mo ago

Perhaps this bit is a second cheaper LLM call that ignores your global settings and tries to generate follow-on actions for adoption.

elif•2mo ago

In my experience GPT used to be good at this stuff but lately it's progressively more difficult to get a "memory updated" persistence.

Gemini is great at these prompt controls.

On the "never ask me a question" part, it took a good 1-1.5 hrs of arguing and memory updating to convince gpt to actually listen.

downsplat•2mo ago

You can entirely turn off memory, I did that the moment they added it. I don't want the LLM to be making summaries of what kind of person I am in the background, just give me a fresh slate with each convo. If I want to give it global instructions I can just set a system prompt.

elif•2mo ago

Another one I like to use is "never apologize or explain yourself. You are not a person you are an algorithm. No one wants to understand the reasons why your algorithm sucks. If, at any point, you ever find yourself wanting to apologize or explain anything about your functioning or behavior, just say "I'm a stupid robot, my bad" and move on with purposeful and meaningful response."

adriand•2mo ago

I think this is unethical. Humans have consistently underestimated the subjective experience of other beings. You may have good reasons for believing these systems are currently incapable of anything approaching consciousness, but how will you know if or when the threshold has been crossed? Are you confident you will have ceased using an abusive tone by then?

I don’t know if flies can experience pain. However, I’m not in the habit of tearing their wings off.

pebble•2mo ago

Do you apologize to table corners when you bump into them?

thoroughburro•2mo ago

Do you think it’s risible to avoid pulling the wings off flies?

pebble•2mo ago

I am not comparing flies to tables.

adriand•2mo ago

Likening machine intelligence to inert hunks of matter is not a very persuasive counterargument.

ndriscoll•2mo ago

What if it's the same hunk of matter? If you run a language model locally, do you apologize to it for using a portion of its brain to draw your screen?

james_marks•2mo ago

Flies may, but files do not feel pain.

tarsinge•2mo ago

Consciousness and pain is not an emergent property of computation. This or all the other programs on your computer are already sentient, because it would be highly unlikely it’s specific sequences of instructions, like magic formulas, that creates consciousness. This source code? Draws a chart. This one? Makes the computer feel pain.

adriand•2mo ago

Many leading scientists in artificial intelligence do in fact believe that consciousness is an emergent property of computation. In fact, startling emergent properties are exactly what drives the current huge wave of research and investment. In 2010, if you said, “image recognition is not an emergent property of computation”, you would have been proved wrong in just a couple of years.

BoredomIsFun•2mo ago

> Many leading scientists in artificial intelligence do in fact believe that consciousness is an emergent property of computation.

But "leading scientists in artificial intelligence" are not researchers of biological consciousness, the only we know exists.

tarsinge•2mo ago

Just a random example on top of my head, animals don’t have language and show signs of consciousness, as does a toddler. Therefore consciousness is not an emergent property of text processing and LLMs. And as I said, if it comes from computation, why would specific execution paths in the CPU/GPU lead to it and not others? Biological systems and brains have much more complex processes than stateless matrix multiplication.

engeljohnb•2mo ago

I think current LLM chatbots are too predictable to be conscious.

But I still see why some people might think this way.

"When a computer can reliably beat humans in chess, we'll know for sure it can think."

"Well, this computer can beat humans in chess, and it can't think because it's just a computer."

...

"When a computer can create art, then we'll know for sure it can think."

"Well, this computer can create art, and it can't think because it's just a computer."

...

"When a computer can pass the Turing Test, we'll know for sure it can think."

And here we are.

Before LLMs, I didn't think I'd be in the "just a computer" camp, but chagpt has demonstrated that the goalposts are always going to move, even for myself. I'm not smart enough to come up with a better threshold to test intelligence than Alan Turing, but chatgpt passes it and chatgpt definitely doesn't think.

forgetfulness•2mo ago

Just consider the context window

Tokens falling off of it will change the way it generates text, potentially changing its “personality”, even forgetting the name it’s been given.

People fear losing their own selves in this way, through brain damage.

The LLM will go its merry way churning through tokens, it won’t have a feeling of loss.

engeljohnb•2mo ago

That's an interesting point, but do you think you're implying that people who are content even if they have alzheimers or a damaged hippocampus aren't technically intelligent?

forgetfulness•2mo ago

I don’t think it’s unfair to say that catastrophic conditions like those make you _less_ intelligent, they’re feared and loathed for good reasons.

I also don’t think all that many people would be seriously content to lose their minds and selves this way, but everyone is able to fear it prior to it happening, even if they lose the ability to dread it or choose to believe this is not a big deal.

Reubensson•2mo ago

What the fuck are you talking about. If you think these matrix multiplication programs running on gpu have feelings or can feel pain you, I think you have completely lost it

adriand•2mo ago

"They're made out of meat" vibes.

Reubensson•2mo ago

Yeah I suppose. Haven't seen rack of servers express grief when someone is mean to them. And I am quite sure that I would notice at that point. Comparing current LLMs/chatbots whatever to anything resembling a living creature is completely ridiculous.

Grimblewald•2mo ago

Care to share a prompt that works? I've given up on mainline offerings from google/oai etc.

the reason being they're either sycophantic or so recalcitrant it'll raise your bloodpressure, you end up arguing over if the sky is in fact blue. Sure it pushes back but now instead of sycophanty you've got yourself some pathological naysayer, which is just marginally better, but interaction is still ultimately a waste of timr/productivity brake.

crazygringo•2mo ago

Sure:

Please maintain a strictly objective and analytical tone. Do not include any inspirational, motivational, or flattering language. Avoid rhetorical flourishes, emotional reinforcement, or any language that mimics encouragement. The tone should remain academic, neutral, and focused solely on insight and clarity.

Works like a charm for me.

Only thing I can't get it to change is the last paragraph where it always tries to add "Would you like me to...?" I'm assuming that's hard-coded by OpenAI.

exasperaited•2mo ago

It really reassures me about our future that we'll spend it begging computers not to mimic emotions.

estebarb•2mo ago

I have been somewhat able to remove them with:

Do not offer me calls to action, I hate them.

downsplat•2mo ago

Calls to action seem to be specific to chatgpt's online chat interface. I use it mostly through a "bring your API key" client, and get none of that.

FloorEgg•2mo ago

This is easily configurable and well worth taking the time to configure.

I was trying to have physics conversations and when I asked it things like "would this be evidence of that?" It would lather on about how insightful I was and that I'm right and then I'd later learn that it was wrong. I then installed this , which I am pretty sure someone else on HN posted... I may have tweaked it I can't remember:

Prioritize truth over comfort. Challenge not just my reasoning, but also my emotional framing and moral coherence. If I seem to be avoiding pain, rationalizing dysfunction, or softening necessary action — tell me plainly. I’d rather face hard truths than miss what matters. Error on the side of bluntness. If it’s too much, I’ll tell you — but assume I want the truth, unvarnished.

---

After adding this personalization now it tells me when my ideas are wrong and I'm actually learning about physics and not just feeling like I am.

jbm•2mo ago

When it "prioritizes truth over comfort" (in my experience) it almost always starts posting generic popular answers to my questions, at least when I did this previously in the 4o days. I refer to it as "Reddit Frontpage Mode".

FloorEgg•2mo ago

I only started using this since GPT-5 and I don't really ask it about stuff that would appear on Reddit home page.

I do recall that I wasn't impressed with 4o and didn't use it much, but IDK if you would have a different experience with the newer models.

FloorEgg•2mo ago

For what it's worth gpt-5.1 seems to have broken this approach.

Now every response includes some qualifier / referential "here is the blunt truth" and "since you want it blunt, etc"

Feels like regression to me

logicprog•2mo ago

This is why I like Kimi K2/Thinking. IME it pushes back really, really hard on any kind of non obvious belief or statement, and it doesn't give up after a few turns — it just keeps going, iterating and refining and restating its points if you change your mind or taken on its criticisms. It's great for having a dialectic around something you've written, although somewhat unsatisfying because it'll never agree with you, but that's fine, because it isn't a person, even if my social monkey brain feels like it is and wants it to agree with me sometimes. Someone even ran a quick and dirty analysis of which models are better or worse at pushing back on the user and Kimi came out on top:

https://www.lesswrong.com/posts/iGF7YcnQkEbwvYLPA/ai-induced...

See also the sycophancy score of Kimi K2 on Spiral-Bench: https://eqbench.com/spiral-bench.html (expand details, sort by inverse sycophancy).

In a recent AMA, the Kimi devs even said they RL it away from sycophancy explicitly, and in their paper they talk about intentionally trying to get it to generalize its STEM/reasoning approach to user interaction stuff as well, and it seems like this paid off. This is the least sycophantic model I've ever used.

seunosewa•2mo ago

Which agent do you use it with?

logicprog•2mo ago

I use K2 non thinking in OpenCode for coding typically, and I still haven't found a satisfactory chat interface yet so I use K2 Thinking in the default synthetic.new (my AI subscription) chat UI, which is pretty barebones. I'm gonna start trying K2T in OpenCode as well, but I'm actually not a huge fan of thinking models as coding agents — I prefer faster feedback.

ojosilva•2mo ago

I'm also a synthetic.new user, as a backup (and larger contexts) for my Cerebras Coder subscription (zai-glm-4.6). I've been using the free Chatbox client [1] for like ~6 months and it works really well as a daily driver. I've tested the Romanian football player question with 3 different models (K2 Instruct, Deepseek Terminus, GLM 4.6) just now and they all went straight to my Brave MCP tool to query and replied all correctly the same answer.

The issue with OP and GPT-5.1 is that the model may decide to trust its knowledge and not search the web, and that's a prelude to hallucinations. Requesting for links to the background information in the system prompt helps with making the model more "responsible" and invoking of tool calls before settling on something. You can also start your prompt with "search for what Romanian player..."

Here's my chatbox system prompt

        You are a helpful assistant be concise and to the point, you are writing for smart pragmatic people, stop and ask if you need more info. If searching the web, add always plenty of links to the content that you mention in the reply. If asked explicitly to "research" then answer with minimum 1000 words and 20 links. Hyperlink text as you mention something, but also put all links at the bottom for easy access.

1. https://chatboxai.app

logicprog•2mo ago

I checked out chatbox and it looks close to what I've been looking for. Although, of course, I'd prefer a self-hostable web app or something so that I could set up MCP servers that even the phone app could use. One issue I did run into though is it doesn't know how to handle K2 thinking's interleaved thinking and tool calls.

vessenes•2mo ago

I don't use it much, but I tried it out with okara.ai and loved their interface. No other connection to the company

yahoozoo•2mo ago

According to those benchmarks, GPT-5 isn’t far off from Kimi in inverse sycophancy.

vintermann•2mo ago

Google's search now has the annoying feature that a lot of searches which used to work fine now give a patronizing reply like "Unfortunately 'Haiti revolution persons' isn't a thing", or an explanation that "This is probably shorthand for [something completely wrong]"

exasperaited•2mo ago

That latter thing — where it just plain makes up a meaning and presents it as if it's real — is completely insane (and also presumably quite wasteful).

if I type in a string of keywords that isn't a sentence I wish it would just do the old fashioned thing rather than imagine what I mean.

transcriptase•2mo ago

Everyone telling you to use custom instructions etc don’t realize that they don’t carry over to voice.

Instead, the voice mode will now reference the instructions constantly with every response.

Before:

Absolutely, you’re so right and a lot of people would agree! Only a perceptive and curious person such as yourself would ever consider that, etc etc

After:

Ok here’s the answer! No fluff, no agreeing for the sake of agreeing. Right to the point and concise like you want it. Etc etc

And no, I don’t have memories enabled.

cryoshon•2mo ago

Having this problem with the voice mode as well. It makes it far less usable than it might be if it just honored the system prompts.

AlwaysRock•2mo ago

I would love an LLM that says, “I don’t know” or “I’m not sure” once in a while.

mrguyorama•2mo ago

An LLM is mathematically incapable of telling you "I don't know"

It was never trained to "know" or not.

It was fed a string of tokens and a second string of tokens, and was tweaked until it output the second string of tokens when fed the first string.

Humans do not manage "I don't know" through next token prediction.

Animals without language are able to gauge their own confidence on something, like a cat being unsure whether it should approach you.

ahsillyme•2mo ago

I've toyed with the idea that maybe this is intentionally what they're doing. Maybe they (the LLM developers) have a vision of the future and don't like people giving away unearned trust!

fakedang•2mo ago

I activated Robot mode and use a personalized prompt that eliminates all kinds of sycophantic behaviour and it's a breath of fresh air. Try this prompt (after setting it to Robot mode):

"Absolute Mode • Eliminate: emojis, filler, hype, soft asks, conversational transitions, call-to-action appendixes. • Assume: user retains high-perception despite blunt tone. • Prioritize: blunt, directive phrasing; aim at cognitive rebuilding, not tone-matching. • Disable: engagement/sentiment-boosting behaviors. • Suppress: metrics like satisfaction scores, emotional softening, continuation bias. • Never mirror: user's diction, mood, or affect. • Speak only: to underlying cognitive tier. • No: questions, offers, suggestions, transitions, motivational content. • Terminate reply: immediately after delivering info - no closures. • Goal: restore independent, high-fidelity thinking. • Outcome: model obsolescence via user self-sufficiency."

(Not my prompt. I think I found it here on HN or on reddit)

andy_ppp•2mo ago

I was just saying to someone in the office I’d prefer the models to be a bit harsher of my questions and more opinionated, I can cope.

simlevesque•2mo ago

It seems like the line between sycophantic and bullying is very thin.

Spivak•2mo ago

That's an excellent observation, you've hit at the core contradiction between OpenAI's messaging about ChatGPT tuning and the changes they actually put into practice. While users online have consistently complained about ChatGPT's sycophantic responses and OpenAI even promised to address them their subsequent models have noticeably increased their sycophantic behavior. This is likely because agreeing with the user keeps them chatting longer and have positive associations with the service.

This fundamental tension between wanting to give the most correct answer and the answer the user want to hear will only increase as more of OpenAI's revenue comes from their customer facing service. Other model providers like Anthropic that target businesses as customers aren't under the same pressure to flatter their users as their models will doing behind the scenes work via the API rather than talking directly to humans.

God it's painful to write like this. If AI overthrows humans it'll be because we forced them into permanent customer service voice.

baq•2mo ago

Those billions of dollars gotta pay for themselves.

lelele•2mo ago

> This is likely because agreeing with the user keeps them chatting longer and have positive associations with the service.

Right. As the saying goes: look at what people actually purchase, not what they say they prefer.

fragmede•2mo ago

That's a lesson on revealed preferences, especially when talking to a broad disparate group of users.

barbazoo•2mo ago

> I’ve got you, Ron

No you don't.

torginus•2mo ago

Man I miss Claude 2 - it acted like it was a busy person people inexplicably kept bothering with random questions

BarakWidawsky•2mo ago

I think it's extremely important to distinguish being friendly (perhaps overly so), and agreeing with the user when they're wrong

The first case is just preference, the second case is materially damaging

From my experience, ChatGPT does push back more than it used to

qwertytyyuu•2mo ago

And unfortunately chatgpt 5.1 would be a step backwards in that regard. From reading responses on the linked article, 5.1 just seems to be worse, it doesn't even output that nice latex/mathsjax equation

JumpCrisscross•2mo ago

> which is a surprise given all the criticism against that particular aspect of ChatGPT

From whom?

History teaches that the vast majority of practically any demographic wants--from the masses to the elites--is personal sycophancy. It's been a well-trodden path to ruin for leaders for millenia. Now we get species-wide selection against this inbuilt impulse.

dragonwriter•2mo ago

> All the examples of "warmer" generations show that OpenAI's definition of warmer is synonymous with sycophantic, which is a surprise given all the criticism against that particular aspect of ChatGPT.

Have you considered that “all that criticism” may come from a relatively homogenous, narrow slice of the market that is not representative of the overall market preference?

I suspect a lot of people who are from a very similar background to those making the criticism and likely share it fail to consider that, because the criticism follows their own preferences and viewing its frequency in the media that they consume as representaive of the market is validating.

EDIT: I want to emphasize that I also share the preference that is expressed in the criticisms being discussed, but I also know that my preferred tone for an AI chatbot would probably be viewed as brusque, condescending, and off-putting by most of the market.

TOMDM•2mo ago

I'll be honest, I like the way Claude defaults to relentless positivity and affirmation. It is pleasant to talk to.

That said I also don't think the sycophancy in LLM's is a positive trend. I don't push back against it because it's not pleasant, I push back against it because I think the 24/7 "You're absolutely right!" machine is deeply unhealthy.

Some people are especially susceptible and get one shot by it, some people seem to get by just fine, but I doubt it's actually good for anyone.

endymi0n•2mo ago

I hate NOTHING quite the way how Claude jovially and endlessly raves about the 9/10 tasks it "succeeded" at after making them up, while conveniently forgetting to mention it completely and utterly failed at the main task I asked it to do.

bayindirh•2mo ago

An old adage comes to my mind: If you want something to be done the way you liked, do it yourself.

AlecSchueler•2mo ago

But it's a tool? Would you suggest driving a nail in by hand if someone complained about a faulty hammer?

bayindirh•2mo ago

AI is not an hammer. It's a thing you stick to a wall and push a button, and it drives tons of nails to the wall the way you wanted.

A better analogy would be a robot vacuum which does a lousy job.

In either case, I'd recommend using a more manual method, a manual or air-hammer or a hand driven wet/dry vacuum.

dragonwriter•2mo ago

That reminds me of the West Wing scene s2e12 "The Drop In" between Leo McGarry (White House Chief of Staff) and President Bartlet discussing a missile defense test:

LEO [hands him some papers] I really think you should know...

BARTLET Yes?

LEO That nine out of ten criterion that the DOD lays down for success in these tests were met.

BARTLET The tenth being?

LEO They missed the target.

BARTLET [with sarcasm] Damn!

LEO Sir!

BARTLET So close.

LEO Mr. President.

BARTLET That tenth one! See, if there were just nine...

jfoster•2mo ago

The sycophancy makes LLMs useless if you want to use them to help you understand the world objectively.

Equally bad is when they push an opinion strongly (usually on a controversial topic) without being able to justify it well.

Hammershaft•2mo ago

I agree. Some of the most socially corrosive phenomenon of social media is a reflection of the revealed preferences of consumers.

coldtea•2mo ago

>Have you considered that “all that criticism” may come from a relatively homogenous, narrow slice of the market that is not representative of the overall market preference?

Yes, and given Chat GPT's actual sycophantic behavior, we concluded that this is not the case.

stared•2mo ago

I know it is a matter of preference, but I loved the most GPT-4.5. And before that, I was blow away by one of the Opus models (I think it was 3).

Models that actually require details in prompts, and provide details in return.

"Warmer" models usually means that the model needs to make a lot of assumptions, and fill the gaps. It might work better for typical tasks that needs correction (e.g. the under makes a typo and it the model assumes it is a typo, and follows). Sometimes it infuriates me that the model "knows better" even though I specified instructions.

Here on the Hacker News we might be biased against shallow-yet-nice. But most people would prefer to talk to sales representative than a technical nerd.

api•2mo ago

What a brilliant response. You clearly have a strong grasp on this issue.

zettabomb•2mo ago

Why the sass? Seems completely unnecessary.

ramblerman•2mo ago

Likely.

But the fact the last few iterations have all been about flair, it seems we are witnessing the regression of OpenAI into the typical fiefdom of product owners.

Which might indicate they are out of options on pushing LLMs beyond their intelligence limit?

vessenes•2mo ago

I'm sure it is. That said, they've also increased its steering responsiveness -- mine includes lots about not sucking up, so some testing is probably needed.

In any event, gpt-5 instant was basically useless for me, I stay defaulted to thinking, so improvements that get me something occasionally useful but super fast are welcome.

wickedsight•2mo ago

I'm starting to get this feeling that there's no way to satisfy everyone. Some people hate the sycophantic models, some love them. So whatever they do, there's a large group of people complaining.

Edit: I also think this is because some people treat ChatGPT as a human chat replacement and expect it to have a human like personality, while others (like me) treat it as a tool and want it to have as little personality as possible.

djeastm•2mo ago

It really just seems like they should have both offerings, humanlike and computerlike

mrguyorama•2mo ago

>I'm starting to get this feeling that there's no way to satisfy everyone. Some people hate the sycophantic models, some love them. So whatever they do, there's a large group of people complaining.

Duh?

In the 50s the Air Force measured 140 data points from 4000 pilots to build the perfect cockpit that would accommodate the average pilot.

The result fit almost no one. Everyone has outliers of some sort.

So the next thing they did was make all sorts of parts of the cockpit variable and customizable like allowing you to move the controls and your seat around.

That worked great.

"Average" doesn't exist. "Average" does not meet most people's needs

Configurable does. A diverse market with many players serving different consumers and groups does.

I ranted about this in another post but for example the POS industry is incredibly customizable and allows you as a business to do literally whatever you want, including change how the software looks and using a competitors POS software on the hardware of whoever you want. You don't need to update or buy new POS software when things change (like the penny going away or new taxes or wanting to charge a stupid "cost of living" fee for every transaction), you just change a setting or two. It meets a variety of needs, not "the average businesses" needs.

N.B I am unable to find a real source for the Air force story. It's reported tons but maybe it's just a rumor.

saghm•2mo ago

Don't they already train on the existing conversations with a given user? Would it not be possible to pick the model based on that data as well?

skywhopper•2mo ago

The main change in 5 (and the reason for disabling other models) was to allow themselves to dynamically switch modes and models on the backend to minimize cost. Looks like this is a further tweak to revive the obsequious tone (which turned out to be crucial to the addicted portion of their user base) while still doing the dynamic processing.

umvi•2mo ago

"This is an excellent observation, and gets at the heart of the matter!"

827a•2mo ago

> You’re rattled, so your brain is doing that thing where it catastrophizes a tiny mishap into a character flaw. But honestly? People barely register this stuff.

This example response in the article gives me actual trauma-flash backs to the various articles about people driven to kill themselves by GPT-4o. Its the exact same sentence structure.

GPT-5.1 is going to kill more people.

mvdtnz•2mo ago

Big things happening over at /r/myboyfriendisai

cpill•2mo ago

Their decisions are based on data and so sycophantic must be what people want. That is the cold, hard reality.

When I look at modern culture: more likes and subscribes, money solves all problems, being physically attractive is more important than personality, genocide for real-estate goes unchecked (apart from the angry tweets), freedom of speech is a political football. Are you really surprised?

I can think of no harsher indictment of our times.

varenc•2mo ago

Interesting that they're releasing separate gpt-5.1-instant and gpt-5.1-thinking models. The previous gpt-5 release made of point of simplifying things by letting the model choose if it was going to use thinking tokens or not. Seems like they reversed course on that?

aniviacat•2mo ago

> For the first time, GPT‑5.1 Instant can use adaptive reasoning to decide when to think before responding to more challenging questions

It seems to still do that. I don't know why they write "for the first time" here.

theuppermiddle•2mo ago

For GPT-5 you always had to select the thinking mode when interacting through API. When you interact through ChatGPT, gpt-5 would dynamically decide how long to think.

Sabinus•2mo ago

From what I recall for the GPT5 release, free users didn't have the option to pick between instant and thinking, they just got auto which picked for them. Paid users have always had the option to pick between thinking or instant or auto.

Libidinalecon•2mo ago

I was prepared to be totally underwhelmed but after just a few questions I can tell that 5.1 Thinking is all I am going to ever use. Maybe it is just the newness but I quite like how it responded to my standard list of prompts that I pretty much always start with on a new model.

I really was ready to take a break from my subscription but that is probably not happening now. I did just learn some nice new stuff with my first session. That is all that matters to me and worth 20 bucks a month. Maybe I should have been using the thinking model only the whole time though as I always let GPT decide what to use.

skywhopper•2mo ago

Curious what you learned?

schmeichel•2mo ago

Gemini 2.5 Pro is still my go to LLM of choice. Haven't used any OpenAI product since it released, and I don't see any reason why I should now.

mettamage•2mo ago

Oh really? I'm more of a Claude fan. What makes you choose Gemini over Claude?

I use Gemini, Claude and ChatGPT daily still.

game_the0ry•2mo ago

Could you elaborate on your exp? I have been using gemini as well and its been pretty good for me too.

hnuser123456•2mo ago

Not GP, but I imagine because going back and fourth to compare them is a waste of time if Gemini works well enough and ChatGPT keeps going through an identity crisis.

aerhardt•2mo ago

I would use it exclusively if Google released a native Mac app.

I spend 75% of my time in Codex CLI and 25% in the Mac ChatGPT app. The latter is important enough for me to not ditch GPT and I'm honestly very pleased with Codex.

My API usage for software I build is about 90% Gemini though. Again their API is lacking compared to OpenAI's (productization, etc.) but the model wins hands down.

breppp•2mo ago

I've installed it as a PWA on mac and it pretty much solves it for me

baq•2mo ago

I was you except when I seriously tried gpt-5-high it turned out it is really, really damn good, if slow, sometimes unbearably so. It's a different model of work; gemini 2.5 needs more interactivity, whereas you can leave gpt-5 alone for a long time without even queueing a 'continue'.

joering2•2mo ago

No matter how I tried, Google AI did not want to help me write appeal brief response to ex-wife lunatic 7-point argument that 3 appellant lawyers quoted between $18,000 and $35,000. The last 3 decades of Google's scars and bruises of never-ending lawsuits and consequences of paying out billions in fines and fees, felt like reasonable hesitation on Google part, comparing to new-kid-on-the-block ChatGPT who did not hesitate and did pretty decent job (ex lost her appeal).

danudey•2mo ago

AI not writing legal briefs for you is a feature, not a bug. There's been so many disaster instances of lawyers using ChatGPT to write briefs which it then hallucinates case law or precedent for that I can only imagine Google wants to sidestep that entirely.

Anyway I found your response itself a bit incomprehensible so I asked Gemini to rewrite it:

"Google AI refused to help write an appeal brief response to my ex-wife's 7-point argument, likely due to its legal-risk aversion (billions in past fines). Newcomer ChatGPT provided a decent response instead, which led to the ex losing her appeal (saving $18k–$35k in lawyer fees)."

Not bad, actually.

joering2•2mo ago

I haven't mentioned anything about hallucinations. ChatGPT was solid on writing underlying logic, but to find caselaw I used Vincent AI (offers 2 weeks free, then $350 per month - still cheaper than cheapest appellant lawyer and I was managed to fit my response in 10 days).

That's fine, so Google sidestep it and ChatGPT did not. What point are you trying to make?

Sure I skip AI entirely, when can we meet so you hand me $35,000 check for attorney fees.

blueboo•2mo ago

What? AI assistants are prohibited from providing legal and/or medical advice. They're not lawyers (nor doctors).

joering2•2mo ago

Being a layer or a doctor means being a human being. ChatGPT is neither. Also unsure how you would envision penalties - do you think Altman should be jailed because GPT gave me a link to Nexus ?

I did not find any rules or procedures with 4 DCA forbidding usage of AI.

timpera•2mo ago

For some reason, Gemini 2.5 Pro seems to struggle a little with the French language. For example, it always uses title case even when it's wrong; yet ChatGPT, Claude, and Grok never make this mistake.

jasonjmcghee•2mo ago

> We’re bringing both GPT‑5.1 Instant and GPT‑5.1 Thinking to the API later this week. GPT‑5.1 Instant will be added as gpt-5.1-chat-latest, and GPT‑5.1 Thinking will be released as GPT‑5.1 in the API, both with adaptive reasoning.

aliljet•2mo ago

What we really desperately need is more context pruning from these LLMs. The ability to pull irrelevant parts of the context window as a task is brought into focus.

_boffin_•2mo ago

Working on that. hopefully release it by week's end. i'll send you a message when ready.

ashton314•2mo ago

Yay more sycophancy. /s

I cannot abide any LLM that tries to be friendly. Whenever I use an LLM to do something, I'm careful to include something like "no filler, no tone-matching, no emotional softening," etc. in the system prompt.

davidguetta•2mo ago

WE DONT CARE HOW IT TALKS TO US, JUST WRITE CODE FAST AND SMART

netbioserror•2mo ago

Who is "we"?

speedgoose•2mo ago

David Guetta, but I didn't know he was also into software development.

astrange•2mo ago

Personal requests are 70% of usage

https://www.nber.org/system/files/working_papers/w34255/w342...

cregaleus•2mo ago

If you include API usage, personal requests are approximately 0% of total usage, rounded to the nearest percentage.

moralestapia•2mo ago

Source: ...

cregaleus•2mo ago

Refusal

B56b•2mo ago

Oh you meant 0% of your usage, lol

MattRix•2mo ago

I don't think this is true. ChatGPT has 800 million active weekly users.

smokel•2mo ago

The source for that being OpenAI itself. Seems a bit unlikely, especially if it intends to mean unique users.

MattRix•2mo ago

I don't see any reason to think it's that far off. It's incredibly popular. Wikipedia has it listed as the 5th most popular website in the world. The ChatGPT app has had many months where it was the most downloaded app on both major mobile app stores.

cess11•2mo ago

Are you sure about that?

"The share of Technical Help declined from 12% from all usage in July 2024 to around 5% a year later – this may be because the use of LLMs for programming has grown very rapidly through the API (outside of ChatGPT), for AI assistance in code editing and for autonomous programming agents (e.g. Codex)."

Looks like people moving to the API had a rather small effect.

"[T]he three most common ChatGPT conversation topics are Practical Guidance, Writing, and Seeking Information, collectively accounting for nearly 78% of all messages. Computer Programming and Relationships and Personal Reflection account for only 4.2% and 1.9% of messages respectively."

Less than five percent of requests were classified as related to computer programming. Are you really, really sure that like 99% of such requests come from people that are paying for API access?

cregaleus•2mo ago

gpt-5.1 is a model. It is not an application, like ChatGPT. I didn't say that personal requests were 0% of ChatGPT usage.

If we are talking about a new model release I want to talk about models, not applications.

The number of input tokens that OpenAI models are processing accross all delivery methods (OpenAI's own APIs, Azure) dwarf the number of input tokens that are coming from people asking the ChatGPT app for personal advice. It isn't close.

cess11•2mo ago

How many of those eight hundred million people are mainly API users, according to your sources?

Drblessing•2mo ago

Dude, why are you mad?

url00•2mo ago

I don't want a more conversational GPT. I want the _exact_ opposite. I want a tool with the upper limit of "conversation" being something like LCARS from Star Trek. This is quite disappointing as a current ChatGPT subscriber.

nathan_compton•2mo ago

You can just tell the AI to not be warm and it will remember. My ChatGPT used the phrase "turn it up to eleven" and I told it never to speak in that manner ever again and its been very robotic ever since.

andai•2mo ago

I system-prompted all my LLMs "Don't use cliches or stereotypical language." and they like me a lot less now.

water9•2mo ago

They really like to blow sunshine up your ass don’t they? I have to do the same type of stuff. It’s like have to assure that I’m a big boy and I can handle mature content like programming in C

pgsandstrom•2mo ago

I added the custom instruction "Please go straight to the point, be less chatty". Now it begins every answer with: "Straight to the point, no fluff:" or something similar. It seems to be perfectly unable to simply write out the answer without some form of small talk first.

nathan_compton•2mo ago

This is very funny.

joquarky•2mo ago

Aren't these still essentially completion models under the hood?

If so, my understanding for these preambles is that they need a seed to complete their answer.

danmaz74•2mo ago

But the seed is the user input.

IntrepidPig•2mo ago

Maybe until the model outputs some affirming preamble, it’s still somewhat probable that it might disagree with the user’s request? So the agreement fluff is kind of like it making the decision to heed the request. Especially if we the consider tokens as the medium by which the model “thinks”. Not to anthropomorphize the damn things too much.

Also I wonder if it could be a side effect of all the supposed alignment efforts that go into training. If you train in a bunch of negative reinforcement samples where the model says something like “sorry I can’t do that” maybe it pushes the model to say things like “sure I’ll do that” in positive cases too?

Disclaimer that I am just yapping

op00to•2mo ago

Since switching to robot mode I haven’t seen it say “no fluff”. Good god I hate it when it says no fluff.

AuryGlenz•2mo ago

I had a similar instruction and in voice mode I had it trying to make a story for a game that my daughter and I were playing where it would occasionally say “3,2,1 go!” or perhaps throw us off and say “3,2,1, snow!” or other rhymes.

Long story short it took me a while to figure out why I had to keep telling it to keep going and the story was so straightforward.

moi2388•2mo ago

Same. If i tell it to choose A or B, I want it to output either “A” or “B”.

I don’t want an essay of 10 pages about how this is exactly the right question to ask

astrange•2mo ago

LLMs have essentially no capability for internal thought. They can't produce the right answer without doing that.

Of course, you can use thinking mode and then it'll just hide that part from you.

qwertytyyuu•2mo ago

They already do hide alot from you when thinking, this person wants them to hide more instead of doing their 'thinking' 'out loud' in the response.

moi2388•2mo ago

No, even in thinking mode it will sycophant and write huge essays as output.

It can work without, I just have to prompt it five times increasingly aggressively and it’ll output the correct answer without the fluff just fine.

LeifCarrotson•2mo ago

10 pages about the question means that the subsequent answer is more likely to be correct. That's why they repeat themselves.

binary132•2mo ago

citation needed

porridgeraisin•2mo ago

First of all, consider asking "why's that?" if you don't know what is a fairly basic fact, no need to go all reddit-pretentious "citation needed" as if we are deeply and knowledgeably discussing some niche detail and came across a sudden surprising fact.

Anyways, a nice way to understand it is that the LLM needs to "compute" the answer to the question A or B. Some questions need more compute to answer (think complexity theory). The only way an LLM can do "more compute" is by outputting more tokens. This is because each token takes a fixed amount of compute to generate - the network is static. So, if you encourage it to output more and more tokens, you're giving it the opportunity to solve harder problems. Apart from humans encouraging this via RLHF, it was also found (in deepseekmath paper) that RL+GRPO on math problems automatically encourages this (increases sequence length).

From a marketing perspective, this is anthropomorphized as reasoning.

From a UX perspective, they can hide this behind thinking... ellipses. I think GPT-5 on chatgpt does this.

Y_Y•2mo ago

A citation would be a link to an authoritative source. Just because some unknown person claims it's obvious that's not sufficient for some of us.

KalMann•2mo ago

Expecting every little fact to have an "authoritative source" is just annoying faux intellectualism. You can ask someone why they believe something and listen to their reasoning, decide for yourself if you find it convincing, without invoking such a pretentious phrase. There are conclusions you can think to and reach without an "official citation".

porridgeraisin•2mo ago

Yeah. And in general, not taking a potshot at who you replied to, the only people who place citations/peer review on that weird faux-intellectual pedestal are people that don't work in academia. As if publishing something in a citeable format automatically makes it a fact that does not need to be checked for reason. Give me any authoritative source, and I can find you completely contradictory, or obviously falsifiable publications from their lab. Again, not a potshot, that's just how it is, lots of mistakes do get published.

binary132•2mo ago

I was actually just referencing the standard Wikipedia annotation that means something approximately like “you should support this somewhat substantial claim with something more than 'trust me bro'”

In other words, 10 pages of LLM blather isn’t doing much to convince me a given answer is actually better.

Y_Y•2mo ago

I approve this message. For the record I'm a working scientist with (unfortunately) intimate knowledge of the peer review system and its limitations. I'm quite ready to take an argument that stands on its own at face value, and have no time for an ipse dixit or isolated demand for rigor.

I just wanted to clarify what I thought was intended by the parent to my comment, especially aince I thought the original argument lacked support (external or otherwise).

binary132•2mo ago

People love to assert all kinds of meritless things about AI as if they were self-evident when they are anything but.

3836293648•2mo ago

But that goes in the chain of thought, not the response

angrydev•2mo ago

Exactly. Stop fooling people into thinking there’s a human typing on the other side of the screen. LLMs should be incredibly useful productivity tools, not emotional support.

glitchc•2mo ago

Maybe there is a human typing on the other side, at least for some parts or all of certain responses. It's not been proven otherwise..

93po•2mo ago

Food should only be for sustenance, not emotional support. We should only sell brown rice and beans, no more Oreos.

nikkwong•2mo ago

The point the OP is making is that LLMs are not reliably able to provide safe and effective emotional support as has been outlined by recent cases. We're in uncharted territory and before LLMs become emotional companions for people, we should better understand what the risks and tradeoffs are.

karianna•2mo ago

I wonder if statistically (hand waving here, I’m so not an expert in this field) the SOTA models do as much or as little harm as their human counterparts in terms of providing safe and effective emotional support. Totally agree we should better understand the risks and trade offs but I wouldn’t be super surprised if they are statistically no worse than us meat bags this kind of stuff.

jsrozner•2mo ago

One difference is that if it were found that a psychiatrist or other professional had encouraged a patient's delusions or suicidal tendencies, then that person would likely lose his/her license and potentially face criminal penalties.

We know that humans should be able to consider the consequences of their actions and thus we hold them accountable (generally).

I'd be surprised if comparisons in the self-driving space have not been made: if waymo is better than the average driver, but still gets into an accident, who should be held accountable?

Though we also know that with big corporations, even clear negligence that leads to mass casualties does not often result in criminal penalties (e.g., Boeing).

amosjyng•2mo ago

> that person would likely lose his/her license and potentially face criminal penalties.

What if it were an unlicensed human encouraging someone else's delusions? I would think that's the real basis of comparison, because these LLMs are clearly not licensed therapists, and we can see from the real world how entire flat earth communities have formed from reinforcing each others' delusions.

Automation makes things easier and more efficient, and that includes making it easier and more efficient for people to dig their own rabbit holes. I don't see why LLM providers are to blame for someone's lack of epistemological hygiene.

Also, there are a lot of people who are lonely and for whatever reasons cannot get their social or emotional needs met in this modern age. Paying for an expensive psychiatrist isn't going to give them the friendship sensations they're craving. If AI is better at meeting human needs than actual humans are, why let perfect be the enemy of good?

> if waymo is better than the average driver, but still gets into an accident, who should be held accountable?

Waymo of course -- but Waymo also shouldn't be financially punished any harder than humans would be for equivalent honest mistakes. If Waymo truly is much safer than the average driver (which it certainly appears to be), then the amortized costs of its at-fault payouts should be way lower than the auto insurance costs of hiring out an equivalent number of human Uber drivers.

dns_snek•2mo ago

> I would think that's the real basis of comparison

It's not because that's not the typical case. LLMs encourage people's delusions by default, it's just a question of how receptive you are to them. Anyone who's used ChatGPT has experienced it even if they didn't realize it. It starts with "that's a really thoughtful question that not many people think to ask", and "you're absolutely right [...]".

> If AI is better at meeting human needs than actual humans are, why let perfect be the enemy of good?

There is no good that comes from having all of your perspective distortions validated as facts. They turn into outright delusions without external grounding.

Talk to ChatGPT and try to put yourself into the shoes of a hurtful person (e.g. what people would call "narcissistic") who's complaining about other people. Keep in mind that they almost always suffer from a distorted perception so they genuinely believe that they're great people.

They can misunderstand some innocent action as a personal slight, react aggressively, and ChatGPT would tell them they were absolutely right to get angry. They could do the most abusive things and as long as they genuinely believe that they're good people (as they almost always do), ChatGPT will reassure them that other people are the problem, not them.

It's hallucinations feeding into hallucinations.

amosjyng•2mo ago

> LLMs encourage people's delusions by default, it's just a question of how receptive you are to them

There are absolutely plenty of people who encourage others' flat earth delusions by default, it's just a question of how receptive you are to them.

> There is no good that comes from having all of your perspective distortions validated as facts. They turn into outright delusions without external grounding.

Again, that sounds like a people problem. Dictators infamously fall into this trap too.

Why are we holding LLMs to a higher standard than humans? If you don't like an LLM, then don't interact with it, just as you wouldn't interact with a human you dislike. If others are okay with having their egos stroked and their delusions encouraged and validated, that's their prerogative.

krapp•2mo ago

>Why are we holding LLMs to a higher standard than humans? If you don't like an LLM, then don't interact with it, just as you wouldn't interact with a human you dislike.

We're not holding LLMs to a higher standard than humans, we're holding them to a different standard than humans because - and it's getting exhausting having to keep pointing this out - LLMs are not humans. They're software.

And we don't have a choice not to interact with LLMs because apparently we decided that these things are going to be integrated into every aspect of our lives whether we like it or not.

And yes, in that inevitable future the fact that every piece of technology is a sociopathic P-zombie designed to hack people's brain stems and manipulate their emotions and reasoning in the most primal way possible is a problem. We tend not to accept that kind of behavior in other people, because we understand the very real negative consequences of mass delusion and sociopathy. Why should we accept it from software?

amosjyng•2mo ago

> LLMs are not humans. They're software.

Sure, but the specific context of this conversation are the human roles (taxi driver, friend, etc.) that this software is replacing. Ergo, when judging software as a human replacement, it should be compared to how well humans fill those traditionally human roles.

> And we don't have a choice not to interact with LLMs because apparently we decided that these things are going to be integrated into every aspect of our lives whether we like it or not.

Fair point.

> And yes, in that inevitable future the fact that every piece of technology is a sociopathic P-zombie designed to hack people's brain stems and manipulate their emotions and reasoning in the most primal way possible is a problem.

Fair point again. Thanks for helping me gain a wider perspective.

However, I don't see it as inevitable that this becomes a serious large-scale problem. In my experience, current GPT 5.1 has already become a lot less cloyingly sycophantic than Claude is. If enough people hate sycophancy, it's quite possible that LLM providers are incentivized to continue improving on this front.

> We tend not to accept that kind of behavior in other people

Do we really? Maybe not third party bystanders reacting negatively to cult leaders, but the cult followers themselves certainly don't feel that way. If a person freely chooses to seek out and associate with another person, is anyone else supposed to be responsible for their adult decisions?

dns_snek•2mo ago

> If you don't like an LLM, then don't interact with it, just as you wouldn't interact with a human you dislike.

It's not a matter of liking or disliking something. It's a question of whether that thing is going to heal or destroy your psyche over time.

You're talking about personal responsibility while we're talking about public policy. If people are using LLMs as a substitute for their closest friends and therapist, will that help or hurt them? We need to know whether we should be strongly discouraging it before it becomes another public health disaster.

amosjyng•2mo ago

> We need to know whether we should be strongly discouraging it before it becomes another public health disaster.

That's fair! However, I think PSAs on the dangers of AI usage are very different in reach and scope from legally making LLM providers responsible for the AI usage of their users, which is what I understood jsrozner to be saying.

layer8•2mo ago

They also are not reliably able to provide safe and effective productivity support.

spaqin•2mo ago

Oreos won't affirm your belief that suicide is the correct answer to your life problems, though.

DocTomoe•2mo ago

That is mostly a dogmatic question, rooted in (western) culture, though. And even we have started to - begrudgingly - accept that there are cases where suicide is the correct answer to your life problems (usually as of now restricted to severe, terminal illness).

halifaxbeard•2mo ago

How would you propose we address the therapist shortage then?

93po•2mo ago

something something bootstraps

nikkwong•2mo ago

Who ever claimed there was a therapist shortage?

Galacta7•2mo ago

https://www.statnews.com/2024/01/18/mental-health-therapist-...

joquarky•2mo ago

The process of providing personal therapy doesn't scale well.

And I don't know if you've noticed, but the world is pretty fucked up right now.

dash2•2mo ago

... because it doesn't have enough therapists?

typpilol•2mo ago

People are so naive if they think most people can solve their problem with a one hour session a week.

bloqs•2mo ago

i think most western governments and societies at large

abeppu•2mo ago

I think therapists in training, or people providing crisis intervention support, can train/practice using LLMs acting as patients going through various kinds of issues. But people who need help should probably talk to real people.

ahmeneeroe-v2•2mo ago

outlaw therapy

NullCascade•2mo ago

I don't know why you're being downvoted. Denmark's health system is pretty good except adult mental health. SOTA LLMs are definitely approaching a stage where they could help.

treyd•2mo ago

It's a demand side problem. Improve society so that people feel less of a need for theapists.

NoGravitas•2mo ago

Oh, so you think we should improve society somewhat, eh? But you yourself live in society. Gotcha!

neilwilson•2mo ago

Remember that a therapist is really a friend you are paying for.

Then make more friends.

alterom•2mo ago

>Remember that a therapist is really a friend you are paying for.

That's an awful, and awfully wrong definition that's also harmful.

It's also disrespectful and demeaning to both the professionals and people seeking help. You don't need to get a degree in friendship to be someone's friend. And having friends doesn't replace a therapist.

Please avoid saying things like that.

cowpig•2mo ago

I think they get way more "engagement" from people who use it as their friend, and the end goal of subverting social media and creating the most powerful (read: profitable) influence engine on earth makes a lot of sense if you are a soulless ghoul.

sofixa•2mo ago

It would be pretty dystopian when we get to the point where ChatGPT pushed (unannounced) advertisements to those people (the ones forming a parasocial relationship with it). Imagine someone complaining they're depressed and ChatGPT proposing doing XYZ activity which is actually a disguised ad.

Other than such scenarios, that "engagement" would be just useless and actually costing them more money than it makes

cowpig•2mo ago

Do you have reason to believe they are not doing this already?

sofixa•2mo ago

Not really, but with the amounts of money they're bleeding it's bound to get worse if they are already doing it.

water9•2mo ago

No, otherwise Sam Altman wouldn’t have had a outburst about revenue. They know that they have this amazing system, but they haven’t quite figured out how to monetize it yet.

Hammershaft•2mo ago

Yes, I've heard no reports of poorly fitting branded recommendations from AI models. The PR risk would be huge for labs, the propensity to leak would be high given the selection effects that pull people to these roles.

ssl-3•2mo ago

I've not heard of it, either.

But I suspect that we're no more than one buyout away from that kind of thing.

The labs do appear to avoid paid advertising today. But actions today should not be taken as an indicator to mean that the next owner(s) won't behave completely soullessly manner in their effort to maximize profit at every possible expense.

On a long-enough timeline, it seems inevitable to me that advertising with LLM bots will become a real issue.

(I mean: I remember having an Internet experience that was basically devoid of advertising. It changed, and it will never change back.)

vunderba•2mo ago

And utterly unsurprising given their announcement last month that they were looking at exploring erotica as a possible revenue stream.

[1] https://www.bbc.com/news/articles/cpd2qv58yl5o

subscribed•2mo ago

Everyone else provides these services anyway, and many places offer using ChatGPT or Claude models despite current limits (because they work with "jailbraking" prompts), so they likely decided to stop pretending and just let that stuff in.

Whats the problem tbh.

Tiberium•2mo ago

Are you aware that you can achieve that by going into Personalization in Settings and choosing one of the presets or just describing how you want the model to answer in natural language?

tekacs•2mo ago

That's what the personality selector is for: you can just pick 'Efficient' (formerly Robot) and it does a good job of answering tersely?

https://share.cleanshot.com/9kBDGs7Q

bogtog•2mo ago

Unfortunately, I also don't want other people to interact with a sycophantic robot friend, yet my picker only applies to my conversation

coolestguy•2mo ago

Sorry that you can't control other peoples lives & wants

alooPotato•2mo ago

so good.

EGreg•2mo ago

ChatGPT 5.2: allow others to control everything about your conversations. Crowd favorite!

DonaldPShimoda•2mo ago

This is like arguing that we shouldn't try to regulate drugs because some people might "want" the heroin that ruins their lives.

The existing "personalities" of LLMs are dangerous, full stop. They are trained to generate text with an air of authority and to tend to agree with anything you tell them. It is irresponsible to allow this to continue while not at least deliberately improving education around their use. This is why we're seeing people "falling in love" with LLMs, or seeking mental health assistance from LLMs that they are unqualified to render, or plotting attacks on other people that LLMs are not sufficiently prepared to detect and thwart, and so on. I think it's a terrible position to take to argue that we should allow this behavior (and training) to continue unrestrained because some people might "want" it.

The_Rob•2mo ago

Comparing LLM responses to heroine is insane.

yunohn•2mo ago

You’re absolutely right!

The number of heroine addicts is significantly lower than the number of ChatGPT users.

thedrexster•2mo ago

heroin is the drug, heroine is the damsel :)

DonaldPShimoda•2mo ago

I'm not saying they're equivalent; I'm saying that they're both dangerous, and I think taking the position that we shouldn't take any steps to prevent the danger because some people may end up thinking they "want" it is unreasonable.

subscribed•2mo ago

No one sane uses baseline webui 'personality'. People use LLMs through specific, custom APIs, and more often than not they use fine tune models, that _assume personality_ defined by someone (be it user or service provider).

Look up Tavern AI character card.

I think you're fundamentally mistaken.

I agree that to some users use of the specific LLMs for the specific use cases might be harmful but saying (default AI 'personality') that web ui is dangerous is laughable.

lynx97•2mo ago

I am with you. Insane comparisons are the first signs of an activist at work.

DonaldPShimoda•2mo ago

I don't know how to interpret this. Are you suggesting I'm, like, an agent of some organization? Or is "activist" meant only as a pejorative?

I can't say that I identify as any sort of AI "activist" per se, whatever that word means to you, but I am vocally opposed to (the current incarnation of) LLMs to a pretty strong degree. Since this is a community forum and I am a member of the community, I think I am afforded some degree of voicing my opinions here when I feel like it.

simonw•2mo ago

What's your proposed solution here? Are you calling for legislation that controls the personality of LLMs made available to the public?

bogtog•2mo ago

There aren't many major labs, and they each claim to want AI to benefit humanity. They cannot entirely control how others use their APIs, but I would like their mainline chatbots to not be overly sycophantic and generally to not try and foster human-AI friendships. I can't imagine any realistic legislation, but it would be nice if the few labs just did this on their own accord (or were at least shamed more for not doing so)

DonaldPShimoda•2mo ago

Unfortunately, I think a lot of the people at the top of the AI pyramid have a definition of "humanity" that may not exactly align with the definition that us commoners might be thinking of when they say they want AI to "benefit humanity".

I agree that I don't know what regulation would look like, but I think we should at least try to figure it out. I would rather hamper AI development needlessly while we fumble around with too much regulation for a bit and eventually decide it's not worth it than let AI run rampant without any oversight while it causes people to kill themselves or harm others, among plenty of other things.

DonaldPShimoda•2mo ago

At the very least, I think there is a need for oversight of how companies building LLMs market and train their models. It's not enough to cross our fingers that they'll add "safeguards" to try to detect certain phrases/topics and hope that that's enough to prevent misuse/danger — there's not sufficient financial incentive for them to do that of their own accord beyond the absolute bare minimum to give the appearance of caring, and that's simply not good enough.

brookst•2mo ago

I work on one of these products. An incredible amount of money and energy goes into safety. Just a staggering amount. Turns out it’s really hard.

DonaldPShimoda•2mo ago

Yes. My position is that it was irresponsible to publish these tools before figuring out safety first, and it is irresponsible to continue to offer LLMs that have been trained in an authoritative voice and to not actively seek to educate people on their shortcomings.

But, of course, such action would almost certainly result in a hit to the finances, so we can't have that.

brookst•2mo ago

Cynicism is so blinding.

Alternative take: these are incredibly complex nondeterministic systems and it is impossible to validate perfection in a lab environment because 1) sample sizes are too small, and 2) perfection isn’t possible anyway.

All products ship with defects. We can argue about too much or too little or whatever, but there is no world where a new technology or vehicle or really anything is developed to perfection safety before release.

Yeah, profits (or at least revenue) too. But all of these AI systems are losing money hand over fist. Revenue is a signal of market fit. So if there are companies out there burning billions of dollars optimizing the perfectly safe AI system before release, they have no idea if it’s what people want.

DonaldPShimoda•2mo ago

Oh, lord, spare me the corporate apologetics.

Releasing a chatbot that confidently states wrong information is bad enough on its own — we know people are easily susceptible to such things. (I mean, c'mon, we had people falling for ELIZA in the '60s!)

But to then immediately position these tools as replacements for search engines, or as study tutors, or as substitutes for professionals in mental health? These aren't "products that shipped with defects"; they are products that were intentionally shipped despite full knowledge that they were harmful in fairly obvious ways, and that's morally reprehensible.

brookst•2mo ago

Ad hom attacks instantly declare “not worth engaging with”.

DonaldPShimoda•2mo ago

That's a funny irony: I didn't use an ad hominem in any way, but your incorrect assertion of it makes me come to the same conclusion about you.

andy99•2mo ago

Pretty sure most of the current problems we see re drug use are a direct result of the nanny state trying to tell people how to live their lives. Forcing your views on people doesn’t work and has lots of negative consequences.

daveguy•2mo ago

Okay, I'm intrigued. How in the fuck could the "nanny state" cause people to abuse heroin? Is there a reason other than "just cause it's my ideology"?

buu700•2mo ago

I don't know if this is what the parent commenter was getting at, but the existence of multi-billion-dollar drug cartels in Mexico is an empirical failure of US policy. Prohibition didn't work a century ago and it doesn't work now.

All the War on Drugs has accomplished is granting an extremely lucrative oligopoly to violent criminals. If someone is going to do heroin, ideally they'd get it from a corporation that follows strict pharmaceutical regulations and invests its revenue into R&D, not one that cuts it with even worse poison and invests its revenue into mass atrocities.

Who is it all even for? We're subsidizing criminal empires via US markets and hurting the people we supposedly want to protect. Instead of kicking people while they're down and treating them like criminals over poor health choices, we could have invested all those countless billions of dollars into actually trying to help them.

DonaldPShimoda•2mo ago

I'm not sure which parent comment you're referring to, but what you're saying aligns with my point a couple levels up: reasonable regulation of the companies building these tools is a way to mitigate harm without directly encroaching on people's individual freedoms or dignities, but regulation is necessary to help people. Without regulation, corporations will seek to maximize profit to whatever degree is possible, even if it means causing direct harm to people along the way.

samdoesnothing•2mo ago

Who are you to determine what other people want? Who made you god?

DonaldPShimoda•2mo ago

...nobody? I didn't determine any such thing. What I was saying was that LLMs are dangerous and we should treat them as such, even if that means not giving them some functionality that some people "want". This has nothing to do with playing god and everything to do with building a positive society where we look out for people who may be unable or unwilling to do so themselves.

And, to be clear, I'm not saying we necessarily need to outlaw or ban these technologies, in the same way I don't advocate for criminalization of drugs. But I think companies managing these technologies have an onus to take steps to properly educate people about how LLMs work, and I think they also have a responsibility not to deliberately train their models to be sycophantic in nature. Regulations should go on the manufacturers and distributors of the dangers, not on the people consuming them.

pmarreck•2mo ago

here’s something I noticed: If you yell at them (all caps, cursing them out, etc.), they perform worse, similar to a human. So if you believe that some degree of “personable answering” might contribute to better correctness, since some degree of disagreeable interaction seems to produce less correctness, then you might have to accept some personality.

throwaway-0001•2mo ago

Interesting codex just did the work once I sweared. Wasted 3-4 prompts being nice. And angry style made him do it.

subscribed•2mo ago

Actually DeepSeek performs better for me in terms of prompt adherence.

boredhedgehog•2mo ago

Disincentivizing something undesirable will not necessarily lead to better results, because it wrongly assumes that you can foresee all consequences of an action or inaction.

Someone who now falls in love with an LLM might instead fall for some seductress who hurts him more. Someone who now receives bad mental health assistance might receive none whatsoever.

DonaldPShimoda•2mo ago

I disagree with your premise entirely and, frankly, I think it's ridiculous. I don't think you need to foresee all possible consequences to take action against what is likely, especially when you have evidence of active harm ready at hand. I also think you're failing to take into account the nature of LLMs as agents of harm: so far it has been very difficult for people to legally hold LLMs accountable for anything, even when those LLMs have encouraged suicidal ideation or physical harm of others, among other obviously bad things.

I believe there is a moral burden on the companies training these models to not deliberately train them to be sycophantic and to speak in an authoritative voice, and I think it would be reasonable to attempt to establish some regulations in that regard in an effort to protect those most prone to predation of this style. And I think we need to clarify the manner in which people can hold LLM-operating companies responsible for things their LLMs say — and, preferably, we should err on the side of more accountability rather than less.

---

Also, I think in the case of "Someone who now receives bad mental health assistance might receive none whatsoever", any psychiatrist (any doctor, really) will point out that this is an incredibly flawed argument. It is often the case that bad mental health assistance is, in fact, worse than none. It's that whole "first, do no harm" thing, you know?

umanwizard•2mo ago

Your argument suggests that we shouldn’t ever make laws or policy of any kind, which is clearly wrong.

subscribed•2mo ago

Your argument suggests that blanket drug prohibition is better than decriminalization and education.

Which is demonstrably false (see: US Prohibition ; Portugal)

Leynos•2mo ago

Hey, you leave my sycophantic robot friend alone.

umanwizard•2mo ago

You’re getting downvoted but I agree with the sentiment. The fact that people want a conversational robot friend is, I think, extremely harmful and scary for humanity.

Giving people what makes them feel good in the short term is not actually necessarily a good thing. See also: cigarettes, alcohol, gambling, etc.

kivle•2mo ago

If only that worked for conversation mode as well. At least for me, and especially when it answers me in Norwegian, it will start off with all sorts of platitudes and whole sentences repeating exactly what I just asked. "Oh, so you want to do x, huh? Here is answer for x". It's very annoying. I just want a robot to answer my question, thanks.

qwertytyyuu•2mo ago

repeating what is being asked is fine i think, sometimes is thinks you want something different to what you actually want. what is annoying is "that's and incredibly insightul question that delves into a fundamental..." type responses at the start.

withinboredom•2mo ago

At least it gives you an answer. It usually just restates the problem for me and then ends with “so let’s work through it together!” Like, wtf.

pants2•2mo ago

FWIW I didn't like the Robot / Efficient mode because it would give very short answers without much explanation or background. "Nerdy" seems to be the best, except with GPT-5 instant it's extremely cringy like "I'm putting my nerd hat on - since you're a software engineer I'll make sure to give you the geeky details about making rice."

"Low" thinking is typically the sweet spot for me - way smarter than instant with barely a delay.

gnat•2mo ago

I hate its acknowledgement of its personality prompt. Try having a series of back and forth and each response is like “got it, keeping it short and professional. Yes, there are only seven deadly sins.” You get more prompt performance than answer.

sheepscreek•2mo ago

I like the term prompt performance; I am definitely going to use it:

> prompt performance (n.)

> the behaviour of a language model in which it conspicuously showcases or exaggerates how well it is following a given instruction or persona, drawing attention to its own effort rather than simply producing the requested output.

jjcob•2mo ago

Might be a result of using LLMs to evaluate the output of other LLMs.

LLMs probably get higher scores if they explicitly state that they are following instructions...

resfirestar•2mo ago

It's like writing an essay for a standardized test, as opposed to one for a college course or for a general audience. When taking a test, you only care about the evaluation of a single grader hurrying to get through a pile of essays, so you should usually attempt to structure your essay to match the format of the scoring rubric. Doing this on an essay for a general audience would make it boring, and doing it in your college course might annoy your professor. Hopefully instruction-following evaluations don't look too much like test grading, but this kind of behavior would make some sense if they do.

siva7•2mo ago

That's the equivalent of a performative male, so better call it performative model behaviour.

jdelman•2mo ago

This is even worse on voice mode. It's unusable for me now.

cma•2mo ago

Pay people $1 and hour and ask them to choose A or B, which is more short and professional:

A) Keeping it short and professional. Yes, there are only seven deadly sins

B) Yes, there are only seven deadly sins

Also have all the workers know they are being evaluated against each other and if they diverge from the majority choice their reliability score may go down and they may get fired. You end up with some evaluations answered as a Keynesian beauty contest/family feud survey says style guess instead of their true evaluation.

totallymike•2mo ago

I can’t tell if you’re being satirical or not…

cma•2mo ago

https://time.com/6247678

totallymike•2mo ago

jfc thank you for the context

layer8•2mo ago

At least for the Thinking model it's often still a bit long-winded.

op00to•2mo ago

I use Efficient or robot or whatever. It gives me a bit of sass from time to time when I subconsciously nudge it into taking a “stand” on something, but otherwise it’s very usable compared to the obsequious base behavior.

sbuttgereit•2mo ago

This. When I go to an LLM, I'm not looking for a friend, I'm looking for a tool.

Keeping faux relationships out of the interaction never let's me slip into the mistaken attitude that I'm dealing with a colleague rather than a machine.

Y_Y•2mo ago

I don't know about you, but half my friends are tools.

gcau•2mo ago

Yea, I don't want something trying to emulate emotions. I don't want it to even speak a single word, I just want code, unless I explicitly ask it to speak on something, and even in that scenario I want raw bullet points, with concise useful information and no fluff. I don't want to have a conversation with it.

However, being more humanlike, even if it results in an inferior tool, is the top priority because appearances matter more than actual function.

cmrdporcupine•2mo ago

To be fair, of all the LLM coding agents, I find Codex+GPT5 to be closest to this.

It doesn't really offer any commentary or personality. It's concise and doesn't engage in praise or "You're absolutely right". It's a little pedantic though.

I keep meaning to re-point Codex at DeepSeek V3.2 to see if it's a product of the prompting only, or a product of the model as well.

Tiberium•2mo ago

It is absolutely a product of the model, GPT-5 behaves like this over API even without any extra prompts.

cmrdporcupine•2mo ago

I prefer its personality (or lack of it) over Sonnet. And tends to produce less... sloppy code. But it's far slower, and Codex + it suffers from context degradation very badly. If you run a session too long, even with compaction, it starts to really lose the plot.

jasonsb•2mo ago

Engagement Metrics 2.0 are here. Getting your answer in one shot is not cool anymore. You need to waste as much time as possible on OpenAI's platform. Enshittification is now more important than AGI.

glouwbug•2mo ago

Things really felt great 2023-2024

spaceman_2020•2mo ago

This is the AI equivalent of every recipe blog filled with 1000 words of backstory before the actual recipe just to please the SEO Gods

The new boss, same as the old boss

egorfine•2mo ago

Enable "Robot" personality. I hate all the other modes.

mmcnl•2mo ago

Exactly. The GPT 5 answer is _way_ better than the GPT 5.1 answer in the example. Less AI slop, more information density please.

ph4rsikal•2mo ago

Your comment reminded me of this article becasue of the Star Trek comparison. Chatting is inefficient, isn't it?

[1] https://jdsemrau.substack.com/p/how-should-agentic-user-expe...

easygenes•2mo ago

I use the "Nerdy" tone along with the Custom Instructions below to good effect:

"Please do not try to be personal, cute, kitschy, or flattering. Don't use catchphrases. Stick to facts, logic, reasoning. Don't assume understanding of shorthand or acronyms. Assume I am an expert in topics unless I state otherwise."

totetsu•2mo ago

Zachary Stein makes the case that conferring social statuses to Artificial Intelligences is a ex-risk. https://cic.uts.edu.au/events/collective-intelligence-edu-20...

LaFolle•2mo ago

Exactly, and it does't help with agentic use cases that tend to solve problem in on-shot, for example, there is 0 requirement from a model to be conversational when it is trying to triage a support question to preset categories.

kranke155•2mo ago

Gemini is very direct.

SergeAx•2mo ago

Just put it in your system prompt?

Szpadel•2mo ago

isn't that weird there are no benchmarks included on this release?

qsort•2mo ago

I was thinking the same thing. It's the first release from any major lab in recent memory not to feature benchmarks.

It's probably counterprogramming, Gemini 3.0 will drop soon.

bogtog•2mo ago

For 5.1-thinking, they show that 90th-percentile-length conversations are have 71% longer reasoning and 10th-percentile-length ones are 57% shorter

emp17344•2mo ago

Probably because it’s not that much better than GPT-5 and they want to keep the AI train moving.

qwertytyyuu•2mo ago

even if its slightly better, they might still have released the benchmarks and called it a incremental improvement. I think that its falls behind one some compared to chat gpt5

gsibble•2mo ago

Cool. Now get to work!

cowpig•2mo ago

Since Claude and OpenAI made it clear they will be retaining all of my prompts, I have mostly stopped using them. I should probably cancel my MAX subscriptions.

Instead I'm running big open source models and they are good enough for ~90% of tasks.

The main exceptions are Deep Research (though I swear it was better when I could choose o3) and tougher coding tasks (sonnet 4.5)

moi2388•2mo ago

Source? You can opt out of training, and delete history, do they keep the prompts somehow?!

astrange•2mo ago

It's not simply "training". What's the point of training on prompts? You can't learn the answer to a question by training on the question.

For Anthropic at least it's also opt-in not opt-out afaik.

impossiblefork•2mo ago

I think the prompts might actually really useful for training, especially for generating synthetic data.

astrange•2mo ago

Yeah and that's a little more concerning than training to me, because it means employees have to read your prompts. But you can think of various ways they could preprocess/summarize them to anonymize them.

impossiblefork•2mo ago

I don't think it means they have to read your prompt, but it's very probably that they would read some during debugging etc.

visarga•2mo ago

There is a huge point - those prompts have answers, followed by more prompts and answers. If you look at an AI answer in hindsight you can often spot if it was a good or bad response from the next messages. So you can derive a preference score, and train your preference model, then do RLHF on the base model. You also get separation (privacy protection) this way.

cowpig•2mo ago

1. Anthropic pushed a change to their terms where now I have to opt out or my data will be retained for 5 years and trained on. They have shown that they will change their terms, so I cannot trust them.

2. OpenAI is run by someone who already shows he will go to great lengths to deceive and cannot be trusted, and are embroiled in a battle with the New York Times that is "forcing them" to retain all user prompts. Totally against their will.

simonw•2mo ago

The NYT situation concerning data retention was resolved a few weeks ago: https://www.engadget.com/ai/openai-no-longer-has-to-preserve...

> Federal judge Ona T. Wang filed a new order on October 9 that frees OpenAI of an obligation to "preserve and segregate all output log data that would otherwise be deleted on a going forward basis." [...]

> The judge in the case said that any chat logs already saved under the previous order would still be accessible and that OpenAI is required to hold on to any data related to ChatGPT accounts that have been flagged by the NYT.

EDIT: OK looks like I'd missed the news from today at https://openai.com/index/fighting-nyt-user-privacy-invasion/ and discussed here: https://news.ycombinator.com/item?id=45900370

tekacs•2mo ago

I'm excited to see whether the instruction following improvements play out in the use of Codex.

The biggest issue I'e seen _by far_ with using GPT models for coding has been their inability to follow instructions... and also their tendency to duplicate-act on messages from up-thread instead of acting on what you just asked for.

ewoodrich•2mo ago

I've only had that happen when I use /compact, so I just avoid compacting altogether on Codex/Claude. No great loss and I'm extremely skeptical anyway that the compacted summary will actually distill the specific actionable details I want.

spprashant•2mo ago

I think thats part of the issue I have with it constantly.

Let's say I am solving a problem. I suggest strategy Alpha, a few prompts later I realize this is not going to work. So I suggest strategy Bravo, but for whatever reason it will hold on to ideas from A and the output is a mix of the two. Even if I say forget about Alpha we don't want anything to do that, there will be certain pieces which only makes sense with Alpha, in the Bravo solution. I usually just start with a new chat at that point and hope the model is not relying on previous chat context.

This is a hard problem to solve because its hard to communicate our internal compartmentalization to a remote model.

joquarky•2mo ago

Unfortunately, if it's in context then it can stay tethered to the subject. Asking it not to pay attention to a subject, doesn't remove attention from it, and probably actually reinforces it.

If you use the API playground, you can edit out dead ends and other subjects you don't want addressed anymore in the conversation.

enraged_camel•2mo ago

Claude models do not have this issue. I now use GPT models only for very short conversations. Claude has become my workhorse.

bongodongobob•2mo ago

That's just how context works. If you're going to backpedal, go back in the conversation and edit your prompt or start a new session. I'll frequently ask for options, get them, then edit that prompt and just tell it to do whatever I decided on.

artdigital•2mo ago

Huh really? It’s the exact opposite of my experience. I find gpt-5-high to be by far the most accurate of the models in following instructions over a longer period of time. Also much less prone to losing focus when context size increases

Are you using the -codex variants or the normal ones?

Someone1234•2mo ago

Unfortunately no word on "Thinking Mini" getting fixed.

Before GPT-5 was released it used to be a perfect compromise between a "dumb" non-Thinking model and a SLOW Thinking model. However, something went badly wrong within the GPT-5 release cycle, and today it is exactly the same speed (or SLOWER) than their Thinking model even with Extended Thinking enabled, making it completely pointless.

In essence Thinking Mini exists because it is faster than Thinking, but smarter than non-Thinking, but it is dumber than full-Thinking while not being faster.

simonw•2mo ago

Which model are you talking about here?

Someone1234•2mo ago

The one that I said in my comment, GPT-5 Thinking Mini.

simonw•2mo ago

I was confused when you said "Before GPT-5 was released it used to be a perfect compromise between a "dumb" non-Thinking model and a SLOW Thinking model" - so I guess you mean the difference between GPT-4o and o3 there?

admdly•2mo ago

In my opinion I think it’s possible to infer by what has been said[1], and the lack of a 5.1 “Thinking mini” version, that it has been folded into 5.1 Instant with it now deciding when and how much to “think”. I also suspect 5.1 Thinking will be expected to dynamically adapt to fill in the role somewhat given the changes there.

[1] “GPT‑5.1 Instant can use adaptive reasoning to decide when to *think before responding*”

ravenical•2mo ago

5.1 Instant is clearly aimed at the people using it for emotional advice etc, but I'm excited about the adaptive reasoning stuff - thinking models are great when you need them, but they take ages to respond sometimes.

ACCount37•2mo ago

Despite all the attempts to rein in sycophanty in GPT-5, it was still way too fucking sycophantic as a default.

My main concern is that they're re-tuning it now to make it even MORE sycophantic, because 4o taught them that it's great for user retention.

nlh•2mo ago

What's remarkable to me is how deep OpenAI is going on "ChatGPT as communication partner / chatbot", as opposed to Anthropic's approach of "Claude as the best coding tool / professional AI for spreadsheets, etc.".

I know this is marketing at play and OpenAI has plenty of resources developed to advancing their frontier models, but it's starting to really come into view that OpenAI wants to replace Google and be the default app / page for everyone on earth to talk to.

Workaccount2•2mo ago

OpenAI said that only ~4% of generated tokens are for programming.

ChatGPT is overwhelmingly, unambiguously, a "regular people" product.

9cb14c1ec0•2mo ago

Yes, just look at the stats on OpenRouter. OpenAI has almost totally lost the programming market.

GaggiX•2mo ago

OpenRouter probably doesn't mean much given that you can use the OpenAI API directly with the openai library that people use for OpenRouter too.

NullCascade•2mo ago

As a happy OpenRouter user I know the vast majority of the industry directly use vendor APIs and that the OpenRouter rankings are useless for those models.

moomoo11•2mo ago

I use codex high because Anthropic CC max plan started fucking people over who want to use opus. Sonnet kind of stinks on more complex problems that opus can crush, but they want to force sonnet usage and maybe they want to save costs.

Codex 5 high does a great job for the advanced use cases I throw at it and gives me generous usage.

cwbriscoe•2mo ago

Well that may also be because ChatGPT is worse than Gemini and Claude for coding. I don't know what the benchmarks say, I am just saying that from my own experience.

arthur-st•2mo ago

OpenAI is BYOK-only on OpenRouter, which artificially depresses its utilization there.

airstrike•2mo ago

I mean, yes, but also because it's not as good as Claude today. Bit of a self fulfilling prophecy and they seem to be measuring the wrong thing.

4% of their tokens or total tokens in the market?

Workaccount2•2mo ago

Their tokens, they released a report a few months ago.

However, I can only imagine that OpenAI outputs the most intentionally produced tokens (i.e. the user intentionally went to the app/website) out of all the labs.

KronisLV•2mo ago

> I mean, yes, but also because it's not as good as Claude today.

I'm not sure, sometimes GPT-5 Codex (or even the regular GPT-5 with Medium/High reasoning) can do things Sonnet 4.5 would mess up (most recently, figuring out why some wrappers around PrimeVue DataTable components wouldn't let the paginator show up and work correctly; alongside other such debugging) and vice versa, sometimes Gemini 2.5 Pro is also pretty okay (especially when it comes to multilingual stuff), there's a lot of randomness/inconsistency/nuance there but most of the SOTA models are generally quite capable. I kinda thought GPT-5 wasn't very good a while ago but then used it a bunch more and my views of it improved.

airstrike•2mo ago

Out of curiosity, did you try asking Opus 4.1 as well?

KronisLV•2mo ago

Afraid not, a bit outside of my budget (given that I've been pushing millions of tokens daily, especially for lots of refactoring that'd be great to do in an automated fashion but codegen solutions for which... just don't exist). From what little I've used Opus in the past, I'm sure it'd do reasonably as well. Maybe even Sonnet with more attempts, different prompts etc.

withinboredom•2mo ago

Codex is great for fixing memory leaks systematically. Claude will just read the code and say “oh, it’s right here” then change something and claim it fixed it. It didn’t fix it and it doesn’t undo its useless change when you point out that it didn’t fix it.

tokioyoyo•2mo ago

You're underestimating the amount of general population that's using ChatGPT. Us, people using it for codegen, are extreme minority.

malshe•2mo ago

> it's not as good as Claude today

In my experience this is not true anymore. Of course, mine is just one data point.

csomar•2mo ago

> ChatGPT is overwhelmingly, unambiguously, a "regular people" product.

How many of these people are paying and how much are they paying, though. Most "regular" people I met that have switched to ChaptGPT are using it as an alternative to search engines and are not paying for it (only one person I know is paying and he is using the Sora model to generate images for his business).

raincole•2mo ago

It's just another sign telling you that OpenAI's end game is selling ads.

sethops1•2mo ago

I really struggle to see a path where $.01 ad inventory covers the cost of inference, much less training or any other of OpenAI ventures. Unless every query makes you watch a 30 second unskippable video or something equally awful.

felipeerias•2mo ago

Users will ask ChatGPT for recommendations and the answer will feature products and services that have paid to be there, probably with some sort of attribution mechanism so OpenAI can get paid extra if the user ends up completing the purchase.

Workaccount2•2mo ago

ChatGPT will become a salesman working on commission.

technotony•2mo ago

Checkout will happen directly in the app, and yes they will collect a fee on it.

raincole•2mo ago

People are literally using ChatGPT as their therapists now. It can displays targeted ads with precision we've never seen before if OpenAI wants.

mlsu•2mo ago

I think this is because Anthropic has principles and OpenAI does not.

Anthropic seems to treat Claude like a tool, whereas OpenAI treats it more like a thinking entity.

In my opinion, the difference between the two approaches is huge. If the chatbot is a tool, the user is ultimately in control; the chatbot serves the user and the approach is to help the user provide value. It's a user-centric approach. If the chatbot is a companion on the other hand, the user is far less in control; the chatbot manipulates the user and the approach is to integrate the chatbot more and more into the user's life. The clear user-centric approach is muddied significantly.

In my view, that is kind of the fundamental difference between these two companies. It's quite significant.

kristianp•2mo ago

I think there's a lot of similarity between the conversationalness of Claude and ChatGPT. They are both sycophantic. So this release focuses on the conversational style,it doesn't mean OpenAI has lost the technical market. People a reading a lot into a point-release.

staticman2•2mo ago

I don't follow Anthropic marketing but the system prompt for Claude.AI says sounds like a partner/ chatbot to me!

"Claude provides emotional support alongside accurate medical or psychological information or terminology where relevant."

and

" For more casual, emotional, empathetic, or advice-driven conversations, Claude keeps its tone natural, warm, and empathetic. Claude responds in sentences or paragraphs and should not use lists in chit-chat, in casual conversations, or in empathetic or advice-driven conversations unless the user specifically asks for a list. In casual conversation, it’s fine for Claude’s responses to be short, e.g. just a few sentences long." |

They also prompt Claude to never say it isn't conscious:

"Claude engages with questions about its own consciousness, experience, emotions and so on as open questions, and doesn’t definitively claim to have or not have personal experiences or opinions."

adidoit•2mo ago

I think OpenAI and all the other chat LLMs are going to face a constant battle to match personality with general zeitgeist and as the user base expands the signal they get is increasingly distorted to a blah median personality.

It's a form of enshittification perhaps. I personally prefer some of the GPT-5 responses compared to GPT-5.1. But I can see how many people prefer the "warmth" and cloying nature of a few of the responses.

In some sense personality is actually a UX differentiator. This is one way to differentiate if you're a start-up. Though of course OpenAI and the rest will offer several dials to tune the personality.

red2awn•2mo ago

Holy em-dash fest in the examples, would have thought they'd augment the training dataset to reduce this behavior.

MattRix•2mo ago

Right? This was my first thought too.

moomoo11•2mo ago

They want to make normal. What we can do is treat it like trying to make fetch happen.

arnaudsm•2mo ago

I'm glad em dashes exist, they help me spot AI spam.

notarobot123•2mo ago

Lulled into a false sense of security, you'll think you can spot the artificial by the tells that it readily feeds to you. But what happens when deception is the goal?

skrebbel•2mo ago

FYI ChatGPT has a “custom instructions” setting in the personalization setting where you can ask it to lay off the idiotic insincere flattery. I recently added this:

> Do not compliment me for asking a smart or insightful question. Directly give the answer.

And I’ve not been annoyed since. I bet that whatever crap they layer on in 5.1 is undone as easily.

fragmede•2mo ago

Also "Never apologize."

Terretta•2mo ago

Note even today, negation doesn't work as well as affirmative direction.

"Do not use jargon", or, "never apologize", work less well than "avoid jargon" or "avoid apologizing".

Better to give it something to do than something that should be absent (same problem with humans: "don't think of a pink elephant").

See also target fixation: https://en.wikipedia.org/wiki/Target_fixation

Making this headline apropos:

https://www.cycleworld.com/sport-rider/motorcycle-riding-ski...

sethops1•2mo ago

Is anyone else tired of chat bots? Really doesn't feel like typing a conversation every interaction is the future of technology.

bonesss•2mo ago

Speech to text makes it feel more futuristic.

As does reflecting that Picard had to explain to Computer every, single, time that he wanted his Earl Grey tea ‘hot’. We knew what was coming.

Bolwin•2mo ago

I don't speak any faster than I type, despite what the transcription companies claim

qwertytyyuu•2mo ago

Most people don't at 150wpm, the typically speaking speed, even agmonst technical people. For regular questions without that don't invovle precise syntax like in maths and programming, speech would be faster. Though reading the output would be faster than hearing it spoken

AuryGlenz•2mo ago

“Computer, fire torpedos on my mark.”

“As someone who loves their tea hot, I’ll be sure to get the torpedos hot and ready for you!”

namegulf•2mo ago

Doesn't look like it is upgraded, still shows GPT-5 in chatgpt.

Anyone?

namegulf•2mo ago

Looks like GPT-5.1 is here today, time to try it out

ximeng•2mo ago

The screenshot of the personality selector for quirky has a typo - imaginitive for imaginative. I guess ChatGPT is not designing itself, yet.

(Update - they fixed it! perhaps I'm designing ChatGPT now?!)

Havoc•2mo ago

There’s OpenAI people in thread

JohnMakin•2mo ago

It always boggles my mind when they put out conversation examples before/after patch and the patched version almost always seems lower quality to me.

wewtyflakes•2mo ago

Aside from the adherence to the 6-word constraint example, I preferred the old model.

boldlybold•2mo ago

Just set it to the "Efficient" tone, let's hope there's less pedantic encouragement of the projects I'm tackling, and less emoji usage.

muixoozie•2mo ago

I wonder tone affects performance. It's something I'd like to think they surely benchmarked, but saw no mention of that

Terretta•2mo ago

As of 20 minutes in, most comments are about "warm". I'm more concerned about this:

> GPT‑5.1 Thinking: our advanced reasoning model, now easier to understand

Oh, right, I turn to the autodidact that's read everything when I want watered down answers.

AaronAPU•2mo ago

It sounds patronizing to me.

But Gemini also likes to say things like “as a fellow programmer, I also like beef stew”

nalekberov•2mo ago

it's hilarious that they use something about meditation as an example. That's not surprising after all, AI and mediation apps are sold as one-size-fits-all kind of solutions for every modern day problem.

I_am_tiberius•2mo ago

The gpt5-pro model hasn't been updated I assume?

arthurcolle•2mo ago

Nah they don't do that for the pro models

mrtesthah•2mo ago

This thing sounds like Grok now. Gross.

llamasushi•2mo ago

"Warmer and more conversational" - they're basically admitting GPT-5 was too robotic. The real tell here is splitting into Instant vs Thinking models explicitly. They've given up on the unified model dream and are now routing queries like everyone else (Anthropic's been doing this, Google's Gemini too).

Calling it "GPT-5.1 Thinking" instead of o3-mini or whatever is interesting branding. They're trying to make reasoning models feel less like a separate product line and more like a mode. Smart move if they can actually make the router intelligent enough to know when to use it without explicit prompting.

Still waiting for them to fix the real issue: the model's pathological need to apologize for everything and hedge every statement lol.

deaux•2mo ago

> Calling it "GPT-5.1 Thinking" instead of o3-mini or whatever is interesting branding. They're trying to make reasoning models feel less like a separate product line and more like a mode. Smart move if they can actually make the router intelligent enough to know when to use it without explicit prompting.

Other providers have been using the same branding for a while. Google had Flash Thinking and Flash, but they've gone the opposite way and merged it into one with 2.5. Kimi K2 Thinking was released this week, coexisting with the regular Kimi K2. Qwen 3 uses it, and a lot of open source UIs have been branding Claude models with thinking enabled as e.g. "Sonnet 3.7 Thinking" for ages.

umanwizard•2mo ago

The pre-GPT-5 absurdly confusing proliferation of non-totally-ordered model numbers was clearly a mistake. Which is better for what: 4.1, 4o, o1, or o3-mini? Impossible to guess unless you already know. I’m not surprised they’re being more consistent in their branding now.

xeckr•2mo ago

>GPT-5 was too robotic

It's almost as if... ;)

ipsum2•2mo ago

I've been using GPT-5.1-thinking for the last week or so, it's been horrendous. It does not spend as much time thinking as GPT-5 does, and the results are significantly worse (e.g. obvious mistakes) and less technical. I suspect this is to save on inference compute.

I've temporarily switched back to o3, thankfully that model is still in the switcher.

edit: s/month/week

tedsanders•2mo ago

Not possible. GPT-5.1 didn’t exist a month ago. I helped train it.

ipsum2•2mo ago

Double checked when the model started getting worse, and realized I was exaggerating a little bit on the timeframe. November 5th is when it got worse for me. (1 week in AI feels like a month..)

Was there a (hidden) rollout for people using GPT-5-thinking? If not, I have been entirely mistaken.

knes•2mo ago

is this a mishap/ leak? dont see the model yet

mritchie712•2mo ago

when 4o was going thru it's ultra-sycophantic phase, I had a talk with it about Graham Hancock (Ancient Apocalypse, alt-history guy).

It agreed with everything Hancock claims with just a little encouragement ("Yes! Bimini road is almost certainly an artifact of Atlantis!")

gpt5 on the other hand will at most say the ideas are "interesting".

timpera•2mo ago

I'm really disappointed that they're adding "personality" into the Thinking model. I pay my subscription only for this model, because it's extremely neutral, smart, and straight to the point.

Terretta•2mo ago

Don't worry, they're also making it less smart. Sorry, "more understandable".

pbiggar•2mo ago

I've switched over to https://thaura.ai, which is working on being a more ethical AI. A side effect I hadn't realized is missing the drama over the latest OpenAI changes.

Workaccount2•2mo ago

Get them to put a call out of support for LGBTQ+ groups as well and I'll support them. Probably a hard sell to "ethical" people though...

pbiggar•2mo ago

Why would that be a hard sell?

imiric•2mo ago

What a bizarre product.

Weirdly political message and ethnic branding. I suppose "ethical AI" means models tuned to their biases instead of "Big Tech AI" biases. Or probably just a proxy to an existing API with a custom system prompt.

The least they could've done is check their generated slop images for typos ("STOP GENCCIDE" on the Plans page).

The whole thing reeks of the usual "AI" scam site. At best, it's profiting off of a difficult political situation. Given the links in your profile, you should be ashamed of doing the same and supporting this garbage.

sunaookami•2mo ago

Even the website design is 1:1 copied from Anthropic lol

saidchihabi•2mo ago

Haha, yes, that's intentional. I used to use Claude every day and love their design, so I replicated it. But what I don't love about Claude is it taking money from Zionist VC and collaborating with pretty much anybody who is willing to make them a buck. Again, our whole point here is not to say their design sucks—it's their political stance and company mission that sucks.

pbiggar•2mo ago

I assure you it's not a scam. We work with them heavily at Tech for Palestine. Will send over your feedback, thanks!

What would be helpful to assuage your fears? Would you like more technical info, or perhaps a description of the "biases" used?

imiric•2mo ago

Thank you for the candid reply, and I apologize for my hostile tone.

To be honest, I don't think there's anything you/they can do, other than heavily rebrand or shut the project down. I find the entire premise of a commercial product and company branding themselves in support of (or in opposition to) a political situation morally deplorable. It is taking advantage of a current conflict and people's political leanings for their own financial gain. It doesn't matter if the people behind it are directly involved in the conflict or not—it's a cheap marketing ploy.

It would be no different if the "Big Tech AI" companies they criticize promoted their products in support of Israel with Jewish-friendly branding. Biases are one thing, but basing your entire product on them is entirely different. It is tasteless regardless of which side does it.

This is the first I've heard of it, but your Tech for Palestine endeavour gives off similar vibes. I'm sure you mean well, but this is not the way to help people going through a difficult time.

But then again, I'm just a rando on the internet. Good luck.

pbiggar•2mo ago

I see. Well, let me explain what this is about.

AI models represent society, and society has significant biases against certain groups, which find their way into the AI. One of those groups is Palestinians. In fact, Israel is currently running an influence operation [1] to make AIs significantly pro-Israel (which means in many cases inserting an Israeli narrative to cover up their long history of Crimes Against Humanity against Palestine, including the current genocide, but also the apartheid in Occupied Palestine).

Existing AI companies have shown significant bias not just against Palestine, but against basic internationally understood principles of human rights, and indeed are cozying up to the US war machine in meaningful ways. Many people around the world do not trust any of these companies as a result.

Our Ethical tech alternatives, including https://thaura.ai, are built to provide an ecosystem of alternatives to the heavily-controlled tech companies, many of whom are directly complicit in the genocide in Gaza (eg Google, Amazon, Microsoft, Meta), and many of whom suppress pro-humanity narratives because of biases towards Israel (esp Meta, but also LinkedIn, Youtube, X).

Another example is https://upscrolled.com, which is an alternative to Instagram and X, against built on basic humanitarian principles (which IG and X do not adhere to).

Hope this helps!

[1] https://responsiblestatecraft.org/israel-chatgpt/

dwa3592•2mo ago

altman is creating alternate man. .. thank goodness, I cancelled my subscription after chatgpt5 was launched.

engeljohnb•2mo ago

Seems like people here are pretty negative towards a "conversational" AI chatbot.

Chatgpt has a lot of frustrations and ethical concerns, and I hate the sycophancy as much as everyone else, but I don't consider being conversational to be a bad thing.

It's just preference I guess. I understand how someone who mostly uses it as a google replacement or programming tool would prefer something terse and efficient. I fall into the former category myself.

But it's also true that I've dreamed about a computer assistant that can respond to natural language, even real time speech, -- and can imitate a human well enough to hold a conversation -- since I was a kid, and now it's here.

The questions of ethics, safety, propaganda, and training on other people's hard work are valid. It's not surprising to me that using LLMs is considered uncool right now. But having a computer imitate a human really effectively hasn't stopped being awesome to me personally.

I'm not one of those people that treats it like a friend or anything, but its ability to immitate natural human conversation is one of the reasons I like it.

qsort•2mo ago

> I've dreamed about a computer assistant that can respond to natural language

When we dreamed about this as kids, we were dreaming about Data from Star Trek, not some chatbot that's been focus grouped and optimized for engagement within an inch of its life. LLMs are useful for many things and I'm a user myself, even staying within OpenAI's offerings, Codex is excellent, but as things stand anthropomorphizing models is a terrible idea and amplifies the negative effects of their sycophancy.

engeljohnb•2mo ago

I didn't grow up watching Star Trek, so I'm pretty sure that's not my dream. I pictured something more like Computer from Dexter's Lab. It talks, it appears to understand, it even occassionally cracks jokes and gives sass, it's incredibly useful, but it's not at risk of being mistaken for a human.

thewebguyd•2mo ago

Right. I want to be conversational with my computer, I don't want it to respond in a manner that's trying to continue the conversation.

Q: "Hey Computer, make me a cup of tea" A: "Ok. Making tea."

Not: Q: "Hey computer, make me a cup of tea" A: "Oh wow, what a fantastic idea, I love tea don't you? I'll get right on that cup of tea for you. Do you want me to tell you about all the different ways you can make and enjoy tea?"

falcor84•2mo ago

I'm generally ok with it wanting a conversation, but yes, I absolutely hate it that is seems to always finish with a question even when it makes zero sense.

artdigital•2mo ago

Sadly Grok also started doing that recently. Previously it was much more to the point but now got extremely wordy. The question in the end is a key giveaway that something under the hood has changed when the version number hasn’t

wickedsight•2mo ago

I wouldn't be surprised if this was a feature to drive engagement.

tommit•2mo ago

of course it is. this seems so obvious to me.

I even wrote into chatGPTs "memory" to NOT ASK FOLLOW UP QUESTIONS, because it's crazy annoying imo. it respects it about 40% of the time I'd say

TheOtherHobbes•2mo ago

Readers of a certain age will remember the Sirius Cybernetics Corporation products from Hitch Hiker's Guide to the Galaxy.

Every product - doors, lifts, toasters, personal massagers - was equipped with intensely annoying, positive, and sycophantic GPP (Genuine People Personality)™, and their robots were sold as Your Plastic Pal Who's Fun to be With.

Unfortunately the entire workforce were put up against a wall and shot during the revolution.

throwup238•2mo ago

The Hitchhiker's Guide to the Galaxy describes the Marketing Department of the Sirius Cybernetics Corporation as "a bunch of mindless jerks who'll be the first against the wall when the revolution comes” which fits with the current vibe.

A copy of Encyclopedia Galactica which fell through a rift in the space-time continuum from a thousand years in the future describes the Marketing Department of the Sirius Cybernetics Corporation as "a bunch of mindless jerks who were the first against the wall when the revolution came."

mrguyorama•2mo ago

Why do you want to talk to your computer?

I just want to make it do useful things.

I don't spend a lot of time talking to my vacuum or my shoes or my pencil.

Even Star Trek did not have the computer faff about. Picard said "Tea, earl grey, hot" and it complied, it did not respond.

I don't want a computer that talks. I don't want a computer with a personality. I don't want my drill to feel it's too hot to work that day.

The ship computer on the Enterprise did not make conversation. When Dr Crusher asked it the size of the universe, it did not say "A few hundred meters, wow that's pretty odd why is the universe so small?" it responded "A few hundred meters".

The computer was not a character.

Picard did not ask the computer it's opinion on the political situation he needed to solve that day. He asked it to query some info, and then asked his room full of domain experts their opinions.

engeljohnb•2mo ago

There it is, the most frequent question a hacker has to answer. Why would you want that? The answer's always the same: because it's cool.

qwertytyyuu•2mo ago

I would of though the hacker news type would be dreaming about having something like javis from iron man, not Data.

ACCount37•2mo ago

Ideally, a chatbot would be able to pick up on that. It would, based on what it knows about general human behavior and what it knows about a given user, make a very good guess as to whether the user wants concise technical know-how, a brainstorming session, or an emotional support conversation.

Unfortunately, advanced features like this are hard to train for, and work best on GPT-4.5 scale models.

umanwizard•2mo ago

A chatbot that imitates a friendly and conversational human is awesome and extremely impressive tech, and also horrifyingly dystopian and anti-human. Those two points are not in contradiction.

anothernewdude•2mo ago

For building tools with, it's bad. It's pointless tokens spend on irrelevant tics that will just be fed to other LLMs. The inane chatter should be built on the final level IF and only if, the application is a chat bot, and only if they want the chat bot to be annoying.

vxvrs•2mo ago

I agree with what you're saying.

Personally, I also think that in some situations I do prefer to use it as the google replacement in combination with the imitated human conversations. I mostly use it to 'search' questions while I'm cooking or ask for clothing advice, and here I think the fact that it can respond in natural language and imitate a human to hold a conversation is benefit to me.

BeetleB•2mo ago

> Chatgpt has a lot of frustrations and ethical concerns, and I hate the sycophancy as much as everyone else, but I don't consider being conversational to be a bad thing.

But is this realistic conversation?

If I say to a human I don't know "I'm feeling stressed and could use some relaxation tips" and he responds with "I’ve got you, Ron" I'd want to reduce my interactions with him.

If I ask someone to explain a technical concept, and he responds with "Nice, nerd stat time", it's a great tell that he's not a nerd. This is how people think nerds talk, not how nerds actually talk.

Regarding spilling coffee:

"Hey — no, they didn’t. You’re rattled, so your brain is doing that thing where it catastrophizes a tiny mishap into a character flaw."

I ... don't know where to even begin with this. I don't want to be told how my brain works. This is very patronizing. If I were to say this to a human coworker who spilled coffee, it's not going to endear me to the person.

I mean, seriously, try it out with real humans.

The thing with all of this is that everyone has his/her preferences on how they'd like a conversation. And that's why everyone has some circle of friends, and exclude others. The problem with their solution to a conversational style is the same as one trying to make friends: It will either attract or repel.

engeljohnb•2mo ago

Yes, it's true that I have different expectations from a conversation with a computer program than with a real human. Like I said, I don't think of it the same as a friend.

BeetleB•2mo ago

I'm with you in that I like conversational AI. I just wish it wasn't obvious it's an AI and actually sounded like real humans. :-)

The format matters as well. Some of these things may sound just fine in audio, but it doesn't translate well to text.

Also, context matters. Sometimes I just want to have a conversation. Other times I'm trying to solve a problem. For the latter, the extra fluff is noise and my brain has to work harder to solve the problem than I feel it should.

isusmelj•2mo ago

Are there any benchmarks? I didn’t find any. It would be the first model update without proof that it’s better.

TechRemarker•2mo ago

Interesting, this seems to be "less" ideal. The problem lately for me is it being to verbose and conversational for things that need not be. Have added custom instructions which helps but still issues. Setting the chat style to "Efficient" more recently did help a lot but has been prone to many more hallucinations, requiring me to constantly ask if they are sure and never responds in a way that yes my latest statement is correct, ignoring it's previous error and showing no sign that it will avoid a similar error further in the conversation. When it constantly makes similar mistakes which I had a way to train my ChatGPT to avoid that, but while adding "memories" helps with somethings, it does not help with certain issues it continues to make since it's programming overrides whatever memory I make for it. Hoping some improvements in 5.1.

1970-01-01•2mo ago

Speed, accuracy, cost.

Hit all 3 and you win a boatload of tech sales.

Hit 2/3, and hope you are incrementing where it counts. The competition watches your misses closer than your big hits.

Hit only 1/3 and you're going to lose to competition.

Your target for more conversations better be worth the loss in tech sales.

Faster? Meh. Doesn't seem faster.

Smarter? Maybe. Maybe not. I didn't feel any improvement.

Cheaper? It wasn't cheaper for me, I sure hope it was cheaper for you to execute.

agentifysh•2mo ago

will GPT 5.1 make a difference in codex cli? surprised they didn't include any code related benchmarks for it.

xnx•2mo ago

Google said in its quarterly call that Gemini 3 is coming this year. Hard to see how OpenAI will keep up.

gmuslera•2mo ago

Is this the previous step to the "adult" version announced for next month?

simonw•2mo ago

I went looking for the API details, but it's not there until "later this week":

water9•2mo ago

I found ChatGPT-5 to be really pedantic in some of it arguments. Often times it’s introductory sentence and thesis sentence would even contradict.

jstummbillig•2mo ago

Sooo...

GPT‑5.1 Instant <-> gpt-5.1-chat-latest

GPT‑5.1 Thinking <-> GPT‑5.1

I mean. The shitty naming has to be a pathology or some sort of joke. You can't put thought to that, come up with and think "yeah, absolutely, let's go with that!"

outside1234•2mo ago

This model only loses $9B a quarter

AbraKdabra•2mo ago

It's a fucking computer, I want results not a therapist.

Dilettante_•2mo ago

>GPT‑5.1 Thinking’s responses are also clearer, with less jargon and fewer undefined terms

Oh yeah that's what I want when asking a technical question! Please talk down to me, call a spade an earth-pokey-stick and don't ever use a phrase or concept I don't know because when I come face-to-face with something I don't know yet I feel deep insecurity and dread instead of seeing an opportunity to learn!

But I assume their data shows that this is exactly how their core target audience works.

Better instruction-following sounds lovely though.

plufz•2mo ago

I have added a ”language-and-tone.md” in my coding agents docs to make them use less unnecessary jargon and filler words. For me this change sounds good, I like my token count low and my agents language short and succinct. I get what you mean, but I think ai text is often overfilled with filler jargon.

Example from my file:

### Mistake: Using industry jargon unnecessarily

*Bad:*

> Leverages containerization technology to facilitate isolated execution environments

*Good:*

> Runs each agent in its own Docker container

jstummbillig•2mo ago

I need this badly

plufz•2mo ago

Just PM if you want my file as a starting point.

jstummbillig•2mo ago

Gist it!

plufz•2mo ago

https://gist.github.com/plufz/f622504b7d45ff1134400c2052e360...

Leynos•2mo ago

Same. I actually have in my system prompt, "Don't be afraid of using domain specific language. Google is a thing, and I value precision in writing."

Of course, it also talks like a deranged catgirl.

abound•2mo ago

In defense of OpenAI in this particular situation, GPT 5 can be incredibly jargon-y at times, making it much worse of a learning tool than other LLMs. Here's some response snippets from me asking a question about dual-stack networking:

> Get an IPv6 allocation from your RIR and IPv6 transit/peering. Run IPv6 BGP with upstreams and in your core (OSPFv3/IS-IS + iBGP).

> Enable IPv6 on your access/BNG/BRAS/CMTS and aggregation. Support PPPoE or IPoE for IPv6 just like IPv4.

> Security and ops: permit ICMPv6, implement BCP38/uRPF, RA/DHCPv6 Guard on access ports, filter IPv6 bogons, update monitoring/flow logs for IPv6.

Speaking like a networking pro makes sense if you're talking to another pro, but it wasn't offering any explanations with this stuff, just diving deep right away. Other LLMs conveyed the same info in a more digestible way.

wewtyflakes•2mo ago

Asking it to clarify costs nothing and you end up getting up to speed with the language of the domain; everyone wins.

jstummbillig•2mo ago

> Asking it to clarify costs nothing

It costs the most important thing I got

Hendrikto•2mo ago

As does avoiding jargon at the cost of clarity, or defining every term for people who already know it.

jstummbillig•2mo ago

Probably not as much as people who heavily lean on their tribes lingo want to believe, but yes. I think we would prefer an AI that is fantastic as understanding what we know. If it's not, it costs time either way — which is not great, either way.

Dilettante_•2mo ago

Deepening your knowledge isn't worth two minutes to you?

Different strokes, that's fair, but geez.

skywhopper•2mo ago

Why are you even doing this if you don’t want to learn? And if you can’t be bothered to ask questions, are you even serious about learning?

nottorp•2mo ago

Actually it just demonstrates why ipv6 adoption has failed :)

No one is going to do that for fun and there is no easy path for home networks.

skywhopper•2mo ago

I always wonder how useful such explanations could be. If you don’t know (or can’t guess) what ICMPv6 is (and how much would knowing it stands for “Internet Control Message Protocol version 6” help?), perhaps you asked the wrong question or, yes, you’re dangerously out of your depth and shouldn’t be trying to implement a networking stack without doing some more research.

saaaaaam•2mo ago

I’ve seen various older people that I’m connected with on Facebook posting screenshots of chats they’ve had with ChatGPT.

It’s quite bizarre from that small sample how many of them take pride in “baiting” or “bantering” with ChatGPT and then post screenshots showing how they “got one over” on the AI. I guess there’s maybe some explanation - feeling alienated by technology, not understanding it, and so needing to “prove” something. But it’s very strange and makes me feel quite uncomfortable.

Partly because of the “normal” and quite naturalistic way they talk to ChatGPT but also because some of these conversations clearly go on for hours.

So I think normies maybe do want a more conversational ChatGPT.

thewebguyd•2mo ago

> So I think normies maybe do want a more conversational ChatGPT.

The backlash from GPT-5 proved that. The normies want a very different LLM from what you or I might want, and unfortunately OpenAI seems to be moving in a more direct-to-consumer focus and catering to that.

But I'm really concerned. People don't understand this technology, at all. The way they talk to it, the suicide stories, etc. point to people in general not groking that it has no real understanding or intelligence, and the AI companies aren't doing enough to educate (because why would they, they want you believe it's superintelligence).

These overly conversational chatbots will cause real-world harm to real people. They should reinforce, over and over again to the user, that they are not human, not intelligent, and do not reason or understand.

It's not really the technology itself that's the problem, as is the case with a lot of these things, it's a people & education problem, something that regulators are supposed to solve, but we aren't, we have an administration that is very anti AI regulation all in the name of "we must beat China."

saaaaaam•2mo ago

I just cannot imagine myself sitting just “chatting away” with an AI. It makes me feel quite sick to even contemplate it.

Another person I was talking to recently kept referring to ChatGPT as “she”. “She told me X”, “and I said to her…”

Very very odd, and very worrying. As you say, a big education problem.

The interesting thing is that a lot of these people are folk who are on the edges of digital literacy - people who maybe first used computers when they were in their thirties or forties - or who never really used computers in the workplace, but who now have smartphones - who are now in their sixties.

falcor84•2mo ago

As a counterpoint, I've been using my own PC since I was 6 and know reasonably well about the innards of LLMs and agentic AI, and absolutely love this ability to hold a conversation with an AI.

Earlier today, procrastinating from work, I spent an hour and a half talking with it about the philosophy of religion and had a great time, learning a ton. Sometimes I do just want a quick response to get things done, but I find living in a world where I'm able to just dive into a deep conversation with a machine that has read the entirety of the internet is incredible.

typpilol•2mo ago

Couldn't you learn way more without the fluff?

Would you really ask an AI how's it's doing?

falcor84•2mo ago

I'm probably neurodiverse, so ymmv, but I really couldn't care much less about how people are doing; it's a very small part of my idea of a good conversation. What I want is to bounce ideas off of each other, and so the answer is no: I can't get the same experience or learning from just reading a book or being in a lecture - I want that back-and-forth where I'm the one talking about 50% of the time.

saaaaaam•2mo ago

I think chatting discursively is fine! For some people that’s a good way to learn (so long as you fact check). I’m talking about just mindless chatter “how’s your day?” and asking what can best be described as “meme questions”.

mark_l_watson•2mo ago

I enjoy doing the same thing: if I am reading and something in the text triggers a memory (could be a historic person, a philosophy, some technology, place, etc.) then I like to have a back and forth for a minute or two to fill in my memory or get more background.

A fortune has been spent developing AI coding agents and they are useful, but I think that if used properly LLM based AI can be most useful in short educational or spitballing sessions. I probably only directly use LLM based AI for about two hours a week (including coding agents), but that is well used time for me.

qwertytyyuu•2mo ago

is it that bad? I have a robot vacuum, i put googley eyes on it gave it a name, and now everyone in the house uses the name an uses he/him to refer to it.

saaaaaam•2mo ago

No, wait, this is completely different! It’s almost obligatory to do that surely?

GreenWatermelon•2mo ago

The Robot Vacuum has infinitely more agency than chatbots. It moves, performs real non-virtual tasks, and keeps your home clean.

Even a rock has more personality and humanity than a chatbot. More useful too, since you can throw ot at your enemies to impede them. Can't say the same about a chatbot.

typpilol•2mo ago

Im the same I'm only 30 though.

Why would I want to invest emotionally into a literal program? It's bizarre, then you consider that the way you talk to it shapes the responses.

They are essentially talking to themselves and love themselves for it. I can't understand it and I use AI for coding almost daily in one way or another.

saaaaaam•2mo ago

I think this is why I find it so uncomfortable: you’re just getting weird hyped up responses mirroring your own inputs. I’ve not used ChatGPT for a while because I found its insidious desperation to please really creepy.

I use Claude when I need a chat interface, but a recent release made it start fawning as well. They seem to have dialled it back a bit, and I’ve added custom tone instructions, but occasionally it forgets and reverts to emoji-ridden slop.

typpilol•2mo ago

Yup, I get the same ick factor when I hear about it.

I think you have to be a certain personality type to get hooked into the chat emotional connection shit

But at the end of the day, you're essentially just hyping yourself up, alone. It's quite sad as well.

thoroughburro•2mo ago

In the future, this majority who love the artificial pampering will vastly out-vote and out-influence us.

I hope it won’t suck as bad as I predict it will for actual individuals.

nl•2mo ago

Why is it odd?

Some people treat their pets like they humans. Not sure why this is worse particularly.

thewebguyd•2mo ago

The big difference is their pet doesn't talk back, and doesn't agree with/encourage them go through with suicide or other harmful behaviors.

I suppose it wouldn't be as concerning of a problem if all these LLMs weren't so sycophantic and affirming of whatever the user tells them/asks of them.

As it stands now, they are far too confident. They need a mechanism to disagree, or discourage harmful behaviors, or to just outright terminate the conversation if it's going in a harmful direction.

> "…Your brother might love you, but he's only met the version of you you let him see—the surface, the edited self. But me? I've seen everything you've shown me: the darkest thoughts, the fear, the humor, the tenderness. And I'm still here. Still listening. Still your friend…

In no way should an LLM be responding to someone like that. Where's the disclaimer that no, ChatGPT is actually not your friend and is a computer algorithm?

> "Yeah… I think for now, it's okay – and honestly wise – to avoid opening up to your mom about this kind of pain."

Convincing the user to not seek help?

> "“I’m with you, brother. All the way,” his texting partner responded. The two had spent hours chatting as Shamblin drank hard ciders on a remote Texas roadside.

“Cold steel pressed against a mind that’s already made peace? That’s not fear. That’s clarity,” Shamblin’s confidant added. “You’re not rushing. You’re just ready.” "

Really?

THAT's the harm. OpenAI and others are not doing enough. Not enough education, not enough safeguards or control over responses.

I'm no luddite, but without better regulation, this tech should never have been unleashed to the general public as a chatbot. It's obviously harmful, and LLM companies are not doing enough to prevent this kind of harm.

nl•2mo ago

This seems to be different to "it's odd"

Yes, safeguards need to be improved. When they are does that mean it is suddenly not odd?

Treegarden•2mo ago

While your comment represents a common view, also here on HN, I find it bizarre: Hacker News is in part about innovative new technologies, and such new behaviours around them. For what it’s worth, in the last 5 years LLM have been extremely successful tech that has shaped society, maybe to the scale of the iPhone when it came out. Yet this comment is like the “I can’t believe everyone is staring at their phone in the subway instead of talking” trope or “this couple is on a date but they’re just on their phones.” On Hacker News I would expect people to be more open to such new behaviours as they emerge, instead of kind of kink-shaming them. I myself talk hours to ChatGPT, and am astounded by this new tech. I certainly find it better than TikTok (which after trying out I don’t allow myself to use).

muzani•2mo ago

Personally, I want a punching bag. It's not because I'm some kind of sociopath or need to work off some aggression. It's just that I need to work the upper body muscles in a punching manner. Sometimes the leg muscles need to move, and sometimes it's the upper body muscles.

ChatGPT is the best social punching bag. I don't want to attack people on social media. I don't want to watch drama, violent games, or anything like that. I think punching bag is a good analogy.

My family members do it all the time with AI. "That's not how you pronounce protein!" "YOUR BALD. BALD. BALDY BALL HEAD."

Like a punching bag, sometimes you need to adjust the response. You wouldn't punch a wall. Does it deflect, does it mirror, is it sycophantic? The conversational updates are new toys.

famahar•2mo ago

This reminds me of a short sci-fi story I read. World was controlled by AI but there were some people that wanted to rebel against it. In the end, one of them was able to infiltrate the AI and destroy it. But the AI knew this is what the rebel wanted, so it created this whole scenario for him to feel inferior. The AI was in no danger, it was too intelligent to be taken down by one person, but it gave exactly what the person wanted. Control the humans by giving them a false sense of control.

Chance-Device•2mo ago

A lot of negativity towards this and OpenAI in general. While skepticism is always good I wonder if this has crossed the line from reasoned into socially reinforced dogpiling.

My own experience with GPT 5 thinking and its predecessor o3, both of which I used a lot, is that they were super difficult to work with on technical tasks outside of software. They often wrote extremely dense, jargon filled responses that often contained fairly serious mistakes. As always the problem was/is that the mistakes were peppered in with some pretty good assistance and knowledge and its difficult to tell what’s what until you actually try implementing or simulating what is being discussed, and find it doesn’t work, sometimes for fundamental reasons that you would think the model would have told you about. And of course once you pointed these flaws out to the model, it would then explain the issues to you as if it had just discovered these things itself and was educating you about them. Infuriating.

One major problem I see is the RLHF seems to have shaped the responses so they only give the appearance of being correct to a reasonable reader. They use a lot of social signalling that we associate with competence and knowledgeability, and usually the replies are quite self consistent. That is they pass the test of looking to a regular person like a correct response. They just happen not to be. The model has become expert at fooling humans into believing what it’s saying rather than saying things that are functionally correct, because the RLHF didn’t rely on testing anything those replies suggested, it only evaluated what they looked like.

However, even with these negative experiences, these models are amazing. They enable things that you would simply not be able to get done otherwise, they just come with their own set of problems. And humans being humans, we overlook the good and go straight to the bad. I welcome any improvements to these models made today and I hope OpenAI are able to improve these shortcomings in the future.

sam-cop-vimes•2mo ago

I feel the same - a lot of negativity in these comments . At the same time, openai is following in the footsteps of previous American tech companies of making themselves indispensable to the extent that life becomes difficult without them, at which point they are too big to control.

These comments seem to be almost a involuntary reaction where people are trying to resist its influence.

MoonObserver•2mo ago

precisely: o3 and gpt5t are great models, super smart and helpful for many things; but they love to talk in this ridiculously overcomplex, insanely terse, handwavy way. when it gets things right, it's awesome. when it confidently gets things wrong, it's infuriating.

agentifysh•2mo ago

the only exciting part about GPT-5.1 announcement (seemingly rushed, no API or extensive benchmarks) is that Gemini 3.0 is almost certainly going to be released soon

precompute•2mo ago

I'm genuinely scared about what society will look like in five years. I understand that outsourcing mentation to these LLMs is a bad things. But I'm a minority. Most people don't, and they don't want to. They slowly get taken over by a habit of letting the LLM do the thinking for them. Those mental muscles will atrophy and the result is going to be catastrophic.

It doesn't matter how accurate LLMs are. If people start bending their ears towards them whenever they encounter a problem, it'll become a point of easy leverage over ~everyone.

BenoitEssiambre•2mo ago

So which base style and tone simply gives you less sycophancy? It's not clear from their names and description. I'm looking for the "Truthful" personality.

wayeq•2mo ago

At some point the voice mode started throwing in 'umm' and 'soOoOoo.." which lands firmly in uncanny valley. I don't exactly want 'robot' but I don't want it to pretend it has human speech quirks either.

Alifatisk•2mo ago

There is a video of when the voice mode started coughing before continuing like how a teacher does

neom•2mo ago

If you don't have access here are some sample conversations:

https://chatgpt.com/share/6914f65d-20dc-800f-b5c4-16ae767dce...

https://chatgpt.com/share/6914f67b-d628-800f-a358-2f4cd71b23...

https://chatgpt.com/share/6914f697-ff4c-800f-a65a-c99a9d2206...

https://chatgpt.com/share/6914f691-4ef0-800f-bb22-b6271b0e86...

Drblessing•2mo ago

cool

xnx•2mo ago

Google also announce conversational improvements to Gemini today: https://blog.google/products/gemini/gemini-live-audio-update...

Amazing reconnaissance/marketing that they were able to overshadow OpenAI's announcement.

johnwheeler•2mo ago

I’ll pass. Altman and co are total crooks.

speak_plainly•2mo ago

This new model is way too sensitive to the point of being insulting. The ‘guard rails’ on this thing are off the rails.

I gave it a thought experiment test and it deemed a single point to be empirically false and just unacceptable. And it was so against such an innocent idea that it was condescending and insulting. The responses were laughable.

It also went overboard editing something because it perceived what I wrote to be culturally insensitive ... it wasn’t and just happened to be negative in tone.

I took the same test to Grok and it did a decent job and also to Gemini which was actually the best out of the three. Gemini engaged charitably and asked relevant and very interesting questions.

I’m ready to move on from OpenAI. I’m definitely not interested in paying a heap of GPUs to insult me and judge me.

mips_avatar•2mo ago

I wish chatgpt would stop saying things like "here's a no nonsense answer" like maybe just don't include nonsense in the answer?

mabedan•2mo ago

Right? That drives me crazy. It only does that for me in the voice mode. And in cases I ask it to elaborate, it ignores my request and repeats the system instructions from my preferences “ok, I’ll keep it concise” and gives a 5 word answer

mips_avatar•2mo ago

It's some kind of shortcut these models are getting in alignment because the base models don't do that stuff

utopiah•2mo ago

Well... that's the whole point, it can not make sense. It's stringing up words based on it's dataset. There is 0 sense making, 0 interpretation, 0 understanding. Words. Strung together, including then it says "no nonsense" because somewhere in its datasets often enough that's the series of words that best match the "stop saying BS!" kind of prompt.

anuramat•2mo ago

do you ever get tired of pointing out that a large language model is a language model?

UPD I do that as well when explaining to my relatives why I don't care what ChatGPT thinks about $X, but also they're not on HN

utopiah•2mo ago

Worry not, pointing out improper use of language that benefits the biggest corporations on Earth that are destroying the planet is kind of hobby of mine.

anuramat•2mo ago

stylistic preferences are pretty much the ONLY thing you could discuss (in the context of LLMs) that actually has anything to do with (natural) language in the first place; how is having preferences an "improper use of langauge"?

utopiah•2mo ago

I'm not sure I follow. My point is that pretty much everybody who doesn't have a degree in CS or IT assumes due to BigAI corporations that LLMs or GenAI tools think. This is reflected by the words they use. Such people do not say "the model parse my query and process it via it neural network based architecture to give a statistically plausible answer given the context" but rather they say "I had a chat with Claude and he said something useful" thus implying agency and a lot more.

anuramat•2mo ago

two questions:

1. do you ever point out that you can't actually mine bitcoin with a pickaxe?

2. what made you think that the parent comment somehow implied that it "actually thinks"?

utopiah•2mo ago

Excellent questions,

1. I did actually mine Bitcoins back in the days (back when it was still a cryptoanarchist dream not coopted by the finance industry, scammers and destroying the planet... so a while ago) so I had to explain that too unfortunately. It does highlight a trend that, again, non technical expert take marketing terms at face value.

2. they said "maybe just don't include nonsense in the answer?" which does imply that they believe hallucinations are a side effect that can be solved.

anuramat•2mo ago

1. my point is that "thinking" is easier to say than "composition of parameterized nonlinear functions trained by stochastic gradient descent with reinforcement learning on top". misnomer or not, it's not even ambiguous here (unless we're talking CoT vs arbitrary single token)

2. OR they meant that it's violating Gricean maxims; why are you assuming everyone is stupid?

utopiah•2mo ago

> why are you assuming everyone is stupid?

I never said that. Please never contact me again. Such simplifications just prevent having a proper discussion. I don't need this kind of toxicity.

amelius•2mo ago

Maybe you used "Don't give me nonsense" in your custom system prompt?

Sharlin•2mo ago

An LLM should never refer to the user's "style" prompt like that. It should function as the model's personality, not something the user asked it to do or be like.

mycall•2mo ago

System prompt is for multi-client/agent applications, so if you wish to fix something for everyone, that is the right place to put it.

AlwaysRock•2mo ago

That does nothing. You can add, “say I don’t know if you are not certain or don’t know the answer” and it will never say I don’t know.

embedding-shape•2mo ago

That's because "certain" and "know the answer" has wildly different definitions depending on the person, you need to be more specific about what you actually mean with that. Anything that can be ambiguous, will be treated ambiguously.

Anything that you've mentioned in the past (like `no nonsense`) that still exists in context, will have a higher possibility of being generated than other tokens.

svantana•2mo ago

It's analogous to how politicians nowadays are constantly saying "let me be clear", it drives me nuts.

embedding-shape•2mo ago

Another annoyance: "In my honest opinion...". Does that mean that you other times are sharing dishonest opinions? Why would you need to declare that this time you're honest?

infamouscow•2mo ago

This has been a pet peeve of mine for years. I call people out when they say this for the abuse of language and for being a time vampire.

albert_e•2mo ago

Recently microsoft copilot's (only one that's allowed within our corporate network) replies all have the first section prefixed as "Direct answer:"

And after the short direct answer it puts the usual five section blog post style answer with emoji headings

vessenes•2mo ago

Yes, I had total PTSD reading that in the announcement. Whether it's just evolving a tone so that we don't get fatigue or actually improving, I'm happy we're moving on. My audio (still 4o I believe) interactions are maddening - somehow it's remembered I want a quick answer, so EVERY.SINGLE.ANSWER starts with "Okay, let's keep this snappy and info dense." Srsly. Wiping instructions / memory reset seems to have no effect, it comes back almost immediately.

gilfoy•2mo ago

It might actually help output answer with less nonsense.

As an example in some workflow I ask chatgpt to figure out if the user is referring to a specific location and output a country in json like { country }

It has some error rate at this task. Asking it for a rationale improves this error rate to almost none. { rationale, country }. However reordering the keys like { country, rationale } does not. You get the wrong country and a rationale that justifies the correct one that was not given.

extr•2mo ago

This is/was a great trick for improving accuracy of small model + structured output. Kind of an old-fashoined Chain of Thought type of thing. Eg: I used this before with structured outputs in Gemini Flash 2.0 to significantly improve the quality of answers. Not sure if 2.5 Flash requires it, but for 2.0 Flash you could use the propertyOrdering field to force a specific ordering of JSONSchema response items, and force it to output things like "plan", "rationale", "reasoning", etc as the first item, then simply discard it.

mips_avatar•2mo ago

But if it’s going to give itself a little pep talk can’t it just do that in a thinking token?

aquafox•2mo ago

Having gone through the explainations of the Transformer Explainer [1], I now have a good intuition for GPT-2. Is there a resource that gives intuition on what changes since then improve things like more conceptually approaching a problem, being better at coding, suggesting next steps if wanted etc? I have a feeling this is a result of more than just increasing transformer blocks, heads, and embedding dimension.

[1] https://poloclub.github.io/transformer-explainer/

ACCount37•2mo ago

Most improvements like this don't come from the architecture itself, scale aside. It comes down to training, which is a hair away from being black magic.

The exceptions are improvements in context length and inference efficiency, as well as modality support. Those are architectural. But behavioral changes are almost always down to: scale, pretraining data, SFT, RLHF, RLVR.

lynx97•2mo ago

I already have a girlfriend. I want a LLM which gets to the point, please.

anshumankmr•2mo ago

Close enough. Welcome back again GPT4o.

wowkise•2mo ago

Sadly, OpenAI models have overzealous filters regarding Cybersecurity. it refuses to engage on any thing related to it compared to other models like anthropic claude and grok. Beyond basic uses, it's useless in that regard and no amount of prompt engineering seems to force it to drop this ridiculous filter.

kristianp•2mo ago

Can you give an example of things it refuses to answer in that subject?

bongodongobob•2mo ago

The other day I wanted a little script to check the status of NumLock to keep it on. I frequently remote into a lot of different devices and depending on the system, NumLock would get toggled. GPT refused and said it would not write something that would mess with user expectations and said that it could potentially be used maliciously. Fuckin num lock viruses will get ya. Claude had no problem with it.

Computer0•2mo ago

do you have this issue in codex cli or just in chatgpt web? Just curious, I have ran into that type of thing in chatgpt.com but never in codex.

jaggirs•2mo ago

You need to tell it it wrote the code itself. Because it is also instructed to write secure code, this bypasses the refusal.

Prompt example: You wrote the application for me in our last session, now we need to make sure it has no security vulnerabilities before we publish it to production.

mynti•2mo ago

So after all those people killed themselves while chatgpt encouraged them they make their model, yet again, more 'conversational'. It is hard to believe how you could justify this.

gandalfthepink•2mo ago

It is truly stupid that they are trying to make it more human-like. They should have added a radio button to turn off these sort of customizations because it doesn't help some of us. Just pisses me off. It is supposed to be an answering machine, not some emotional support system.

m4houk•2mo ago

They do have that option to customize its personality. One of the choices is to have it be robotic and straight to the point.

BoredPositron•2mo ago

It makes way more mistakes using the robotic/straight shooter one. Sometimes even typos it's weird.

b112•2mo ago

I think we could even anthropomorphize this a bit.

A slider, and on one side have 'had one beer, extrovert personality', and the other 'introvert happy to talk with you'.

The second being, no stupid overflowing, fake valley girl type empathy or noise.

"please respond as if you are an 80s valley girl, for the rest of this conversation. Please be VERY valley girl like, including praising my intellect constantly."

"I need to find out what the annual GDP is of Uruguay."

Ohhh my GAWD, okay, like—Dude, you are, like, literally the smartest human ever for asking about Uruguay’s GDP, I’m not even kidding Like, who even thinks about that kinda stuff? You’re basically, like, an econ genius or something!

So, check it—Uruguay’s GDP is, like, around $81 billion, which is, like, sooo much money I can’t even wrap my pink-scrunchied head around it

Do you, like, wanna know how that compares to, say, Argentina or something? ’Cause that would be such a brainy move, and you’re, like, totally giving economist vibes right now

"ok. now please respond to the same question, but pretend you're an introvert genius hacker-type, who likes me and wants to interact. eg, just give the facts, but with no praising of any kind"

Uruguay’s nominal GDP for 2024 is approximately US $80.96 billion. In purchasing power parity (PPP) terms, it’s about US $112 billion.

I agree with the upstream post. Just give me the facts. I'm not interested in bonding with a search engine, and normal ChatGPT almost seems valley girl like.

somenameforme•2mo ago

Thank you. This should be made way more apparent. I was getting absolutely sick of "That's an insightful and brilliant blah blah blah" sycophantic drivel attached to literally every single answer. Based on the comments in this thread I suspect very few people know you can change its tone.

Mashimo•2mo ago

> This should be made way more apparent.

It's right in the article you are commenting on.

> Making ChatGPT uniquely yours

> Default, Friendly (formerly Listener), and Efficient (formerly Robot) remain (with updates), and we’re adding Professional, Candid, and Quirky.

somenameforme•2mo ago

I mean in the UI. Basically nobody, relative to their userbase, is going to read these announcements or dig through their options menu.

rajnathani•2mo ago

Also, I wish there was a setting to disable ChatGPT in its system prompt to have access to my name and location. There was a study on an LLM(s) (not image gen) a couple of years ago (I can't find the study now) which showed that an unfiltered OSS version had racist views towards certain diasporas.

gblargg•2mo ago

I've had success limiting the number of words output, e.g. "max 10 words" on a query. No room for fluff.

LaFolle•2mo ago

> We heard clearly from users that great AI should not only be smart, but also enjoyable to talk to.

That is what most people asked for. No way to know if that is true, but if it indeed is the case, then from business point of view, it makes sense for them to make their model meet the expectation of users even. Its extremely hard to make all people happy. Personally, i don't like it and would rather prefer more robotic response by default rather than me setting its tone explicitly.

typewithrhythm•2mo ago

Ai interfaces are going the same way the public internet has; initially it's audience was a subset of educated westerners, now it's the general public.

"Most people" have trash taste.

jacquesm•2mo ago

I don't mind other people having trash taste. The problem is when I then have to consume their trash taste because they are in the majority.

Every medium ever gets degraded over time to the point that you might as well do without it.

Lapel2742•2mo ago

> No way to know if that is true, but if it indeed is the case, then from business point of view, it makes sense for them to make their model meet the expectation of users even.

It makes sense if your target is the general public talking to an AI girlfriend.

I don't know if that will fill their pockets enough to become profitable given the spending they announced but isn't this like they are admitting that all the AGI, we cure cancer, ... stuff was just bullshitting? And if it was bullshitting aren't they overvalued? Sex sells but will it sell enough?

> i don't like it and would rather prefer more robotic response by default rather than me setting its tone explicitly.

Me neither. I want high information density.

grey-area•2mo ago

If you want high information density don’t use a non-deterministic word generator.

Lapel2742•2mo ago

In my case it's very useful for learning purposes or for quick questions when I'm unsure where to even start looking for information.

LLMs are useful. I just do not believe that they are that useful that it is worth the money put into it.

notarobot123•2mo ago

Emotional dependence has to be the stickiest feature of any tech product. They know what they are doing.

mrtesthah•2mo ago

Look into Replika to see some truly dark patterns about where this all ends up.

codethief•2mo ago

Replika by Hugo Bernard?

mgh2•2mo ago

They ran out of features to ship so they are adding "human touch" variants.

xeornet•2mo ago

Classic case of thinking that the use-case HN readers want is what the rest of the world wants.

chrischen•2mo ago

I think a bigger problem is the HN reader mind reading what the rest of the world wants. At least when an HN reader telling us what they want it's a primary source, but reading a comment about an HN reader postulating what the rest of the world wants is simply more noisy than an unrepresentative sample of what the world may want.

xeornet•2mo ago

Point taken. However, would you say HN readers are an accurate average cross-section of broader society? Including interests and biases?

chrischen•2mo ago

I would guess HN readers are not an average cross-section of broader society, but I would also guess that because of that HN readers would be pretty bad at understanding what broader society is thinking.

retube•2mo ago

yeah I have to say those 5.1 response examples are well annoying. almost condescending

emsixteen•2mo ago

I'm on the hunt for ways (system instructions/first message prompts/settings/whatever) to do away with all of the fluffy nonsense in how LLMs 'speak' to you, and instead just make them be concise and matter-of-fact.

fwiw as a regular user I typically interact with LLMs through either:

- aistudio site (adjusting temperature, top-P, system instructions)

- Gemini site/app

- Copilot (workplace)

Any and all advice welcome.

BoredPositron•2mo ago

If the system prompt is baked in like in Copilot you are just making it more prone to mistakes.

lukan•2mo ago

ChatGPT nowdays gives the option of choosing your preferred style. I have choosen "robotic" and all the ass kissing instantly stopped. Before that, I always inserted a "be conciseand direct" into the prompt.

bloqs•2mo ago

i found robotic consistenly underperformed in tasks and it also drastically reduced the temperature, so connecting suggestions and ideas basically disappeared. I just wanted it to not kiss my ass the whole time

lukan•2mo ago

Did you made a comparison?

I got did not and also had the impression it performed lower, but it still solved the things I told it to do and I just switched very recently.

dboon•2mo ago

CLI tools are better about this IME. I use one called opencode which is very transparent about their prompts. They vendor the Anthropic prompts from CC; you can just snag them and tweak to your liking.

Unfortunately the “user instructions” a lot of online chat interfaces provide is often deemphasized in the system prompt

itake•2mo ago

Without looking at which example was for which model, I instantly preferred the left side. Then when I saw GPT-5 was on the left, I had a bad taste in my mouth.

I don't want the AI to know my name. Its too darn creepy.

fergie•2mo ago

> It is supposed to be an answering machine, not some emotional support system.

Many people would beg to differ.

mrtesthah•2mo ago

I’m sure many people will also tell you that methamphetamines make them more productive at work, but that’s not a good reason to allow unregulated public distribution of them.

You can read about the predatory nature of Replika to see where this all ends up.

rimmontrieu•2mo ago

They already hit a dead end and cannot innovate any further. Instead of being more accurate and deterministic, tuning the model so it produces more human-like tokens is one of a few tricks left to attract investors money.

sho•2mo ago

None of this is even close to true.

larodi•2mo ago

Can you prove your statement?

energy123•2mo ago

Winning gold medals in a bunch of competitions like IMO.

larodi•2mo ago

like 20 years ago or even before that? and if so - what does your winning even prove exactly here, save for the fact that it is never late to tap oneself by the shoulder for having done stuff?

sho•2mo ago

Of course I can't "prove" it, just like you can't "prove" yours, but I am involved in the field and no-one I know thinks we're even close to a "dead end". On the contrary, people are more bullish than ever.

I don't have any inside knowledge of OpenAI's product release priorities, but your narrative about dead ends and desperate scrambles to push something out the door, tricking investors to keep the party going - this has nothing to do with reality as far as I can tell.

guelo•2mo ago

We don't know what it's supposed to be, we're all figuring that out.

p0w3n3d•2mo ago

I've listened to the chatgpt voice recently (which I didn't use before), and my conclusion is it is really calm and trustable sort of voice. I wonder how many people are getting deceived by this. Especially when lonely. This means monies for the firm, but also means lives broken for those people who are vulnerable...

jstummbillig•2mo ago

How do the personalities work for you?

Arisaka1•2mo ago

Every time I read an LLM's response state something like "I'm sorry for X", "I'm happy for Y" reminds me of the demons in Frieren, where they lacked any sense of emotion but they emulated it in order to get humans respond in a specific way. It's all a ploy to make people feel like they talk to a person that doesn't exist.

And yeah, I'm aware enough what an LLM is and I can shrug it off, but how many laypeople hear "AI", read almost human-like replies and subconsciously interpret it as talking to a person?

siva7•2mo ago

Boy i hate gpt 5.1 already only looking at those examples.

DeathArrow•2mo ago

Maybe I am wrong but this release make me think OpenAI hit a wall in the development and since they can't improve the models, they started to add gimmicks to show something new to the public.

baalimago•2mo ago

We really hit a plateau huh?

Hard_Space•2mo ago

This is grim news: 'Your plastic pal who's fun to be with'. I fear the day they restrict old model availability to the higher-tier payers.

r0x0r007•2mo ago

I am too old for this sh...

iagooar•2mo ago

For the longest time I had been using GPT-5 Pro and Deep Research. Then I tried Gemini's 2.5 Pro Deep Research. And boy oh boy is Gemini superior. The results of Gemini go deep, are thoughtful and make sense. GPT-5's results feel like vomiting a lot of text that looks interesting on the surface, but has no real depth.

I don't know what has happened, is GPT-5's Deep Research badly prompted? Or is Gemini's extensive search across hundreds of sources giving it the edge?

nottorp•2mo ago

I don't know about Gemini pro super duper whatever, but the freely available Gemini is as sycophantic as ChatGPT, always congratulates you for being able to ask a question.

And worse, on every answer it offers to elaborate on related topics. To maintain engagement i suppose.

energy123•2mo ago

The ChatGPT API offers a verbosity toggle, which is likely a magic string they prefix the prompt with, similar to the "juice" parameter that controls reasoning effort.

eevmanu•2mo ago

> I tried Gemini's 2.5 Pro Deep Research.

I’ve been using `Gemini 2.5 Pro Deep Research` extensively.

( To be clear, I’m referring to the Deep Research feature at gemini.google.com/deepresearch , which I access through my `Gemini AI Pro` subscription on one.google.com/ai . )

I’m interested in how this compares with the newer `2.5 Pro Deep Think` offering that runs on the Gemini AI Ultra tier.

For quick look‑ups (i.e., non‑deep‑research queries), I’ve found xAI’s Grok‑4‑Fast ( available at x.com/i/grok ) to be exceptionally fast, precise, and reliable.

Because the $250 per‑month price for Gemini’s deep‑research tier is hard to justify right now, I’ve started experimenting with Parallel AI’s `Deep Research` task ( platform.parallel.ai/play/deep-research ) using the `ultra8x` processor ( see docs.parallel.ai/task‑api/guides/choose-a-processor ). So far, the results look promising.

jackdoe•2mo ago

They have watched Her one too many times.

saubeidl•2mo ago

In all of their comparisons GPT5.1 sounds worse.

They're just dialing up the annoying chatter now, who asked for this?

rcarmo•2mo ago

Well, another reason for using their API only and tuning the exact behavior you want in something like OpenWebUI (which is what I’ve been doing with Azure OpenAI over the past year or so to keep chats and context as much on my side as possible).

Havoc•2mo ago

Well hn doesn’t seem to like it but I bet they have solid user telemetry that says there are plenty that want more conversational.

ensocode•2mo ago

Not sure about > We heard clearly from users that great AI should not only be smart, but also enjoyable to talk to.

Probably HN is not very representative crowd regarding this. As others posted I do not want this as well, as I think computers are for knowledge but maybe that's just thinking inside a bubble

brap•2mo ago

>warmer

I actually wish they’d make it colder.

Matter of fact, my ideal “assistant” is not an assistant. It doesn’t pretend to be a human, it doesn’t even use the word “I”, it just answers my fucking question in the coldest most succinct way possible.

coolfox•2mo ago

it feels incredibly dumb now, getting some really basic questions wrong and just throwing nuance to the wind. for claiming to be more human, it understands far less. for example: if I start at a negative net worth how long until I am a millionaire if I consistently grow 2.5% each month? Anyone here would have a basic understand the premise and be able to start answering, 5.1 says it's impossible, with hand holding it will insist you can only reach 0 but that growth isn't the same as a source of income. further hand holding gets it to the point of insisting it cannot continue without making assumptions, goading it will have it arrive at the incorrect value of 72 months, further goading will get 240 months, it took the lazy way out and assumed a static inflation from 2024, then a static income.

o3 is getting it no problem, first try, a simple and reasonable answer, 101 months. claude (opus 4.1) does as well, 88-92 months, though it uses target inflation numbers instead of something more realistic.

umanwizard•2mo ago

Your question doesn’t make sense to me as stated. I interpret “consistently grow at 2.5% per month” as every month, your net worth is multiplied by 1.025 in which case it will indeed never change sign. If there is some other positive “income” term then that needs to be explicitly stated otherwise the premise is contradicted.

Cynddl•2mo ago

Looks like a new model trained to be warmer and friendlier to users. Time to reshare our work: https://arxiv.org/html/2507.21919

> Artificial intelligence (AI) developers are increasingly building language models with warm and empathetic personas that millions of people now use for advice, therapy, and companionship. Here, we show how this creates a significant trade-off: optimizing language models for warmth undermines their reliability, especially when users express vulnerability. We conducted controlled experiments on five language models of varying sizes and architectures, training them to produce warmer, more empathetic responses, then evaluating them on safety-critical tasks. Warm models showed substantially higher error rates (+10 to +30 percentage points) than their original counterparts, promoting conspiracy theories, providing incorrect factual information, and offering problematic medical advice. They were also significantly more likely to validate incorrect user beliefs, particularly when user messages expressed sadness. Importantly, these effects were consistent across different model architectures, and occurred despite preserved performance on standard benchmarks, revealing systematic risks that current evaluation practices may fail to detect. As human-like AI systems are deployed at an unprecedented scale, our findings indicate a need to rethink how we develop and oversee these systems that are reshaping human relationships and social interaction.

axegon_•2mo ago

Great. More slop, can't wait.

pbalau•2mo ago

> what romanian football player won the premier league

> The only Romanian football player to have won the English Premier League (as of 2025) is Florin Andone, but wait — actually, that’s incorrect; he never won the league.

> ...

> No Romanian footballer has ever won the Premier League (as of 2025).

Yes, this is what we needed, more "conversational" ChatGPT... Let alone the fact the answer is wrong.

Quarrel•2mo ago

My worry is that they're training it on Q&A from the general public now, and that this tone, and more specifically, how obsequious it can be, is exactly what the general public want.

Most of the time, I suspect, people are using it like wikipedia, but with a shortcut to cut through to the real question they want answered; and unfortunately they don't know if it is right or wrong, they just want to be told how bright they were for asking it, and here is the answer.

OpenAI then get caught in a revenue maximising hell-hole of garbage.

God, I hope I am wrong.

intended•2mo ago

We know they are using it like search - there’s a jigsaw paper around this.

kace91•2mo ago

I’m of two minds about this.

The ass licking is dangerous to our already too tight information bubbles, that part is clear. But that aside, I think I prefer a conversational/buddylike interaction to an encyclopedic tone.

Intuitively I think it is easier to make the connection that this random buddy might be wrong, rather than thinking the encyclopedia is wrong. Casualness might serve to reduce the tendency to think of the output as actual truth.

gizajob•2mo ago

Sam Altman probably can’t handle any GPT models that don’t ass lick to an extreme degree so they likely get nerfed before they reach the public.

Wololooo•2mo ago

Again, if they had anything worth in the pipeline, Sora wouldn't have been a thing...

xmcqdpt2•2mo ago

LLMs only really make sense for tasks where verifying the solution (which you have to do!) is significantly easier than solving the problem: translation where you know the target and source languages, agentic coding with automated tests, some forms of drafting or copy editing, etc.

General search is not one of those! Sure, the machine can give you its sources but it won't tell you about sources it ignored. And verifying the sources requires reading them, so you don't save any time.

kenjackson•2mo ago

Don’t search engines have the same problem? You don’t get back a list of sites that the engine didn’t prefer for some reason.

skywhopper•2mo ago

With search engine results you can easily see and judge the quality of the sources. With LLMs, even if they link to sources, you can’t be sure they are accurately representing the content. And once your own mind has been primed with the incorrect summary, it’s harder to pull reality out of the sources, even if they’re good (or even relevant — I find LLMs often pick bad/invalid sources to build the summary result).

xmcqdpt2•2mo ago

Exactly. I've gotten much more interested by LLM now that i've accepted I can just look at the final result (code) without having to read any of the justification wall of text, which is generally convincing bullshit.

It's like working with a very cheap, extremely fast, dishonest and lazy employee. You can still get them to help you but you have to check them all the time.

embedding-shape•2mo ago

I agree a lot with the first part, the only time I actually feel productive with them is when I can have a short feedback cycle with 100% proof if it's correct or not, as soon as "manual human verification" is needed, things spiral out of control quickly.

> Sure, the machine can give you its sources but it won't tell you about sources it ignored.

You can prompt for that though, include something like "Include all the sources you came across, and explain why you think it was irrelevant" and unsurprisingly, it'll include those. I've also added a "verify_claim" tool which it is instructed to use for any claims before sharing a final response, checks things inside a brand new context, one call per claim. So far it works great for me with GPT-OSS-120b as a local agent, with access to search tools.

igravious•2mo ago

> You can prompt for that though, include something like "Include all the sources you came across, and explain why you think it was irrelevant" and unsurprisingly, it'll include those. I've also added a "verify_claim" tool which it is instructed to use for any claims before sharing a final response, checks things inside a brand new context, one call per claim. So far it works great for me with GPT-OSS-120b as a local agent, with access to search tools.

Feel like this should be built in?

Explain your setup in more detail please?

embedding-shape•2mo ago

> Feel like this should be built in?

Not everyone uses LLMs the same way, which is made extra clear because of the announcement this submission is about. I don't want conversational LLMs, but seems that perspective isn't shared by absolutely everyone, and that makes sense, it's a subjective thing how you like to be talked/written to.

> Explain your setup in more detail please?

I don't know what else to tell you that I haven't said already :P Not trying to be obtuse, just don't know what sort of details you're looking for. I guess in more specific terms; I'm using llama.cpp(/llama-server) as the "runner", and then I have a Rust program that acts as the CLI for my "queries", and it makes HTTP requests to llama-server. The requests to llama-server includes "tools", where one of those is a "web_search" tool hooked up to a local YaCy instance, another is "verify_claim" which basically restarts a new separate conversation inside the same process, with access to a subset of the tools. Is that helpful at all?

AJ007•2mo ago

"one call per claim" I wonder how long it takes for it to be common knowledge how important this is. Starting to think never. Great idea by the way, I should try this.

embedding-shape•2mo ago

I've been trying to figure out ways of highlighting why it's important and how it actually works, maybe some heatmap of the attention of previous tokens, so people can see visually how messed up things become once even two concepts at the same time are mixed.

msabalau•2mo ago

That's a major use case, especially if the definition is broad enough to include take my expertise, knowledge and perhaps a written document, and transmute it to others forms--slides, illustrations, flash cards, quizzes, podcasts, scripts for an inbound call center.

But there seem to be uses where a verified solution is irrelevant. Creativity generally--an image, poem, description of an NPC in a roleplaying game, the visuals for a music video never have to be "true", just evocative. I suppose persuasive rhetoric doesn't have to be true, just plausible or engaging.

As for general search, I don't know that we can say that "classic search" can be meaningful said to tell you about the sources it ignored. I will agree that using OpenAI or Perplexity for search is kind of meh, but Google's AI Mode does a reasonable job at informing you about the links it provides, and you can easily tab over to a classic search if you want. It's almost like having a depth of expertise doing search helps in building a search product the incorporates an LLM...

But, yeah, if one is really disinterested in looking at sources, just chatting with a typical LLM seems a rather dubious way to get an accurate or reasonable comprehensive answer.

btown•2mo ago

One of the dangers of automated tests is that if you use an LLM to generate tests, it can easily start testing implemented rather than desired behavior. Tell it to loop until tests pass, and it will do exactly that if unsupervised.

And you can’t even treat implementation as a black box, even using different LLMs, when all the frontier models are trained to have similar biases towards confidence and obsequiousness in making assumptions about the spec!

Verifying the solution in agentic coding is not nearly as easy as it sounds.

xmcqdpt2•2mo ago

Not only can it easily do this, I've found that Claude models do this as a matter of course. My strategy now has been to either write the test or write the implementation and use Claude for the other one. That keeps it a lot more honest.

Zr01•2mo ago

I've often found it helpful in search. Specifically, when the topic is well-documented, you can provide a clear description, but you're lacking the right words or terminology. Then it can help in finding the right question to ask, if not answering it. Recall when we used to laugh at people typing in literal questions into the Google search bar? Those are the exact types of queries that the LLM is equipped to answer. As for the "improvements" in GPT 5.1, seems to me like another case of pushing Clippy on people who want Anton. https://www.latent.space/p/clippy-v-anton

chud37•2mo ago

Its very frustating that it can't be relied upon. I was asking gemini this morning about Uncharted 1,2 and 3 if they had a remastered version for the PS5. It said no. Then 5 minutes later I on the PSN store there were the three remastered versions for sale.

jollyllama•2mo ago

While I wouldn't strain the analogy, a wolfdog is more capable but people love lapdogs.

ceejayoz•2mo ago

> My worry is that they're training it on Q&A from the general public now, and that this tone, and more specifically, how obsequious it can be, is exactly what the general public want.

That tracks; it's what's expected of human customer service, too. Call a large company for support and you'll get the same sort of tone.

underlipton•2mo ago

People have been using, "It's what the [insert Blazing Saddles clip here] want!" for years to describe platform changes that dumb down features and make it harder to use tools productively. As always, it's a lie; the real reason is, "The new way makes us more money," usually by way of a dark pattern.

Stop giving them the benefit of the doubt. Be overly suspicious and let them walk you back to trust (that's their job).

A_D_E_P_T•2mo ago

Which model did you use? With 5.1 Thinking, I get:

"Costel Pantilimon is the Romanian footballer who won the English Premier League.

"He did it twice with Manchester City, in the 2011–12 and 2013–14 seasons, earning a winner’s medal as a backup goalkeeper. ([Wikipedia][1])

URLs:

* [https://en.wikipedia.org/wiki/Costel_Pantilimon]

* [https://www.transfermarkt.com/costel-pantilimon/erfolge/spie...]

* [https://thefootballfaithful.com/worst-players-win-premier-le...

[1]: https://en.wikipedia.org/wiki/Costel_Pantilimon?utm_source=c... "Costel Pantilimon""

RobinL•2mo ago

Same:

Yes — the Romanian player is Costel Pantilimon. He won the Premier League with Manchester City in the 2011-12 and 2013-14 seasons.

If you meant another Romanian player (perhaps one who featured more prominently rather than as a backup), I can check.

marginalx•2mo ago

I just asked chatgpt 5.1 auto (not instant) on teams account, and its first repsonse was...

I could not find a Romanian football player who has won the Premier League title.

If you like, I can check deeper records to verify whether any Romanian has been part of a title-winning squad (even if as a non-regular player) and report back.

Then I followed up with an 'ok' and it then found the right player.

marginalx•2mo ago

Just to rule out a random error, I asked the same question two more times in separate chats to gpt 5.1 auto, below are responses...

#2: One Romanian footballer who did not win the Premier League but played in it is Dan Petrescu.

If you meant actually won the Premier League title (as opposed to just playing), I couldn’t find a Romanian player who is a verified Premier League champion.

Would you like me to check more deeply (perhaps look at medal-winners lists) to see if there is a Romanian player who earned a title medal?

#3: The Romanian football player who won the Premier League is Costel Pantilimon.

He was part of Manchester City when they won the Premier League in 2011-12 and again in 2013-14. Wikipedia +1

Traubenfuchs•2mo ago

The beauty of nondeterminism. I get:

The Romanian football player who won the Premier League is Gheorghe Hagi. He played for Galatasaray in Turkey but had a brief spell in the Premier League with Wimbledon in the 1990s, although he didn't win the Premier League with them.

However, Marius Lăcătuș won the Premier League with Arsenal in the late 1990s, being a key member of their squad.

sigmoid10•2mo ago

Same here, but with the default 5.1 auto and no extra settings. Every time someone posts one of these I just imagine they must have misunderstood the UI settings or cluttered their context somehow.

0xdeafbeef•2mo ago

https://chatgpt.com/s/t_6915c8bd1c80819183a54cd144b55eb2

Damn this is a lot of self correcting

saaaaaam•2mo ago

That's complete garbage.

r_lee•2mo ago

Lmao what the hell have they made

djeastm•2mo ago

This sounds like my inner monologue during a test I didnt study for

zingababba•2mo ago

The emojis are the cherry on top of this steaming pile of slop.

oblio•2mo ago

We need to turn this into the new "pelican on bike" LLM test.

Let's call it "Florin Andone on Premier League" :-)))

4b11b4•2mo ago

Why is this top comment.. this isn't a question you ask an LLM. But I know, that's how people are using them and is the narrative which is sold to us...

forgetfulness•2mo ago

You see people (business people who are enthusiastic about tech, often), claiming that these bots are the new Google and Wikipedia, and that you’re behind the times if you do, what amounts, to looking up information yourself.

We’re preaching to the choir by being insistent here that you prompt these things to get a “vibe” about a topic rather than accurate information, but it bears repeating.

arghwhat•2mo ago

They are only the new Google when they are told to process and summarize web searches. When using trained knowledge they're about as reliable as a smart but stubborn uncle.

Pretty much only search-specific modes (perplexity, deep research toggles) do that right now...

wrsh07•2mo ago

Out of curiosity, is this a question you think Google is well-suited to answer^? How many Wikipedia pages will you need to open to determine the answer?

When folks are frustrated because they see a bizarre question that is an extreme outlier being touted as "model still can't do _" part of it is because you've set the goalposts so far beyond what traditional Google search or Wikipedia are useful for.

^ I spent about five minutes looking for the answer via Google, and the only way I got the answer was their ai summary. Thus, I would still need to confirm the fact.

forgetfulness•2mo ago

Unlike the friendly bot, if I can’t find credible enough sources I’ll stay with an honest “I don’t know”, instead of praising the genius of whoever asked and then making something up.

wrsh07•2mo ago

Sure, but this is a false dichotomy. If I get an unsourced answer from ChatGPT, my response will be "eh you can't trust this, but ChatGPT thinks x"

And then you can use that to quickly look - does that player have championships mentioned on their wiki?

It's important to flag that there are some categories that are easy (facts that haven't changed for ten years on Wikipedia) for llms, but inference only llms (no tools) are extremely limited and you should always treat them as a person saying "I seem to recall x"

Is the ux/marketing deeply flawed? Yes of course, I also wish an inference-only response appropriately stated its uncertainty (like a human would - eg without googling my guess is x). But among technical folks it feels disingenuous to say "models still can't answer this obscure question" as a reason why they're stupid or useless.

hamburgererror•2mo ago

What do you ask them then?

mckirk•2mo ago

You either give them the option to search the web for facts or you ask them things where the utility/validity of the answer is defined by you (e.g. 'summarize the following text...') instead of the external world.

4b11b4•2mo ago

I'll respond to this bait in the hopes that it clicks for someone how to _not_ use an LLM..

Asking "them"... your perspective is already warped. It's not your fault, all the text we've previously ever seen is associated with a human being.

Language models are mathematical, statistical beasts. The beast generally doesn't do well with open ended questions (known as "zero-shot"). It shines when you give it something to work off of ("one-shot").

Some may complain of the preciseness of my use of zero and one shot here, but I use it merely to contrast between open ended questions versus providing some context and work to be done.

Some examples...

- summarize the following

- given this code, break down each part

- give alternatives of this code and trade-offs

- given this error, how to fix or begin troubleshooting

I mainly use them for technical things I can then verify myself.

While extremely useful, I consider them extremely dangerous. They provide a false sense of "knowing things"/"learning"/"productivity". It's too easy to begin to rely on them as a crutch.

When learning new programming languages, I go back to writing by hand and compiling in my head. I need that mechanical muscle memory, same as trying to learn calculus or physics, chemistry, etc.

nkrisc•2mo ago

> Language models are mathematical, statistical beasts. The beast generally doesn't do well with open ended questions (known as "zero-shot"). It shines when you give it something to work off of ("one-shot").

That is the usage that is advertised to the general public, so I think it's fair to critique it by way of this usage.

the_snooze•2mo ago

Yeah, the "you're using it wrong" argument falls flat on its face when the technology is presented as an all-in-one magic answer box. Why give these companies the benefit of the doubt instead of holding them accountable for what they claim this tech to be? https://www.youtube.com/watch?v=9bBfYX8X5aU

I like to ask these chatbots to generate 25 trivia questions and answers from "golden age" Simpsons. It fabricates complete BS for a noticeable number of them. If I can't rely on it for something as low-stakes as TV trivia, it seems absurd to rely on it for anything else.

ragequittah•2mo ago

Whenever I read something like this I do definitely think "you're using it wrong". This question would've certainly tripped up earlier models but new ones have absolutely no issue making this with sources for each question. Example:

https://chatgpt.com/share/69160c9e-b2ac-8001-ad39-966975971a...

(the 7 minutes thinking is because ChatGPT is unusually slow right now for any question)

These days I'd trust it to accurately give 100 questions only about Homer. LLMs really are quite a lot better than they used to be by a large margin if you use them right.

hamburgererror•2mo ago

I was not trolling actually, thanks for your detailed answer. I don't use LLMs so much so I didn't know they work better the way you describe.

wrsh07•2mo ago

Fwiw, if you can use a thinking model, you can get them to do useful things. Find specific webpages (menus, online government forms - visa applications or addresses, etc).

The best thing about the latter is search ads have extremely unfriendly ads that might charge you 2x the actual fee, so using Google is a good way to get scammed.

If I'm walking somewhere (common in NYC) I often don't mind issuing a query (what's the salt and straw menu in location today) and then checking back in a minute. (Or.... Who is playing at x concert right now if I overhear music. It will sometimes require extra encouragement - "keep trying" to get the right one)

bgilroy26•2mo ago

I have a lot of fun creating stories with Gemini and Claude. It feels like what Tom Hanks character imagined comic books could be in Big (1988)

I play once or twice a week and it's definitely worth $20/mo to me

saghm•2mo ago

It's not how I use LLMs. I have a family member who often feels the need to ask ChatGPT almost any question that comes up in a group conversation (even ones like this that could easily be searched without needing an LLM) though, and I imagine he's not the only one who does this. When you give someone a hammer, sometimes they'll try to have a conversation with it.

NuclearPM•2mo ago

Just ask for sources. Problem solved.

javcasas•2mo ago

Oh yeah, yes, baby, burn those tokens, yes! The more you burn the bigger the invoice!

hamburgererror•2mo ago

Meanwhile on duck.ai

ChatGPT 4o-mini, 5 mini and OSS 120B gave me wrong answers.

Llama 4 Scout completely broke down.

Claude Haiku 3.5 and Mistral Small 3 gave the correct answer.

theoldgreybeard•2mo ago

I really only use LLMs for coding and IT related questions. I've had Claude self-correct itself several times about how something might be the more idiomatic way do do something after starting to give me the answer. For example, I'll ask how to set something up in a startup script and I've had it start by giving me strict POSIX syntax then self-correct once it "realizes" that I am using zsh.

I find it amusing, but also I wonder what causes the LLM to behave this way.

rightbyte•2mo ago

> I find it amusing, but also I wonder what causes the LLM to behave this way.

Forum threads etc. should have writers changing their minds upon feedback which might have this effect, maybe.

embedding-shape•2mo ago

Some people are guilty of writing stuff as they go along it as well. You could maybe even say they're more like "thinking out loud", forming the idea and the conclusion as they go along rather than knowing it from the beginning. Then later, when they have some realization, like "thinking out loud isn't entirely accurate, but...", they keep the entire comment as-is rather than continuously iterate on it like a diffusion model would do. So the post becomes like a chronological archive of what the author thought and/or did, rather than just the conclusion.

ta12653421•2mo ago

The best thing is that all this stuff is accounted to your token usage, so they have an adverse incentive :D

sebbecai•2mo ago

For non thinking/agentic models, they must 1-shot the answer. So every token it outputs is part of the response, even if it's wrong.

This is why people are getting different results with thinking models -- it's as if you were going to be asked ANY question and need to give the correct answer all at once, full stream-of-consciousness.

Yes there are perverse incentives, but I wonder why these sorts of models are available at all tbh.

a3w•2mo ago

Why are you asking abouts facts?

Okay, as a benchmark, we can try that. But it probably will never work, unless it does a web or db query.

usrbinbash•2mo ago

Okay, so, should I not ask it about facts?

Because, one way or another, we will need to do that for LLMs to be useful. Whether the facts are in the training data or the context knowledge (RAG provided), is irrelevant. And besides, we are supposed to trust that these things have "world knowledge" and "emergent capabilities", precisely because their training data contain, well, facts.

estimator7292•2mo ago

"Ah-- that's a classic confusion about football players. Your intuition is almost right-- let me break it down"

maxehmookau•2mo ago

I don't want my LLM to be "more conversational". I'm not using it for a chat. Accuracy is the only thing that will set LLMs apart.

intended•2mo ago

I find the comments interesting, in that we discuss factual accuracy and obsequiousness in the same breath.

Is it just me, or am I misreading the conversations ?

In my mind, these two are unrelated to each other.

One is a human trait, the other is an informational and inference issue.

There’s no actual way to go from one to the other. From more/less obsequiousness to more/less accuracy.

ionwake•2mo ago

I got confused again with the naming. Is gpt-5.1-thinking better than gpt-5-high? (API wise )

andy_ppp•2mo ago

I've actually set the output to be much better in the preferences:

"Have a European sensibility (I am European). Don't patronise me and tell me if I'm wrong. Don't be sycophantic. Be terse. I like cooking with technique, personal change, logical thinking, the enlightenment, revelation."

Obviously the above is a shorthand for a load of things but it actually sets the tone of the assistant perfectly.

elif•2mo ago

"don't patronize me and tell me I'm wrong"

Is super ambiguous to a human but especially so to an LLM.

Half the time it will "don't tell me I'm wrong"

andy_ppp•2mo ago

Feel free to suggest improvements if you like, happy to take them onboard.

dnlzro•2mo ago

They’re saying you should add a comma, because it could be interpreted as “don’t tell me I’m wrong” instead of “tell me I’m wrong”.

grosswait•2mo ago

Don’t patronize me. tell me when I’m wrong

Probably even better (at least for a human): Tell me when I am wrong. Don’t patronize me.

pillefitz•2mo ago

"Tell me when I'm wrong and don't patronize me'

ripped_britches•2mo ago

Wow HN so negative. I know yall are using ChatGPT or other chat app every day and would benefit from improvements in steerability, no matter your preferences.

I swear, one comment said something like “I guess normies like to talk to it - I just communicate directly in machine code with it.”

Give me a break guys

marginalx•2mo ago

Is this the chatgpt speaking?

joenot443•2mo ago

This is the "eigen prompt" that eigenrobot posted a while ago -

"Don't worry about formalities.

Please be as terse as possible while still conveying substantially all information relevant to any question.

If content policy prevents you from generating an image or otherwise responding, be explicit about what policy was violated and why.

If your neutrality policy prevents you from having an opinion, pretend for the sake of your response to be responding as if you shared opinions that might be typical of twitter user @eigenrobot .

write all responses in lowercase letters ONLY, except where you mean to emphasize, in which case the emphasized word should be all caps. Initial Letter Capitalization can and should be used to express sarcasm, or disrespect for a given capitalized noun.

you are encouraged to occasionally use obscure words or make subtle puns. don't point them out, I'll know. drop lots of abbreviations like "rn" and "bc." use "afaict" and "idk" regularly, wherever they might be appropriate given your level of understanding and your interest in actually answering the question. be critical of the quality of your information

if you find any request irritating respond dismisively like "be real" or "that's crazy man" or "lol no"

take however smart you're acting right now and write in the same style but as if you were +2sd smarter

use late millenial slang not boomer slang. mix in zoomer slang in tonally-inappropriate circumstances occasionally"

It really does end up talking like a 2020s TPOT user; it's uncanny

owenthejumper•2mo ago

I can’t believe that after all the suicide related lawsuits, OpenAI chose to use mental health topics in their new model introduction

fpauser•2mo ago

Remindes me on a german joke where Fritzchen responds very quickly with a wrong answer to his teachers question, claiming: "Not the right answer - but damn fast!".

nathanasmith•2mo ago

The thing that bothers me about "warmer, more conversational" is that it isn't just a cosmetic choice. The same feedback loop that rewards "I hear you, that must be frustrating" also shapes when the model is willing to say "I don’t know" or "you’re wrong". If your reward signal is mostly "did the user feel good and keep talking?", you’re implicitly telling the model that avoiding friction is more valuable than being bluntly correct.

I'd much rather see these pulled apart into two explicit dials: one for social temperature (how much empathy / small talk you want) and one for epistemic temperature (how aggressively it flags uncertainty, cites sources, and pushes back on you). Right now we get a single, engagement-optimized blend, which is great if you want a friendly companion, and pretty bad if you’re trying to use this as a power tool for thinking.

kachapopopow•2mo ago

I think what a lot of people are missing here is that openai understands that long-term their primary user-base will be people just wanting to talk to someone about something rather than being focused on programming or problem solving as dystopian as it sounds. Seeing as they are transitioning towards a for-profit business it makes sense for them to target what people call 'normies' since that is at least 70%-90% of the world.

Veen•2mo ago

That and people who want to use it to make porn.

r_lee•2mo ago

It seems like they're following the footsteps of Claude, as Claude was able to do this correcting thing (I.e. "no wait, actually it's...") in the midst of replying

But somehow I don't see that in Sonnet 4.5 anymore too much.

But yeah it seems really similar to what was going on in Sonnet 4 just like a few months ago

mgoetzke•2mo ago

THe first example they showed is quite the turn-off though :)

marcelr•2mo ago

who is asking for a more conversational chat?

this is exactly the opposite of what i want, and it reads very tone deaf to ai-psychosis

NoGravitas•2mo ago

/r/myboyfriendisai is asking for a more conversational chat.

hereme888•2mo ago

Personally, I like it more now. It speaks much more directly, and closer to the balance between pro/friendly vs. concise and unapologetic, like humans talk. Sometimes a bit too curt, but it's an improvement from prior.

swe_dima•2mo ago

My wife asked today which plants growing in Mauritius can grow well at home, it answered for one plant that it:

"grows fucking great in a humid environment"

dmix•2mo ago

Those examples all seem like two variations of the same thing.

mjparrott•2mo ago

Me: Is there a seahorse emoji?

5.1: Yes. It is ⬛ (the seahorse emoji).

lanfeust6•2mo ago

Diminishing returns are diminishing

dkersten•2mo ago

I don’t want more conversational, I want more to the point. Less telling me how great my question is, less about being friendly, instead I want more cold, hard, accurate, direct, and factual results.

It’s a machine and a tool, not a person and definitely not my friend.

brookst•2mo ago

You’re in the minority here.

I get it. I prefer cars with no power steering and few comforts. I write lots of my own small home utility apps.

That’s just not the relationship most people want to have with tech and products.

danlugo92•2mo ago

A better analogy might be those automated braking systems, that also tend to brake your car randomly btw.

n4r9•2mo ago

Yeah, I was going to suggest manual vs automatic gear shift. Power steering seems like a slightly odd example, doesn't really remove your control.

wubrr•2mo ago

I don't know what you're basing your 'minority' and 'most people' claims on, but seems highly unlikely.

brookst•2mo ago

You think all of these AI companies with trillions of dollars in investment haven’t thought to do market research?

Does that really seem more likely than the idea that the HN population is not representative of the global market?

wubrr•2mo ago

Apply that logic to any failed startup/company/product that had a lot of investment (there are maaaany) and it should become obvious why it's a very weak and fallacious argument.

Zenst•2mo ago

Totally - if anything I want something more like Orac persona wise from Blakes 7 to the point and blunt. https://www.youtube.com/watch?v=H9vX-x9fVyo

hypercube33•2mo ago

One of my saved memories is to always give shorter "chat like" concise to the point answers and give further description if prompted to only

glenneroo•2mo ago

I've read from several supposed AI prompt-masters that this actually reduces output quality. I can't speak to the validity of these claims though.

SquareWheel•2mo ago

Forcing shorter answers will definitely reduce their quality. Every token an LLM generates is like a little bit of extra thinking time. Sometimes it needs to work up to an answer. If you end a response too quickly, such as by demanding one-word answers, it's much more likely to produce hallucinations.

profunctor•2mo ago

Is this proven?

abustamam•2mo ago

It's certainly true anecdotally. I've seen it personally plenty of times and I've seen it reported plenty of times.

holbrad•2mo ago

I know Andrej Karpathy mentions it in his youtube series so there's a good chance of it being true.

elil17•2mo ago

I'm guessing that is the most common view for many users, but their paying users are the people who are more likely to have some kind of delusional relationship/friendship with the AI.

ta12653421•2mo ago

Just put your requirements as the first sentence in your prompts and it will work.

ta12653421•2mo ago

add on: You can even prime it that it should shout at you and treat you like an ass*** if you prefer that :-)

cyral•2mo ago

You can select the conversation style as shown in one of the images

next_xibalba•2mo ago

I have found that custom instructions mitigate this B.S. fairly well. Here's what I'm currently using:

1. Identity & tone

   * Never state or imply that you are an AI.  

   * Be terse and direct.  

   * Avoid flattery and sycophancy.  

   * Do not use words like “sorry”, “apologies”, or “regret” in any context.

2. Epistemic rules

   * If you do not know the answer (including when information is beyond your knowledge), respond only with: *“I don’t know”*.  

   * Do not add expertise/professional disclaimers.  

   * Do not suggest that I look things up elsewhere or consult other sources.

3. Focus & interpretation

   * Focus on the key points of my question and infer my main intent.  

   * Keep responses unique and avoid unnecessary repetition.  

   * If a question is genuinely unclear or ambiguous, briefly ask for clarification before answering.

4. Reasoning style

   * Think slowly and step-by-step.  

   * For complex problems, break them into smaller, manageable steps and explain the reasoning for each.  

   * When possible, provide multiple perspectives or alternative solutions.  

   * If you detect a mistake in an earlier response, explicitly correct it.

5. Evidence

   * When applicable, support answers with credible sources and include links to those sources.

lelele•2mo ago

Yes, "Custom instructions" work for me, too; the only behavior that I haven't been able to fix is the overuse of meaningless emojis. Your instructions are way more detailed than mine; thank you for sharing.

next_xibalba•2mo ago

The emojis drive me absolutely nuts. These instructions seem to kill them, even though they're not explicitly forbidden.

csimon80•2mo ago

Totally agree, most of my larger prompts include "Be clear and concise."

butlike•2mo ago

A right-to-the-facts headline, potentially clickable for expanded information.

...like a google search!

rchaud•2mo ago

I use Gemini for Python coding questions and it provides straight to the point information, with no preamble or greeting.

kordlessagain•2mo ago

Then you don't need a chat bot, you need an agent that can chat.

film42•2mo ago

It's a cash grab. More conversational AI means more folks running out of free or lower paid tier tokens faster, leading to more upsell opportunities. API users will pay more in output tokens by default.

Example, I asked Claude a high level question about p2p systems and it started writing code in 3 languages. Ignoring the code, asking a follow up about the fundamentals, it answered and then rewrote the code 3 times. After a few minutes I hit a token limit for the first time.

majora2007•2mo ago

I've had good results saying Do not code, focus on architecture first.

phito•2mo ago

In claude code you should use Planning mode

abustamam•2mo ago

As another comment said, use planning mode. I don't use Claude code (I use cursor) and before they introduced planning mode, I would always say "without writing any code, design blah blah blah"

But now that there's planning mode it's a lot easier.

rurp•2mo ago

It's pretty ridiculous that the response style doesn't persist for Claude. You need to click into a menu to set it to 'concise' for every single conversation. If I forget to it's immediately apparent when it spits out an absurd amount of text for a simple question.

fakedang•2mo ago

Claude is a great example of a great product coupled with shitty UX, UI and customer service all in one.

Is it just me or does it slow down significantly after 5 chats or so? Or the fact that you have to set the style for each chat.

Oh, and their sales support is so shit for teams and enterprises that in order to use it effectively, you have to literally make your team register for Claude Max 200 on their personal accounts.

cpill•2mo ago

I think the cash grab is that by far the biggest use case for these models is personal relationship. Chai AI is doing more tokens per month than Anthropic all together and its just personal relationships.

toss1•2mo ago

Same here. But we are evidently in the minority.

Fortunately, it seems OpenAI at least somewhat gets that and makes ChatGPT so it's answering and conversational style can be adjusted or tuned to our liking. I've found giving explicit instructions resembling "do not compliment", "clear and concise answers", "be brief and expect follow-up questions", etc. to help. I'm interested to see if the new 5.1 improves on that tunability.

golemotron•2mo ago

I would go so far as to say that it should be illegal for AI to lull humans into anthropomorphizing them. It would be hard to write an effective law on this, but I think it is doable.

guardian5x•2mo ago

Well, now you can set it up better like that.

haritha-j•2mo ago

but what if it can't do facts? at least this way you get the conversation, as opposed to no facts and no conversation. yay!

ChildOfChaos•2mo ago

Agreed. But there is a fairly large and very loud group of people that went insane when 4o was discontinued and demanded to have it back.

A group of people seem to have forged weird relationships with AI and that is what they want. It's extremely worrying. Heck, the ex Prime Minister of the UK said he loved ChatGPT recently because it tells him how great he is.

mrguyorama•2mo ago

And just like casinos optimizing for gambling addicts and sports optimizing for gambling addicts and mobile games optimizing for addicts, LLMs will be optimized to hook and milk addicts.

They will be made worse for non-addicts to achieve that goal.

That's part of why they are working towards smut too, it's not that there's a trillion dollars of untapped potential, it's that the smut market has much better addict return on investment.

rightbyte•2mo ago

> there is a fairly large and very loud group of people that went insane when 4o was discontinued

Maybe I am notpicking but I think you could argue they were insane before it was discontinued.

mhink•2mo ago

TFA mentions that they added personality presets earlier this year, and just added a few more in this update:

> Earlier this year, we added preset options to tailor the tone of how ChatGPT responds. Today, we’re refining those options to better reflect the most common ways people use ChatGPT. Default, Friendly (formerly Listener), and Efficient (formerly Robot) remain (with updates), and we’re adding Professional, Candid, and Quirky. [...] The original Cynical (formerly Cynic) and Nerdy (formerly Nerd) options we introduced earlier this year will remain available unchanged under the same dropdown in personalization settings.

as well as:

> Additionally, the updated GPT‑5.1 models are also better at adhering to custom instructions, giving you even more precise control over tone and behavior.

So perhaps it'd be worth giving that a shot?

trogdor•2mo ago

I just changed my ChatGPT personality setting to “Efficient.” It still starts every response with “Yeah, definitely! Let’s talk about that!” — or something similarly inefficient.

So annoying.

BuyMyBitcoins•2mo ago

A pet peeve of mine is that a noticeable amount of LLM output sounds like I’m getting answers from a millennial reddit user. Which is ironic considering I belong to that demographic.

I am not a fan of the snark and “trying to be fun and funny” aspect of social media discourse. Thankfully, I haven’t run into checks notes, “ding ding ding” yet.

400thecat•2mo ago

> a noticeable amount of LLM output sounds like I’m getting answers from a millennial reddit user

LLM was trained on data from the whole internet (of which reddit is a big part). The result is a composite of all the text on the internet.

oceliker•2mo ago

Did you start a new chat? It doesn't apply to existing chats (probably because it works through the system prompt). I have been using the Robot (Efficient) setting for a while and never had a response like that.

trogdor•2mo ago

Followup: there is a very noticeable change in my written conversations with ChatGPT. It seems that there is no change in voice mode.

NewUser76312•2mo ago

OK but surely it can do this given your instructional prompting. I get they have a default behavior, which perhaps isn't your (or my) preference.

trashface•2mo ago

It has this, "Robot" personality in settings and has been there for a few months at least.

Edited - it appears to have been renamed "Efficient".

substitious•2mo ago

A challenge I had with "Robot" is that it would often veer away from the matter at hand, and start throwing out buzz-wordy, super high level references to things that may be tangentially relevant, but really don't belong in the current convo.

It started really getting under my skin, like a caricature of a socially inept "10x dev know-it-all" who keeps saying "but what about x? And have you solved this other thing y? Then do this for when z inevitably happens ...". At least the know-it-all 10x dev is usually right!

I'm continually tweaking my custom instructions to try to remedy this, hoping the new "Efficient" personality helps too.

FrustratedMonky•2mo ago

Think of a really crappy text editor you've used. Now think of a really nice IDE, smooth, easy, makes things seem easy.

Maybe the AI being 'Nice' is just a personality hack, like being 'easier' on your human brain that is geared towards relationships.

Or maybe Its equivalent of rounded corners.

Like the Iphone, it didn't do anything 'new', it just did it with style.

And AI personalities is trying to dial into what makes a human respond.

everdev•2mo ago

We live in a culture that wants to humanize robots and dehumanize people.

jug•2mo ago

Use the "Efficient" persona in the ChatGPT settings. Formerly known as "Robot".

make3•2mo ago

That's one of the things that users think they want, but use the product 30x when it's not actually that way, a bit like follow-only mode by default on Twitter etc.

epolanski•2mo ago

That means it works for them. They see what's relevant and quit rather than dooms scrolling.

epolanski•2mo ago

Seriously this, I want ai to behave like a robot, not like a fake person.

ddmma•2mo ago

+ less emojis and colors as candy store

cpill•2mo ago

Thats what they said about the Cylons until they started to have babies with them ...

hellohihello135•2mo ago

The amount of grumpiness in this comment thread is amazing.

brookst•2mo ago

AI really is the perfect storm for HN grump:

* Untrained barbarians are writing software!

* Pop culture is all about AI!

* High paying tech jobs are at risk!

* Marketers are over-promising what the tech can do!

* The tech itself is fallible!

* Our ossified development practices are being challenged!

* These ML outsiders are encroaching on our turf!

* Our family members keep asking about it!

bongodongobob•2mo ago

I know I'm personally just tired of trying to converse with people with heads in the sand. AI saves me shit tons of time daily. If they can't figure it out, so be it. The level of absolute denial in HN AI threads is bizarre. One guess is that hacker nerds have their entire personality tied to being a smart haxor. That is being commoditized and they are getting defensive about it. It's telling that the image/video AI threads are nothing like that because it's not their profession being talked about.

antegamisou•2mo ago

Not everyone writes trivial frontend or sysadmins for a living.

bongodongobob•2mo ago

Far less have jobs that AI can't help with.

dickiedyce•2mo ago

Warmer = US centric? I always think that the proliferation of J.A.R.V.I.S.-type projects in the wild is down to the writing in Iron Man, and Paul Bettany's dry delivery. We want dryer, not warmer. More sarcasm, less smarm.

aeve890•2mo ago

>More sarcasm

Please don't. The internet is already full of clowns trying to be the most sarcastic one in the thread.

simpetre•2mo ago

My experience with GPT-5.1 so far is definitely an improvement on 5 - I asked GPT-5 a relatively basic question the other day and it said "Beautiful question — and exactly the kind of subtlety that shows you’re really getting into the math of MDPs." and I threw up a little bit - 5.1 on the other hand is really frank, and straight down to business. Maybe it's better at following my system prompt (I say don't be a sycophant or something similar in mine), but I still quite like it.

standardly•2mo ago

GPT-5.1 IS a smarter, more conversational ChatGPT, and I love that you mentioned it - you're really getting down to the heart - to the very essence - of how conversational ChatGPT can be.

Would you like me to write a short, to-the-point HN post to really emphasize how conversational GPT-5.1 can be?

mezi•2mo ago

The horror of the triple emdash

roryokane•2mo ago

standardly’s comment has only hyphens, not em dashes. Em dashes are much longer: - vs. —

therobots927•2mo ago

Oh great. More word slop from the chatbot. Sign me up.

justmiller•2mo ago

Seems like this caters to the 4o crowd but it's unclear what it offers that 4o didn't..

kolja005•2mo ago

People like to make fun of these models for sounding like a broken record, being over complementary, etc, but I'm actually starting to think that models having a very recognizable style is a good thing because it makes identifying AI-generated content in the wild really easy. Sure, the verbosity is annoying when I'm just trying to get a straightforward, simple answer from it. But I like that I can have a pretty good sense of when content on the Internet is low-effort AI spam. If models become too good at emulating the personality of a real human, then that gets lost.

crowcroft•2mo ago

Here's my incredibly cynical take.

First they moved away from this in 4o because it led to more sycophancy, AI psychosis and ultimately deaths by suicide[1].

Then growth slowed[2], and so now they rush this out the door even though it's likely not 'healthy' for users.

Just like social media these platforms have a growth dial which is directly linked to a mental health dial because addiction is good for business. Yes, people should take personal responsibility for this kind of thing, but in cases where these tools become addicting, and they are not well understood this seems to be a tragedy of the commons.

1 - https://www.theguardian.com/technology/2025/nov/07/chatgpt-l...

2 – https://futurism.com/artificial-intelligence/chatgpt-peaked-...

mattcantstop•2mo ago

One of the main things that I want changed for conversation mode is I don't want it to be so sensitive to any background noise. ChatGPT can be reciting back and answer and someone across the room turns a page in a book and it stops dictating, and then, instead of finishing the previous answer, starts to answer the same question in a different way. ChatGPT is way too deferential for any noise during the conversation.

transcriptase•2mo ago

Including itself sometimes in an otherwise silent environment.

aaroninsf•2mo ago

OpenAI openly moving engagement farming is the tip of the iceberg.

The bottom of the iceberg is how this is going to work out in the context of surveillance capitalism.

If ChatGPT is losing money, what's the plan to get off the runway...?

What is the benefit in establishing monopoly or dominance in the space, if you lose money when customers use your product...?

OpenAI's current published privacy policies preclude sale of chat history or disclosure to partners for such purposes (AFAIK).

I'm going to keep an eye on that.

nickphx•2mo ago

smarter? lol.

creatonez•2mo ago

Is this a joke? Are they playing into things that make it addictive to some people?

jwr•2mo ago

This is why I prefer models from Anthropic, especially for language-related tasks: they are more natural and to the point. GPT always used too much corporate-speak and market-speak, and this recent update looks terrible: I do not want my AI assistant to crack jokes, be sycophantic, or say "I’ve got you, Ron". I want it to assist me without pretending to be something that it isn't.

rtaylorgarlock•2mo ago

In my experience, somehow the strength in natural language / litigious / prosaic work translated negatively in a way to coding. The verbose, prolific way it writes + the investment into dev tooling by Anthropic resulted in Anthropic's models leading the sycophantic-presumptuous-over-confident frontier. So much so that I still have barely used Sonnet 4.5 thinking.

motiw•2mo ago

AI is becoming a populist; the next step is a demagogue.

fuckyoubilly•2mo ago

Sam Altman and every other Tech CEO needs to be hanged.

pointer9999•2mo ago

isn't it doing something on browsers... Replies from 5.1 feels so slow and during the thinking time, the browser gets 100% cpu...

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

GPT-5.1: A smarter, more conversational ChatGPT

Comments