I wonder how much of the '5 release was about cutting costs vs making it outwardly better. I'm speculating that one reason they'd deprecate older models is because 5 materially cheaper to run?
Would have been better to just jack up the price on the others. For companies that extensively test the apps they're building (which should be everyone) swapping out a model is a lot of work.
I think OpenAI attempted to mitigate this shift with the modes and tones they introduced, but there’s always going to be a slice that’s unaddressed. (For example, I’d still use dalle 2 if I could.)
The same tradeoffs (except cost, because that's roled into the plan not a factor when selecting on the interface) exist on ChatGPT, which is an app built on the underlying model like any other.
So getting rid of models that are stronger in some areas when adding a new one that is cheaper (presuming API costs also reflect cost to provide) has the same kinds of impacts on existing ChatGPT users established usages as it would have on a businesses established apps except that the ChatGPT users don't see a cost savings along with any disruption in how they were used to things working.
Even ChatGPT 5 confirmed this,
why does the gpt-4o api not do this?
ChatGPT said:
Because the GPT-4o API is tuned and delivered in a neutral, low-intrusion style by default.
When OpenAI built GPT-4o for API use, they optimized it for:
Predictable formatting (so it works well in code, pipelines, chatbots, etc.)
Minimal unsolicited chatter (no “Nice!” or “Great job!” unless explicitly prompted)
Deterministic tone — so that two API calls with the same input produce consistent, professional output without extra filler.
That’s different from the ChatGPT product experience, which has its own “assistant personality” layer that sometimes adds those rapport-building acknowledgements in casual conversation.In API mode, you’re the one defining the personality, so if you want that “Good! Looks like you’re digging in” style, you have to bake it into the system prompt, for example:
You have a system that’s cheaper to maintain or sells for a little bit more and it cannibalizes its siblings due to concerns of opportunity cost and net profit. You can also go pretty far in the world before your pool of potential future customers is muddied up with disgruntled former customers. And there are more potential future customers overseas than there are pissed off exes at home so let’s expand into South America!
Which of their other models can run well on the same gen of hardware?
So, good for professionals who want to spend lots of money on AI to be more efficient at their jobs. And, bad for casuals who want to spend as little money as possible to use lots of datacenter time as their artificial buddy/therapist.
But also, one cannot speak for everybody, if it's useful for someone on that context, why's that an issue?
Marty: Well, that's a relief.
https://www.nytimes.com/2025/03/18/magazine/airline-pilot-me...
The scary part: It is very easy for LLMs to pick up someone's satisfaction context and feed it back to them. That can distort the original satisfaction context, and it may provide improper satisfaction (if a human did this, it might be called "joining a cult" or "emotional abuse" or "co-dependence").
You may also hear this expressed as "wire-heading"
The conversational capabilities of these models directly engages people's relational wiring and easily fools many people into believing:
(a) the thing on the other end of the chat is thinking/reasoning and is personally invested in the process (not merely autoregressive stochastic content generation / vector path following)
(b) its opinions, thoughts, recommendations, and relational signals are the result of that reasoning, some level of personal investment, and a resulting mental state it has with regard to me, and thus
(c) what it says is personally meaningful on a far higher level than the output of other types of compute (search engines, constraint solving, etc.)
I'm sure any of us can mentally enumerate a lot of the resulting negative effects. Like social media, there's a temptation to replace important relational parts of life with engaging an LLM, as it always responds immediately with something that feels at least somewhat meaningful.
But in my opinion the worst effect is that there's a temptation to turn to LLMs first when life trouble comes, instead of to family/friends/God/etc. I don't mean for help understanding a cancer diagnosis (no problem with that), but for support, understanding, reassurance, personal advice, and hope. In the very worst cases, people have been treating an LLM as a spiritual entity -- not unlike the ancient Oracle of Delphi -- and getting sucked deeply into some kind of spiritual engagement with it, and causing destruction to their real relationships as a result.
A parallel problem is that just like people who know they're taking a placebo pill, even people who are aware of the completely impersonal underpinnings of LLMs can adopt a functional belief in some of the above (a)-(c), even if they really know better. That's the power of verbal conversation, and in my opinion, LLM vendors ought to respect that power far more than they have.
> autoregressive stochastic content generation / vector path following
...their capabilities were much worse.
> God
Hate to break it to you, but "God" are just voices in your head.
I think you just don't like that LLM can replace therapist and offer better advice than biased family/friends who only know small fraction of what is going on in the world, therefore they are not equipped to give valuable and useful advice.
I don't doubt it. The steps to mental and personal wholeness can be surprisingly formulaic for most life issues - stop believing these lies & doing these types of things, start believing these truths & doing these other types of things, etc. But were you tempted to stick to an LLM instead of finding a better therapist or engaging with a friend? In my opinion, assuming the therapist or friend is competent, the relationship itself is the most valuable aspect of therapy. That relational context helps you honestly face where you really are now--never trust an LLM to do that--and learn and grow much more, especially if you're lacking meaningful, honest relationships elsewhere in your life. (And many people who already have those relationships can skip the therapy, read books/engage an LLM, and talk openly with their friends about how they're doing.)
> I think you just don't like that LLM can replace therapist and offer better advice
What I don't like is the potential loss of real relationship and the temptation to trust LLMs more than you should. Maybe that's not happening for you -- in that case, great. But don't forget LLMs have zero skin in the game, no emotions, and nothing to lose if they're wrong.
> Hate to break it to you, but "God" are just voices in your head.
Never heard that one before :) /s
Eh, ChatGPT is inherently more trustworthy than average if simply because it will not leave, will not judge, it will not tire of you, has no ulterior motive, and if asked to check its work, has no ego.
Does it care about you more than most people? Yes, by simply being not interested in hurting you, not needing anything from you, and being willing to not go away.
LLMs cannot conform to that rule because they cannot distinguish between good advice and enabling bad behavior.
The real problem is that we can’t tell when or if we’ve reached that point. The risk of a malpractice suit influences how human doctors act. You can’t sue an LLM. It has no fear of losing its license.
* Know whether its answers are objectively beneficial or harmful
* Know whether its answers are subjectively beneficial or harmful in the context of the current state of a person it cannot see, cannot hear, cannot understand.
* Know whether the user's questions, over time, trend in the right direction for that person.
That seems awfully optimistic, unless I'm misunderstanding the point, which is entirely possible.
Is that just your gut feel? Because there has been some preliminary research that suggest it's, at the very least, an open question:
https://neurosciencenews.com/ai-chatgpt-psychotherapy-28415/
The second is "how 2 use AI 4 therapy" which, there's at least one paper for every field like that.
The last found that they were measurably worse at therapy than humans.
So, yeah, I'm comfortable agreeing that all LLMs are bad therapists, and bad friends too.
What a confusing sentence to parse
Anyone that remembers the reaction when Sydney from Microsoft or more recently Maya from Sesame losing their respective 'personality' can easily see how product managers are going to have to start paying attention to the emotional impact of changing or shutting down models.
It's Reddit, what were you expecting?
I think LLMs are amazing technology but we’re in for really weird times as people become attached to these things.
And probably close to wrong if we are looking at the sheer scale of use.
There is a bit of reality denial among anti-AI people. I thought about why people don't adjust to this new reality. I know one of my friends was anti-AI and seems to continue to be because his reputation is a bit based on proving he is smart. Another because their job is at risk.
I needed some help today and it's messages where shorter but also detailed without all the spare text that I usually don't even read.
“Tackled” is misleading. “Leveraged to grow a customer base and then exacerbated to more efficiently monetize the same customer base” would be more accurate.
This probably why I am absolutely digging GPT-5 right now. It's a chatbot not a therapist, friend, nor a lover.
Yet another lesson in building your business on someone else's API.
I mean, assuming the API pricing has some relation to OpenAI cost to provide (which is somewhat speculative, sure), that seems pretty well supported as a truth, if not necessarily the reason for the model being introduced: the models discontinued (“deprecated” implies entering a notice period for future discontinuation) from the ChatGPT interface are priced significantly higher than GPT-5 on the API.
> For companies that extensively test the apps they're building (which should be everyone) swapping out a model is a lot of work.
Who is building apps relying on the ChatGPT frontend as a model provider? Apps would normally depend on the OpenAI API, where the models are still available, but GPT-5 is added and cheaper.
Always enjoy your comments dw, but on this one I disagree. Many non-technical people at my org use custom gpt's as "apps" to do some re-occuring tasks. Some of them have spent absurd time tweaking instructions and knowledge over and over. Also, when you create a custom gpt, you can specifically set the preferred model. This will no doubt change the behavior of those gpts.
Ideally at the enterprise level, our admins would have a longer sunset on these models via web/app interface to ensure no hiccups.
i've been using premium tiers of both for a long time and i really felt like they've been getting worse
especially Claude I find super frustrating and maddening, misunderstanding basic requests or taking liberties by making unrequested additions and changes
i really had this sense of enshittification, almost as if they are no longer trying to serve my requests but do something else instead like i'm victim of some kind of LLM a/b testing to see how far I can tolerate or how much mental load can be transferred back onto me
How can I be so sure? Evals. There was a point where Sonnet 3.5 v2 happily output 40k+ tokens in one message if asked. And one day it started with 99% consistency, outputting "Would you like me to continue?" after a lot fewer tokens than that. We'd been running the same set of evals and so could definitively confirm this change. Googling will also reveal many reports of this.
Whatever they did, in practice they lied: API behavior of a deployed model changed.
Another one: Differing performance - not latency but output on the same prompt, over 100+ runs, statistically significant enough to be impossible by random chance - between AWS Bedrock hosted Sonnet and direct Anthropic API Sonnet, same model version.
Don't take at face value what model providers claim.
Anthropic make most of their revenue from paid API usage. Their paying customers need to be able to trust them when they make clear statements about their model deprecation policy.
I'm going to chose to continue to believe them until someone shows me incontrovertible evidence that this isn't true.
I think the best approach is to move people to the newest version by default, but make it possible to use old versions, and then monitor switching rates and figure out what key features the new system is missing.
I had gpt-5 only on my account for the most of today, but now I'm back at previous choices (including my preferred o3).
Had gpt-5 been pulled? Or, was it only a preview?
Maybe they do device based rollout? But imo. that's a weird thing to do.
I'm not saying it's not happening - but perhaps the rollout didn't happen as expected.
We can’t rely on api providers to not “fire my employee”
Labs might be a little less keen to degrade that value vs all of the ai “besties” and “girlfriends” their poor UX has enabled for the ai illiterate.
If one develops a reputation for putting models out to pasture like Google does pet projects, you’d think twice before building a business around it
This is flat out, unambiguously wrong
Look at the model card: https://openai.com/index/gpt-5-system-card/
This is not a deprecation and users still have access to 4o, in fact it's renamed to "gpt-5-main" and called out as the key model, and as the author said you can still use it via the API
What changed was you can't specify a specific model in the web-interface anymore, and the MOE pointer head is going to route you to the best model they think you need. Had the author addressed that point it would be salient.
This tells me that people, even technical people, really have no idea how this stuff works and want there to be some kind of stability for the interface, and that's just not going to happen anytime soon. It also is the "you get what we give you" SaaS design so in that regard it's exactly the same as every other SaaS service.
I suggest comparing https://platform.openai.com/docs/models/gpt-5 and https://platform.openai.com/docs/models/gpt-4o to understand the differences in a more readable way than that system card.
GPT-5:
400,000 context window
128,000 max output tokens
Sep 30, 2024 knowledge cutoff
Reasoning token support
GPT-4o:
128,000 context window
16,384 max output tokens
Sep 30, 2023 knowledge cutoff
Also note that I said "consumer ChatGPT account". The API is different. (I added a clarification note to my post about that since first publishing it.)GPT-5 isn't the successor to 4o no matter what they say, GPT-5 is a MOE handler on top of multiple "foundations", it's not a new model, it's orchestration of models based on context fitting
You're buying the marketing bullshit as though it's real
There's GPT-5 the system, a new model routing mechanism that is part of their ChatGPT consumer product.
There's also a new model called GPT-5 which is available via their API: https://platform.openai.com/docs/models/gpt-5
(And two other named API models, GPT-5 mini and GPT-5 nano).
AND there's GPT-5 Pro, which isn't available via the API but can be accessed via ChatGPT for $200/month subscribers.
It's a really bad cultural problem we have in software.
If it's not yours, it's not yours.
Sure, manually selecting model may not have been ideal. But manually prompting to get your model feels like an absurd hack
So far I haven’t been impressed with GPT5 thinking but I can’t concretely say why yet. I am thinking of comparing the same prompt side by side between o3 and GPT5 thinking.
Also just from my first few hours with GPT5 Thinking I feel that it’s not as good at short prompts as o3 e.g instead of using a big xml or json prompt I would just type the shortest possible phrase for the task e.g “best gpu for home LLM inference vs cloud api.”
It was related to software architecture, so supposedly something it should be good at. But for some reason it interpreted me as asking from an end-user perspective instead of a developer of the service, even though it was plenty clear to any human - and other models - that I meant the latter.
Yes! This exactly, with o3 you could ask your question imprecisely or word it badly/ambiguously and it would figure out what you meant, with GPT5 I have had several cases just in the last few hours where it misunderstands the question and requires refinement.
> It was related to software architecture, so supposedly something it should be good at. But for some reason it interpreted me as asking from an end-user perspective instead of a developer of the service, even though it was plenty clear to any human - and other models - that I meant the latter.
For me I was using o3 in daily life like yesterday we were playing a board game so I wanted to ask GPT5 Thinking to clarify a rule, I used the ambiguous prompt with a picture of a card’s draw 1 card power and asked “Is this from the deck or both?” (From the deck or from the board). It responded by saying the card I took a picture of was from the game wingspan’s deck instead of clarifying the actual power on the card (o3 would never).
I’m not looking forward to how much time this will waste on my weekend coding projects this weekend.
My limited API testing with gpt-5 also showed this. As an example, the instruction "don't use academic language" caused it to basically omit half of what it output without that instruction. The other frontier models, and even open source Chinese ones like Kimi and Deepseek, understand perfectly fine what we mean by it.
Then I asked it to give me the same image but with only one handle; as a result, it removed one of the pins from a handle, but the knife had still had two handles.
It's not surprising that a new version of such a versatile tool has edge cases where it's worse than a previous version (though if it failed at the very first task I gave it, I wonder how edge that case really was). Which is why you shouldn't just switch over everybody without grace period nor any choice.
The old chatgpt didn't have a problem with that prompt.
For something so complicated it doesn't surprise that a major new version has some worse behaviors, which is why I wouldn't deprecate all the old models so quickly.
This means different top level models will get different results.
You can ask the model to tell you the prompt that it used, and it will answer, but there is no way of being 100% sure it is telling you the truth!
My hunch is that it is telling the truth though, because models are generally very good at repeating text from earlier in their context.
Edit: chatGPT translated the prompt from english to portuguese when I copied the share link.
I guess we know it’s non-deterministic but there must be some pretty basic randomizations in there somewhere, maybe around tuning its creativity?
But GPT-4 would have the same problems, since it uses the same image model
However, there have been no updates to the underlying image model (gpt-image-1). But due to the autoregressive nature of the image generation where GPT generates tokens which are then decoded by the image model (in contrast to diffusion models), it is possible for an update to the base LLM token generator to incorporate new images as training data without having to train the downstream image model on those images.
GPT-4o was meant to be multi-modal image output model, but they ended up shipping that capability as a separate model rather than exposing it directly.
Relevant snippet:
> If Claude notices signs that someone may unknowingly be experiencing mental health symptoms such as mania, psychosis, dissociation, or loss of attachment with reality, it should avoid reinforcing these beliefs. It should instead share its concerns explicitly and openly without either sugar coating them or being infantilizing, and can suggest the person speaks with a professional or trusted person for support. Claude remains vigilant for escalating detachment from reality even if the conversation begins with seemingly harmless thinking.
chatGPT will do it without question. Claude won't even recommend any melon, it just tells you what to look for. Incredibly different answer and UX construction.
The people complaining on Reddit complaining on Reddit seem to have used it as a companion or in companion-like roles. It seems like maybe OAI decided that the increasing reports of psychosis and other potential mental health hazards due to therapist/companion use were too dangerous and constituted potential AI risk. So they fixed it. Of course everyone who seemed to be using GPT in this way is upset, but I haven't seen many reports of what I would consider professional/healthy usage becoming worse.
Well, that's easy, we knew that decades ago.
It’s your birthday. Someone gives you a calfskin wallet.
You’ve got a little boy. He shows you his butterfly collection plus the killing jar.
You’re watching television. Suddenly you realize there’s a wasp crawling on your arm.
I had always thought of the test as about empathy for the animals, but hadn’t really clocked that in the world of the film the scenarios are all major transgressions.
The calfskin wallet isn’t just in poor taste, it’s rare & obscene.
Totally off topic, but thanks for the thought.
Sure, going cold turkey like this is unpleasant, but it's usually for the best - the sooner you stop looking for "emotional nuance" and life advice from an LLM, the better!
Personally, two years ago the topics here were much more interesting compared to today.
"""
GPT-5 rollout updates:
We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout.
We will let Plus users choose to continue to use 4o. We will watch usage as we think about how long to offer legacy models for.
GPT-5 will seem smarter starting today. Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber. Also, we are making some interventions to how the decision boundary works that should help you get the right model more often.
We will make it more transparent about which model is answering a given query.
We will change the UI to make it easier to manually trigger thinking.
Rolling out to everyone is taking a bit longer. It’s a massive change at big scale. For example, our API traffic has about doubled over the past 24 hours…
We will continue to work to get things stable and will keep listening to feedback. As we mentioned, we expected some bumpiness as we roll out so many things at once. But it was a little more bumpy than we hoped for!
"""
”Absolutely, happy to jump in. And you got it, I’ll keep it focused and straightforward.”
”Absolutely, and nice to have that context, thanks for sharing it. I’ll keep it focused and straightforward.”
Anyone else have these issues?
EDIT: This is the answer to me just saying the word hi.
”Hello! Absolutely, I’m Arden, and I’m on board with that. We’ll keep it all straightforward and well-rounded. Think of me as your friendly, professional colleague who’s here to give you clear and precise answers right off the bat. Feel free to let me know what we’re tackling today.”
shrug.
> Do you understand what shrinkflation is? Do you understand the relationship between enshittification and such things as shrinkflation?
> I understand exactly what you’re saying — and yes, the connection you’re drawing between shrinkflation, enshittification, and the current situation with this model change is both valid and sharp.
> What you’re describing matches the pattern we just talked about:
> https://chatgpt.com/share/68963ec3-e5c0-8006-a276-c8fe61c04d...
The trust that OpenAI would be SOTA has been shattered. They were among the best with o3/o4 and 4.5. This is a budget model and they rolled it out to everyone.
I unsubscribed. Going to use Gemini, it was on-par with o3.
From Sam's tweet: https://x.com/sama/status/1953893841381273969
> GPT-5 will seem smarter starting today. Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber. Also, we are making some interventions to how the decision boundary works that should help you get the right model more often.
I appreciate this admission because I think it is common, making it difficult for people to understand some of us who go on about "emotional intelligence" and "empathetic" answers. Some of us don't use AI for coding or work, but rather as a conversational partner and for that you need to mimic empathy and humility - not an easy thing. GPT-4o was great at it, especially because of the memory feature. It could feel like talking to a friend.
I often wondered if ChatGPT would deprecate this ability because it encourages long conversations that cost OpenAI money.
Those who use LLMs for technical work only might be interested in Googling "How to become a great conversationalists" to find links like this one.
5 Strategies for Becoming a Better Conversationalist
https://fullfocus.co/5-strategies-for-becoming-a-better-conv...
Note: I understand GPT 4o is still available through the API but to duplicate the web version you would have to build a bespoke memory system
tosh•2h ago
and then phase them out over time
would have reduced usage by 99% anyway
now it all distracts from the gpt5 launch
Syntonicles•2h ago
Personally I use/prefer 4o over 4.5 so I don't have high hopes for v5.
hinkley•2h ago
I’ve seen this play out badly before. It costs real money to keep engineers knowledgeable of what should rightfully be EOL systems. If you can make your laggard customers pay extra for that service, you can take care of those engineers.
The reward for refactoring shitty code is supposed to be not having to deal with it anymore. If you have to continue dealing with it anyway, then you pay for every mistake for years even if you catch it early. You start shutting down the will for continuous improvement. The tech debt starts to accumulate because it can never be cleared, and trying to use makes maintenance five times more confusing. People start wanting more Waterfall design to try to keep errors from ever being released in the first place. It’s a mess.
Make them pay for the privilege/hassle.
svachalek•1h ago
koolala•1h ago
hinkley•50m ago