Sam Altman is in damage-control mode after latest ChatGPT release

https://www.cnn.com/2025/08/14/business/chatgpt-rollout-problems

67•reconnecting•5mo ago

Comments

blibble•5mo ago

he did well, he managed to convince the entire world that LLMs are "intelligent" for over 3 years

but no con can endure forever

dismalaf•5mo ago

I think he truly believed that scaling LLMs would lead to AGI, so he over-promised assuming the technology would catch up to the claims.

The wall is very obvious now though.

dinkblam•5mo ago

> I think he truly believed that scaling LLMs would lead to AGI.

no one in the industry could have believed that

stephc_int13•5mo ago

It is easy to say retrospectively.

I am not in the industry but I've been following closely and I am usually skeptical, but while I erred on the side of "this is just a tool" I also wondered "what if?" more than once.

Herring•5mo ago

I'm in the industry. It's still on track.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

nwienert•5mo ago

Those tasks they are measuring are extremely well-defined problems that are extremely well-known. They don't really represent general programming as it is practiced day to day. "Find a fact on the web", "Train a classifier" these are trivial things given the answers are all over the place on Github, etc.

So they're getting exponentially better are doing some easy fraction of programming work. But this would be like self-driving cars getting exponentially better at driving on very safe, easy roads, with absolutely no measurement towards something like chaotic streets, or rural back-roads, or edge cases like a semi swerving or weird reflections.

nialse•5mo ago

GPT-5 sounds like it knows a lot but the level of trust in ChatGPT is quickly eroding. Examples:

* Meeting notes which was read accurately from a handwritten note (impressive!) but the summary hallucinated information that was completely made up. * Running omplex pytorch benchmarks while getting the simple parts of it completely wrong. We're talking getting variants of y=f(wx+b), which is what was being compared. All the graphs and visualizations look very convincing, but the details of what's tested completely bonkers.

Is there a petition to bring o3 back? Please? At least it was obvious when it failed.

mrcwinn•5mo ago

o3 is available for certain paid plans. I see it in my legacy model dropdown and had to use it last night because 5-Pro chews on an Excel for 20 minutes and then never finishes but also never times out.

Womp womp. Frustrating.

jonplackett•5mo ago

It’s actually quite clever: Release an ‘update’ to everyone - even free users.

Then ‘roll back’ to the real version - but only for paid users.

Imagine how much worse it’d have gone if they called it GPT-4o lite and gave that to free users only and kept 4o for paid only.

Maybe it will make more people subscribe?

But it will make people cancel their subs too - I miss o3

rbinv•5mo ago

o3 can be re-enabled in the settings ("Show additional models") if you're a paid (Plus) user.

CodesInChaos•5mo ago

I disagree. Appearing incompetent is much worse for OpenAI than appearing greedy. Their primary business is selling AGI hype to investors, not selling the current product to consumers.

hattmall•5mo ago

I think the idea is more or less that this has changed. Like how much more hype can you sell when you are approaching 500B valuation, 1 billion users and a huge burn rate. Uber is the classic example of the startup with huge burn and high valuation and it's peanuts compared to the AI arena. But uBer disrupted a huge market with a very clear value proposition and monetization strategy.

luke-stanley•5mo ago

"can't even label a map" - Image generation is not using GPT-5, they said this in the live stream. CNN should try harder.

duskwuff•5mo ago

Users don't care about the technical details of what model is being used for what type of output. What matters to them is that they asked ChatGPT to draw a map, and it spat out nonsense.

The same issue exists with a bunch of other types of image output from ChatGPT - graphs, schematics, organizational charts, etc. It's been getting better at generating images which look like the type of image you requested, but the accuracy of the contents hasn't kept up.

CamperBob2•5mo ago

Users don't care about the technical details of what model is being used for what type of output.

Reporters should. Or else they're not doing their jobs.

luke-stanley•5mo ago

"The custom is always right", maybe so, but CNN has a duty to fact check the accuracy of their central claims.

ChatGPT's image generation was not introduced as part of the GPT-5 model release (except SVG generation).

The article leads with "The latest ChatGPT [...] can’t even label a map".

Yes, ChatGPT's image gen has uncanny valley issues, but OpenAI's GPT-5 product release post says nothing about image generation, it only mentions analysis [1].

As far as I can tell, GPT-Image-1 [2], which was released around March, is what powers image generation used by ChatGPT, which they introduced as "4o Image Generation" [3], which suggests to me that GPT-Image-1 is a version of the old GPT-4o.

The GPT-5 System card also only mentions image analysis, not generation. [4]

In the OpenAI live stream they said as much. CNN could have checked and made it clear the features are from the much earlier release, but instead they lead with a misleading headline.

It's very true that OpenAI doesn't make it obvious how the image generation works though.

[1] https://openai.com/index/introducing-gpt-5/

[2] https://platform.openai.com/docs/models/gpt-image-1

[3] https://openai.com/index/introducing-4o-image-generation/

[4] https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb...

duskwuff•5mo ago

Even if the image generation isn't handled by GPT-5 itself, GPT-5 is still in the driver's seat. It's responsible for the choice to generate an image, and for writing the prompt which drives the image model.

As an aside, ChatGPT has always been "overconfident" in the capabilities of its associated image model. It'll frequently offer to generate images which exceed its ability to execute, or which would need to be based on information which it doesn't know. Perhaps OpenAI developers need to place more emphasis on knowing when to refuse unrealistic image generation requests?

luke-stanley•5mo ago

Yeah, OpenAI do have harmfulness classifiers in ChatGPT that can detect problems in their own responses before it finishes, and learning how confident GPT-5 or the image generator tool call should be about meeting the brief, after it finishes describing the visual concept, might be a task OpenAI could train a classifier on. But reliably predicting ahead of time can be a really hard problem to solve; knowing how successful a complicated tool will be, before it actually starts or finishes the task attempt is tricky.

After gpt-image-1 has produced an image is another helpful intervention point, it can do a better self-review for detecting problems after image generation, but it's still not very thorough. However OpenAI has small teams, they try to keep them small and really focused, and everything is always changing really fast, they probably have gpt-image-2 or something else soon anyway.

In a way, reliable prediction is the main job OpenAI has to solve, and always has been. Some researches say the main way models are trained causes "Entangled Representations", which makes them unreliable. They also suffer from the "Reverse Curse". Maybe when they fix these issues, it might be real AGI and ASI all in one go?

yahoozoo•5mo ago

Eh, I don’t know. I spent some time over 3 days trying to get Claude Code to write a pagination plugin for ProseMirror. Had a few different branches with different implementations and none of them worked well at all and one or two of them were really over-engineered. I asked GPT-5, via the Chat UI (I don’t pay for OpenAI products), and it basically one-shot a working plugin, not only that, the code was small and comprehensible, too.

CamperBob2•5mo ago

GPT-5 at its strongest is as good as any model we've seen from any provider. However, while these models aren't parrots, they are most definitely stochastic. All it takes is for a few influencers and journalists to experience a few conspicuous failures, and you get articles like this one and a growing (if unjustified) perception that the GPT-5 launch is a "flop."

I think their principal mistake was in conflating the introduction of GPT-5 with the model-selection heuristics they started using at the same time. Whatever empirical hacks they came up with to determine how much thinking should be applied to a given prompt are not working well. Then there's the immediate-but-not-really deprecation of the other models. It should have been very clear that the image-based tests that the CNN reporter referred to were not running on GPT-5 at all. But it wasn't, and that's a big marketing communications failure on OpenAI's part.

One of several, for anyone who sat through their presentation.

throwaway31131•5mo ago

I wonder if GPT-5 benefited from what you learned about how-to-prompt for this problem while prompting Claude for 3 days?

I done a couple of experiments now and I can get an LLM to make not horrible and mostly functional code with effort. (I’ve been trying to create algorithms from CS papers that don’t link to code) I’ve observed once you discover the magic words the LLM wants and give sufficient background in the history, it can do ok.

But, for me anyway, the process of uncovering the magic words is slower than just writing the code myself. Although that could be because I’m targeting toy examples that aren’t very large code bases and aren’t what is in the typical internet coding demo.

belter•5mo ago

"The AI boom and dot-com bubble: Comparisons and insights across 25 years" - https://english.ckgsb.edu.cn/knowledge/article/dot-com-to-de...

stephc_int13•5mo ago

Aside from the gossips about Sam Altman hubris, the failed launch of GPT5 is likely a significant event and potentially a turning point regarding the hype and expectations around LLMs based tech.

The limitations of what was believed to be by many as a path to AGI/ASI are becoming more clearly apparent.

Difficult to say how much room for improvement there is, or to have a definite answer regarding the usefulness and economic impact of those models, but what we're seeing now is not exponential improvement.

This is not going to rewrite and improve itself, or to cure cancer, unify physics or any kind of scientific or technological breakthrough.

For coders is is merely a dispensable QoL improvement.

lurking_swe•5mo ago

“no scientific breakthrough”

careful. I too am pessimistic on the generative AI hype, but you seem even more so, to the point where it’s making you biased and possibly uninformed.

Today’s news from BBC, 6 hours ago. “AI designs antibiotics for gonorrhoea and MRSA superbugs”

https://www.bbc.com/news/articles/cgr94xxye2lo

> Now, the MIT team have gone one step further by using *generative AI* to design antibiotics in the first place for the sexually transmitted infection gonorrhoea and for potentially-deadly MRSA (methicillin-resistant Staphylococcus aureus).

…

> "We're excited because we show that generative AI can be used to design completely new antibiotics," Prof James Collins, from MIT, tells the BBC.

MerrimanInd•5mo ago

I don't think that BBC article is technically detailed enough to make that case. The actual study may be, so I'm not saying you're wrong, but "generative AI" is far too large of an umbrella term that encompasses two very different views. The common thesis here is that AI is a wildly powerful tool to do fundamental science, and I think most technologists with any knowledge of neural networks would agree that's true. But the problem is that the Sam Altmans of the world are applying that thesis to the promise that GPT5 is on the path to AGI and they just need to keep scaling and pouring more billions of dollars into these massive models to get there. When I see actually interesting applications of AI in fundamental science the studies are usually of custom programs starting with smaller or more purpose-built foundational models, being hand tuned by knowledgeable researchers with deterministic/testable validation feedback loops. So what you're saying can be true while what Altman is promising can also be absolutely false. But it's hard to say without actually reading that MIT study.

lurking_swe•5mo ago

I agree with you. I’m eager to see the details once MIT releases it.

Generative AI is a lot of things. LLM’s in particular (subset of generative AI) are somewhat useful, but nowhere near as useful as what Sam claims. And i guess LLM’s specifically - if we focus on chatgpt, will not be solving cancer lol.

So we agree that Sam is selling snake oil. :)

Just wanted to point out that a lot of the fundamental “tech” is being used for genuinely useful things!

mcswell•5mo ago

The details were released previously in the Cell paper I link to a couple posts above this. It is behind a paywall (my university allowed me access).

lurking_swe•5mo ago

thanks for the tip!

stephc_int13•5mo ago

To be clear, I was referring to LLMs, not other types of neural nets.

mcswell•5mo ago

Hold on, you're talking about something entirely different from what stephc (the person you are responding to) was talking about. He (or she) was talking about LLMs, and GPT5 in particular. The MIT article you're referring to is talking about two generative AI programs which are not LLMs. From the MIT article (https://news.mit.edu/2025/using-generative-ai-researchers-de...), of which the BBC article you reference appears to be an excerpt:

---------- One of those algorithms, known as chemically reasonable mutations (CReM), works by starting with a particular molecule containing F1 and then generating new molecules by adding, replacing, or deleting atoms and chemical groups. The second algorithm, F-VAE (fragment-based variational autoencoder), takes a chemical fragment and builds it into a complete molecule. It does so by learning patterns of how fragments are commonly modified, based on its pretraining on more than 1 million molecules from the ChEMBL database. ----------

(The technical article about the MIT work is here: https://www.cell.com/cell/abstract/S0092-8674(25)00855-4)

Both the MIT programs and GPT5 use "generative AI", but with entirely different training sets and perhaps very different training methods, architectures etc. Indeed, the AI systems used in the MIT work were described in conference papers in 2018 and 2020 (citations in the Cell paper), meaning that they preceded by quite a bit the current generations of GPT-5. In sum, the fact that the MIT model (reportedly) works well in developing antibiotics does not in any way imply that GPT-5 is a "scientific breakthrough", much less that LLMs will lead to AI that is able to "rewrite and improve itself, or to cure cancer, unify physics or any kind of scientific or technological breakthrough" (quoting the OP).

tom_m•5mo ago

AI wouldn't be responsible for any breakthroughs, a human using AI is. They are the ones prompting and guiding and in control. AI doesn't care about our cancer. It doesn't have cancer.

asdff•5mo ago

If your new update is 95% as good and saves you overhead, why the hell would you not roll that out? Enshittification is the natural movement of entropy in business. You want to cut costs. You already have a market beholden to your product. They really can't gleam a couple % different quality of your product in some metric, but you sure can operating at your mass market scale. That can make or break financials one quarter and lead to a big promotion yourself among other incentives. And maybe you try and save big, push to the point customers do notice it, but only so much that the % lost to competitors isn't much in light of the savings at scale you've achieved enshittifiying the product. Physical and technology products or services are no different with these incentives and direction of entropy.

This is the natural progression of mass market business where cost savings is valued and quality is not. If you as a customer want a higher quality product, you are left to the edges of the market of boutique, bespoke, upscale experiences which are only able to be offered at high quality because their scale is small and more manageable in all metrics and their existence against the walmarts of their industry is dependent on being at a higher quality offering.

stronglikedan•5mo ago

I was never so happy to have access to Enterprise when they dumped 5 on the normies.

jethronethro•5mo ago

Ed Zitron's going to have a field day with this!

cadamsdotcom•5mo ago

CEO hype turns out to be hype. More news at 11.

This piece's bias hurts its credibility. "It can’t even label a map" doesn't tell a story about the things these tools are useful for. And you know that something hundreds of millions of people are using has got to be pretty useful, or people wouldn't spend their money on it.

tim333•5mo ago

Yeah that seems a little unfair given it is a language model and not a map drawing one. Also I'd imagine if you went to your average human and asked them to draw a map of the US with all the states labeled from memory they'd have a few bloopers too.

msgodel•5mo ago

It sounds like most of it is just bad communication.

This would be a lot easier if they just published the weights for their models even if they did it with a delay of a couple years like Grok was supposed to. By keeping everything secret and then having some of the worst naming conventions anyone has come up with everyone just gets confused and frustrated. Combine that with the normal rug pull feeling of hosted software and the anxiety people get with SOTA AI and you have a perfect storm of upset users.

DebtDeflation•5mo ago

https://www.theverge.com/command-line-newsletter/759897/sam-...

>“When bubbles happen, smart people get overexcited about a kernel of truth...Are we in a phase where investors as a whole are overexcited about AI? My opinion is yes."

How is Sam Altman admitting an AI bubble not front page news everywhere?

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

Transcribe your aunts post cards with Gemini 3 Pro

.72% Variance Lance

ReKindle – web-based operating system designed specifically for E-ink devices

Encrypt It

NextMatch – 5-minute video speed dating to reduce ghosting

Personalizing esketamine treatment in TRD and TRBD

SpaceKit.xyz – a browser‑native VM for decentralized compute

NotebookLM: The AI that only learns from you

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

Game Boy Advance d-pad capacitor measurements

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

Transcribe your aunts post cards with Gemini 3 Pro

.72% Variance Lance

ReKindle – web-based operating system designed specifically for E-ink devices

Encrypt It

NextMatch – 5-minute video speed dating to reduce ghosting

Personalizing esketamine treatment in TRD and TRBD

SpaceKit.xyz – a browser‑native VM for decentralized compute

NotebookLM: The AI that only learns from you

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

Game Boy Advance d-pad capacitor measurements

Sam Altman is in damage-control mode after latest ChatGPT release

Comments