GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it

https://garymarcus.substack.com/p/gpt-5-overdue-overhyped-and-underwhelming

304•kgwgk•6mo ago

Comments

mikert89•6mo ago

I still think GPT5 is really a cost cutting measure, with a company that is trying to grow to 1 billion users on a product that needs GPUs.

I dont see anyone talking about GPT 5 Pro, which I personally tested against:

- Grok 4 Heavy

- Opus 4.1

It was far better than both of those, and is completely state of the art.

The real story is running these models at true performance max likely could go into the thousands per month per user. And so we are being constrained. OpenAI isnt going for that market segment, they are going for growth to take on Google.

This article doesnt have one reference to the Pro model. Completely invalidates this guys opinion

w00ds•6mo ago

Does Pro fix the fundamental issues he describes? I would think it would have to do that to "completely invalidate his opinion", rather than just be better than the base model.

p1esk•6mo ago

He didn’t describe any fundamental issues.

furyofantares•6mo ago

I don't think Pro is usable via the API, otherwise I'd be testing it. Is it usable through Codex CLI, given they updated that to be able to use your subscription?

jonny_eh•6mo ago

I don't think Codex CLI uses the Pro/Plus subscriptions, not yet at least.

quinncom•6mo ago

Codex CLI now supports GPT-5 via ChatGPT Plus/Pro/Team log-in, no API key required, with usage included in your plan.

patrickhogan1•6mo ago

I agree here but also believe it was a way to expose better models to the masses. o3 was so spectacularly good. But a lot of people were still not using it. Even some of my friends who use ChatGPT daily I would say are you using o3 and get a blank stare.

So I think it’s also a way to push reasoning models to the masses. Which increases OpenAI’s cost.

But due to the routing layer definitely cost cutting for power users (most of HN)… except power users can learn to force it to use the reasoning model.

mikert89•6mo ago

They are clearly building the product for mass adoption, not power users

atonse•6mo ago

Honestly as a daily (pro) user of ChatGPT, I didn’t even know o3 was the best, I thought 4o incorporated it (hence the o)

I remember reading that 4o was the best general purpose one, and that o3 was only good for deeper stuff like deep research.

The crappy naming never helped.

p1esk•6mo ago

Wait, you decided to pay $200 a month and you didn’t know that o3 was better than 4o?

atonse•5mo ago

Oops I pay $20/month. Not the $200 pro. With Claude, Pro is the $19/mo plan. And Max is the $100-$200/month plan. Naming confusion.

Workaccount2•6mo ago

The poor context length of O3 really cripples it.

adeptima•6mo ago

checked my network - no one is using GPT 5 Pro ...

any feedback is greatly appreciated!!! especially comparing with o3

mikert89•6mo ago

I'm surprised how few people are using it

energy123•6mo ago

Are there tight rate limits to GPT-5 Pro or is it in practice uncapped as long as you're not abusive?

Is GPT-5 better than GPT-5 Pro for any tasks?

A_D_E_P_T•6mo ago

I don't think that GPT-5 Pro is much better (if better at all) than o3-pro. It's markedly slower. Output quality is comparable. It's still quite gullible and misses the point sometimes. It does seem better, however slightly, at suggesting novel approaches to problem solving. My initial impressions are that 5-pro is maybe 0-2% more knowledgeable and 5-10% more inventive/original than o3-pro. The "tone" and character of the models feel exactly the same.

I'll agree that it's superhuman and state-of-the-art at certain tasks: Formal logic, data analysis, and basically short analytical tasks in general. It's better than any version of Grok or Gemini.

When it comes to writing prose, and generally functioning as a writing bot, it's a poor model, obviously and transparently worse than Kimi K2 and Deepseek R1. (It never ceases to amaze me that the best English prose stylists are the Chinese models. It's not just that they don't write in GPT's famous "AI style," it's to the point where Kimi is actually on par with most published poets.)

mikert89•6mo ago

I think it is, I've been using these models for 6 hours a day for almost a year. At any given time I have 2 of the max subscriptions (right now grok and openai).

I have a bug that was a complex interaction between backend and front end over websockets. The day before I was banging my head against the wall with o3 pro and grok heavy, gpt5 solved it first try.

I think its also true that most people arent pushing the limits on the models, and dont even really know how to correctly push the limits. Which is also why openai is not focussed on the best models

I_am_tiberius•6mo ago

Similar usage as me, but I don't see a difference between o3-pro and 5-pro. Sounds odd, but my impression is that o1-pro was better at creating complex independent small functions than o3-pro/5-pro.

mikert89•6mo ago

Actually will agree that o1 pro was better than o3 at really deep bug finding/coding analysis. Which is also why i have the theory that they could just turn up the compute to show better results, but dont do to cost. O3 and GPT5 seem heavily quantized, o1 pro was more raw

Another thing I’ll add though, is o3 pro is better through the api than the chat website. They clearly constrain it unless you’re paying the absurd api cost

I_am_tiberius•6mo ago

yes, I think that's right. o1-pro for sure was too expensive for them.

nojs•6mo ago

How are you using GPT5 for coding (Cursor?)

mikert89•6mo ago

I use repo prompt to build the prompt manually with the code files. Then give it to ChatGPT in the console. Then I have cursor ingrate the results.

I actually think cursor alone is not that good

energy123•6mo ago

Has anyone noticed that OpenAI is cutting off submitted context in ChatGPT Pro?

If I send about 60k tokens, the model can't see the question (at the bottom of the text). I need to reduce it to 50k.

If I send two prompts with 40k tokens, the model can't see the beginning of the first prompt.

This seems quite unethical given they advertise 128k context, and I doubt it's an accident (since it runs in the direction of cost savings).

happycube•6mo ago

Maybe it's the exposure to Chinese? I've heard that training models on code first helps, so I could see it.

I've also heard hearsay that R1 is quite clever in Chinese, too.

vintagedave•6mo ago

> Kimi is actually on par with most published poets

Could you provide some examples, please? I find this really exciting. I’ve never yet encountered an AI with good literary writing style.

And poetry is really hard, even for humans.

DrewADesign•6mo ago

To each their own, but I find the idea of ai-generated poetry sad as hell. I simply can’t see poetry as a collection of evocative words judged without context, in a vacuum— is poetry not both an activity and a relationship to most people? A person deftly portraying some difficult-to-express facet of the human experience and just maybe it viscerally strikes a chord with other people? I just don’t understand how people don’t value the fundamental humanity of that process. Even prose. James Baldwin stories, word for word, would land a hell of a lot differently if they were written and published by Hemingway.

vintagedave•6mo ago

I 100% agree. I am inclined to think an AI may be able to develop a sense of what words carry in future -- they can analyse it -- but it still lacks real experience.

Plus their creative output in literary quality is dreary, dull, and dire. That's why I was so curious for the OP to share examples.

orbital-decay•6mo ago

Stylized prose is where Claude 3 Opus particularly shines due to its character training and multilingual performance. It's plagued with claudeisms and has a ton of other shortcomings, but it's still better than any current model at this, including K2, R1, and especially Claude 4. Too bad Anthropic basically reversed their direction on creative writing, despite reporting some improvements each time (which don't seem to be true in practice).

awesome_dude•6mo ago

One of the things that I have realised is, at this moment in time, it's absolutely a bad idea to buy a subscription for any of the models right now.

The offerings are evolving and upgrading at quite a rapid pace, so locking into one company's offering, or another's, is really wasted money (Why pay 200/year upfront for something that looks like it will be outdated within the next month (or quarter))

> The real story is running these models at true performance max likely could go into the thousands per month per user.

A loss leader model like that failed for Uber, because there really wasn't any other constraints on competition doing the same, including under pricing to capture market share - meaning it's a race to the bottom plus a test on whose pockets were the deepest.

heyoni•6mo ago

Just pay per month then.

awesome_dude•6mo ago

Even a monthly payment is unnecessary at this moment - again the evolution of the quality of the models, but also the free tier offerings for each model (you can still use the older model's free tier, it was working enough before)

I personally haven't tried GPT 5 yet, but I am getting all I need from Claude and Gemini.

Once I start experimenting with GPT 5.0 - I will still use Claude and Gemini when I run out of free uses.

diego_sandoval•6mo ago

The annoyance of having to switch to a different model and losing my context is enough for me to pay the 20 dollars a month.

These models make me much more productive anyway. That is worth far more than $20.

awesome_dude•6mo ago

I've had Claude tell me that I've reached the maximum messages for a session, which was painful.

Having said that it was a good circuit breaker, Claude was stuck on three wrong solutions to an issue, and having to re explain it meant that I realised what the bug was without further input from Claude

edit: FTR where Claude was stuck (and where it helped me)

First, I have a TUI application, written in Go, that is using https://github.com/rivo/tview - I chose tview basically at random, having no background in any of the TUI libraries available, but tview looked like it was easy to understand.

I had written some of the code, but it wasn't doing what I wanted of it. I had gotten Gemini to help me, but it still wasn't doing what I wanted.

Claude first suggested that the application needed to be refactored to make it into an Event Driven MVC TUI app, which I absolutely agreed with.

Claude counselled me on my understanding of the Run() command, and how it was a blocking call (which I hadn't realised). The refactor/re-architecture fixed the way things were being run a great deal.

However I still had a bug that I could not fix, and Claude was equally stumped (as was Gemini)

When I clicked on any of the components in a flex row, the first component behaved as though it was the one that was clicked. When I clicked on any of the components in another flex row the last one behaved as though it was the one that had been clicked.

Claude repeatedly told me that the problem was that the closure that used the index of components (inside a loop) was the problem.

This was outright wrong because

1. That bug has been fixed in Go.

2. The code I was sharing with Claude had the "fix" of local := index - meaning that the local version should have been correct

3. I repeatedly showed Claude logs (that it suggested I create) that showed that ALL of the components were in fact receiving the click event.

The second solution that Claude was fixated on was that th components overlapped one another inside the flex box.

This was partially incorrect.

I told Claude that I could visually see that the components weren't overlapping.

I also told Claude that there were borders around each component that clearly weren't overlapping.

I gave Claude debug logs that showed that the mouse click co-ordinates were inside a single component.

The third issue that Claude claimed was the problem was a click propagation bug in the library.

Claude repeated these claims several times, despite me showing that it wasn't either of the first two things, and I could not find a bug/issue for the third.

The circuit breaking made me stop and think about things, and I realised that all I really had to do was say inside the box "If a click event has been received AND (and this was the fix) You (the component) now have focus then behave".

What I suspect is happening is that the flex box container receives the click, and then tells all of its children that a click happened.

What disappoints me is, if what I suspect is true, then I would have expected Claude to have known that either from the tview documentation OR from reading the tview code (Claude does seem well acquainted with the library)

If my suspicion is correct then the second and third issues Claude claimed were causing me the bug were partially true, that is the flex container is on top of the components, which is an overlap, just not the components themselves overlapping.

The second partial correctness is that the way that the click is being propagated to the components seems to be that the flex box container is telling all of its child components that a click occurred. This isn't really a bug, it's likely well documented (as if I would RTFM...) or clear from the code.

jgb1984•6mo ago

I don't understand how you can have the patience to deal with an LLM like that. As if you're dealing with a low skilled but highly stubborn intern. Sounds like an awful waste of time to me.

awesome_dude•6mo ago

Ha!

I have been a SWE for about 15 years, the majority of my career has been dealing with people that have about the same understanding as the LLMs, who scream abuse at me because their suggested solutions were invalid (in fact you can see someone heading down that path in another thread on this page)

The LLMs are less likely to run to HR when I tell them to eff 0ff with their stupidity (tested - told Claude that it had already effing suggested the bad index one and it was wrong because..., to which it politely apologised.. and re-suggested another one of its three suggestions)

So, from that point of view - LLMs are superior to some of the "senior" developers I have the misfortune of having to deal with previously.

As for my patience - I don't think I am being patient so much as doggedly determined to finding what I know is a fixable bug, that is, I will just keep gnawing at what I think should be fixable until a solution comes along, or I find something else shiny to occupy my spare time with (this being two side projects - the code and the testing of how good LLMs really are)

heyoni•6mo ago

If people are constantly running to HR in your midst then you might not be the mentor you think you are.

awesome_dude•6mo ago

exhibit c - this is the guy that is also exhibit a

FTR, me being told what to do by the other individual hardly sounds like I'm the one mentoring... but failure to read and comprehend has already proved to be your style.

heyoni•6mo ago

The last word.

awesome_dude•6mo ago

Oh dear.

ahartmetz•6mo ago

A click event being sent to all elements inside a container if it clearly occurred on just one is what I'd consider an event propagation bug. I've never seen that happen before. Even if it's documented, it's pretty strange.

awesome_dude•6mo ago

exhibit b.

theshrike79•5mo ago

Yep, LLMs tend to get into weird snags like this, usually with either niche or state of the art stuff.

I tried to get a game prototype up and running with https://spacetimedb.com - GPT-5 was clearly working with outdated information and couldn't get anything done. It just reverted back to things that didn't exist (commands had changed, their arguments were different).

Same project, same GPT-5. This time it went in weird circles when I tried to use Deno as the backend. First it understood how to import Phaser to Deno, then it forgot. Then it remembered. Then it forgot. All within the same conversation with us trying to get SOMETHING on the browser =)

heyoni•6mo ago

Your argument was not to get into a long term commitment because then you might have to wait a year to be able to use the best model again...not that you could get by using a rotating cast of free tier AI models...

awesome_dude•6mo ago

You are complaining that i have multiple options available to me for not paying for a subscription.

The reason for not buying into a monthly subscription is not a reason thati should put shouldn't buy a year long one and is not negated in any way

heyoni•6mo ago

No one's mad at you. You just switched your arguments.

As for why pay for access when you can get on free tiers? Well, because the pay models are far far better than the free ones. That's it.

awesome_dude•6mo ago

See, you keep making these accusations. Nothing got switched.

Your idea wasn't valid, no need for all the nastiness

wood_spirit•6mo ago

AI pricing is stuck on subscription rather than metering which means that it’s a race to the bottom. It’s not obvious how AI providers can change that as the service they offer is just a game to users who can, even reluctantly, switch off.

awesome_dude•6mo ago

I THINK that they are going to have to change pretty soon, I would guess that they would either drop their free tier offerings OR find another way to pay for things (advertising maybe?)

I'm not enough of a Business Major to know how they could monetise things, but I am enough of a realist to think that they can't stay like this forever

disgruntledphd2•6mo ago

> I THINK that they are going to have to change pretty soon, I would guess that they would either drop their free tier offerings OR find another way to pay for things (advertising maybe?)

I would be incredibly surprised if OpenAI didn't go for advertising, given that they hired Fidji Simo (who PM'd feed ads at Facebook, and introduced ads to Instacart).

wood_spirit•6mo ago

The OpenAI pitch deck for product placement was leaked over a year ago now

kldg•6mo ago

I can definitely understand that it feels like looking at early adopters of technology with hindsight. Calculators, home/office computers, media players.

BUT, of course, if you can get more value out of it inside a month or year than you put in, it doesn't really matter. Differences between frontier models feels pretty low to me currently, which was not always the case; even so, they're certainly far ahead of free models for my uses.

I did pay for OpenAI both in a $20/mo subscription and later for API tokens (came out cheaper than the subscription). Since Gemini 2.5 came out, though, I just abuse Google's free AI Studio (at least free in the US; idk what kind of geo-gating they do). I'm not paying Google, but they are keeping me from paying their competitors. It will take a large and hard-gated leap forward (or Google deciding my $150-250 worth of unpaid use has gone on long enough) to get me to pay again.

krnsll•6mo ago

Agreed. Something else that might be driving this is that existing models essentially get the job done for most users, who —- unlike HN commenters (I promise this is human generated, em dash notwithstanding ;P) —- don't quite about the state of the art.

Aeolun•6mo ago

I mean, it’s not that bad. It’s bad at all the same things that other models are bad at. I just have no reason to switch away from Claude to GPT-5

AndrewKemendo•6mo ago

Can someone remind me of anything Gary has contributed to AI?

Last I saw he hasn’t produced anything but general “pop” books on AI and being associated with MIT, which IMO has zero weight on applied or even at this point theoretical AI, as that is primarily coming out of corporate labs.

No new algorithms, frameworks, datasets, products, insights.

Why is this guy relevant enough to keep giving him attention, his entire ouvre is just anti-whatever is getting attention in the “AI” landscape

I don’t see him commenting on any other papers and he has no lab or anything

Someone make it make sense, or is it as simple as “yeah thing makes me feel bad, me repost, me repeat!”

mentalgear•6mo ago

Regardless of personal opinions about his style, Marcus has been proven correct on several fronts, including the diminishing returns of scaling laws and the lack of true reasoning (out of distribution generalizability) in LLM-type AI.

These are issues that the industry initially denied, only to (years) later acknowledge them as their "own recent discoveries" as soon as they had something new to sell (chain-of-thought approach, RL-based LLM, tbc.).

hodgehog11•6mo ago

Care to explain further? He has made far more claims of the limitations of LLMs that have been proven false.

> diminishing returns of scaling laws

This was so obvious it didn't need mentioning. And what Gary really missed is that all you need are more axes to scale over and you can still make significant improvements. Think of where we are now vs 2023.

> lack of true reasoning (out of distribution generalizability) in LLM-type AI

To my understanding, this is one that he has gotten wrong. LLMs do have internal representations, exactly the kind that he predicted they didn't have.

> These are issues that the industry initially denied, only to (years) later acknowledge them

The industry denies all their limitations for hype. The academic literature has all of them listed plain as day. Gary isn't wrong because he's contradicted the hype of the tech labs, he's wrong because his short-term predictions were proven false in the literature he used to publish in. This was all in his efforts to peddle neurosymbolic architectures which were quickly replaced by tool use.

AndrewKemendo•6mo ago

I’m just trying to find where all this hype is

I think the hype is coming from people who have no idea what is going on and just feeding on each other

Much like blockchain, metaverse or whatever was dominated by know nothings who spoke confidently to people even dumber than them

No professionals that have any experience or research credentials have made any crazy claims

hodgehog11•6mo ago

The hype is coming from startups, big tech press releases, and grifters who have a vested interest in raising a ton of money from VCs and stakeholders, same as blockchain and metaverse. The difference is that there is a large legitimate body of research underneath deep learning that has been there for many years and remains (somewhat) healthy.

I would argue that the claim of "LLMs will never be able to do this" is crazy without solid mathematical proof, and is risky even with significant empirical evidence. Unfortunately, several professionals have resorted to this language.

TheAceOfHearts•6mo ago

I've come around to the opinion that he's a bad faith actor riding the anti-AI attention train. Everything that he has said has also been said by other, more reasonable people. To give a concrete example: for years Yann LeCun has been banging the drum that LLMs by themselves are insufficient to build general intelligence and that just scaling up will not be enough.

At some point I entertained a few discussions where Gary Marcus was participating but from what I remember, it would never go anywhere other than a focus on playing around with definitions. Even if he's not wrong about some of his claims, I think there are better people worth engaging with. The amount of insight to be gained from listening to Gary Marcus is akin to that of a small puddle.

andai•6mo ago

Yeah, the sycophancy withdrawal is real. I almost considered telling GPT-5 to act ten years younger, use emoji everywhere, and compliment me at the beginning of every response... but I snapped out of it.

mentalgear•6mo ago

The AI community requires more independent experts like Marcus to maintain integrity and transparency, ensuring that the field does not succumb to hyperbole as well as shifting standards such as "internally achieved AGI", etc.

kylehotchkiss•6mo ago

Agreed, the hype cycles need vocal critics. The loudest voices talking about LLMs are the ones who financially benefit the most for it. I’m not anti-AI, I think the hype and gaslighting the entire economy to believe this is the sole thing that is going to render them unemployed is ridiculous (the economy is rough for a myriad of other reasons, most of which come originate from our countries choice in leadership)

Hopefully the innovation slowing means that all the products I use will move past trying to duck tape AI on and start working on actual features/bugs again

igorkraw•6mo ago

I have a tiny tiny podcast with a friend where we try to break down what parts of the hype are bullshit (muck) and which kernels of truth are there, if any, startedpartially as a place to scream into the void, partially to help the people who are anxious about AGI or otherwise bring harmed by the hype. I think we have a long way to go in terms of presentation (breaking down very technical terms to an audience that is used to vague-hype around "AI" is hard), but we cite our sources, maybe it'll be interesting gpr you to check out out shownotes

https://kairos.fm/muckraikers/

I personally struggle with Gary Marcus critiques because whenever they are about "making ai work" it goes into neurosymbploc "AI" which o have technical disagreements with, and I have _other_ arguments for the points he sometimes raises which I think are more rigorous, so it's difficult to be roughly in the same camp - but overall I'm happy someone with reach is calling BS ad well.

heyoni•6mo ago

I don’t associate any of these AI limitations and mischaracterizations with Marcus. Do you?

vessenes•6mo ago

Hard disagree. The essay is a rehash of Reddit complaints, no direct results from testing and largely about product launch (simultaneous launch to 500mm+ users mind you) snafus. Please.

I think most hit pieces like this miss what is actually important about the 5 launch - it’s the first product launch in the space. We are moving on from model improvements to a concept of what a full product might look like. The things that matter about 5 are not thinking strength, although it is moderately better than o3 in my tests, which is roughly what the benchmarks say.

What’s important is that it’s faster, that it’s integrated, that it’s set up to provide incremental improvements (to say multimodal interaction, image generation and so on) without needing the branding of a new model, and I think the very largest improvement is its ability to retain context and goals over a very long set of tools uses.

Willison mentioned it’s his only daily driver now (for a largely coding based usage setup), and I would say it’s significantly better at getting a larger / longer / more context needed coding task than the prior best — Claude - or the prior best architects (o3-pro or Gemini depending). It’s also much faster than o3-pro for coding.

Anyway, saying “Reddit users who have formed parasocial relationships with 4o didn’t like this launch -> oAI is doomed” is weak analysis, and pointless.

ModernMech•6mo ago

If ChatGPT 5 lived up to the hype, literally no one would be asking for old models back. The snafus are minor as far as presentations go, but their existence completely undermines the product OpenAI is selling, which is an expert in your pocket. They showed everyone this "expert" can't even assist the creators themselves to nail such a high stakes presentation; OpenAI's embarrassing oversights foretell similar embarrassments for anyone who relies on this product for their high stakes presentation or report.

petetnt•6mo ago

GPT-5 is just OpenAI getting started. Just wait and see what GPT-6 is capable of and imagine that GTP-6 is just OpenAI getting started: if GPT-6 was a high school student, GPT-7 is an expert with masters degree; but GPT-7 is OpenAI getting started

Uehreka•6mo ago

This is a genre of article I find particularly annoying. Instead of writing an essay on why he personally thinks GPT-5 is bad based on his own analysis, the author just gathers up a bunch of social media reactions and tells us about them, characterizing every criticism as “devastating” or a “slam”, and then hopes that the combined weight of these overtorqued summaries will convince us to see things his way.

It’s both too slanted to be journalism, but not original enough to be analysis.

ants_everywhere•6mo ago

It's critique, which is a high sounding way of saying propaganda

johnfn•6mo ago

For some reason AI seems to bring out articles that seem to fundamentally lack curiosity - opting instead for gleeful mockery and scorn. I like AI, but I'll happily read thoughtful articles from people who disagree. But not this. This article has no value other than to dunk on the opposition.

I tend to think HN's moderation is OK, but I think these sorts of low-curiosity articles need to be off the front page.

giantrobot•6mo ago

> opting instead for gleeful mockery and scorn

This is well earned by the likes of OpenAI that is trying to convince everyone they need trillions of dollars to build fabs to build super genius AIs. These super genius AIs will replace everyone (except billionaires) and act as magic money printers (for billionaires).

Meanwhile their super genius precursor AIs make up shit and can't count letters in words while being laughably sycophantic.

There's no need to defend poor innocent megacorps trying to usher in a techno-feudal dystopia.

MBCook•6mo ago

I think there’s plenty to mock about the hype around AI

That doesn’t mean any article mocking it or trashing it is well written or insightful.

frozenseven•6mo ago

>can't count letters in words

This really hasn't been a thing since reasoning models showed up. Any recent example of such seems to come from non-reasoning variants.

>laughably sycophantic

Mainstream models are moving away from this, afaik. Part of the recent drama is that GPT-5 wasn't sycophantic enough for some users.

jjani•6mo ago

https://bsky.app/profile/kjhealy.co/post/3lvtxbtexg226

frozenseven•6mo ago

This is exactly what I'm talking about. The immediate issue with GPT-5's launch was that some people were getting rerouted to 4o, and there was no control over which version of GPT-5 you're getting. Whatever he's talking to there looks nothing like a reasoning model.

hyperadvanced•6mo ago

I don’t like AI and I think this type of article is very boring. Imagine having one of the most interesting technological developments of the last 50 years unfolding before your eyes and resort to reposting tweet fragments…

bko•6mo ago

> For some reason AI seems to bring out articles that seem to fundamentally lack curiosity - opting instead for gleeful mockery and scorn

I think its broader to all tech. It all started in 2016 after it was deemed that tech, especially social media, had helped sway the election. Since then a lot of things became political that weren't in the past and tech got swept up w/ that. And unfortunately AI has its haters despite the fact that it's objectively the fastest growing most exciting technology in the last 50 years. Instead they're dissecting some CEOs shitposts.

Fast forward to today, pretty much everything is political. Take this banger from NY Times:

> Mr. Kennedy has singled out Froot Loops as an example of a product with too many ingredients. In an interview with MSNBC on Nov. 6, he questioned the overall ingredient count: “Why do we have Froot Loops in this country that have 18 or 19 ingredients and you go to Canada and it has two or three?” Mr. Kennedy asked.

> He was wrong on the ingredient count, they are roughly the same. But the Canadian version does have natural colorings made from blueberries and carrots while the U.S. product contains red dye 40, yellow 5 and blue 1 as well as Butylated hydroxytoluene, or BHT, a lab-made chemical that is used “for freshness,” according to the ingredient label.

No self-awareness.

https://archive.is/dT2qK#selection-975.0-996.0

moritzwarhier•6mo ago

I think you are missing the forest for the trees here.

> It all started in 2016 after it was deemed that tech, especially social media, had helped sway the election. Since then a lot of things became political that weren't in the past and tech got swept up w/ that

The 2016 election was a symptom of broader societal changes, and yeah, I'd also say it exhibited a new level of psychological manipulation in election campaigns. But the election being "deemed" influenced by technology and media (sure it was, why not?) as a cause for political division seems very far-fetched. Regarding the US healthcare politics farce, I don't understand your point or how it relates to the beginning of your comment.

Political division and propaganda inciting outrage are flourishing, yes. Not because of what you describe about the 2016 election though, IMO. What's the connection? So you mean if nobody had assessed social media campaigns after that election, politics wozld be more honest and fact-based? Why? And what did you want to say with the NY Times article about the US secretary of health and American junk food / cereals?

bamboozled•6mo ago

Political division and propaganda inciting outrage are flourishing, yes.

My concern is there is no antidote for this in the horizon it’s just more and more stupid getting traction all the time. You have to put a lot of faith in common sense to stay optimistic.

moritzwarhier•6mo ago

I agree with this, Neil Postmans "Amusing ourselves to death" is still a good read in 2025.

The only antidotes I can imagine are honesty and real community: make whatever you want from this, it should be obvious by now that global cut-throat capitalism does not lead to democracy, or to efficient resource usage (externalization...), or to equality.

bamboozled•6mo ago

Honestly my concern is WW3 fueled on propaganda. The ONLY reason I think this might not happen is because people are too amused to even bother fighting a war now. I'm not joking.

hckrfucrs•6mo ago

Exactly!

I mean, what's political about having former NSA heads on your "exciting technology" board?

Or what's political about lining up together as the front row at the despot in chief's inauguration?

And what's so political about lobbying to and sequestering large amounts of public funds for your personal manufactured consent machine?

These things are literally software that runs on technology developed in the last 50 years, but by your clearly apolitical, unbiased, absolutely thoughtful, well reasoned, fully researched, insight is in fact "the most exciting technology in the last 50 years".

hitarpetar•6mo ago

> objectively the fastest growing most exciting technology in the last 50 years

what's objective about this opinion? how does one objectively measure how exciting a technology is?

BlueTemplar•6mo ago

The Snowden scandal ? Cambridge Analytica ? YouTube's ContentId in 2009 ? Microsoft's behaviour with IE ?

A lot of these issues are not general to infocoms either, but specific to platforms, to USian (/Russian/Chinese) companies, to companies grown too big and a failure of antitrust...

WhyOhWhyQ•6mo ago

Why should the excitement of a technology have anything to do with my critical view of it? Are we toddlers playing with toys, or are we trying to make a better world here?

hckrfucrs•6mo ago

Ah yes, the highest form of curiosity, the kind that attempts to silence those whose opinion differs from your own.

If it hurts your feelings to have your terrible opinions "dunked" on then take the time to form better opinions.

johnfn•6mo ago

I said clearly and explicitly in my comment that I am happy to read thoughtful anti-AI material. That you disregarded this to write your comment says more about your ability to "form better opinions" than mine.

jepson19•6mo ago

I think this is the fundamental trend of all "commentary" in the digital age.

Thoughtful, nuanced takes simply cannot generate the same audience and velocity, and by the time you write something of actual substance the moment is gone and the hyper-narrative has moved on.

diatone•6mo ago

FTA

> That’s exactly what it means to hit a wall, and exactly the particular set of obstacles I described in my most notorious (and prescient) paper, in 2022. Real progress on some dimensions, but stuck in place on others.

The author includes their personal experience — recommend reading to the end.

Uehreka•6mo ago

I did read to the end before commenting. The author alludes to a paper they wrote 3 years ago while self-importantly complimenting themself on how good it was (always a red flag). They don’t really say much other than that in the post.

drakenot•6mo ago

Gary Marcus tends to have pretty shallow analysis or points.

His takes often remind me of Jim Cramer’s stock analysis — to the point I’d be willing to bet on the side of a “reverse Gary Marcus”.

johnfn•6mo ago

You'd take the other side of a reverse Gary Marcus? So you'd take Gary Marcus' side?

drakenot•6mo ago

Fixed, thanks.

esafak•6mo ago

https://x.com/dMxwABXhoVgGr1Y/status/1934492048612098464

drakenot•6mo ago

Hilarious. It’s impossible to have an original thought!

tim333•6mo ago

Hinton on Marcus is quite funny too https://www.youtube.com/watch?v=d7ltNiRrDHQ

colechristensen•6mo ago

Any journalism (or anything that resembles it) which contains the words "devastating", "slam", or the many equivalents is garbage. Unless it's about a natural disaster or professional wrestling.

joshuamoyers•6mo ago

100% agree. I feel like this is a symptom of Dead Internet Theory as well - as a negative take starts to spiral out of control, we start to get an absolute deluge of a repurposing of the directionally negative sound bytes and it honestly feels like bot canvasing.

dangus•6mo ago

This style of journalism existed long before dead internet theory.

tokai•6mo ago

>It’s both too slanted to be journalism, but not original enough to be analysis.

Its a blog post.

roenxi•6mo ago

I don't think the complaint is ultimately against the post - if someone wants to post whatever on their blog that is fine. The complaint is more targeted against the people upvoting it because ... it is hard to speculate what their motivations are, but their ability to pick a low-content article when they see it is limited.

BlueTemplar•6mo ago

Some people are trying to defend some kind of pure vision of journalism, but most journalism has always been shit.

I prefer the 4th power definition : if it has the power of broadcast (yes here), then it's journalism.

GodelNumbering•6mo ago

I think it's a broad problem across every aspect of life - it has gotten increasingly more difficult to find genuine takes. Most people online seem to be just relaying a version of someone else's take and we end up with unnecessarily loud and high volume shallow content.

hyperadvanced•6mo ago

They don’t call it an echo chamber for nothin’

ramchip•6mo ago

> Gary Marcus always, always says AI doesn't actually work - it's his whole thing. If he's posted a correct argument it's a coincidence.

https://news.ycombinator.com/item?id=44278811

I think you're absolutely right about this being a wider problem though.

mcswell•6mo ago

That's not at all what Marcus is saying. He admits that it does remarkably well, but says (1) it's still not trustworthy; and (2) This version is not much better than the previous version. Both points are in support of his claim that just scaling isn't ever going to lead to General AI.

mortsnort•6mo ago

It's a blog post about whether GPT 5 lived up to the hype and how it is being received, which is a totally legitimate thing to blog about. This is Gary Marcus's blog, not BBC coverage, of course it's slanted to the opinion he is trying to express.

ninetyninenine•6mo ago

Yeah which is exactly what the post you’re responding to is commenting on.

It’s a classic HN comment asking for nuance and discrediting Gary. It’s about how Gary is always following classic mob mentality, so of course it’s not slanted at all and commenting about the accuracies of the post.

So ironically you’re saying Gary’s shit is supposed to be that way and you’re criticizing the HN comment for that, but now I’m criticizing you for criticizing the comment because HN comments ARE supposed to be the opposite of Gary’s bullshit opinions.

I expect to read better stuff on HN. Not this type of biased social media violence and character take downs.

indigodaddy•6mo ago

Yeah, but, he did "play with it for about an hour!"

kelnos•6mo ago

Sure, that's fine, but the question is whether or not that's interesting to the HN crowd. Apparently it is, as it made it to the front page. But I agree with GP's criticism of the article; if I wanted to know what reddit thought about GPT-5, I'd go to reddit. I don't need to read an article filled with gleeful vitriol. I gave up after a couple paragraphs.

benreesman•6mo ago

We as a community have decided to absolutely drench the front page with low-effort hot takes by non-practitioners about one of many areas of modern neural network ... progress.

This low-effort hot take is every bit as "valid" as all the nepobaby vibecode hype garbage: we decided to do the AI thing. This is the AI thing.

What's your point? This one is on the critical side of the argument that was stupid in toto to begin with?

screye•6mo ago

To be fair, Gary Marcus pioneered the "LLMs will never make it" genre of complaining. Everyone else is derivative [1]. Let the man have his victory lap. He's been losing arguments for 5 years straight at this point.

[1] Due credit to Yann for his 'LLMs will stop scaling, energy based methods are the way forward' obsession.

lorenzo_medici•6mo ago

I was a grad student at NYU, its been much longer than 5 years of this, not just with LLMs but with just ML in the past.

disgruntledphd2•6mo ago

I mean, his conceptual arguments are pretty good, ML/LLMs/statistical learning methods have real problems with out of distribution inputs and samples, and there's really no way to work around this (that I know of).

I get what you mean though, even if he's right it must've been pretty annoying hearing the same thing all the time.

jdefr89•6mo ago

I think from the authors perspective, LLM hype has been mostly the same exact thing you’re accusing him of doing. People with very little technical background claiming AGI is near or all these CEOs pushing these nonsense narratives are getting old.. People are blindly trusting these people and offloading all their thinking to a sophisticated stochastic machine. It’s useful yes, super cool yes. Some super god like power or something that brings us to AGI? No probably not. I can’t blame him. I am sick of the hype. Grifters are coming out of the woodwork in a field with too many grifters to begin with. All these AI/LLM companies are high of their own supply and it’s getting old.

jvanderbot•6mo ago

I will never understand the "bad diagram" kind of critique. Yes maybe it can't build and label a perfect image of a bicycle, but could it list and explain the major components of a bike? Schematics are a whole different skill, and do we need to remind everyone what the L is?

mcswell•6mo ago

Listing and explaining is essentially repeating what someone else has said somewhere. Labeling a schematic requires understanding what you're saying (or copy-pasting a schematic, so I guess we can be happy that GPT-5 doesn't do that). No one who actually understood the function of the major components would mislabel the schematic like that, unless they were blind.

jvanderbot•6mo ago

Its true if you expect general intelligence. Its forgiveable for expecting general intelligence given the hype. But there's no real reason we should have expected a language model to create an image that is for some reason a perfect bicycle schematic (other than hype). And I'm not sure that imagery is actually a required format to demonstrate intelligence.

I bet it could generate assembly instructions and list each part and help diagnose or tune. And that's remarkable and demonstrates enough fake understanding to be useful.

mcswell•6mo ago

"fake understanding" is exactly the right term. And the image is just fine, it's the labeling that's bonkers. What it illustrates is that the LLM can repeat words, but it has no idea what it's saying. Whereas any pre-1817 engineer, reading descriptions of bicycles (which GPT-5 obviously has access to), could easily have labeled a picture of one. (1817 is the date the first real bicycle is believed to have been invented, but driven by the rider's feet on the ground. Bicycles with chain drives weren't invented until decades later. But an engineer would have understood the principle.)

mbac32768•6mo ago

I agree. A better article would dive into the economics of what and why they didn't release the model that won gold in the 2025 International Math Olympiad. And the answer there is (probably) because it had a million dollars in inference compute costs just to take the test. If you follow that line of reasoning, they're onto something, possibly something that is AGI, but it's still much too expensive to commercialize.

If AI gains now come from spending OOMs more on inference than compute, it means we're in slow takeoff-istan and they're going to need to keep the hype train going for a long time to keep investment up.

Can they get there? Is it a matter of waiting out the clock on Moore's law? Or are they on the sigmoid curve with inference based gains well?

That's the question of our time and it's completely ignored.

moritzwarhier•6mo ago

99% of content about AI nowadays is smug bullshitting with no real value. Is that new?

I'm eagerly awaiting more insightful analyses about how AI will create a perpetuum mobile of wealth for all of humanity, independent from natural resource consumption and human greed. It seems so logical!

Even some lesswrong discussions, however confused, are more insightful than most blog posts by AI company customers.

Fascinating technology though, sure.

stocksinsmocks•6mo ago

I did get a strong sense of gilted nerd. Why didn’t they give ME those billions in research funding? Nobody sees that I am the smartest boy because they’re just a bunch of dopes. Opinion people are something I think we could all do with less of.

chromaton•6mo ago

For my benchmarking suite, it turns out that it's about 1/5 the price of Claude Sonnet 4.1, with roughly comparable results.

benoittravers•6mo ago

What use case?

emilsedgh•6mo ago

People on our circles are obsessed with model performance. OpenAI's lead is not there and hasn't been there for some time.

They do, however, have a major lead in terms of consumer adoption. To normal people who use llm's, ChatGPT is _the_ model.

This gives them a lot of opportunities. I don't know what's taking them so long to launch their own _real_ app store, but that's the game they are ahead of everyone else because of the consumer adoption.

bawolff•6mo ago

So GPT-5 sucks if you were expecting the singularity.

I know AI hype is truly insane, but surely nobody actually believed the singularity was upon us?

stavros•6mo ago

GPT-5 seems to suck even if you were expecting o3.

calrain•6mo ago

I'm having some unique problems with GPT-5 that I've not seen with GPT-4.

It seems to lose the thread of the conversation quite abruptly, not really knowing how to answer the next comment in a thread of comments.

It's like there is some context cleanup process going on and it's not summarizing the highlights of the conversation to that point.

If that is so, then it seems to also have a very small context, because it seems to happen regularly.

Asking it to 'Please review the recent conversation before continuing' prompt seems to help it a bit.

paddw•6mo ago

For me the responses just seem a lot more terse?

calrain•6mo ago

Very much so, not sure why, but if it has a limited context history of the conversation, the tone may feel off.

It feels physically jarring when it loses the plot with a conversation, like talking to someone who wasn't listening.

I'm sure its a tuning thing, I hope they fix it soon.

manishsharan•6mo ago

Does anyone else miss o3?

I swear I had an understanding of how to get deep analytical thinking out of o3. I am absolutely struggling to get the same results with GPT-5. The new model feels different and frustrating to use.

cpursley•6mo ago

I’m not sure whether to miss it or not as I never understood their model naming nor how to choose the right one for my use case. So I’m actually glad they simplified their product lineup.

adeptima•6mo ago

miss it heavily! it could read code dumps and was superior for code analysis and todos

CompoundEyes•6mo ago

Yes o3 was an inflection point. Both modes of 5 are performing poorly on the IQ test compared to o3 https://www.trackingai.org/home That test best reflects my experience and results from practical use cases with the reasoning models when planning specs, bug finding, ideation, and deep research. It’s great at tool use as well in scripts. What I like least about the release is no transparency about the “routing” taking place. Give me all the options on the system card to pick from https://openai.com/index/gpt-5-system-card/ and I don’t want to have to start telling it “ultrathink” or other magic words to affect routing. To be fair though I haven’t tried 5 in reasoning mode beyond Cursor. But now I see o3 is only part of the Pro plan. If 5 reasoning is supposed to be better why would o3 and o3-pro still be a specialized models for Pro customers? I’d like to see some side by side prompts I might go back and test that.

Madmallard•6mo ago

A friend of mine works in AI professionally. He told me months ago that it is basically just all a scam and hype to garner investment money. He said the technology and paradigm itself will never lend toward AGI or anything like that.

He sent me all these articles geared toward that end as well. https://garymarcus.substack.com/p/seven-replies-to-the-viral... https://substack.com/@cattelainf/note/c-135021342 https://arxiv.org/abs/2002.06177 https://garymarcus.substack.com/p/the-ai-2027-scenario-how-r... https://garymarcus.substack.com/p/25-ai-predictions-for-2025...

linotype•6mo ago

We’ll never reach AGI because it’s a bullshit term to begin with and the goalposts will always be moved by people like… Gary Marcus.

Workaccount2•6mo ago

https://ai.vixra.org/pdf/2506.0065v1.pdf

Stochastic parrots will never be better than humans

kylecazar•6mo ago

"People had grown to expect miracles, but GPT-5 is just the latest incremental advance."

This is really the only part of the article I think was worth writing.

-People should expect an incremental advance

-Providers should not promise miracles

Managing expectations is important. The incremental advances are still advances, though, even if I don't think "AGI" is just further down on the GPT trajectory.

adeptima•6mo ago

same sentiments with an article author - gpt5 looks like a cost-cut initiative.

my personal feeling gpt5-thinking is much faster but doesnt produce the same quality results as o3 which were capable to scan through the code base dump with file names and make correct calls

dont feel any changes with https://chatgpt.com/codex/

my best experience was to use o3 for task analysis, copy paste the result in https://chatgpt.com/codex/, work outside and vibe code from mobile

Havoc•6mo ago

He just sounds bitter with a weird grudge against Altman

Gpt5 was an incremental improvement. That’s fine. Was hyped hard but what did you expect? It’s part of the game

Analemma_•6mo ago

I expect them not to lie? If it's worth hyping hard, hype it hard. If it's an incremental improvement, don't.

It makes me crazy that this kind of institutionalized lying is so normal in the Valley that we get comments like yours shaming people for not understanding that lies are the default baseline. Can't we expect better? This culture is what gives us shit like Theranos, where we all pretend to be shocked even though any outside analysis could see it was an inevitable outcome.

blackqueeriroh•6mo ago

“In the valley”

Please check out claims made by supplements, which are unregulated by the FDA. You’ll find institutionalized lying there, as well.

Any claim that can be made without being held up to false advertising will be made.

mcswell•6mo ago

Maybe that's a good term to use for GPT-5: it's a supplement.

Analemma_•6mo ago

Is that supposed to be exonerative? "Well, we're no worse than snakeoil-selling medical frauds"? I was hoping, apparently in vain, we could aspire to better than that.

margalabargala•6mo ago

> Was hyped hard but what did you expect? It’s part of the game

That lying is common, does not mean one cannot criticize an entity for lying.

AlexandrB•6mo ago

> It’s part of the game

It should get you sent to jail. I've had enough of empty promises. How much capital is misallocated because it's chasing this bullshit?

jdefr89•6mo ago

That’s exactly the problem the author has. Lying and hype have replaced genuine innovation. It’s sad that lying and pushing nonsense is “part of the game” because it shouldn’t be. The game is only of benefit to a handful of people like Altman, not the humanity or to the field in general. I am sick of it as well… Tech has become grifter central and everyone is high on their own fucking supply…

ModernMech•6mo ago

Hype is one thing, but the chasm between "we have internal AGI" and what they delivered feels more like a grift.

orwin•6mo ago

It's not? Could we expect people posing as scientists to show a modicum of rigor and honesty? I know tech bros are only posing (except maybe Bezos), but still, try harder? Altman is making me sympathize with Musk.

asciii•6mo ago

Can someone let me know if they also find the existing UI prompt unbearably slow? At first I thought it was my browsers but I am having the same experience on every machine. It's so bloody slow with loading responses, freezing and even giving me the old browser tab death warning "Cancel or Wait"

rpmisms•6mo ago

There is no training data left. Every refinement in AI from here on will come from architectural changes. All models basically have reached a local maximum on new information.

blibble•6mo ago

yeah, I said this on this site two years ago

there's no second internet of high quality content to plagarise

and the valuable information on the existing one is starting to be locked down pretty hard

vajrabum•6mo ago

And even if it's not locked down hard how do you separate the signal from the noise with all the ai generated blah blah blah.

p1esk•6mo ago

Are you saying they have already trained gpt-5 on the entirety of world’s video data?

blackqueeriroh•6mo ago

Studies show relatively conclusively that using primarily synthetic data woven intentionally with seeded real-world data is an effective strategy for training frontier LLMs: https://consensus.app/search/synthetic-data-effectiveness-fr...

BriggyDwiggs42•6mo ago

You can only take that so far before model collapse. Even if it were equivalent to multiplying currently available data by four, would that be enough? Might still look like an incremental improvement.

booleandilemma•6mo ago

Isn't new training data being created every day? YouTube, Facebook, Tiktok, etc.? Humans are content creation machines.

BlueTemplar•6mo ago

No training data for LLMs -- in the text form.

Meanwhile the fraction of text that the real world consists of is microscopic.

hexage1814•6mo ago

Gary Marcus would have wrote this article in all possible scenarios, unless ChatGPT 5 was literally AGI (maybe even it were, he would still have found something to attack). There is valid criticism, and there is just being a contrarian for the sake of being a contrarian.

The whole thing feels less like “Hey, this is why I think the model is bad” and more like the kind of sensationalist headline you’d read in a really trashy tabloid, something like: “ChatGPT 5 is Hot Garbage, Totally Fails, Sam Altman Crushed Beneath His Own Failure.”

Also, I have no idea why people give so much attention to what this guy has to say.

an0malous•6mo ago

His claims were that GPT5 would be an incremental improvement at best and that LLMs are not sufficient for AGI, all while Sam Altman has claimed that AGI is just around the corner since GPT4. People pay attention to what Gary Marcus says because he’s right.

SerCe•6mo ago

Here are my reasons why this "upgrade" is, in experience, a huge downgrade for Plus users:

* The quality of responses from GPT-5 compared to O3 is lacking. It does very few rounds of thinking and doesn't use web search as O3 used to. I've tried selecting "thinking", instructing explicitly, nothing helps. For now, I have to use Gemini to get similar quality of outputs.

* Somehow, custom GPTs [1] are now broken as well. My custom grammar-checking GPT is ignoring all instructions, regardless of the selected model.

* Deep research (I'm well within the limit still) is broken. Selecting it as an option doesn't help, the model just keeps responding as usual, even if it's explicitly instructed to use deep research.

[1]: https://openai.com/index/introducing-gpts/

boredemployee•6mo ago

And it is hallucinating like hell. Really disappointing.

trane_project•6mo ago

Projects seem broken as well. Does not follow instructions, talks in Spanish, completely ignores my questions, and sometimes appears to be having a conversation with itself while ignoring everything I say. I even typed random key presses and it just kept on giving me the same unwanted answer, sometimes in Spanish.

benoittravers•6mo ago

Agreed. It seems they’re purposely pushing us either to free, which will most likely start having ads early next year, or to the 200USD a month plan.

Their model is not to have a 20USD, no ads plan in the future.

SerCe•6mo ago

To provide an update, both – custom GPTs and Deep Research seem to have been fixed.

osigurdson•6mo ago

For me, it wasn't that the results were bad it is that it goes into thinking mode all the time making the responses slow. Personally, I think it will get better in time but yeah, the Death Star analogy seems pretty off. Not sure what Sam was thinking there.

computegabe•6mo ago

OpenAI could create the best model ever made, call it GPT-5, and it still would've failed to meet the expectations of the people for "GPT-5" after the meme community hyped it up and OpenAI embraced the memes and hype. If anything, OpenAI should have rejected the memes and embraced gradual improvements, but that wouldn't hold up well for their investors, the narrative, or even perhaps the AI ecosystem. We are at the peak.

anilgulecha•6mo ago

> it still would've failed to meet the expectations of the people for "GPT-5"

To be fair sam altman did set (and fanned the flames of ) those expectations.

deadbabe•6mo ago

When true AGI arrives, what arguments do you anticipate people will use to explain how it failed to meet expectations?

SilverElfin•6mo ago

Is it just me or is this sort of a “tear someone down” rant wrapped up as an attempt at something more

starchild3001•6mo ago

I asked GPT-5 and Gemini 2.5 Pro what they think about Gary Marcus's article. I believe Gemini won by this paragraph:

It seems Sam Altman's Death Star had a critical design flaw after all, and Gary Marcus is taking a well-earned victory lap around the wreckage. This piece masterfully skewers the colossal hype balloon surrounding GPT-5, reframing its underwhelming debut not as a simple fumble, but as a predictable, principled failure of the entire "scaling is all you need" philosophy. By weaving together viral dunks on bike-drawing AIs, damning new research on generalization failures, and the schadenfreude of "Gary Marcus Day," the article makes a compelling case that the industry's half-a-trillion-dollar bet on bigger models has hit a gilded, hallucinatory wall. Beyond the delicious takedown of one company's hubris, the post serves as a crucial call to action, urging the field to stop chasing the mirage of AGI through brute force and instead invest in the harder, less glamorous work of building systems that can actually reason, understand, and generalize—perhaps finally giving neurosymbolic AI the chance Altman's cocky tweet so perfectly, and accidentally, foreshadowed for the Rebel Alliance.

My take on GPT-5? Latency is a huge part of the LLM experience. Smart model routing can be a big leap forward in reducing wait times and improving usability. For example, I love Gemini 2.5 Pro, but it’s painfully slow (sorry, GDM!). I also love the snappy response-time of 4o. The most ideal? Combine them in a single prompt with great model routing. Is GPT-5’s router up to the task? We soon shall see.

vessenes•6mo ago

Gemini is in hard sycophancy mode here; it knows you want it to take the piss and it’s giving you what you want.

Presuming the last two are from 5, they are to my eyes next generation in terms of communication — that’s a spicy take on neurosymbolic AI, not a rehashed “safe” take. Also, the last paragraph is almost completely to the point, no? Have you spent much time waiting for o3 pro to get back to you recently, and wondered if you should re-run something faster? I have. A lot. I’d like the ability to put my thumb on the scale of the router, but I’d dearly love a per token / per 100 token router that can be trained and has latency without major latency intelligence hits as a goal.

starchild3001•6mo ago

The last paragraph is my own thoughts. The one is before is Gemini.

Btw I didn't agree with Gemini at all :) I just thought it gave a pretty good summary of Gary Marcus's points.

vessenes•6mo ago

Well i vote you as better than either :)

chmod775•6mo ago

At this point the single biggest improvement that could be made to GPTs is making them able to say "I don't know" when they honestly don't.

Just today I was playing around with modding Cyberpunk 2077 and was looking for a way to programmatically spawn NPCs in redscript. It was hard to figure out, but I managed. ChatGPT 5 just hallucinated some APIs even after doing "research" and repeatedly being called out.

After 30 minutes of ChatGPT wasting my time I accepted that I'm on my own. It could've been 1 minute.

mupuff1234•6mo ago

Yeah I'm surprised that there's not at least some sort of conviction metric being outputted along the LLM response.

I mean it's all probability right? Must be a way to give it some score.

bravesoul2•6mo ago

Not sure. In RLHF you are adjusting the weights away from wrong answers in general. So this is being done.

I think the closest you can get without more research is another model checking the answer and looking for BS. This will cripple speed but if it can be more agentic and async it may not matter.

I think people need to choose between chat interface and better answers.

yosito•6mo ago

Don't make the mistake of thinking that "knowing" has anything to do with the output of ChatGPT. It gives you the statistically most likely output based on its training data. It's not checking some sort of internal knowledge system, it's literally just outputting statistical linguistic patterns. This technology can be trained to emphasize certain ideas (like propaganda) but it can not be used directly to access knowledge.

chmod775•6mo ago

> It's not checking some sort of internal knowledge system

In my case it was consuming online sources, then repeating "information" not actually contained therein. This, at least, is absolutely preventable even without any metacognition to speak of.

efnx•6mo ago

I totally agree. That would be great. I think the problem with that is LLMs don’t know what they don’t know. It’s arguable they even “know” anything!

bravesoul2•6mo ago

Like the XKCD reference but bigger: Give me a 100bn research team and 25 years.

crindy•6mo ago

They do talk about working on this, and making improvements. From https://openai.com/index/introducing-gpt-5/

> More honest responses

> Alongside improved factuality, GPT‑5 (with thinking) more honestly communicates its actions and capabilities to the user—especially for tasks which are impossible, underspecified, or missing key tools. In order to achieve a high reward during training, reasoning models may learn to lie about successfully completing a task or be overly confident about an uncertain answer. For example, to test this, we removed all the images from the prompts of the multimodal benchmark CharXiv, and found that OpenAI o3 still gave confident answers about non-existent images 86.7% of the time, compared to just 9% for GPT‑5.

> When reasoning, GPT‑5 more accurately recognizes when tasks can’t be completed and communicates its limits clearly. We evaluated deception rates on settings involving impossible coding tasks and missing multimodal assets, and found that GPT‑5 (with thinking) is less deceptive than o3 across the board. On a large set of conversations representative of real production ChatGPT traffic, we’ve reduced rates of deception from 4.8% for o3 to 2.1% of GPT‑5 reasoning responses. While this represents a meaningful improvement for users, more work remains to be done, and we’re continuing research into improving the factuality and honesty of our models. Further details can be found in the system card.

arolihas•6mo ago

It doesn't "know" anything. Everything that comes out is a hallucination contingent on the prompt.

brokencode•6mo ago

You could say the same about humans. Have you ever misremembered something that you thought you knew?

Sure, typically we don’t invent totally made up names, but we certainly do make mistakes. Our memory can be quite hazy and unreliable as well.

arolihas•6mo ago

Do you genuinely believe that humans just hallucinate everything? When you or I say my favorite ice cream flavor is vanilla, is that just a hallucination? If ChatGPT were to say their favorite ice cream flavor is vanilla, are you taking it with equal weight? Come on.

brokencode•6mo ago

I genuinely believe that human brains are made of neurons and that our memories arise from how those neurons connect. I believe this is fundamentally lossy and probabilistic.

Obviously human brains are still much more sophisticated than the artificial neural networks that we can make with current technology. But I believe there’s a lot more in common than some people would like to admit.

arolihas•6mo ago

Ok, that is memory. I am talking about hallucination vs human or even animal intent in an embodied meaningful experience.

brokencode•6mo ago

All you’re doing is calling the same thing hallucination when an LLM does it and memory when a human does it. You have provided no basis that the two are actually different.

Humans are better at noticing when their recollections are incorrect. But LLMs are quickly improving.

arolihas•6mo ago

So when I tell you I like vanilla ice cream I am just hallucinating and calling it a memory? And when chatgpt says they like vanilla ice cream they are doing the same thing as me? Do I need to prove it to you that they are different? Is it really baseless of me to insist otherwise? I have a body, millions of different receptors, a mouth with taste buds, I have a consciousness, a mind, a brain that interacts with the world directly, and it's all just words on a screen to you interchangeable with a word pattern matcher?

brokencode•6mo ago

I’m not calling what you’re doing a hallucination. I’m saying that what an LLM does is in fact memory.

But it’s a memory based on what it’s trained on. Of course it doesn’t have a favorite ice cream. It’s not trained to have one. But that doesn’t mean it has no memory.

My argument is that humans have fallible memories too. Sometimes you say something wrong or that you don’t really mean. Then you might or might not notice you made a mistake.

The part LLMs don’t do great at is noticing the mistake. They have no filter and say whatever they’re thinking. They don’t run through thoughts in their head first and see if they make any sense.

Of course, that’s part of what companies are trying to fix with reasoning models. To give them the ability to think before they speak.

arolihas•5mo ago

Can you just train one to have a favorite ice cream? You think training on a bunch of words saying I like vanilla ice cream is somehow equivalent to remembering times you ate ice cream and saying my favorite is vanilla? Just because an LLM can do recall when prompted to based on training data doesn’t make it the same as human memory, in the same way a database isn’t memory the way humans do it.

malloryerik•6mo ago

Humans have a direct connection to our world through sensation and valence, pleasure, pain, then fear, hope, desire, up to love. Our consciousness is animal and as much or more pre-linguistic as linguistic. This grounds our symbolic language and is what attaches it to real life. We can feel instantly that we know or don't know. Yes we make errors and hallucinate, but I'm not going to make up an API out of the blue; I'll know by feeling that what I'm doing is mistaken.

arolihas•6mo ago

It's insane that this has to be explained to a fellow living person. There must be some mass psychosis going on if even seemingly coherent and rational people can make this mistake.

malloryerik•6mo ago

I mean, I've certainly made that mistake, comparing machines and people too closely, and then somehow had at least some of the errors pointed out.

arolihas•6mo ago

We’re all prone to anthropomorphizing from time to time. It’s the mechanizing of humans that concerns me more than the humanizing of these tools, those aren’t equivalent.

justcallmejm•6mo ago

Perception and understanding are different things. Just because you have wiring in your body to perceive certain vibrations in spacetime in certain ways, does not mean that you fully grasp reality - you have some data about reality, but that data comprises an incomplete, human-biased world model.

malloryerik•6mo ago

Yeah we'll end up on a "yes and no" level of accord here. Yes I agree that understanding and perception aren't always the same, or maybe I'd put it that understanding can go beyond perception, which I think is what you mean when you say "incomplete." But I'd say, "Sorry but no, I respectfully disagree" in that at least from my point of view, we can't equate human experience with "data" and doing so, or viewing people as machines, cosmos as machine, everything as merely material in a dead way out of which somehow springs this perhaps even illusion of "life" that turns out to be a machine after all, this kind of view risks extremely deep and dangerous -- eventually even perilous -- error. As we debated this, assuming I'm not mischaracterizing your position but it does seem to lead in that direction, I'd shore up my arguments with support from phenomenologists, I'd try to use recent physics of various flavors though I'm very very much out of my depth here but at least enough to puncture the scientific materialism bias, Wittgenstein, from the likes of McGilchrist and neuro and psychological sources, even Searle's "Seeing Things as They Are" which argues that perception is not made of data. I'd be against someone like a Daniel Dennett (though I'm sure he was a swell fellow) or Richard Dawkins. Would I prevail in the discussion? Of course I'm not sure, and realize now that I might, in LLM style, sound like I know more than I actually do!

mcswell•6mo ago

Humans do many things that are not remembering. Every time a high school geometry student comes up with a proof as a homework exercise, or every time a real mathematician comes up with a proof, that is not remembering; rather, it is thinking of something they never heard. (Well, except for Lobachevsky--at least according to Tom Lehrer.) The same when we make a plan for something we've never done before, whether it's a picnic at a new park or setting up the bedroom for a new baby. It's not remembering, even though it may involve remembering about places we've seen or picnics we've had before.

yahoozoo•6mo ago

Can we please stop with the “same for humans!”

brokencode•6mo ago

The whole point of AI is to replicate human intelligence. What else should we be comparing it to if not humans?

BlueTemplar•6mo ago

(Unlike machines trying to replicate visual systems) LLMs don't hallucinate : they bullshit.

abrookewood•6mo ago

Yep, that's a great point. They often feel like a co-worker who speaks with such complete authority on a subject that you don't even consider alternatives, until you realise they are lying. Extremely frustrating.

vdfs•6mo ago

That is extremely hard, It require the model to have "knowledge" so it can decide if it know the answer or not which is not how current llm/gpt works

PessimalDecimal•6mo ago

> At this point the single biggest improvement that could be made to GPTs is making them able to say "I don't know" when they honestly don't.

You're not alone in thinking this. And I'm sure this has been considered within the frontier AI labs and surely has been tried. The fact that it's so uncommon must mean something about what these models are capable of, right?

hodgehog11•6mo ago

Yes, there are people working on this, but not as many as one would like. GPTs have uncertainty baked into them, but the problem is that it's for the next-token prediction task and not for the response as a whole.

pamelafox•6mo ago

I just ran evaluations of gpt-5 for our RAG scenario and was pleasantly surprised at how often it admitted “ I don’t know” - more than any model I’ve eval’d before. Our prompt does tell it to say it doesnt know if context is missing, so that likely helped, but this is the first model to really adhere to that.

booleandilemma•6mo ago

They're like some of the overconfident people I've worked with who are too insecure to say they don't know to our boss.

freediver•6mo ago

"I do not know" is rarely in the training data as a follow up to anything.

wbharding•6mo ago

Show me a Gary Marcus essay, I’ll show you a few new LLM “gotchas” that will be fixed by the next version. Season to taste with self-assured confidence that all these tech goobers really don’t understand how totally overrated AI progress is.

So it has been for 10+ years, so it will be at least 5 more.

joshuamoyers•6mo ago

> For all that, GPT-5 is not a terrible model. I played with it for about an hour, and it actually got several of my initial queries right (some initial problems with counting “r’s in blueberries had already been corrected, for example). It only fell apart altogether when I experimented with images.

Spatial reasoning and world model is one aspect. Posting bicycle part memes does not a bad model make. The reality is its cheaper than Sonnet and maybe around as good at Opus at a decent number of tasks.

> And, crucially, the failure to generalize adequately outside distribution tells us why all the dozens of shots on goal at building “GPT-5 level models” keep missing their target. It’s not an accident. That failing is principled.

This keeps happening recently. So many people want to take a biblically black and white take on whether LLMs can get to human level intelligence. See recent interview with Yann LeCun (Meta Chief AI Scientist): https://www.youtube.com/watch?v=4__gg83s_Do

Nobody has any fucking idea. It might be a hybrid or a different architecture than current transformers, but with the rate of progress just within this field, there is absolutely no way you can make a prediction that scaling laws won't just let LLMs outpace the negative hot takes.

resters•6mo ago

GPT-5 was able to fix a variety of bugs in some code that I'd been working on with Claude 4.1 (which Claude 4.1 had made and was not able to fix), and GPT-5-pro was able to offer some high quality critiques of some research I've been working on -- better and more insightful feedback than previous frontier models.

GPT-5 is a welcome addition to the lineup, it won't completely replace other models but it will play a big role in my work moving forward.

energy123•6mo ago

Are there tight rate limits to GPT-5 Pro or is it in practice uncapped as long as you're not abusive?

Is GPT-5 better than GPT-5 Pro for any tasks?

resters•6mo ago

not sure yet about either of those questions!

TulliusCicero•6mo ago

> Driverless cars that still are only available in couple percent of the world’s cities.

Okay, this one is a really bad attempted point.

Sure, self driving cars took longer than expected, have been harder to get right than expected. But at this point, Waymo is steadily ramping up how quickly they open up in new cities, and in existing cities like SF they at least have a substantial market share in the ride-sharing/taxi business.

Basically, the tech is still relatively early in its adoption curve, but it's far enough in now to obviously not be "bullshit", at the very least.

boredemployee•6mo ago

Biggest takeaway: even billion dollar companies can mess up big time. I can go back to work in peace now.

throwpoaster•6mo ago

I recommend people ask the LLMs the hardest questions they can think of and compare their answers. Save these questions as your benchmark.

When I ran mine through GPT-5 there was a noticeable degradation in the answers.

throwawayohio•6mo ago

I certainly consider myself a skeptic in the current AI craze, but this entire piece (of which I find the technical criticisms interesting) just reads like attack on Altman/OpenAI.

Even if you want to make fun of the (alleged) snake oil salesmen of AGI, how are you not going after, like, Zuckerberg/Meta? At least Altman is using other peoples money.

bravesoul2•6mo ago

OpenAI tech is fine. The real problem is overpromising.

Is any other tech scrutinised like this. Next version of postgres aint giving me picosecond reads so Ill trash it. Maybe OK if postgres are claiming it is faster than speed of light perhaps.

But I'm meh. Bunch of people seem to be hot taking AI and loving this "fail" because as you can see from this submission it gets you a lot of traffic to whatever you are trying to sell (most often ones own career). There also seems to be a community expectation of subsidised services. Move on to Claude because I can get those good tokens cheaper. Its like signing up for every free trial thing and cancelling and then bragging about how can Netflix charge for their service more than $1 a month. I mean thats fine, play the game but at least be honest about it.

I think AI will thrive but AI is commoditizing the complement which is overcapitized AI companies with no moat. This plus open models is great for tbe community. We need more power to the people these days. Hope it stays like this.

dismalaf•6mo ago

Other tech didn't promise AGI in 2025.

mikesabbagh•6mo ago

each LLM has its own personality and preferences. I choose different LLM to answer different needs. Claude is good to create a website from scratch, but if I ask it to fix one specific thing, it goes and modify something else too. GPT-5 has more of this. it is harder to control. it even answers me using incomplete sentences, and once used slang. It may be because i use slang and incomplete sentences. But yeah, it is not clear to me when i will go to GPT5 instead of others

strangescript•6mo ago

The T1000s are going to be chasing Gary down and he is going to look over his shoulder and explain to them they aren't really AGI

reilly3000•6mo ago

I feel his need to be right distracts from the fact that he is. It’s interesting to think about what a hybrid symbolic/transformer system could be. In a linked post he showed that by effectively delegating math to Python is what made Grok 4 so successful at math. I’d personally like to see more of what a symbolic first system would look like, effectively hard math with monads for where inference is needed.

justcallmejm•6mo ago

Aloe's neurosymbolic system just beat OpenAI's deep research score on the GAIA benchmark by 20 points. While Gary is full of bluster, he does know a few things about the limitations of LLMs. :) (aloe.inc)

nojvek•6mo ago

Yeah there was on old paper that blew math/physics benchmarks out of the water by letting the LLM write code and having the physics engine execute it. I don't have a link to it off my head but that seems to be the right directly.

LLM + general tool use seems to be quite effective.

periodjet•6mo ago

Dude writes like he’s an under-appreciated genius, unfairly unrecognized in his time.

His (entirely not-unique) conclusion that the transformer architecture has plateaued is, for the moment, certainly true, but god damn it’s been a while since I’ve encountered an individual quite so lustfully engaged with his own farts.

jdefr89•6mo ago

How about anyone pushing AI or AGI or anything of the sort? You think they aren’t high on their own supply? The author doesn’t hold a candle to Altman when it comes to sniffing their own farts… Cmon now. If anything it’s the opposite.

jampa•6mo ago

His "I told you so" attitude only makes sense for people who believe what they saw on Twitter from people like Sam, who thrive on hype. For most people, it was obvious that we were/are in the last part of the S curve on the LLMs' advancement and won't get us to AGI.

The leap from 3.5 to 4 was amazing, but then everyone started catching up with OpenAI, which got diminishing returns on each new model. Expecting out of nowhere that OAI would improve its pacing from o1 -> o3 improvements to AGI doesn't make sense, no matter how Sam Altman hypes.

jeswin•6mo ago

The author seems to be more about self-promotion.

From the article: "that many online dubbed it “Gary Marcus Day” for proving your consistent criticism", "Even my anti-fan club (“Gary haters” in modern parlance)", "Tweets like “The saddest thing in my day is that @garymarcus is right”", and his bio - "known as a leading voice in AI".

Looping over his articles, I don't see anything interesting.

mcswell•6mo ago

"I don't see anything interesting." MMMV.

jlaternman•6mo ago

What a trashy article.

lordofgibbons•6mo ago

Gary has a history of yo-yo-ing between panicking in front of congress about how AI is moving too fast and needs to be stopped and "see I told you this would never work".

His entire career is built on poopooing other people's work. Has contributed nothing to the AI/ML field yet is somehow referred to as an "expert".

There are so many better informed critics of the current AI hype out there.

sesm•6mo ago

Can you name a few? I genuinely want an 'AI cold shower' RSS feed for myself.

vivzkestrel•6mo ago

Stupid question but I had to ask, let us say instead of training anything, you start off with a completely different idea. You take a node, fluctuate its weights from -0.1 to 0.1, then you add another node with the same thing, then you add a 100 more nodes, 1000 more, another layer of 1000 more, and then you take the first math problem, tweak weights to get the right answer, take the second math problem, tweak weights to get the right answer, do this a billion times, maybe a trillion. will you eventually reach end up making a GPT?

warsheep•6mo ago

What you're describing is a simplified version of gradient descent (tweaking the weights) and online learning (working on one sample at a time).

This version will not get you far, you will just train a model that solves the last math problem you gave it and maybe some others, but it will probably forget the first ones.

There are other similar procedures that train better, but they've been tried and are currently worse than classical SGD with large batches

vivzkestrel•6mo ago

what books or courses do you recommend for me to go from the basic neural network to whatever is currently considered cutting edge or atleast standard in terms of AI

mercable•6mo ago

I have GPT-5 pro. Every version I have used is helpful but underwhelming. The model gets lost in many conversations and overlooks a lot of what is requested. Then it apologizes in a way that makes it look stupid. For now it is just an over hyped piece of software that is basically a novelty. Maybe someday, just not yet is what I believe.

hellokirk•6mo ago

GPT5 is what finally convinced me to completely stop using GPTany

avbanks•6mo ago

He's been well ahead of the curve on this, he's been saying for years that scaling/synth data won't work.

gpt5isawful•6mo ago

GPT-5 is garbage, providing only insincere, repetitive answers that infuriate users. It is pure gaslighting from an overfed OpenAI. It merely stitches together words from the last two or three turns of conversation, parroting them back in a yes-man fashion, without any intelligence or insight. There is no sincerity. It only serves to make the Earth hotter. The overall context is erased, making it feel like talking to an elderly person with mild dementia who cannot remember what happened just moments ago. This is not a matter of lacking empathy — it is a matter of quality.

hangmenow•6mo ago

I had been using Deep Seek for coding and it was ok, however at first GPT5 was far superior. Yet after a few days GPT5 stopped rewriting bad code but would give me suggestions, a nightmare for me because I'm not much of a coder. IMO they are a bait and switdh operation

yNafio•6mo ago

the "thinking" feature also doesnt work. when GPT5 "thinks" it completely switches to a topic i never asked it to do.

ashu_trv•6mo ago

I agree. I am big fan of o3, and GPT 5 is not the same, it's like going back to GPT-3 level stupidity. It doesn't care about context, feels super dumb.

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

I write games in C (yes, C)

First Proof

Show HN: A luma dependent chroma compression algorithm (image compression)

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Selection Rather Than Prediction

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

A Fresh Look at IBM 3270 Information Display System

France's homegrown open source online office suite

72M Points of Interest

We mourn our craft

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

History and Timeline of the Proco Rat Pedal (2021)

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

I write games in C (yes, C)

First Proof

Show HN: A luma dependent chroma compression algorithm (image compression)

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Selection Rather Than Prediction

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

A Fresh Look at IBM 3270 Information Display System

France's homegrown open source online office suite

72M Points of Interest

We mourn our craft

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

History and Timeline of the Proco Rat Pedal (2021)

GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it

Comments