Last I saw he hasn’t produced anything but general “pop” books on AI and being associated with MIT, which IMO has zero weight on applied or even at this point theoretical AI, as that is primarily coming out of corporate labs.
No new algorithms, frameworks, datasets, products, insights.
Why is this guy relevant enough to keep giving him attention, his entire ouvre is just anti-whatever is getting attention in the “AI” landscape
I don’t see him commenting on any other papers and he has no lab or anything
Someone make it make sense, or is it as simple as “yeah thing makes me feel bad, me repost, me repeat!”
These are issues that the industry initially denied, only to (years) later acknowledge them as their "own recent discoveries" as soon as they had something new to sell (chain-of-thought approach, RL-based LLM, tbc.).
> diminishing returns of scaling laws
This was so obvious it didn't need mentioning. And what Gary really missed is that all you need are more axes to scale over and you can still make significant improvements. Think of where we are now vs 2023.
> lack of true reasoning (out of distribution generalizability) in LLM-type AI
To my understanding, this is one that he has gotten wrong. LLMs do have internal representations, exactly the kind that he predicted they didn't have.
> These are issues that the industry initially denied, only to (years) later acknowledge them
The industry denies all their limitations for hype. The academic literature has all of them listed plain as day. Gary isn't wrong because he's contradicted the hype of the tech labs, he's wrong because his short-term predictions were proven false in the literature he used to publish in. This was all in his efforts to peddle neurosymbolic architectures which were quickly replaced by tool use.
I think the hype is coming from people who have no idea what is going on and just feeding on each other
Much like blockchain, metaverse or whatever was dominated by know nothings who spoke confidently to people even dumber than them
No professionals that have any experience or research credentials have made any crazy claims
I would argue that the claim of "LLMs will never be able to do this" is crazy without solid mathematical proof, and is risky even with significant empirical evidence. Unfortunately, several professionals have resorted to this language.
At some point I entertained a few discussions where Gary Marcus was participating but from what I remember, it would never go anywhere other than a focus on playing around with definitions. Even if he's not wrong about some of his claims, I think there are better people worth engaging with. The amount of insight to be gained from listening to Gary Marcus is akin to that of a small puddle.
Regardless of personal opinions about his style, Marcus has been proven correct on several fronts, including the diminishing returns of scaling laws and the lack of true reasoning (out of distribution generalizability) in LLM-type AI.
These are issues that the industry initially denied, only to (years) later acknowledge them as their "own recent discoveries" as soon as they had something new to sell (chain-of-thought approach, RL-based LLM, tbc.).
Hopefully the innovation slowing means that all the products I use will move past trying to duck tape AI on and start working on actual features/bugs again
https://kairos.fm/muckraikers/
I personally struggle with Gary Marcus critiques because whenever they are about "making ai work" it goes into neurosymbploc "AI" which o have technical disagreements with, and I have _other_ arguments for the points he sometimes raises which I think are more rigorous, so it's difficult to be roughly in the same camp - but overall I'm happy someone with reach is calling BS ad well.
I think most hit pieces like this miss what is actually important about the 5 launch - it’s the first product launch in the space. We are moving on from model improvements to a concept of what a full product might look like. The things that matter about 5 are not thinking strength, although it is moderately better than o3 in my tests, which is roughly what the benchmarks say.
What’s important is that it’s faster, that it’s integrated, that it’s set up to provide incremental improvements (to say multimodal interaction, image generation and so on) without needing the branding of a new model, and I think the very largest improvement is its ability to retain context and goals over a very long set of tools uses.
Willison mentioned it’s his only daily driver now (for a largely coding based usage setup), and I would say it’s significantly better at getting a larger / longer / more context needed coding task than the prior best — Claude - or the prior best architects (o3-pro or Gemini depending). It’s also much faster than o3-pro for coding.
Anyway, saying “Reddit users who have formed parasocial relationships with 4o didn’t like this launch -> oAI is doomed” is weak analysis, and pointless.
It’s both too slanted to be journalism, but not original enough to be analysis.
I tend to think HN's moderation is OK, but I think these sorts of low-curiosity articles need to be off the front page.
This is well earned by the likes of OpenAI that is trying to convince everyone they need trillions of dollars to build fabs to build super genius AIs. These super genius AIs will replace everyone (except billionaires) and act as magic money printers (for billionaires).
Meanwhile their super genius precursor AIs make up shit and can't count letters in words while being laughably sycophantic.
There's no need to defend poor innocent megacorps trying to usher in a techno-feudal dystopia.
That doesn’t mean any article mocking it or trashing it is well written or insightful.
This really hasn't been a thing since reasoning models showed up. Any recent example of such seems to come from non-reasoning variants.
>laughably sycophantic
Mainstream models are moving away from this, afaik. Part of the recent drama is that GPT-5 wasn't sycophantic enough for some users.
I think its broader to all tech. It all started in 2016 after it was deemed that tech, especially social media, had helped sway the election. Since then a lot of things became political that weren't in the past and tech got swept up w/ that. And unfortunately AI has its haters despite the fact that it's objectively the fastest growing most exciting technology in the last 50 years. Instead they're dissecting some CEOs shitposts.
Fast forward to today, pretty much everything is political. Take this banger from NY Times:
> Mr. Kennedy has singled out Froot Loops as an example of a product with too many ingredients. In an interview with MSNBC on Nov. 6, he questioned the overall ingredient count: “Why do we have Froot Loops in this country that have 18 or 19 ingredients and you go to Canada and it has two or three?” Mr. Kennedy asked.
> He was wrong on the ingredient count, they are roughly the same. But the Canadian version does have natural colorings made from blueberries and carrots while the U.S. product contains red dye 40, yellow 5 and blue 1 as well as Butylated hydroxytoluene, or BHT, a lab-made chemical that is used “for freshness,” according to the ingredient label.
No self-awareness.
> It all started in 2016 after it was deemed that tech, especially social media, had helped sway the election. Since then a lot of things became political that weren't in the past and tech got swept up w/ that
The 2016 election was a symptom of broader societal changes, and yeah, I'd also say it exhibited a new level of psychological manipulation in election campaigns. But the election being "deemed" influenced by technology and media (sure it was, why not?) as a cause for political division seems very far-fetched. Regarding the US healthcare politics farce, I don't understand your point or how it relates to the beginning of your comment.
Political division and propaganda inciting outrage are flourishing, yes. Not because of what you describe about the 2016 election though, IMO. What's the connection? So you mean if nobody had assessed social media campaigns after that election, politics wozld be more honest and fact-based? Why? And what did you want to say with the NY Times article about the US secretary of health and American junk food / cereals?
My concern is there is no antidote for this in the horizon it’s just more and more stupid getting traction all the time. You have to put a lot of faith in common sense to stay optimistic.
The only antidotes I can imagine are honesty and real community: make whatever you want from this, it should be obvious by now that global cut-throat capitalism does not lead to democracy, or to efficient resource usage (externalization...), or to equality.
I mean, what's political about having former NSA heads on your "exciting technology" board?
Or what's political about lining up together as the front row at the despot in chief's inauguration?
And what's so political about lobbying to and sequestering large amounts of public funds for your personal manufactured consent machine?
These things are literally software that runs on technology developed in the last 50 years, but by your clearly apolitical, unbiased, absolutely thoughtful, well reasoned, fully researched, insight is in fact "the most exciting technology in the last 50 years".
what's objective about this opinion? how does one objectively measure how exciting a technology is?
A lot of these issues are not general to infocoms either, but specific to platforms, to USian (/Russian/Chinese) companies, to companies grown too big and a failure of antitrust...
If it hurts your feelings to have your terrible opinions "dunked" on then take the time to form better opinions.
Thoughtful, nuanced takes simply cannot generate the same audience and velocity, and by the time you write something of actual substance the moment is gone and the hyper-narrative has moved on.
> That’s exactly what it means to hit a wall, and exactly the particular set of obstacles I described in my most notorious (and prescient) paper, in 2022. Real progress on some dimensions, but stuck in place on others.
The author includes their personal experience — recommend reading to the end.
His takes often remind me of Jim Cramer’s stock analysis — to the point I’d be willing to bet on the side of a “reverse Gary Marcus”.
Its a blog post.
I prefer the 4th power definition : if it has the power of broadcast (yes here), then it's journalism.
https://news.ycombinator.com/item?id=44278811
I think you're absolutely right about this being a wider problem though.
It’s a classic HN comment asking for nuance and discrediting Gary. It’s about how Gary is always following classic mob mentality, so of course it’s not slanted at all and commenting about the accuracies of the post.
So ironically you’re saying Gary’s shit is supposed to be that way and you’re criticizing the HN comment for that, but now I’m criticizing you for criticizing the comment because HN comments ARE supposed to be the opposite of Gary’s bullshit opinions.
I expect to read better stuff on HN. Not this type of biased social media violence and character take downs.
This low-effort hot take is every bit as "valid" as all the nepobaby vibecode hype garbage: we decided to do the AI thing. This is the AI thing.
What's your point? This one is on the critical side of the argument that was stupid in toto to begin with?
[1] Due credit to Yann for his 'LLMs will stop scaling, energy based methods are the way forward' obsession.
I get what you mean though, even if he's right it must've been pretty annoying hearing the same thing all the time.
I bet it could generate assembly instructions and list each part and help diagnose or tune. And that's remarkable and demonstrates enough fake understanding to be useful.
If AI gains now come from spending OOMs more on inference than compute, it means we're in slow takeoff-istan and they're going to need to keep the hype train going for a long time to keep investment up.
Can they get there? Is it a matter of waiting out the clock on Moore's law? Or are they on the sigmoid curve with inference based gains well?
That's the question of our time and it's completely ignored.
I'm eagerly awaiting more insightful analyses about how AI will create a perpetuum mobile of wealth for all of humanity, independent from natural resource consumption and human greed. It seems so logical!
Even some lesswrong discussions, however confused, are more insightful than most blog posts by AI company customers.
Fascinating technology though, sure.
They do, however, have a major lead in terms of consumer adoption. To normal people who use llm's, ChatGPT is _the_ model.
This gives them a lot of opportunities. I don't know what's taking them so long to launch their own _real_ app store, but that's the game they are ahead of everyone else because of the consumer adoption.
I know AI hype is truly insane, but surely nobody actually believed the singularity was upon us?
It seems to lose the thread of the conversation quite abruptly, not really knowing how to answer the next comment in a thread of comments.
It's like there is some context cleanup process going on and it's not summarizing the highlights of the conversation to that point.
If that is so, then it seems to also have a very small context, because it seems to happen regularly.
Asking it to 'Please review the recent conversation before continuing' prompt seems to help it a bit.
It feels physically jarring when it loses the plot with a conversation, like talking to someone who wasn't listening.
I'm sure its a tuning thing, I hope they fix it soon.
I swear I had an understanding of how to get deep analytical thinking out of o3. I am absolutely struggling to get the same results with GPT-5. The new model feels different and frustrating to use.
He sent me all these articles geared toward that end as well. https://garymarcus.substack.com/p/seven-replies-to-the-viral... https://substack.com/@cattelainf/note/c-135021342 https://arxiv.org/abs/2002.06177 https://garymarcus.substack.com/p/the-ai-2027-scenario-how-r... https://garymarcus.substack.com/p/25-ai-predictions-for-2025...
Stochastic parrots will never be better than humans
This is really the only part of the article I think was worth writing.
-People should expect an incremental advance
-Providers should not promise miracles
Managing expectations is important. The incremental advances are still advances, though, even if I don't think "AGI" is just further down on the GPT trajectory.
my personal feeling gpt5-thinking is much faster but doesnt produce the same quality results as o3 which were capable to scan through the code base dump with file names and make correct calls
dont feel any changes with https://chatgpt.com/codex/
my best experience was to use o3 for task analysis, copy paste the result in https://chatgpt.com/codex/, work outside and vibe code from mobile
Gpt5 was an incremental improvement. That’s fine. Was hyped hard but what did you expect? It’s part of the game
It makes me crazy that this kind of institutionalized lying is so normal in the Valley that we get comments like yours shaming people for not understanding that lies are the default baseline. Can't we expect better? This culture is what gives us shit like Theranos, where we all pretend to be shocked even though any outside analysis could see it was an inevitable outcome.
Please check out claims made by supplements, which are unregulated by the FDA. You’ll find institutionalized lying there, as well.
Any claim that can be made without being held up to false advertising will be made.
That lying is common, does not mean one cannot criticize an entity for lying.
It should get you sent to jail. I've had enough of empty promises. How much capital is misallocated because it's chasing this bullshit?
there's no second internet of high quality content to plagarise
and the valuable information on the existing one is starting to be locked down pretty hard
Meanwhile the fraction of text that the real world consists of is microscopic.
The whole thing feels less like “Hey, this is why I think the model is bad” and more like the kind of sensationalist headline you’d read in a really trashy tabloid, something like: “ChatGPT 5 is Hot Garbage, Totally Fails, Sam Altman Crushed Beneath His Own Failure.”
Also, I have no idea why people give so much attention to what this guy has to say.
* The quality of responses from GPT-5 compared to O3 is lacking. It does very few rounds of thinking and doesn't use web search as O3 used to. I've tried selecting "thinking", instructing explicitly, nothing helps. For now, I have to use Gemini to get similar quality of outputs.
* Somehow, custom GPTs [1] are now broken as well. My custom grammar-checking GPT is ignoring all instructions, regardless of the selected model.
* Deep research (I'm well within the limit still) is broken. Selecting it as an option doesn't help, the model just keeps responding as usual, even if it's explicitly instructed to use deep research.
Their model is not to have a 20USD, no ads plan in the future.
To be fair sam altman did set (and fanned the flames of ) those expectations.
It seems Sam Altman's Death Star had a critical design flaw after all, and Gary Marcus is taking a well-earned victory lap around the wreckage. This piece masterfully skewers the colossal hype balloon surrounding GPT-5, reframing its underwhelming debut not as a simple fumble, but as a predictable, principled failure of the entire "scaling is all you need" philosophy. By weaving together viral dunks on bike-drawing AIs, damning new research on generalization failures, and the schadenfreude of "Gary Marcus Day," the article makes a compelling case that the industry's half-a-trillion-dollar bet on bigger models has hit a gilded, hallucinatory wall. Beyond the delicious takedown of one company's hubris, the post serves as a crucial call to action, urging the field to stop chasing the mirage of AGI through brute force and instead invest in the harder, less glamorous work of building systems that can actually reason, understand, and generalize—perhaps finally giving neurosymbolic AI the chance Altman's cocky tweet so perfectly, and accidentally, foreshadowed for the Rebel Alliance.
My take on GPT-5? Latency is a huge part of the LLM experience. Smart model routing can be a big leap forward in reducing wait times and improving usability. For example, I love Gemini 2.5 Pro, but it’s painfully slow (sorry, GDM!). I also love the snappy response-time of 4o. The most ideal? Combine them in a single prompt with great model routing. Is GPT-5’s router up to the task? We soon shall see.
Presuming the last two are from 5, they are to my eyes next generation in terms of communication — that’s a spicy take on neurosymbolic AI, not a rehashed “safe” take. Also, the last paragraph is almost completely to the point, no? Have you spent much time waiting for o3 pro to get back to you recently, and wondered if you should re-run something faster? I have. A lot. I’d like the ability to put my thumb on the scale of the router, but I’d dearly love a per token / per 100 token router that can be trained and has latency without major latency intelligence hits as a goal.
Btw I didn't agree with Gemini at all :) I just thought it gave a pretty good summary of Gary Marcus's points.
Just today I was playing around with modding Cyberpunk 2077 and was looking for a way to programmatically spawn NPCs in redscript. It was hard to figure out, but I managed. ChatGPT 5 just hallucinated some APIs even after doing "research" and repeatedly being called out.
After 30 minutes of ChatGPT wasting my time I accepted that I'm on my own. It could've been 1 minute.
I mean it's all probability right? Must be a way to give it some score.
I think the closest you can get without more research is another model checking the answer and looking for BS. This will cripple speed but if it can be more agentic and async it may not matter.
I think people need to choose between chat interface and better answers.
In my case it was consuming online sources, then repeating "information" not actually contained therein. This, at least, is absolutely preventable even without any metacognition to speak of.
> More honest responses
> Alongside improved factuality, GPT‑5 (with thinking) more honestly communicates its actions and capabilities to the user—especially for tasks which are impossible, underspecified, or missing key tools. In order to achieve a high reward during training, reasoning models may learn to lie about successfully completing a task or be overly confident about an uncertain answer. For example, to test this, we removed all the images from the prompts of the multimodal benchmark CharXiv, and found that OpenAI o3 still gave confident answers about non-existent images 86.7% of the time, compared to just 9% for GPT‑5.
> When reasoning, GPT‑5 more accurately recognizes when tasks can’t be completed and communicates its limits clearly. We evaluated deception rates on settings involving impossible coding tasks and missing multimodal assets, and found that GPT‑5 (with thinking) is less deceptive than o3 across the board. On a large set of conversations representative of real production ChatGPT traffic, we’ve reduced rates of deception from 4.8% for o3 to 2.1% of GPT‑5 reasoning responses. While this represents a meaningful improvement for users, more work remains to be done, and we’re continuing research into improving the factuality and honesty of our models. Further details can be found in the system card.
Sure, typically we don’t invent totally made up names, but we certainly do make mistakes. Our memory can be quite hazy and unreliable as well.
Obviously human brains are still much more sophisticated than the artificial neural networks that we can make with current technology. But I believe there’s a lot more in common than some people would like to admit.
Humans are better at noticing when their recollections are incorrect. But LLMs are quickly improving.
But it’s a memory based on what it’s trained on. Of course it doesn’t have a favorite ice cream. It’s not trained to have one. But that doesn’t mean it has no memory.
My argument is that humans have fallible memories too. Sometimes you say something wrong or that you don’t really mean. Then you might or might not notice you made a mistake.
The part LLMs don’t do great at is noticing the mistake. They have no filter and say whatever they’re thinking. They don’t run through thoughts in their head first and see if they make any sense.
Of course, that’s part of what companies are trying to fix with reasoning models. To give them the ability to think before they speak.
You're not alone in thinking this. And I'm sure this has been considered within the frontier AI labs and surely has been tried. The fact that it's so uncommon must mean something about what these models are capable of, right?
So it has been for 10+ years, so it will be at least 5 more.
Spatial reasoning and world model is one aspect. Posting bicycle part memes does not a bad model make. The reality is its cheaper than Sonnet and maybe around as good at Opus at a decent number of tasks.
> And, crucially, the failure to generalize adequately outside distribution tells us why all the dozens of shots on goal at building “GPT-5 level models” keep missing their target. It’s not an accident. That failing is principled.
This keeps happening recently. So many people want to take a biblically black and white take on whether LLMs can get to human level intelligence. See recent interview with Yann LeCun (Meta Chief AI Scientist): https://www.youtube.com/watch?v=4__gg83s_Do
Nobody has any fucking idea. It might be a hybrid or a different architecture than current transformers, but with the rate of progress just within this field, there is absolutely no way you can make a prediction that scaling laws won't just let LLMs outpace the negative hot takes.
GPT-5 is a welcome addition to the lineup, it won't completely replace other models but it will play a big role in my work moving forward.
Is GPT-5 better than GPT-5 Pro for any tasks?
Okay, this one is a really bad attempted point.
Sure, self driving cars took longer than expected, have been harder to get right than expected. But at this point, Waymo is steadily ramping up how quickly they open up in new cities, and in existing cities like SF they at least have a substantial market share in the ride-sharing/taxi business.
Basically, the tech is still relatively early in its adoption curve, but it's far enough in now to obviously not be "bullshit", at the very least.
When I ran mine through GPT-5 there was a noticeable degradation in the answers.
Even if you want to make fun of the (alleged) snake oil salesmen of AGI, how are you not going after, like, Zuckerberg/Meta? At least Altman is using other peoples money.
Is any other tech scrutinised like this. Next version of postgres aint giving me picosecond reads so Ill trash it. Maybe OK if postgres are claiming it is faster than speed of light perhaps.
But I'm meh. Bunch of people seem to be hot taking AI and loving this "fail" because as you can see from this submission it gets you a lot of traffic to whatever you are trying to sell (most often ones own career). There also seems to be a community expectation of subsidised services. Move on to Claude because I can get those good tokens cheaper. Its like signing up for every free trial thing and cancelling and then bragging about how can Netflix charge for their service more than $1 a month. I mean thats fine, play the game but at least be honest about it.
I think AI will thrive but AI is commoditizing the complement which is overcapitized AI companies with no moat. This plus open models is great for tbe community. We need more power to the people these days. Hope it stays like this.
LLM + general tool use seems to be quite effective.
His (entirely not-unique) conclusion that the transformer architecture has plateaued is, for the moment, certainly true, but god damn it’s been a while since I’ve encountered an individual quite so lustfully engaged with his own farts.
The leap from 3.5 to 4 was amazing, but then everyone started catching up with OpenAI, which got diminishing returns on each new model. Expecting out of nowhere that OAI would improve its pacing from o1 -> o3 improvements to AGI doesn't make sense, no matter how Sam Altman hypes.
From the article: "that many online dubbed it “Gary Marcus Day” for proving your consistent criticism", "Even my anti-fan club (“Gary haters” in modern parlance)", "Tweets like “The saddest thing in my day is that @garymarcus is right”", and his bio - "known as a leading voice in AI".
Looping over his articles, I don't see anything interesting.
His entire career is built on poopooing other people's work. Has contributed nothing to the AI/ML field yet is somehow referred to as an "expert".
There are so many better informed critics of the current AI hype out there.
This version will not get you far, you will just train a model that solves the last math problem you gave it and maybe some others, but it will probably forget the first ones.
There are other similar procedures that train better, but they've been tried and are currently worse than classical SGD with large batches
mikert89•6mo ago
I dont see anyone talking about GPT 5 Pro, which I personally tested against:
- Grok 4 Heavy
- Opus 4.1
It was far better than both of those, and is completely state of the art.
The real story is running these models at true performance max likely could go into the thousands per month per user. And so we are being constrained. OpenAI isnt going for that market segment, they are going for growth to take on Google.
This article doesnt have one reference to the Pro model. Completely invalidates this guys opinion
w00ds•6mo ago
p1esk•6mo ago
furyofantares•6mo ago
jonny_eh•6mo ago
quinncom•6mo ago
patrickhogan1•6mo ago
So I think it’s also a way to push reasoning models to the masses. Which increases OpenAI’s cost.
But due to the routing layer definitely cost cutting for power users (most of HN)… except power users can learn to force it to use the reasoning model.
mikert89•6mo ago
atonse•6mo ago
I remember reading that 4o was the best general purpose one, and that o3 was only good for deeper stuff like deep research.
The crappy naming never helped.
p1esk•6mo ago
atonse•5mo ago
Workaccount2•6mo ago
adeptima•6mo ago
any feedback is greatly appreciated!!! especially comparing with o3
mikert89•6mo ago
energy123•6mo ago
Is GPT-5 better than GPT-5 Pro for any tasks?
A_D_E_P_T•6mo ago
I'll agree that it's superhuman and state-of-the-art at certain tasks: Formal logic, data analysis, and basically short analytical tasks in general. It's better than any version of Grok or Gemini.
When it comes to writing prose, and generally functioning as a writing bot, it's a poor model, obviously and transparently worse than Kimi K2 and Deepseek R1. (It never ceases to amaze me that the best English prose stylists are the Chinese models. It's not just that they don't write in GPT's famous "AI style," it's to the point where Kimi is actually on par with most published poets.)
mikert89•6mo ago
I have a bug that was a complex interaction between backend and front end over websockets. The day before I was banging my head against the wall with o3 pro and grok heavy, gpt5 solved it first try.
I think its also true that most people arent pushing the limits on the models, and dont even really know how to correctly push the limits. Which is also why openai is not focussed on the best models
I_am_tiberius•6mo ago
mikert89•6mo ago
Another thing I’ll add though, is o3 pro is better through the api than the chat website. They clearly constrain it unless you’re paying the absurd api cost
I_am_tiberius•6mo ago
nojs•6mo ago
mikert89•6mo ago
I actually think cursor alone is not that good
energy123•6mo ago
If I send about 60k tokens, the model can't see the question (at the bottom of the text). I need to reduce it to 50k.
If I send two prompts with 40k tokens, the model can't see the beginning of the first prompt.
This seems quite unethical given they advertise 128k context, and I doubt it's an accident (since it runs in the direction of cost savings).
happycube•6mo ago
I've also heard hearsay that R1 is quite clever in Chinese, too.
vintagedave•6mo ago
Could you provide some examples, please? I find this really exciting. I’ve never yet encountered an AI with good literary writing style.
And poetry is really hard, even for humans.
DrewADesign•6mo ago
vintagedave•6mo ago
Plus their creative output in literary quality is dreary, dull, and dire. That's why I was so curious for the OP to share examples.
orbital-decay•6mo ago
awesome_dude•6mo ago
The offerings are evolving and upgrading at quite a rapid pace, so locking into one company's offering, or another's, is really wasted money (Why pay 200/year upfront for something that looks like it will be outdated within the next month (or quarter))
> The real story is running these models at true performance max likely could go into the thousands per month per user.
A loss leader model like that failed for Uber, because there really wasn't any other constraints on competition doing the same, including under pricing to capture market share - meaning it's a race to the bottom plus a test on whose pockets were the deepest.
heyoni•6mo ago
awesome_dude•6mo ago
I personally haven't tried GPT 5 yet, but I am getting all I need from Claude and Gemini.
Once I start experimenting with GPT 5.0 - I will still use Claude and Gemini when I run out of free uses.
diego_sandoval•6mo ago
These models make me much more productive anyway. That is worth far more than $20.
awesome_dude•6mo ago
Having said that it was a good circuit breaker, Claude was stuck on three wrong solutions to an issue, and having to re explain it meant that I realised what the bug was without further input from Claude
edit: FTR where Claude was stuck (and where it helped me)
First, I have a TUI application, written in Go, that is using https://github.com/rivo/tview - I chose tview basically at random, having no background in any of the TUI libraries available, but tview looked like it was easy to understand.
I had written some of the code, but it wasn't doing what I wanted of it. I had gotten Gemini to help me, but it still wasn't doing what I wanted.
Claude first suggested that the application needed to be refactored to make it into an Event Driven MVC TUI app, which I absolutely agreed with.
Claude counselled me on my understanding of the Run() command, and how it was a blocking call (which I hadn't realised). The refactor/re-architecture fixed the way things were being run a great deal.
However I still had a bug that I could not fix, and Claude was equally stumped (as was Gemini)
When I clicked on any of the components in a flex row, the first component behaved as though it was the one that was clicked. When I clicked on any of the components in another flex row the last one behaved as though it was the one that had been clicked.
Claude repeatedly told me that the problem was that the closure that used the index of components (inside a loop) was the problem.
This was outright wrong because
1. That bug has been fixed in Go.
2. The code I was sharing with Claude had the "fix" of local := index - meaning that the local version should have been correct
3. I repeatedly showed Claude logs (that it suggested I create) that showed that ALL of the components were in fact receiving the click event.
The second solution that Claude was fixated on was that th components overlapped one another inside the flex box.
This was partially incorrect.
I told Claude that I could visually see that the components weren't overlapping.
I also told Claude that there were borders around each component that clearly weren't overlapping.
I gave Claude debug logs that showed that the mouse click co-ordinates were inside a single component.
The third issue that Claude claimed was the problem was a click propagation bug in the library.
Claude repeated these claims several times, despite me showing that it wasn't either of the first two things, and I could not find a bug/issue for the third.
The circuit breaking made me stop and think about things, and I realised that all I really had to do was say inside the box "If a click event has been received AND (and this was the fix) You (the component) now have focus then behave".
What I suspect is happening is that the flex box container receives the click, and then tells all of its children that a click happened.
What disappoints me is, if what I suspect is true, then I would have expected Claude to have known that either from the tview documentation OR from reading the tview code (Claude does seem well acquainted with the library)
If my suspicion is correct then the second and third issues Claude claimed were causing me the bug were partially true, that is the flex container is on top of the components, which is an overlap, just not the components themselves overlapping.
The second partial correctness is that the way that the click is being propagated to the components seems to be that the flex box container is telling all of its child components that a click occurred. This isn't really a bug, it's likely well documented (as if I would RTFM...) or clear from the code.
jgb1984•6mo ago
awesome_dude•6mo ago
I have been a SWE for about 15 years, the majority of my career has been dealing with people that have about the same understanding as the LLMs, who scream abuse at me because their suggested solutions were invalid (in fact you can see someone heading down that path in another thread on this page)
The LLMs are less likely to run to HR when I tell them to eff 0ff with their stupidity (tested - told Claude that it had already effing suggested the bad index one and it was wrong because..., to which it politely apologised.. and re-suggested another one of its three suggestions)
So, from that point of view - LLMs are superior to some of the "senior" developers I have the misfortune of having to deal with previously.
As for my patience - I don't think I am being patient so much as doggedly determined to finding what I know is a fixable bug, that is, I will just keep gnawing at what I think should be fixable until a solution comes along, or I find something else shiny to occupy my spare time with (this being two side projects - the code and the testing of how good LLMs really are)
heyoni•6mo ago
awesome_dude•6mo ago
FTR, me being told what to do by the other individual hardly sounds like I'm the one mentoring... but failure to read and comprehend has already proved to be your style.
heyoni•6mo ago
awesome_dude•6mo ago
ahartmetz•6mo ago
awesome_dude•6mo ago
theshrike79•5mo ago
I tried to get a game prototype up and running with https://spacetimedb.com - GPT-5 was clearly working with outdated information and couldn't get anything done. It just reverted back to things that didn't exist (commands had changed, their arguments were different).
Same project, same GPT-5. This time it went in weird circles when I tried to use Deno as the backend. First it understood how to import Phaser to Deno, then it forgot. Then it remembered. Then it forgot. All within the same conversation with us trying to get SOMETHING on the browser =)
heyoni•6mo ago
awesome_dude•6mo ago
The reason for not buying into a monthly subscription is not a reason thati should put shouldn't buy a year long one and is not negated in any way
heyoni•6mo ago
As for why pay for access when you can get on free tiers? Well, because the pay models are far far better than the free ones. That's it.
awesome_dude•6mo ago
Your idea wasn't valid, no need for all the nastiness
wood_spirit•6mo ago
awesome_dude•6mo ago
I'm not enough of a Business Major to know how they could monetise things, but I am enough of a realist to think that they can't stay like this forever
disgruntledphd2•6mo ago
I would be incredibly surprised if OpenAI didn't go for advertising, given that they hired Fidji Simo (who PM'd feed ads at Facebook, and introduced ads to Instacart).
wood_spirit•6mo ago
kldg•6mo ago
BUT, of course, if you can get more value out of it inside a month or year than you put in, it doesn't really matter. Differences between frontier models feels pretty low to me currently, which was not always the case; even so, they're certainly far ahead of free models for my uses.
I did pay for OpenAI both in a $20/mo subscription and later for API tokens (came out cheaper than the subscription). Since Gemini 2.5 came out, though, I just abuse Google's free AI Studio (at least free in the US; idk what kind of geo-gating they do). I'm not paying Google, but they are keeping me from paying their competitors. It will take a large and hard-gated leap forward (or Google deciding my $150-250 worth of unpaid use has gone on long enough) to get me to pay again.
krnsll•6mo ago