LLMs are a 400-year-long confidence trick

https://tomrenner.com/posts/400-year-confidence-trick/

117•Growtika•3w ago

Comments

mossTechnician•3w ago

"AI safety" groups are part of what's described here: you might assume from the general "safety" label that organizations like PauseAI or ControlAI would focus things like data center pollution, the generation of sexual abuse material, causing mental harm, or many other things we can already observe.

But they don't. Instead, "AI safety" organizations all appear to exclusively warn of unstoppable, apocalyptic, and unprovable harms that seem tuned exclusively to instill fear.

ltbarcly3•3w ago

You are the masses. Are you afraid?

Xss3•3w ago

Hn commenters are not representative

ltbarcly3•3w ago

Everyone thinks they are special right? Thinking you are special suggests you likely aren't that special (not saying this about you personally, but still).

das_keyboard•3w ago

They don't need to instill fear in everyone, but only a critical mass and most importantly _regulators_.

So there will be laws because not everyone can be trusted to host and use this "dangerous", new tech.

And then you have a few "trusted" big tech firms forming an oligopoly of ai, with all of the drawbacks.

iNic•3w ago

We should do both and it makes sense that different orgs have different focuses. It makes no sense to berate one set of orgs for not working on the exact type of thing that you want. PauseAI and ControlAI have each received less than $1 million in funding. They are both very small organizations as far as these types of advocacy non-profits go.

mossTechnician•3w ago

If it makes sense to handle all of these issues, then couldn't these organizations just acknowledge all of these issues? If reducing harm is the goal, I don't see a reason to totally segregate different issues, especially not by drawing a dividing line between the ones OpenAI already acknowledges and the ones it doesn't. I've never seen any self-described "AI safety" organizations that tackles any of the present-day issues AI companies cause.

iNic•3w ago

If you've never seen it then you haven't been paying attention. For example Anthropic (the biggest AI org which is "safety" aligned) released a big report last year on metal well being [1]. Also here is their page on societal impacts [2]. Here is PauseAI's list of risks [3], it has deepfakes as its second issue!

The problem is not that no one is trying to solve the issues that you mentioned, but that it is really hard to solve them. You will probably have to bring large class action law suits, which is expensive and risky (if it fails it will be harder to sue again). Anthropic can make their own models safe, and PauseAI can organize some protests, but neither can easily stop grok from producing endless CSAM.

[1] https://www.anthropic.com/news/protecting-well-being-of-user...

[2] https://www.anthropic.com/research/team/societal-impacts

[3] https://pauseai.info/risks

mossTechnician•3w ago

PauseAI's official proposal recommends[0]: "Only allow deployment of models after no dangerous capabilities are present." Their list of dangerous capabilities[1] does not include deepfakes, but it does include several unrealized ones that fit the description of this post here, including "a recursive loop of self-improvement, spinning rapidly out of control... called an intelligence explosion".

I appreciate you pointing out the Risks page though, as it does disprove my hyperbole about ignoring present-day harms completely, although I was disheartened that the page just appears to list things that they believe actions "could be mitigated by a Pause" (emphasis mine).

[0]: https://pauseai.info/proposal

[1]: https://pauseai.info/dangerous-capabilities

rl3•3w ago

It's almost like there's enough people in the world that we can focus on and tackle multiple problems at once.

ACCount37•3w ago

I'd rather the "AI safety" of the kind you want didn't exist.

The catastrophic AI risk isn't "oh no, people can now generate pictures of women naked".

mossTechnician•3w ago

Why would you rather it not exist?

In a vacuum, I agree with you that there's probably no harm in AI-generated nudes of fictional women per se; it's the rampant use to sexually harass real women and children[0], while "causing poor air quality and decreasing life expectancy" in Tennessee[1], that bothers me.

[0]: https://arstechnica.com/tech-policy/2026/01/x-blames-users-f...

[1]: https://arstechnica.com/tech-policy/2025/04/elon-musks-xai-a...

ACCount37•3w ago

Because it's just a vessel for the puritans and the usual "cares more about feeling righteous than about being right" political activists. I have no love for either.

The whole thing with "AI polluting the neighborhoods" falls apart on a closer examination. Because, as it turns out, xAI put its cluster in an industrial area that already has: a defunct coal power plant, an operational steel plant, and an operational 1 GW grid-scale natural gas power plant that powers the steel plant - that one being across the road from xAI's cluster.

It's quite hard for me to imagine a world where it's the AI cluster that moves the needle on local pollution.

muwtyhg•3w ago

> Because it's just a vessel for the puritans and the usual "cares more about feeling righteous than about being right" political activists. I have no love for either.

People are (or were, as of a couple of weeks ago) having nude images generated from pictures of them, and were posted publicly, by a company worth billions. You think the outage around that is just puritan hand-wringing? Not that real people are actively being harassed?

ltbarcly3•3w ago

I think anyone who thinks that LLMs are not intelligent in any sense is simply living in denial. They might not be intelligent in the same way a human is intelligent, they might make mistakes a person wouldn't make, but that's not the question.

Any standard of intelligence devised before LLMs is passed by LLMs relatively easily. They do things that 10 years ago people would have said are impossible for a computer to do.

I can run claude code on my laptop with an instruction like "fix the sound card on this laptop" and it will analyze what my current settings are, determine what might be wrong, devise tests to have me gather information it can't gather itself, run commands to probe hardware for it's capabilities, and finally offer a menu of solutions, give the commands to implement the solution, and finally test that the solution works perfectly. Can you do that?

exceptione•3w ago

In a way LLMs are intelligence tests indeed.

dependency_2x•3w ago

I'm vibe coding now, after work. I am able to much more quickly explore the landscape of a problem, get into and out of dead ends in minutes instead of wasting an evening. At some point I need to go in and fix, but the benefit of the tool is there. It is like a electric screwdriver vs. normal one. Sometimes the normal one can do things the electric can't, but hell if you get an IKEA deliver you want the electric one.

SwoopsFromAbove•3w ago

And is the electric one intelligent? :p

dependency_2x•3w ago

Who cares!

hexbin010•3w ago

Got any recent specific examples of it saving you an entire evening?

Traubenfuchs•3w ago

0. Claude, have a look at frontend project A and backend project B.

1. create a skeleton clone of frontend A, named frontend B, which is meant to be the frontend for backend project B, including the oAuth configuration

2. create the kubernetes yaml and deployment.sh, it should be available under b.mydomain.com for frontend B and run it, make sure the deployment worked by checking the page on b.mydomain.com

3. in frontend B, implement the UI for controller B1 from backend B, create the necessary routing to this component and add a link to it to the main menu, there should be a page /b1 that lists the entries, /b1/xxx to display details, /b1/xxx/edit to edit an entry and /b1/new to create one

4. in frontend B, implement the UI for controller B2 from backend B, create the necessary routing to this component and add a link to it to the main menu, etc.

etc.

All of this is done in 10 minutes. Yeah I could do all of this myself, but it would take longer.

falloutx•3w ago

Did you need it though? Like most projects I see being done by people with Claude Code are just their personal projects, which they wouldn't have wasted their time on in the past but now they will get pulled into the terminal thinking its only gonna take 20 mins and they end up burning 100s of subscription dollars on it. If there is no other maintainer & the project is all yours, I dont see any harm in doing it.

Traubenfuchs•3w ago

I don't NEED this, but turning 2-3 hours fiddling with DTOs, kubernetes yaml, dockerfiles, deployment scripts and other busy work into half an hour does "save you the whole evening" (which our current discussion is about!).

With adults usually having no more than 2-3 hours of free time per work day, this allows you to productively program in your free time without fully burning out.

Also, my company pays for claude and does not give a shit what I do with it.

HWR_14•3w ago

Bad example. IKEA assembles better with a manual screwdriver.

Traubenfuchs•3w ago

You wouldn't say that anymore if you would have ever assembled PAX doors.

HWR_14•3w ago

Maybe? I'm not familiar with every ikea product. But it looks like it take a dozen small screws into soft wood.

SwoopsFromAbove•3w ago

I also cannot calculate the square root of 472629462.

My pocket calculator is not intelligent. Nor are LLMs.

HWR_14•3w ago

You'd be surprised. You could probably get three digits of the square root in under a minute if you tried.

kusokurae•3w ago

It's incredible that on Hacker News we still encounter posts by people who will or cannot differentiate mathematics from magic.

obsoleetorr•3w ago

it's also incredible we find people which can't differentiate physics/mathematics from the magic of the human brain

adrianN•3w ago

Intelligence is not magic though. The difference between intelligence and mathematics can plausibly be the same kind of difference between chemistry and intelligence.

qsera•3w ago

There is Intelligence and there is Imitation of Intelligence. LLMs do the latter.

Talk to any model about deep subjects. You ll understand what I am saying. After a while it will start going around in circles.

FFS ask it to make an original joke, and be amused..

obsoleetorr•3w ago

> After a while it will start going around in circles.

so like your average human

> FFS ask it to make an original joke, and be amused..

let's try this one on you - say an original joke

oh, right, you dont respond to strangers prompts, thus you have agency, unlike an LLM

qsera•3w ago

>so like your average human

If an average human has seen and read all that is written till now, I bet that they can hold the conversation going for quite a long time...

>say an original joke

I asked an LLM if it had a good night with sweet dreams, It said, "I don't sleep and I only dream when I work!"

adrianN•3w ago

Many animals are clearly intelligent, but I can't talk to them at all.

ltbarcly3•3w ago

If you understood anything about this topic you would know that imitation of intelligence requires intelligence. It's the same thing.

qsera•3w ago

Give it some thought.

ltbarcly3•3w ago

It has been given significant thought already. This isn't a new topic.

energy123•3w ago

Human intelligence is chemistry and biology, not magic. OK, now what?

ACCount37•3w ago

Your brain is just math implemented in wet meat.

TeriyakiBomb•3w ago

Everything is magic when you don't understand how things work.

jaccola•3w ago

There are dozens of definitions of "intelligence", we can't even agree what intelligence means in humans, never mind elsewhere. So yes, by some subset of definitions it is intelligent.

But by some subset of definitions my calculator is intelligent. By some subset of definitions a mouse is intelligent. And, more interestingly, by some subset of definitions a mouse is far more intelligent than an LLM.

techpression•3w ago

I did that when I was 14 because I had no other choice, damn you SoundBlaster! I didn't get any menu but I got sound in the end.

I don't think conflating intelligence with "what a computer can do" makes much sense though. I can't calculate the X digit of PI in less than Z, I'm still intelligent (or I pretend to be).

But the question is not about intelligence, it's a red herring, it's just about utility and they (LLM's) are useful.

self_awareness•3w ago

set BLASTER=A220 I5 D1

slg•3w ago

>I can run claude code on my laptop with an instruction like "fix the sound card on this laptop" and it will analyze what my current settings are, determine what might be wrong, devise tests to have me gather information it can't gather itself, run commands to probe hardware for it's capabilities, and finally offer a menu of solutions, give the commands to implement the solution, and finally test that the solution works perfectly. Can you do that?

Yes, I have worked in small enough companies in which the developers just end up becoming the default IT help desk. I never had any formal training in IT, but most of that kind of IT work can be accomplished with decent enough Google skills. In a way, it worked the same as you and the LLM. I would go poking through settings, run tests to gather info, run commands, and overall just keep trying different solutions until either one worked or it became reasonable to give up. I'm sure many people here have had similar experiences doing the same thing in their own families. I'm not too impressed with an LLM doing that. In this example, it's functionally just improving people's Googling skills.

qsera•3w ago

It is the imitation of intelligence.

It works because people have answered similar questions a million times on the internet and the LLMs are trained on it.

So it will work for a while. When the human generated stuff stops appearing online, then LLMs ll quickly fall in usefulness.

But that is enough time for the people who might think that it going to last for ever to make huge investments into it, and the AI companies to get away with the loot.

Actually it is the best kind of scam...

EDIT: Another thought. Thus it seems that AI companies actually have an incentive to hinder developements, because new things mean that their model is less useful. With the widespread dependence on AI, they might even get away with manipulating the population to stagnate.

schnitzelstoat•3w ago

I agree that all the AI doomerism is silly (by which I mean those that are concerned about some Terminator-style machine uprising, the economic issues are quite real).

But it's clear the LLM's have some real value, even if we always need a human-in-the-loop to prevent hallucinations it can still massively reduce the amount of human labour required for many tasks.

NFT's felt like a con, and in retrospect were a con. The LLM's are clearly useful for many things.

latexr•3w ago

Those aren’t mutually exclusive; something can be both useful and a con.

When a con man sells you a cheap watch for an high price, what you get is still useful—a watch that tells the time—but you were also still conned, because what you paid for is not what was advertised. You overpaid because you were tricked about what you were buying.

LLMs are useful for many things, but they’re also not nearly as beneficial and powerful as they’re being sold as. Sam Altman, while entirely ignoring the societal issues raised by the technology (such as the spread of misinformation and unhealthy dependencies), repeatedly claims it will cure all cancers and other kinds of diseases, eradicate poverty, solve the housing crisis, democracy… Those are bullshit, thus the con description applies.

https://youtu.be/l0K4XPu3Qhg?t=60

BoxOfRain•3w ago

I think the following things can both be true at the same time:

* LLMs are a useful tool in a variety of circumstances.

* Sam Altman is personally incentivised to spout a great deal of hyped-up rubbish about both what LLMs are capable of, and can be capable of.

latexr•3w ago

Yes, that’s the point I’m making. In the scenario you’re describing, that would make Sam Altman a con man. Alternatively, he could simply be delusional and/or stupid. But given his history of deceit with Loopt and Worldcoin, there is precedent for the former.

pousada•3w ago

It would make every marketing department and basically every startup founder conmen too. While I don’t completely disagree with that framing it’s not really helpful.

latexr•3w ago

No, that is not true. Coca-Cola doesn’t advertise itself as a cure for cancer. Dropbox doesn’t advertise itself as a tax-filing application.

Theranos on the other hand… That was a con and the founder was prosecuted.

And again, Sam Altman has a history of deceit.

https://www.technologyreview.com/2022/04/06/1048981/worldcoi...

https://www.buzzfeednews.com/article/richardnieva/worldcoin-...

pousada•3w ago

Coca-Cola advertises itself as a “happiness machine” as something that brings people together, builds communities, creates love and friendship.

The latest slogan i can remember is “open happiness”.

I like fizzy sugar drinks as much as the next guy but Coca-Cola is not liquid happiness.

Edit: Dropbox marketing slogan is “for all things worth saving”

latexr•3w ago

Slogans are not promises, they are vague feelings. In the case of Coca-Cola, I know someone who might literally agree with the happiness part of it (though I certainly wouldn’t).

The promises of Theranos and LLMs are concrete measurable things we can evaluate and report where they succeed, fall short, or are lies.

pousada•3w ago

Sure but equating Theranos and LLMs seems a bit disingenuous.

Theranos was an outright scam that never produced any results, whereas LMMs might not have (yet?) lived up to all the marketing promises (you might call them slogans?) they made, but they definitely provided some real measurable value.

latexr•3w ago

> (you might call them slogans?)

You’re being ridiculous. These are not slogans, and no one arguing in good faith would claim they are.

https://youtu.be/l0K4XPu3Qhg?t=60

> but they definitely provided some real measurable value.

Something which provides value can still be a scam.

https://news.ycombinator.com/item?id=46614364

I addressed all your points some eight levels up the thread.

runarberg•3w ago

These are not independent hypotheses. If (b) is true it decreases the possibility that (a) is true and vice versa.

The dependency here is that if Sam Altman is indeed a con man, it is reasonable to assume that he has in fact conned many people who then report an over inflated metric on the usefulness of the stuff they just bought (people don’t like to believe they were conned; cognitive dissonance).

In other words, if Sam Altman is indeed a con man, it is very likely that most metrics of the usefulness of his product is heavily biased.

dgxyz•3w ago

I disagree with this perspective. Human labour is mostly inefficiency from habitual repetition from experience. LLMs tend not to improve that. They look like they do but instead train the user into replacing the repetition with machine repetition.

We had an "essential" reporting function in the business which was done in Excel. All SMEs seem to have little pockets of this. Hours were spent automating the task with VBA to no avail. Then LLMs came in after the CTO became obsessed with it and it got hit with that hammer. This is four iterations of the same job: manual, Excel, Excel+VBA, Excel+CoPilot. 15 years this went on.

No one actually bothered to understand the reason the work was being done and the LLM did not have any context. This was being emailed weekly to a distribution list with no subscribers as the last one had left the company 14 years ago. No one knew, cared or even though about it.

And I see the same in all areas LLMs are used. They are merely pasting over incompetence, bad engineering designs, poor abstractions and low knowledge situations. Literally no one cares about this as long as the work gets done and the world keeps spinning. No one really wants to make anything better, just do the bad stuff faster. If that's where something is useful, then we have fucked up.

Another one. I need to make a form to store some stuff in a database so I can do some analytics on it later. The discussion starts with how we can approach it with ReactJS+microservices+kubernetes. That isn't the problem I need solving. People have been completely blinded on what a problem is and how to get rid of it efficiently.

ACCount37•3w ago

LLMs of today advance in incremental improvements.

There is a finite amount of incremental improvements left between the performance of today's LLMs and the limits of human performance.

This alone should give you second thoughts on "AI doomerism".

latexr•3w ago

That is not necessarily true. That would be like arguing there is a finite number of improvements between the rockets of today and Star Trek ships. To get warp technology you can’t simply improve combustion engines, eventually you need to switch to something else.

That could also apply to LLMs, that there would be a hard wall that the current approach can’t breach.

ACCount37•3w ago

If that's the case, then, what's the wall?

The "walls" that stopped AI decades ago stand no more. NLP and CSR were thought to be the "final bosses" of AI by many - until they fell to LLMs. There's no replacement.

The closest thing to a "hard wall" LLMs have is probably online learning? And even that isn't really a hard wall. Because LLMs are good at in-context learning, which does many of the same things, and can do things like set up fine-tuning runs on themselves using CLI.

latexr•3w ago

> If that's the case, then, what's the wall?

I didn’t say that is the case, I said it could be. Do you understand the difference?

And if it is the case, it doesn’t immediately follow that we would know right now what exactly the wall would be. Often you have to hit it first. There are quite a few possible candidates.

ACCount37•3w ago

And there could be a teapot in an orbit around the Sun. Do we have any evidence for that being the case though?

So far, there's a distinct lack of "wall" to be seen - and a lot of the proposed "fundamental" limitations of LLMs were discovered to be bogus with interpretability techniques, or surpassed with better scaffolding and better training.

latexr•3w ago

> And there could be a teapot in an orbit around the Sun.

I think you’re confused. You are the one making the extraordinary claim, the burden of proof is on you.

You asserted LLMs have a finite number of steps to go to reach (overcome?) human limits. You don’t know that. It hasn’t happened. You can’t prove it.

I, on the other hand, merely pointed out that is not a certainty.

Your teapot argument works against you.

ACCount37•3w ago

The case against "hard wall": every "hard wall" that was predicted so far was bypassed, and the measured performance keeps going up steadily.

The case for "hard wall": wishful thinking.

Yossarrian22•3w ago

He’s not saying there is a hard wall he’s saying there’s a point where we’ll need new techniques or technologies not just refine the current one. Less of a hard barrier like the speed of light than an innovative one like creating artificial ammonia to make industrial amounts of fertilizer to support increasing crop amounts

myrmidon•3w ago

Agree completely with your position.

I do think though that lack of online learning is a bigger drawback than a lot of people believe, because it can often be hidden/obfuscated by training for the benchmarks, basically.

This becomes very visible when you compare performance on more specialized tasks that LLMs were not trained for specifically, e.g. playing games like Pokemon or Factorio: General purpose LLMs are lagging behind a lot in those compared to humans.

But it's only a matter of time until we solve this IMO.

ACCount37•3w ago

By now, I subscribe to "you're just training them wrong".

Pre-training a base model on text datasets teaches that model a lot, but it doesn't teach it to be good at agentic tasks and long horizon tasks.

Which is why there's a capability gap there - the gap companies have to overcome "in post" with things like RLVR.

kubb•3w ago

The wall is training data. Yes, we can make more and more of post training examples. No, we can never make enough. And there are diminishing returns to that process.

glenneroo•3w ago

Hallucinations are IMO a hard wall. They have gotten slightly better over the years but you still get random results that may or may not be true, or rather, are in a range between 0-100% true, depending on which part of the answer you look at.

ACCount37•3w ago

Are they now?

OpenAI's o3 was SOTA, and valued by its users for its high performance on hard tasks - while also being an absolute hallucination monster due to one of OpenAI's RLVR oopsies. You'd never know whether it's brilliant or completely full of shit at any given moment in time. People still used o3 because it was well worth it.

So clearly, hallucinations do not stop AI usage - or even necessarily undermine AI performance.

And if the bar you have to clear is "human performance", rather than something like "SQL database", then the bar isn't that high. See: the notorious unreliability of eyewitness testimonies.

Humans avoid hallucinations better than LLMs do - not because they're fundamentally superior, but because they get a lot of meta-knowledge "for free" as a part of their training process.

LLMs get very little meta-knowledge in pre-training, and little skill in using what they have. Doesn't mean you can't train them to be more reliable - there are pipelines for that already. It just makes it hard.

falloutx•3w ago

AI doomerism was sold by the AI companies as some sort of "learn it or you'll fall behind". But they didnt think it through, now that AI is widely seen as a bad thing by general public (except programmers who think they can deliver slop faster). Who would be buying $200/month sub when they get laid off, I am not sure the strategy of spreading fear was worth it. I also don't think this tech can ever be profitable. I hope it burns more money at this rate.

pmontra•3w ago

The employer buys the AI subscription, not the employee. An employee that sends company code to an external AI is somebody looking for troubles.

In the case of contractors, the contractors buy the subscription but they need authorization to give access to the code. That's obvious if the property of the code is of the customer but there might be NDAs even if the contractor owns the code.

falloutx•3w ago

If companies have very little no of employees, AI companies are expecting regular people to pay for AI access. Then who would be buying $200/month for a thing that took their job? By cutting employees strat, the AI companies also lose much more in revenue.

112233•3w ago

pole-vaulting records improve incrementally too. and there is finite distance left to the moon. without deep understanding and experience and numbers to back up the opinion, any progress seems about to reach arbitrary goals.

immibis•3w ago

You can't get to the moon by learning to climb taller and taller trees.

runarberg•3w ago

> it can still massively reduce the amount of human labour required for many tasks.

I want to see some numbers before I believe this. So far my feelings is that the best case scenario is that it reduces the time it needs to do bureaucratic tasks, tasks that were not needed anyway and could have just been removed for an even grater boost in productivity. Maybe, it seems to be automating tasks from junior engineer, tasks which they need to perform in order to gain experience and develop their expertise. Although I need to see the numbers before I believe even that.

I have a suspicion that AI is not increasing productivity by any meaningful metric which couldn’t be increased by much much much cheaper and easier means.

bodge5000•3w ago

> The LLM's are clearly useful for many things

I don't think that's of any doubt. Even beyond programming, imo especially beyond programming, there are a great many things they're useful for. The question is; is that worth the enormous cost of running them?

NFT's were cheap enough to produce and that didn't really scale depending on the "quality" of the NFT. With an LLM, if you want to produce something at the same scale as OpenAI or Anthropic the amount of money you need just to run it is staggering.

This has always been the problem, LLMs (as we currently know them) they being a "pretty useful tool" is frankly not good enough for the investment put into them

falloutx•3w ago

All of the professions its trying to replace are very much bottom end of the tree, like programmers, designers, artists, support, lawyers etc. While you can easily already replace management and execs with it already and save 50% of the costs, but no one is talking about that.

At this point the "trick" is to scare white collar knowledge workers into submission with low pay and high workload with the assumption that AI can do some of the work.

And do you know a better way to increase your output without giving OpenAI/Claude thousands of dollars? Its morale, improving morale would increase the output in a much more holistic way. Scare the workers and you end up with spaghetti of everyone merging their crappy LLM enhanced code.

ACCount37•3w ago

"Just replace management and execs with AI" is an elaborate wagie cope. "Management and execs" are quite resistant to today's AI automation - and mostly for technical reasons.

The main reason being: even SOTA AIs of today are subhuman at highly agentic tasks and long-horizon tasks - which are exactly the kind of tasks the management has to handle. See: "AI plays Pokemon", AccountingBench, Vending-Bench and its "real life" test runs, etc.

The performance at long-horizon tasks keeps going up, mind - "you're just training them wrong" is in full force. But that doesn't change that the systems available today aren't there yet. They don't have the executive function to be execs.

falloutx•3w ago

Some management would be cut off when the time comes, Execs on the other hand are not there for work and are in due to personal relationships, so impossible to fire. If you think someone like lets say Satya Nadella can't be replaced by a bot which takes different input streams and then makes decisions, then you are joking. Even his recent end of 2025 letter was mostly written by AI.

ACCount37•3w ago

If an AI exec reliably outperformed meatbag execs while demanding less $$$, many boards would consider that an upgrade. Why gamble on getting a rare high performance super-CEO when you can get a reliable "good enough"?

The problem is: we don't have an AI exec that would outperform a meatbag exec on average, let alone reliably. Yet.

bodge5000•3w ago

> even SOTA AIs of today are subhuman at highly agentic tasks and long-horizon tasks

This sounds like a lot of the work engineers do as well, we're not perfect at it (though execs aren't either), but the work you produce is expected to survive long term, thats why we spend time accounting for edge cases and so on.

Case in point; the popularity of docker/containerization. "It works on my machine" is generally fine in the short term, you can replicate the conditions of the local machine relatively easily, but doing that again and again becomes a problem, so we prepare for that (a long-horizon task) by using containers.

ACCount37•3w ago

Yeah. Obviously. Duh. That's why we keep doing it.

Opus 4.5 saved me about 10 hours of debugging stupid issues in an old build system recently - by slicing through the files like a grep ninja and eventually narrowing down onto a thing I surely would have missed myself.

If I were to pay for the tokens I used at API pricing, I'd pay about $3 for that feat. Now, come up with your best estimate: what's the hourly wage of a developer capable of debugging an old build system?

For the reference: by now, the lifetime compute use of frontier models is inference-dominated, at a rate of 1:10 or more. And API costs at all major providers represent selling the model with a good profit margin.

bodge5000•3w ago

So could the company hiring you to do that work fire you and just use Opus instead? If no, then you cannot compare an engineers salary to what Opus costs, because the engineer is needed anyway.

> And API costs at all major providers represent selling the model with a good profit margin.

Though we don't know for certain, this is likely false. At best, it's looking like break even, but if you look at Anthropic, they cap their API spend at just $5,000 a month, which sounds like a stop loss. If it were making a good profit, they'd have no reason to have a stop loss (and certainly not that low).

> Yeah. Obviously. Duh. That's why we keep doing it.

I don't think so. I think what is promised is what keeps spend on it so high. I'd imagine if all the major AI companies were to come out and say "this is it, we've gone as far as we can", investment would likely dry up

schnitzelstoat•3w ago

But now instead of spending 10 hours working on that, he can go and work on something else that would otherwise have required another engineer.

It's not going to mean they can employ 0 engineers, but maybe they can employ 4 instead of 5 - and a 20% reduction in workforce across the industry is still a massive change.

bodge5000•3w ago

Thats assuming a near 100% success rate from the agent, meaning it's not something he needs to supervise at all. It also assumes that the agent is able to take on the task completely, meaning he can go do something else which would normally occupy the time of another engineer, rather than simply doing something else within the same task (from the sounds of things, it was helping with debugging, not necessarily actually solving the bug). Finally, and most importantly, the 20% reduction in workforce assumes it can do this consistently well across any task. Saving 10h on one task is very different from saving 10h on every task.

Assuming all the stars align though and all these things come true, a 20% reduction in workforce costs is significant, but again, you have to compare that to the cost of investment, which is reported to be close to a trillion. They'll want to see returns on that investment, and I'm not sure a 20% cut (which, as above, is looking like a best case scenario) in workforce lives up to that.

baq•3w ago

"People are falling in love with LLMs" and "P(Doom) is fearmongering" so close to each other is some cognitive dissonance.

The 'are LLMs intelligent?' discussion should be retired at this point, too. It's academic, the answer doesn't matter for businesses and consumers; it matters for philosophers (which everyone is even a little bit). 'Are LLMs useful for a great variety of tasks?' is a resounding 'yes'.

leogao•3w ago

> The purpose here is not to responsibly warn us of a real threat. If that were the aim there would be a lot more shutting down of data centres and a lot less selling of nuclear-weapon-level-dangerous chatbots.

you're lumping together two very different groups of people and pointing out that their beliefs are incompatible. of course they are! the people who think there is a real threat are generally different people from the ones who want to push AI progress as fast as possible! the people who say both do so generally out of a need to compromise rather than there existing many people who simultaneously hold both views.

BoxOfRain•3w ago

> nuclear-weapon-level-dangerous chatbots

I feel this framing in general says more about our attitudes to nuclear weapons than it does about chatbots. The 'Peace Dividend' era which is rapidly drawing to a close has made people careless when they talk about the magnitude of effects a nuclear war would have.

AI can be misused, but it can't be misused to the point an enormously depopulated humanity is forced back into subsistence agriculture to survive, spending centuries if not millennia to get back to where we are now.

lyu07282•3w ago

I think it's interesting how gamers have developed a pretty healthy aversion to generative ai in video games. Steam and Itch both now make it mandatory that games disclose generative ai use and recently even beloved Larian Studios was under fire for using ai for concept art. Gamers hate that shit.

I think that's good, but the whole "AI is literally not doing anything", that it's just some mass hallucination has to die. Gamers argue it takes jobs from artists away, programmers seem to have to argue it doesn't actually do anything for some reason. Isn't that telling?

Chance-Device•3w ago

I think this is probably a trend that will erode with time, even now it’s probably just moved underground. How many human artists are using AI for concepts then laundering the results? Even if it’s just idea generation, that’s a part of the process. If it speeds up throughput, then maybe that’s fewer jobs in the long run.

And if AI assisted products are cheaper, and are actually good, then people will have to vote with their wallets. I think we’ve learned that people aren’t very good at doing that with causes they claim to care about once they have to actually part with their money.

lyu07282•3w ago

Because voting with your wallet is nonsense, we can decide what society we want to live in we don't have to accept one in which human artists can't make a living. Capitalism isn't a force of nature we discovered like gravity, it's deliberate choices we made.

Chance-Device•3w ago

Which I assume is why you pay someone to hand-paint scenes from your holidays instead of taking photographs? And why you employ someone to wash your clothes on a scrubbing board instead of using a machine?

Or would you prefer these things be outlawed to increase employment?

lyu07282•3w ago

You have to always devolve to individual responsibility and freedom to make your case. But games are a 250+ billion dollar industry employing hundreds of thousands of artists who's jobs are all threatened by generative ai in the future, that's systemic, structural. We can all look at that future and decide to make a different choice, that is actual freedom, what you describe is collective helplessness.

HWR_14•3w ago

A huge issue with voting with your wallet is fraud. It's easy to lie about having no AI in your process. Especially if the final product is laundered by a real artist.

timschmidt•3w ago

> programmers seem to have to argue it doesn't actually do anything for some reason.

It's not really hard to see... spend your whole life defining yourself around what you do that others can't or won't, then an algorithm comes along which can do a lot of the same. Directly threatens the ego, understandings around self-image and self-worth, as well as future financial prospects (perceived). Along with a heavy dose of change scary, change bad.

Personally, I think the solution is to avoid building your self-image around material things, and to welcome and embrace new tools which always bring new opportunities, but I can see why the polar opposite is a natural reaction for many.

bandrami•3w ago

IDK, I think it's at least reasonable to look at the fact that there isn't a ton of new software available out there and conclude "AI isn't actually making software creation any faster". I understand the counterarguments to that but it's hardly an unreasonable conclusion.

Al-Khwarizmi•3w ago

I haven't gamed much in the last few years due to severe lack of time so I'm out of touch, but I used to play a lot of CRPGs and I always dreamed of having NPCs who could talk and react beyond predefined scripted lines. This seems to finally be possible thanks to LLMs and I think it was desired by many (not only me). So why are gamers not excited about generative AI?

kubb•3w ago

Because of how very badly it works in practice. LLM writing is bad, there’s no interactivity (inference delay), it feels soulless, there’s no coherence in the generated text.

There should have been one good game by now. But there isn’t.

danielbln•3w ago

> Gamers hate that shit.

Unless AI is used for code (which it is, surely, almost everywhere), then Gamers don't give a damn. Also, Larian didn't use it for concept art, they used it to generate the first mood board to give to the concept artist as a guideline. And then there is Ark Raiders, who uses AI for all their VO, and that game is a massive hit.

This is just a breathless bubble, the wider gaming audience couldn't give two shits if studios use AI or not.

lyu07282•3w ago

> they used it to generate the first mood board

Btw there was an update on this, larian did an AMA on reddit because of the backlash because the "just moodboard" excuse didn't satisfy many, they are now saying they won't use ai for anything at all:

> We already said this doesn’t mean the actual concept art is generated by AI but we understand it created confusion. So, to ensure there is no room for doubt, we’ve decided to refrain from using genAI tools during concept art development. That way there can be no discussion about the origin of the art.

https://www.ign.com/articles/everything-we-learned-about-div...

https://www.reddit.com/r/Games/comments/1q870w5/larian_studi...

lpcvoid•3w ago

I think the costs of LLMs (huge energy hunger, people being fired because of it, hostile takeover of human creativity, and it causing computer hardware to rise in cost exponentially) is by far larger than the uses (generating videos of fish with arms, programming slightly faster, writing slop emails to talented people).

I know LLMs won't vanish again magically, but I wish they would every time I have to deal with their output.

falloutx•3w ago

That is consumer choice, a consumer has rights to know whether something is made by using a tech which could make them unemployed or not. I wouldnt pay $70 or $10 on a game that I know someone didnt put effort into.

frozenseven•3w ago

>Larian Studios was under fire for using ai for concept art

By an extremely loud group of activists, as always. I'd wager most gamers don't care one way or the other.

krystofee•3w ago

I disagree with the "confidence trick" framing completely. My belief in this tech isn't based on marketing hype or someone telling me it's good – it's based on cold reality of what I'm shipping daily. The productivity gains I'm seeing right now are unprecedented. Even a year ago this wouldn't have been possible, it really feels like an inflection point.

I'm seeing legitimate 10x gains because I'm not writing code anymore – I'm thinking about code and reading code. The AI facilitates both. For context: I'm maintaining a well-structured enterprise codebase (100k+ lines Django). The reality is my input is still critically valuable. My insights guide the LLM, my code review is the guardrail. The AI doesn't replace the engineer, it amplifies the intent.

Using Claude Code Opus 4.5 right now and it's insane. I love it. It's like being a writer after Gutenberg invented the printing press rather than the monk copying books by hand before it.

ManuelKiessling•3w ago

This. By now I don’t understand how anyone can still argue in the abstract while it’s trivial to simply give it a try and collect cold, hard facts.

It’s like arguing that the piano in the room is out of tune and not bothering to walk over to the piano and hit its keys.

satisfice•3w ago

I am hitting the keys, and I call bullshit.

Yes, the technology is interesting and useful. No, it is not a “10x” miracle.

ozim•3w ago

I call "AGI" or "100x miracle" a bullshit but still existing stuff definitely is "10x miracle".

runarberg•3w ago

This is a known sales trick, called door-in-the face. First you introduce your victim to an outrageous claim, and then follow with a more modest and more reasonable sounding claim.

In truth neither claims are reasonable, but because of the door in the face, the victim is more susceptible the the latter claim. Without the more outrageous claim it is unlikely the victim would have believed the latter claim.

In reality, both "AGI" and "100x miracle" AND the "10x miracle" are all outrageous claims, and I call bullshit on all of them.

ozim•3w ago

I am more concerned by bait and switch that is comming, people will get used to convenience for $100 a year or $100 a month and after 10 years they do price 5x and what are people going to do?

ozim•3w ago

Downside is a lot of those that argue, try out some stuff in ChatGPT or other chat interface without digging a bit further. Expecting "general AI" and asking general questions where LLMs are most prone for hallucinations. Other part is cheap out setups using same subscription for multiple people who get history polluted.

They don't have time to check more stuff as they are busy with their life.

People who did check the stuff don't have time in life to prove to the ones that argue "in exactly whatever the person arguing would find useful way".

Personally like a year ago I was the person who tried out some ChatGPT and didn't have time to dabble, because all the hype was off putting and of course I was finding more important and interesting things to do in my life besides chatting with some silly bot that I can trick easily with trick questions or consider it not useful because it hallucinated something I wanted in a script.

I did take a plunge for really a deep dive into AI around April last year and I saw for my own eyes ... and only that convinced me. Using API where I built my own agent loop, getting details from images, pdf files, iterating on the code, getting unstructured "human" input into structured output I can handle in my programs.

*Data classification is easy for LLM. Data transformation is a bit harder but still great. Creating new data is hard so like answering questions where it has to generate stuff from thin air it will hallucinate like a mad man.*

Data classification like "is it a cat, answer with yes or no" it will be hard for latest models to start hallucinating.

demorro•3w ago

It's like arguing that the piano goes out of tune randomly and that even if you get through 1, 2, or even 10 songs without that happening, I'm not interested in playing that piano on stage.

112233•3w ago

So I tried it and it is worse that having random dude from Fiverr write you code — it is actively malicious and goes out of it's way do decieve and to subtly sabotage existing working code.

Do I now get the right to talk badly about all LLM coding, or is there another exercise I need to take?

ManuelKiessling•3w ago

Hey, serious question that I ask in good faith: would you be open to a screensharing session, where we compare approaches and experiences?

vanderZwan•3w ago

Even assuming all of what you said is true, none of it disproves the arguments in the article. You're talking about the technology, the article is about the marketing of the technology.

The LLM marketing exploits fear and sympathy. It pressures people into urgency. Those things can be shown and have been shown. Whether or not the actual LLM based tools genuinely help you has nothing to do with that.

amelius•3w ago

Yeah, but it should have been in the title otherwise it uses in itself a centuries old trick.

remus•3w ago

The point of the article is to paint LLMs as a confidence trick, the keyword being trick. If LLMs do actually deliver very real, tangible benefits then can you say there is really a trick? If a street performer was doing the cup and ball scam, but I actually won and left with more money than I started with then I'd say that's a pretty bad trick!

Of course it is a little more nuanced than this and I would agree that some of the marketing hype around AI is overblown, but I think it is inarguable that AI can provide concrete benefits for many people.

latexr•3w ago

> If LLMs do actually deliver very real, tangible benefits then can you say there is really a trick?

Yes, yes you can. As I’ve mentioned elsewhere on this thread:

> When a con man sells you a cheap watch for an high price, what you get is still useful—a watch that tells the time—but you were also still conned, because what you paid for is not what was advertised. You overpaid because you were tricked about what you were buying.

LLMs are being sold as miracle technology that does way more than it actually can.

ozlikethewizard•3w ago

And at a cost im not sure most fully understand. We've allowed these companies to externalise all the negative outcomes. Now were seeing consumer electronics stock dry up, huge swaths of raw resources used, massive invasions of privacy, all so this one guy can do his corpo job 10x faster? Nah im good.

intended•3w ago

The marketing hype is economy defining at this point, so calling it overblown is an understatement.

Simplifying the hype into 2 threads, the first is that AI is an existential risk and the second is the promise of “reliable intelligence”.

The second is the bugbear, and the analogy I use is factories and assembly lines vs power tools.

LLMs are power tools. They are being hyped as factories of thoughts.

String the right tool calls, agents, and code together and you have an assembly line that manufactures research reports, gives advice, or whatever white collar work you need. No Holidays, HR, work hours, overhead etc.

I personally want everyone who can see why this second analogy does not work, to do their part in disabusing people of this notion.

LLMs are power tools, and impressive ones at that. In the right hands, they can do much. Power tools are wildly useful. But Power tools do not make automatically make someone a carpenter. They don’t ensure you’ve built a house to spec. Nor is a planar saw going to evolve into a robot.

The hype needs to be taken to task, preferably clinically, so that we know what we are working with, and can use them effectively.

megaBiteToEat•3w ago

A huge amount of tech is a confidence trick. Not one aimed at <50 year old crowd but aimed at innumerate and STEM ignorant political leaders.

It's not LLMs they care about, it's datacenter ownership. US political norms empower owners. If you think of a DC as a mega church and remote users the disciple, it makes the desired network effect obvious. That is leveraged to sway Congress and states.

These tech projects are not intended for users. They're designed to gain confidence of politicians, preferential political support.

Gen pop is not the market. DC is.

Most peoples individual data crunching problems can be resolved with a TI graphing calculator.

Big Tech convinced Congress that culture of helpless consumers of their data center outputs is simpler and will lead humanity to a forever growth future!... nevermind they will all be dead, unable to verify.

A con trick that worked great on older, more religious leaning Americans. One that's not working so well on the younger generation who know how these systems work.

latexr•3w ago

Exactly. It’s like if someone claimed to be selling magical fruit that cures cancer, and they’re just regular apples. Then people like your parent commenter say “that’s not a con, I eat apples and they’re both healthy and tasty”. Yes, apples do have great things about them, but not the exaggerations they were being sold as. Being conned doesn’t mean you get nothing, it means you don’t get what was advertised.

JacoboJacobi•3w ago

The claims being made that are cited are not really in that camp though..

It may be extremely dangerous to release. True. Even search engines had the potential to be deemed too dangerous in the nuclear pandoras box arguments of modern times. Then there are high-speed phishing opportunities, etc.

It may be an essential failure to miss the boat. True. If calculators were upgraded/produced and disseminated at modern Internet speeds someone who did accounting by hand would have been fired if they refused to learn for a few years.

Its communication builds an unhealthy relationship that is parasitic. True. But the Internet and the way content is critiqued is a source of this even if it is not intentionally added.

I don't like many people involved and I don't think they will be financially successful on merit alone given that anyone can create a LLM. But LLM technology is being sold by organic "con" that is how all technology such as calculators end up spreading for individuals to evaluate and adopt. A technology everyone is primarily brutally honest about is a technology that has died because no one bothers to check if the brutal honesty has anything to do with their own possible uses.

latexr•3w ago

> The claims being made that are cited are not really in that camp though..

They literally are. Sam Altman has literally said multiple times this tech will cure cancer.

JacoboJacobi•3w ago

Such claims are not cited in the article. It may be possible to write a good article on the topic but this article could just as well be on the organic uptake of the PC and how most wealthy nontechnical people adopted a need for a PC on "cons" that preceded their ability to gain more worth than trouble.

carpo•3w ago

But saying it's a confidence trick is saying it's a con. That they're trying to sell someone something that doesn't work. Th op is saying it makes then 10x more productive, so how is that a con?

trimethylpurine•3w ago

The marketing says it does more than that. This isn't just a problem unique to LLMs either. We have laws about false advertising for a reason. It's going on all the time. In this case the tech is new so the lines are blurry. But to the technically inclined, it's very obvious where they are. LLMs are artificial, but they are not literally intelligent. Calling them "AI" is a scam. I hope that it's only a matter of time until that definition is clarified and we can stop the bullshit. The longer it goes, the worse it will be when the bubble bursts. Not to be overly dramatic, but economic downturns have real physical consequences. People somewhere will literally starve to death. That number of deaths depends on how well the marketers lied. Better lies lead to bigger bubbles, which when burst lead to more deaths. These are facts. (Just ask ChatGPT, it will surely agree with me, if it's intelligent. ;p)

CamperBob2•3w ago

How does one go about competing at the IMO without "intelligence", exactly? At a minimum it seems we are forced to admit that the machines are smarter than the test authors.

trimethylpurine•2w ago

"LLM" as a marketing term seems rational. "Machine learning" also does. We can describe the technology honestly without using a science fiction lexicon. Just because a calculator can do math faster than Isaac Newton doesn't mean it's intelligent. I wouldn't expect it to invent a new way of doing math like Isaac Newton, at least.

CamperBob2•2w ago

Just because a calculator can do math faster than Isaac Newton doesn't mean it's intelligent.

Exactly, and that's the whole point. If you lack genuine mathematical reasoning skill, a calculator won't help you at the IMO. You might as well bring a house plant or a teddy bear.

But if you bring a GPT5-class LLM, you can walk away with a gold medal without having any idea what you're doing.

Consequently, analogies involving calculators are not valid. The burden of proof rests firmly on the shoulders of those who claim that an LLM couldn't invent new mathematical techniques in response to a problem that requires it.

In fact, that appears to have just happened (https://news.ycombinator.com/item?id=46664631), where an out-of-distribution proof for an older problem was found. (Meta: also note the vehement arguments in that thread regarding whether or not someone is using an LLM to post comments. That doesn't happen without intelligence, either.)

trimethylpurine•2w ago

That doesn't appear to be what happened. But the marketing sure has a lot of people working quick to presume so.

I would guess it's only a matter of days before that proof, or one very similar, is found in the training data, if that hasn't happened already, just as has been the case every time.

No fundamental change in how the LLM functions has been made that would lead us to expect otherwise.

Similar "discoveries" occurred all the time with the dawn of the internet connecting the dots on a lot of existing knowledge. Many people found that someone had already solved many problems they were working on. We used to be able to search the web, if you can believe that.

The LLMs are bringing that back in a different way. It's functional internet search with an uncanny language model, that sadly obfuscates the underlying data while making guesswork to summarize it (which makes it harder to tell which of its findings are valuable, and which are not).

It's useful for some things, but that's not remotely what intelligence is. It doesn't literally understand.

>* if you bring a GPT5-class LLM, you can walk away with a gold medal without having any idea what you're doing.*

My money won't be betting on your GPT5-class business advice unless you have a really good idea what you're doing.

It requires some (a lot of) intelligence and experience to usefully operate an LLM in virtually every real world scenario. Think about what that implies. (It implies that it's not by itself intelligent.)

CamperBob2•2w ago

You need to read the IMO papers, seriously. Your outlook on what happened there is grossly misinformed. No searching or tool use was involved.

You cannot bluff, trick, or "market" your way through a test like that.

trimethylpurine•2w ago

I didn't say anything about cheating. In fact, if it did cheat, that would make for a much stronger argument in your favor.

If scoring highly on an exam implies intelligence then certainly I'm not intelligent and the Super Nintendo from the 90s is more sentient than myself, given I'm terrible at chess.

I personally don't agree with that definition, nor does any dictionary I'm familiar with, nor do any software engineers with whom I'm familiar, nor any LLM specialists, including the forefront developers at OpenAI, xAI, Google, etc. as far as I'm aware.

But for some reason (it's a very obvious reason $$$), marketers, against the engineers' protest, appear to be claiming otherwise.

This is what you're up against and what you'll find the courts, and lawyers, will go by when this comparison comes to a head.

In my opinion, I can't wait for this to happen.

Thrilled to know if I shouldn't wait for that. If you're directly involved with some credible research to the contrary, I would love to hear more.

But IMO, in this case at least, has nothing to do with intelligence. It's performing a search against its own training data, and piecing together a response in line with that data, while including the context of the search term (aka the question). This is run through a series of linear regressions, and a response is produced. There is nothing really groundbreaking here, as best I can tell.

CamperBob2•2w ago

These arguments usually seem to come down to disagreements about definitions, as you suggest. You've talked about what you don't consider evidence of intelligence, but you haven't said anything about the criteria you would apply. What evidence of intelligent reasoning would change your mind?

It is unsupportable to claim that ML researchers at leading labs share your opinion. Since roughly 2022, they understand that they are working with systems capable of reasoning: https://arxiv.org/abs/2205.11916

trimethylpurine•2w ago

Based on an English dictionary definition, I would expect an intelligence exhibits understanding, don't you? I would hope people are reading the dictionary before they market a multibillion dollar product set to reach the masses. It seems irresponsible not to.

The article you linked discussed reasoning. That's really cool. But, consider that we can say that a chess game computer opponent is reasoning. It's using a preprogrammed set of instructions to predict out to some number of possible moves ahead, and choosing the most reasonable. A calculator, essentially, it is in fact reasoning. But that doesn't have much to do with intelligence. As we read in the dictionary, intelligence implies understanding, and we certainly can't say that the Chess Masters opponent from the Super Nintendo literally understands me, right?

More to the point, I don't see that any LLM has thus far exhibited remotely any inkling of understanding, nor can it. It's a linear regression calculator. Much like a lot of TI84 graphing calculators running linear algebraic functions on a grand scale. It's impressive that basic math can achieve results across word archives that sound like a person, but it's still not understanding what it outputs, and really, not what it inputs beyond graphing it algebraically either.

It doesn't literally understand. So, it is not literally intelligent, and it will require some huge breakthroughs to change that. I very much doubt that such a discovery will happen in our lifetime.

It might be more likely that the marketers will succeed in revising the dictionary. We've seen often times that if you use words wrong enough, it becomes right. But so far at least, that hasn't happened with this word.

CamperBob2•2w ago

OK, now let's talk about what it means to "understand" something.

Let's say a kid who's not unusually gifted/talented at math somehow ends up at the International Math Olympiad. Smart-enough kid, regularly gets 4.0+ grades in normal high school classes, but today Timmy got on the wrong bus. He does have a great calculator in his backpack -- heck, we'll give him a laptop with Mathematica installed -- so he figures, why not, I'll take the test and see how it goes. Spoiler: he doesn't do so well. He has the tools, but he lacks understanding of how and when to apply them.

At the same time, the kid at the next desk also doesn't understand what's going on. She's a bright kid from a talented family -- in fact Alice's old man works for OpenAI -- but she's a bit absent-minded. Alice not only took the wrong bus this morning, but she grabbed the wrong laptop on the way out the door. She shrugs, types in the problems, and copies down what she sees on the screen. She finishes up, turns in the paper, and they give her a gold medal.

My point: any definition of "understanding" you can provide is worthless unless it can somehow account for the two kids' different experiences. One of them has a calculator that does math, the other has a calculator that understands math.

I very much doubt that such a discovery will happen in our lifetime.

So did I, and then AlphaGo happened, and IMO a few years later. At that point I realized I wasn't very good at predicting what was and was not going to be possible, so I stopped trying.

trimethylpurine•2w ago

Calculators do not understand math, while both kids understand each other and the world around them. The calculator relies on an external intelligence.

Don't stop trying. Predictability is an indicator of how well a theory describes the universe. That's what science is.

The engineers have long predicted this stuff. LLM tech isn't really new. The size and speed of the machines is new. The more you understand about a topic, the better your predictions.

CamperBob2•2w ago

The more you understand about a topic, the better your predictions.

Indeed.

trimethylpurine•2w ago

I'm not sure what your level of expertise is with software but I got a lot out of some free tutorials on developing your own LLM and on ML. These are even available, free, directly from Google among many other sources.

I feel that my expectations surrounding "AI" are much more realistic than they were before building the tools.

If you haven't already, it's very much worth giving them a run through.

satisfice•3w ago

You are speculating. You don’t know. You are not testing this technology— you are trusting it.

How do I know? Because I am testing it, and I see a lot of problems that you are not mentioning.

I don’t know if you’ve been conned or you are doing the conning. It’s at least one of those.

consp•3w ago

> It's like being a writer after Gutenberg invented the printing press rather than the monk copying books by hand before it.

That's not how book printing works and I'd argue the monk can far more easy create new text and devise new interpretations. And they did in the sidelines of books. It takes a long time to prepare one print but nearly just as long as to print 100 which is where the good of the printing press comes from. It's not the ease of changing or making large sums of text, it's the ease of reproducing and since copy/paste exist it is a very poor analogue in my opinion.

I'd also argue the 10x is subject/observer bias since they are the same person. My experience at this point is that boilerplate is fine with LLMs, and if that's only what you do good for you, otherwise it will hardly speed up anything as the code is the easy part.

energy123•3w ago

> I'm maintaining a well-structured enterprise codebase (100k+ lines Django)

How do you avoid this turning into spaghetti? Do you understand/read all the output?

falloutx•3w ago

Are you actually reading the code? I have noticed most of the gains go away when you are reading the code outputted by the machine. And sometimes I do have to fix it by hand and then the agent is like "Oh you changed that file, let me fix it"

keyle•3w ago

It's fine for a Django app that doesn't innovate and just follows the same patterns for the 100 solved problems that it solves.

The line becomes a lot blurrier when you work on non trivial issues.

A Django app is not particularly hard software, it's hardly software but a conduit from database to screens and vice-versa; which is basic software since the days of terminals. I'm not judging your job, if you get paid well for doing that, all power to you. I had a well paying Laravel job at some point.

What I'm raising though is the fact that AI is not that useful for applications that aren't solving what has been solved 100 times before. Maybe it will be, some day, reasoning that well that it will anticipate and solve problems that don't exist yet. But it will always be an inference on current problems solved.

Glad to hear you're enjoying it, personally, I enjoy solving problems, not the end result as much.

danielbln•3w ago

I think the 'novelty' goalpost is being moved here. This notion that agentic LLMs can't handle novel or non-trivial problems needs to die. They don't merely derive solutions from the training data, but synthesize a solution path based on the context that is being built up in the agentic loop. You could make up some obscure DSL whole cloth, that has therefore never been in the training data, feed it the docs and it will happily use it to create output in said DSL.

Also, almost all problems are composite problems where each part is either prior art or in itself somewhat trivial. If you can onboard the LLM onto the problem domain and help it decompose then it can tackle a whole lot more than what it has seen during pre- and post-training.

jason_oster•3w ago

> You could make up some obscure DSL whole cloth, that has therefore never been in the training data, feed it the docs and it will happily use it to create output in said DSL.

I have two stories, which I will attempt to tie together coherently in response.

I'm making a compiler right now. ChatGPT 4 was helpful in the early phases. Even back then, its capabilities with reading and modifying the grammar and writing boilerplate for a parser was a real surprise. Today 5.2-Codex is iterating on the implementation and specification as I extend the language and fill in gaps in the compiler.

Don't get me wrong, it isn't a "10x" productivity gain. Not even close. And the model makes decisions that I would not. I spent the last few days completely rewriting the module system that it spit out in an hour. Yeah, it worked, but it's not what I wanted. The downsides are circumstantial.

25 years ago, I was involved in a group whose shared hobby was "ROM hacking". In other words, unofficial modification of classic NES and SNES games. There was a running joke in our group that went something like this: Someone would join IRC and ask for an outlandish feature in some level editor that seemed hopelessly impossible at the time. Like generating a new level with new graphics.

We would extrapolate the request to adding a button labeled "Do My Hack For Me". Good times! Now this feature request seems within reach. It may forever be a pipe dream, who knows. But it went from "unequivocally impossible" to "ya know, with the right foundation and guidance, that might just be crazy enough to work!" Almost entirely all within the last 10 years.

I think the novelty or creativity criticism of AI is missing the point. Using these tools in novel or creative ways is where I would put my money in the coming decade. It is mind boggling that today's models can even appear to make sense of my completely made up language and compiler. But the job postings for adding those "Do My Hack For Me" buttons are the ones to watch for.

NewsaHackO•3w ago

I feel as though the majority of programmers do the same thing; they apply well known solutions to business programs. I agree that LLM are not yet making programs like ffmpeg, mpv, or BLAS but only a small amount of programmers are working on projects like that anyway.

abricq•3w ago

> My belief in this tech isn't based on marketing hype or someone telling me it's good – it's based on cold reality of what I'm shipping daily

Then why is half of the big tech companies using Microsoft Teams and sending mails with .docx embedded in ?

Of course marketing matters.

And of course the hard facts also matters, and I don't think anybody is saying that AI agents are purely marketing hype. But regardless, it is still interesting to take a step back and observe what marketing pressures we are subject to.

megamix•3w ago

Are you also getting dumber? https://tech.co/news/another-study-ai-making-us-dumb

mpweiher•3w ago

> I'm seeing legitimate 10x gains...

Self-reports on this have been remarkably unreliable.

7777332215•3w ago

0.05x to 0.5x

sotix•3w ago

> The productivity gains I'm seeing right now are unprecedented.

My company just released a year-long productivity chart covering our shift to Claude Code, and overall, developer productivity has plummeted despite the self-reported productivity survey conveying developers felt it had shot through the roof.

JacoboJacobi•3w ago

I'd like to see a neutral productivity measure? Whether you tell me it went way up or way down I tend to be suspicious of productivity measures being neutral to perception changes that effect expectation, non paradoxical, etc.

bandrami•3w ago

It makes a lot of intuitive sense: people feel more productive because they're twiddling switches but they're spending so much time on tooling it doesn't actually increase output (this is more or less what the MIT study found: 20% perception of productivity, 20% lower actual output).

JacoboJacobi•3w ago

Sure but increased output would mean code. I don't think generating a lot of code is itself developer productivity. Some people could be using it to stop themselves from creating bad code which is developer productivity. While I find it a bit unlikely people are using it in this way (in terms of the average) I would most certainly have made this argument if code quantity was up from LLMs so I can't claim to know a quantitative measure.

sotix•3w ago

My hypothesis for why our developers have reduced productivity is that LLM assisted coding has made reviews much more difficult. The words that are written are subtly more complex for a human to understand compared to what our engineers would have previously written themselves. Sort of an uncanny valley effect.

Couple that with engineers across the board mentioning that they feel like they're losing proficiency in an understanding of the codebase and where things are.

bandrami•3w ago

The model that does make sense to me is (and the only actual success stories I've seen) is people saying "it let me quickly produce a piece of software that otherwise wouldn't have been worth the time to create". That is definitely an increase in productivity, but "software people aren't actually willing to pay for can now be made much more cheaply" is a much different claim than the marketing is making (which I read to be TFA's point).

JacoboJacobi•3w ago

I don't really see the influence of things like LLMs (or StackOverflow or improved search engines) as simple productivity. People do what they can with very complex value estimates and comfort levels. If they are less productive in a careful measure it may mean they are doing a lot of high value low hanging fruit across areas they were afraid to touch.

The trouble with highly productive specialists is that they produce a ton of high quality results where the demand is not really there and has to be artificially made. Even if you find enough work for them it often means the incremental cases are things you wouldn't have bothered with. A specialist branching to work slowly in related yet further related areas is a lot more value and can work with an oracle so flawed that it barely beats chance..

With juniors it is much more complex, but they have always been a useless consideration in productivity. Not having them has always been highly productive in the short term but has long term consequences.

WithinReason•3w ago

The monk analogy is perfect

yomismoaqui•3w ago

The best way to describe AI agents (coding agents here) I heard on some presentation, I think it was from Andrej Karpathy.

It was something like this:

"We think we are building Ultron but really we are building the Iron Man suit. It will be a technology to amplify humans, not replace them"

1vuio0pswjnm7•3w ago

"My belief in this tech isn't based on marketing hype or someone telling me it's good - it's based on cold reality of what I'm shipping daily."

This may be true. The commenter may "believe in this tech" based on his experimentation with it

But the majority of sentences following this statement ironically appear to be "marketing hype" or "someone telling [us] it's good":

1. "The productivity gains I'm seeing right now are unprecedented."

2. "Even a year ago this wouldn't have been possible, it really feels like an inflection point."

3. "I'm seeing legitimate 10x gains because I'm not writing code anymore - I'm thinking about code and reading code."

4. "Using Claude Code Opus 4.5 right now and it's insane."

5. "It's like being a writer after Gutenberg invented the printing press rather than the monk copying books by hand before it."

The "framing" in this blog post is not focused on whether "this tech" actually saves anyone any time or money

It is focused on _hype_, namely how "this tech" is promoted. That promotion could be intentional or unintentional

N.B. I am not "agreeing" with the blog post author or "disagreeing" with the HN commenter, or vice versa. The point I'm making is that one is focused on whether "this tech" works for them and the other is focused on how "this tech" is being promoted. Those are two different things, as other replies have also noted. Additionally, the comment appears to be an example of the promotion (hype) that its author claims is not the basis for his "belief in this tech"

I think the use of the term "belief" is interesting

That term normally implies a lack of personal knowledge:

151 "Belief" gcide "The Collaborative International Dictionary of English v.0.48"

Belief \Be*lief"\, n. [OE. bileafe, bileve; cf. AS. gele['a]fa. See {Believe}.]

1. Assent to a proposition or affirmation, or the acceptance of a fact, opinion, or assertion as real or true, without immediate personal knowledge; reliance upon word or testimony; partial or full assurance without positive knowledge or absolute certainty; persuasion; conviction; confidence; as, belief of a witness; the belief of our senses. [1913 Webster]

Belief admits of all degrees, from the slightest suspicion to the fullest assurance. --Reid. [1913 Webster]

2. (Theol.) A persuasion of the truths of religion; faith. [1913 Webster]

No man can attain [to] belief by the bare contemplation of heaven and earth. --Hooker. [1913 Webster]

4. A tenet, or the body of tenets, held by the advocates of any class of views; doctrine; creed. [1913 Webster]

In the heat of persecution to which Christian belief was subject upon its first promulgation. --Hooker. [1913 Webster]

{Ultimate belief}, a first principle incapable of proof; an intuitive truth; an intuition. --Sir W. Hamilton. [1913 Webster]

Syn: Credence; trust; reliance; assurance; opinion. [1913 Webster]

151 "belief" wn "WordNet (r) 3.0 (2006)"

belief

n 1: any cognitive content held as true [ant: {disbelief}, {unbelief}]

2: a vague idea in which some confidence is placed; "his impression of her was favorable"; "what are your feelings about the crisis?"; "it strengthened my belief in his sincerity"; "I had a feeling that she was lying" [syn: {impression}, {feeling}, {belief}, {notion}, {opinion}]

151 "BELIEF" bouvier "Bouvier's Law Dictionary, Revised 6th Ed (1856)"

BELIEF. The conviction of the mind, arising from evidence received, or from information derived, not from actual perception by our senses, but from. the relation or information of others who have had the means of acquiring actual knowledge of the facts and in whose qualifications for acquiring that knowledge, and retaining it, and afterwards in communicating it, we can place confidence. " Without recurring to the books of metaphysicians' "says Chief Justice Tilghman, 4 Serg. & Rawle, 137, "let any man of plain common sense, examine the operations of, his own mind, he will assuredly find that on different subjects his belief is different. I have a firm belief that, the moon revolves round the earth. I may believe, too, that there are mountains and valleys in the moon; but this belief is not so strong, because the evidence is weaker." Vide 1 Stark. Ev. 41; 2 Pow. Mortg. 555; 1 Ves. 95; 12 Ves. 80; 1 P. A. Browne's R 258; 1 Stark. Ev. 127; Dyer, 53; 2 Hawk. c. 46, s. 167; 3 Wil. 1, s. 427; 2 Bl. R. 881; Leach, 270; 8 Watts, R. 406; 1 Greenl. Ev. Sec. 7-13, a.

immibis•3w ago

Have you seen the 2025 METR report on AI coding productivity?

TLDR: everyone thought AI made people faster, including those who did the task, both before and after doing it. However, AI made people slower at doing the task.

Frieren•3w ago

> The productivity gains I'm seeing right now are unprecedented.

How long have you been in the industry?

This does not seem a revolution compared with database standardization, abandonment of assembly for most coding, introduction of game engines, etc.

I see a lot of hype for LLMs from people that do not have the experience to compare them to anything else.

blakeem567•3w ago

I been doing development for over 25 years and I completely agree with what they are saying. It's similar to going from punch cards to terminals. We were using assembly, COBOL, and Fortran in the 1990's and into the early 2000s. Zork even had a game engine. These are not the revolutions you think and have no hard cut off when change happened.

hydr0smok3•3w ago

haha enterprise python/django! that was good

whattheheckheck•3w ago

And you not only use the emdash once -- you use it twice

lxgr•3w ago

Considerations around current events aside, what exactly is the supposed "confidence trick" of mechanical or electronic calculators? They're labor-saving devices, not arbiters of truth, and as far as I can tell, they're pretty good at saving a lot of labor.

mono442•3w ago

I don't think it's true. It is probably overhyped but it is legitimately useful. Current agents can do around 70% of coding stuff I do at work with light supervision.

latexr•3w ago

> It is probably overhyped

That’s exactly what a con is: selling you something as being more than what it actually is. If you agree it’s overhyped by its sellers, you agree it’s a con.

> Current agents can do around 70% of coding stuff I do

LLMs are being sold as capable of significantly more than coding. Focusing on that singular aspect misses the point of the article.

Traubenfuchs•3w ago

Yeah there is overhyped marketing, but at this point, AI has revolutionized software engineering and is writing the majority of code world wide whether you like it or not and is still improving.

self_awareness•3w ago

> If your answer doesn’t match the calculator’s, you need to redo your work.

Hm... is it wrong to think like this?

falcor84•3w ago

> We should be afraid, they say, making very public comments about “P(Doom)” - the chance the technology somehow rises up and destroys us.

> This has, of course, not happened.

This is so incredibly shallow. I can't think of even a single doomer, who ever claimed that AI will destroy us by now. P(doom) is about the likelihood of it destroying us "eventually". And I haven't seen anything in this post or in any recent developments to make my reduce my own p(doom), which is not close to zero.

Here are some representative values: https://pauseai.info/pdoom

Meneth•3w ago

> This has, of course, not happened.

And that's the anthropic fallacy. In the worlds where it has happened, the author is dead.

falcor84•3w ago

A very good point too.

Though I personally hope that we'll have enough of a warning to convince people that there is a problem and give us a fighting chance. I grew up on Terminator and would be really disappointed if the AI kills me in an impersonal way.

grumbel•3w ago

> GPT-3 was supposedly so powerful OpenAI refused to release the trained model because of “concerns about malicious applications of the technology”. [...] This has, of course, not happened.

What parallel world are they living in? Every single online platform has been flooded with AI generated content and had to enact counter measures, or went the other way, embraced it and replaced humans with AI. AI use in scams has also become common place.

Everything they warned about with the release of GPT‑2 did in fact happen.

petesergeant•3w ago

Reading AI-denier articles in 2026 is almost as boring as reading crypto-booster articles was 10 years ago. You may not like LLMs, you may not want LLMs, but pretending they're not doing anything clever or useful is bizarre, however flowery you make your language.

latexr•3w ago

> Reading AI-denier articles in 2026 is almost as boring as reading crypto-booster articles was 10 years ago.

That’s quite a funny take, because I bet you someone will have made that same argument to criticise “crypto-deniers”.

> pretending they're not doing anything clever or useful

That isn’t at all the argument of the article. No one is claiming LLMs are completely useless or that they aren’t interesting technology. The critique is they’re being sold as way more than what they are or could be, that that has tangible negative consequences we can already feel, and the benefits don’t offset it.

someuser484848•3w ago

What? The comparison to the confidence trick from 400 years ago already stops at the second point? Why call this article that way if you are not going to bring up any parallels beyond... the extremely weak link of "building trust".

You have not actually made clear how mechanical calculators were a scam.

Ironically, this article feels like it was written by an LLM. Just a baseless opinion.

pancsta•3w ago

You missed the point.

> Simply put, these companies have fallen for a confidence trick. They have built on centuries of received wisdom about the efficacy and reliability of computers, and have been drawn in by highly effective salespeople selling scarcely-believable technological wonders.

Calculators are ok, but LLMs are not calculators.

someuser484848•3w ago

I see what you mean - it wasn't intended to be a parallel to mechanical calculators.

However, the title implies that they were a trick - otherwise why is the "confidence trick" 400 years old?

I feel like this kind of imprecise use of language is what makes it difficult to interact with LLMs in a meaningful way - perhaps that is the reason the author seems to dismiss the value of them.

xtiansimon•3w ago

> “…LLM vendors [are responsible for the message?] We should be afraid […] The purpose here is not to responsibly warn us of a real threat. If that were the aim there would be a lot more shutting down of data centres…”

Let’s not forget these innovations are on the heels of COVID. Strong, swift action by government, industry, and individuals against a deadly pathogen is “controversial”. Even if killer AI was here, twice shy…

I’m angry about a lot of things right now, but LLM “marketing” (and inadequate reporting which turns to science fiction instead of science) is not one of them. The LLM revolution is getting shoehorned into this Three Card Monte narrative, and I don’t see the utility.

The criticisms of LLM promise and danger is part of the zeitgeist. If firms are playing off of anything I bet it’s that, and not an industry wide conspiracy to trick the public and customers. Advertising and marketing meets people where they’re at, and “imagines” where they want to go, all wrapped up with the product. It doesn’t make the product frightening. It’s the same for all manner of dangerous technologies—guns, nuclear energy, whatever. The product is the solution to the fear.

> “The LLMs we have today are famously obsequious. The phrase “you’re absolutely right!” may never again be used in earnest.”

Hard NO. I get it, the language patterns of LLMs are creepy, but it’s not bad usage. So, no.

I can handle the cognitive dissonance of computer algorithms spewing out anthropomorphic phrasing and not decide that I, as a human being, can no longer in humility and honesty tell someone else they’re right, and i was wrong.

stavarotti•3w ago

Most of what I've been reading on either side of the argument is reductive. It's possible to have a take based on one's perspective and experience but it's impossible (at this time) to generalize things more broadly. I think what most people feel is multi-faceted: efficiency expectations from "leaders", job change inevitability (perceived or real), economic impact (should things not go well), loss of identity (am I a programmer, engineer, manager of things?), and several others. The discussions on the multiplicative effect of LLMs are being framed as a false dichotomy when it's far more complicated and nuanced.

josefritzishere•3w ago

This is spot on. LLMs are a money pit putting our whole economy at risk. There is no path to profitability so the trash fire continues to burn.

Havoc•3w ago

The confidence trick seems to be pretty good at making me tools I want

GuestFAUniverse•3w ago

A "confidence trick" doesn't generate a working program on demand -- I just did that yesterday. In a blink of an eye. It worked. It was clean (passed linter). It wasn't very complex, yet it would have taking me several minutes to type it myself. Again: two sentence description -> enter -> working program/script. Plus: with simple feedback the program is modified accordingly (and working).

While there might be open issues with AI, those AI companies are providing *far* more value than null.

motbus3•3w ago

"You're absolutely right!" Those models are geared towards continuing the text. I have my impression that without that, the model would disagree much more as a chat/conversation

erelong•3w ago

More like LLMs are 400-year-long project we're confident "just works"?

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Why I Joined OpenAI

Dark Alley Mathematics

Show HN: I spent 4 years building a UI design tool with only the features I use

A century of hair samples proves leaded gas ban worked

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Show HN: If you lose your memory, how to regain access to your computer?

Hackers (1995) Animated Experience

An Update on Heroku

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

How to effectively write quality code with AI

Learning from context is harder than we thought

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

Introducing the Developer Knowledge API and MCP Server

FORTH? Really!?

PC Floppy Copy Protection: Vault Prolok

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Show HN: Smooth CLI – Token-efficient browser for AI agents

The Oklahoma Architect Who Turned Kitsch into Art

I'm going to cure my girlfriend's brain tumor

Show HN: Slack CLI for Agents

Claude Composer

Evolution of car door handles over the decades

Planetary Roller Screws

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Why I Joined OpenAI

Dark Alley Mathematics

Show HN: I spent 4 years building a UI design tool with only the features I use

A century of hair samples proves leaded gas ban worked

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Show HN: If you lose your memory, how to regain access to your computer?

Hackers (1995) Animated Experience

An Update on Heroku

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

How to effectively write quality code with AI

Learning from context is harder than we thought

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

Introducing the Developer Knowledge API and MCP Server

FORTH? Really!?

PC Floppy Copy Protection: Vault Prolok

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Show HN: Smooth CLI – Token-efficient browser for AI agents

The Oklahoma Architect Who Turned Kitsch into Art

I'm going to cure my girlfriend's brain tumor

Show HN: Slack CLI for Agents

Claude Composer

Evolution of car door handles over the decades

Planetary Roller Screws

LLMs are a 400-year-long confidence trick

Comments