Seven replies to the viral Apple reasoning paper and why they fall short

https://garymarcus.substack.com/p/seven-replies-to-the-viral-apple

343•spwestwood•7mo ago

Comments

bluefirebrand•7mo ago

I'm glad to read articles like this one, because I think it is important that we pour some water on the hype cycle

If we want to get serious about using these new AI tools then we need to come out of the clouds and get real about their capabilities

Are they impressive? Sure. Useful? Yes probably in a lot of cases

But we cannot continue the hype this way, it doesn't serve anyone except the people who are financially invested in these tools.

fhd2•7mo ago

Even of the people invested in these tools, hype only benefits those attempting a pump and dump scheme, or those selling training, consulting or similar services around AI.

People who try to make genuine progress, while there's more money in it now, might just have to deal with another AI winter soon at this rate.

bluefirebrand•7mo ago

> hype only benefits those attempting a pump and dump scheme

I read some posts the other day saying Sam Altman sold off a ton of his OpenAI shares. Not sure if it's true and I can't find a good source, but if it is true then "pump and dump" does look close to the mark

aeronaut80•7mo ago

You probably can’t find a good source because sources say he has a negligible stake in OpenAI. https://www.cnbc.com/amp/2024/12/10/billionaire-sam-altman-d...

bluefirebrand•7mo ago

Interesting

When I did a cursory search, this information didn't turn up either

Thanks for correcting me. I suppose the stuff I saw the other day was just BS then

aeronaut80•7mo ago

To be fair I struggle to believe he’s doing it out of the goodness of his heart.

spookie•7mo ago

Think the same thing, we need more breakthroughs. Until then, it is still risky to rely on AI for most applications.

The sad thing is that most would take this comment the wrong way. Assuming it is just another doomer take. No, there is still a lot to do, and promissing the world too soon will only lead to disappointment.

Zigurd•7mo ago

This is the thing of it: "for most applications."

LLMs are not thinking. They way they fail, which is confidently and articulately, is one way they reveal there is no mind behind the bland but well-structured text.

But if I was tasked with finding 500 patents with weak claims or claims that have been litigated and knocked down, I would turn into LLMs to help automate that. One or two "nines" of reliability is fine, and LLMs would turn this previously impossible task into something plausible to take on.

mountainriver•7mo ago

I’ll take critiques from someone who knows what a test train split is.

The idea that a guy so removed from machine learning has something relevant to say about its capabilities really speaks to the state of AI fear

devwastaken•7mo ago

experts are often blinded by their paychecks to see how nonsense their expertise is

soulofmischief•7mo ago

[citation needed]

Spooky23•7mo ago

Remember Web 3.0? Lol

Zigurd•7mo ago

It's unfortunate that a discussion about LLM weaknesses is giving crypto bro. But telling. There are a lot of bubble valuations out there.

soulofmischief•7mo ago

It's only telling of the people who have a nascent understanding of tech cycles and who are more interested in confirming biases than attempting to respect or understand a subculture, and who are unable to recognize that every hype cycle will attract parasitic undesirables who are not representative of the movement they are hijacking.

soulofmischief•7mo ago

Yes, did you have an argument to make?

mountainriver•7mo ago

Not knowing the most basic things about the subject you are critiquing is utter nonsense. Defending someone who does this is even worse

bluefirebrand•7mo ago

I think it's pretty fair to be critical of what LLMs are producing and how they fit into the tools without necessarily understanding how they work

If you bought a chainsaw that broke when you tried to cut down a tree, then you can criticize the chainsaw without knowing how the motor on it works, right?

mountainriver•7mo ago

Except that he pretends to know how they work, and paints himself as an expert in the ecosystem but knows nothing about how they actually work

Spooky23•7mo ago

The idea that practitioners would try to discredit research to protect the golden goose from critique speaks to human nature.

mountainriver•7mo ago

No one is discrediting research from valid places, this is the victim alt-right style narrative that seems to follow Gary Marcus around. Somehow the mainstream is "suppressing" the real knowledge

senko•7mo ago

Gary Marcus isn't about "getting real", it's making a name for himself as a contrarian to the popular AI narrative.

This article may seem reasonable, but here he's defending a paper that in his previous article he called "A knockout blow for LLMs".

Many of his articles seem reasonable (if a bit off) until you read a couple dozen a spot a trend.

adamgordonbell•7mo ago

This!

For all his complaints about llms, his writing could be generated by an llm with a prompt saying: 'write an article responding to this news with an essay saying that you are once again right that this AI stuff is overblown and will never amount to anything.'

woopsn•7mo ago

Given that the links work, the quotes were actually said, numbers are correct, cited research actually exists etc we can immediately rule that out.

steamrolled•7mo ago

> Gary Marcus isn't about "getting real", it's making a name for himself as a contrarian to the popular AI narrative.

That's an odd standard. Not wanting to be wrong is a universal human instinct. By that logic, every person who ever took any position on LLMs is automatically untrustworthy. After all, they made a name for themselves by being pro- or con-. Or maybe a centrist - that's a position too.

Either he makes good points or he doesn't. Unless he has a track record of distorting facts, his ideological leanings should be irrelevant.

sinenomine•7mo ago

Marcus' points routinely fail to pass scrutiny, nobody in the field takes him seriously. If you seek real scientifically interesting LLM criticism, read François Chollet and his Arc AGI series of evals.

senko•7mo ago

He makes many very good points:

For example he continusly calls out AGI hype for what it is, and also showcases dangers of naive use of LLMs (eg. lawyers copy-pasting hallucinated cases into their documents, etc). For this, he has plenty of material!

He also makes some very bad points and worse inferences: that LLMs as a technology are useless because they can't lead to AGI, that hallucation makes LLMs useless (but then he contradicts himself in another article conceding they "may have some use"), that because they can't follow an algorithm they're useless, etc, that scaling laws are over therefore LLMs won't advance (he's been making that for a couple of years), that AI bubble will collapse in a few months (also a few years of that), etc.

Read any of his article (I've read too many, sadly) and you'll never come to the conclusion that LLMs might be a useful technology, or be "a good thing" even in some limited way. This just doesn't fit with reality I can observe with my own eyes.

To me, this shows he's incredibly biased. That's okay if he wants to be a pundit - I couldn't blame Gruber for being biased about Apple! But Marcus presents himself as the authority on AI, a scientist, showing a real and unbiased view on the field. In fact, he's as full of hype as Sam Altman is, just in another direction.

Imagine he was talking about aviation, not AI. 787 dreamliner crashes? "I've been saying for 10 years that airplanes are unsafe, they can fall from the sky!" Boeing the company does stupid shit? "Blown door shows why airplane makers can't be trusted" Airline goes bankrupt? "Air travel winter is here"

I've spoken to too many intelligent people who read Marcus, take him at his words and have incredibly warped views on the actual potential and dangers of AI (and send me links to his latest piece with "so this sounds pretty damning, what's your take?"). He does real damage.

Compare him with Simon Willison, who also writes about AI a lot, and is vocal about its shortcomings and dangers. Reading Simon, I never get the feeling I'm being sold on a story (either positive or negative), but that I learned something.

Perhaps a Marcus is inevitable as a symptom of the Internet's immune system to the huge amount of AI hype and bullshit being thrown around. Perhaps Gary is just fed up with everything and comes out guns blazing, science be damned. I don't know.

But in my mind, he's as much BSer as the AGI singularity hypers.

ImageDeeply•7mo ago

> Compare him with Simon Willison, who also writes about AI a lot, and is vocal about its shortcomings and dangers. Reading Simon, I never get the feeling I'm being sold on a story (either positive or negative), but that I learned something.

Very true!

2muchcoffeeman•7mo ago

What’s the argument here that he’s not considering all the information regarding GenAI?

That there’s a trend to his opinion?

If I consider all the evidence regarding gravity, all my papers will be “gravity is real”.

In what ways is he only choosing what he wants to hear?

senko•7mo ago

Replied elsewhere in the thread: https://news.ycombinator.com/item?id=44279283

To your example about gravity, I argue that he goes from "gravity is real" to "therefore we can't fly", and "yeah maybe some people can but that's not really solving gravity and they need to go down eventually!"

2muchcoffeeman•7mo ago

If your argument about my gravity example holds. That’s not really a good argument. Between Newtons death and the first powered flight was almost 200 years. Being all negative about gravity would be reasonable since a bunch of stuff had to happen.

I’m not sure I buy your longer argument either.

I have a feeling the nay sayers are right on this. The next leap in AI isn’t something we’re going to recognise. (Obviously it’s possible - humans exist)

newswasboring•7mo ago

What exactly is your objection here? That the guy has an opinion and is writing about it?

senko•7mo ago

Replied elsewere in the thread: https://news.ycombinator.com/item?id=44279283

ramchip•7mo ago

I was very put off by his article "A knockout blow for LLMs?", especially all the fuss he was making about using his own name as a verb to mean debunking AI hype...

ninjin•7mo ago

Marcus comes with a very standard cognitive science criticism of statistical approaches to artificial intelligence, many parts of which dates back to the late 50s from when the field was born and moved to distance itself from behaviourism. The worst part to me is not that his criticism is entirely wrong, but rather that it is obvious and yet peddled as something that those of us that develop statistical approaches are completely ignorant of. To make matters worse, instead of developing alternative approaches (like plenty of my colleagues in cognitive science do!), he simply reiterates pretty much the same points over and over and has done so at least for the last twenty or so years. He and others paint themselves as sceptics and bulwarks against the current hype (which I can assure you, I hate at least as much as they do). But, to me, they are cynics, not sceptics.

I try to maintain a positive and open mind of other researchers, but Marcus lost me pretty much at "first contact" when a student in the group who leaned towards cognitive science had us read "Deep Learning: A Critical Appraisal" by Marcus (2018) [1] back around when it was published. Finally I could get into the mind of this guy so many people were talking about! 27 pages and yet I learned next to nothing new as the criticism was just the same one we have heard for decades: "Statistical learning has limits! It may not lead to 'truly" intelligent machines!". Not only that, the whole piece consistently conflates deep learning and statistical learning for no reason at all, reads as if it was rushed (and not proofed), emphasises the author's research strongly rather than giving a broad overview, etc. In short, it is bad, very bad as a scientific piece. At times, I read short excerpts of an article Marcus has written and yet sadly it is pretty much the same thing all over again.

[1]: https://arxiv.org/abs/1801.00631

There is a horrible market to "sell" hype when it comes to artificial intelligence, but there is also a horrible market to "sell" anti-hype. Sadly, both brings traffic, attention, talk invitations, etc. Two largely unscientific tribes, that I personally would rather do without, with their own profiting gurus.

bigyabai•7mo ago

There's something innately funny about "HN's undying optimism" and "bad-news paper from Apple" reaching a head like this. An unstoppable object is careening towards an impervious wall, anything could happen.

DiogenesKynikos•7mo ago

I don't understand what people mean when they say that AI is being hyped.

AI is at the point where you can have a conversation with it about almost anything, and it will answer more intelligently than 90% of people. That's incredibly impressive, and normal people don't need to be sold on it. They're just naturally impressed by it.

FranzFerdiNaN•7mo ago

I don’t need a tool that’s right maybe 70% of the time (and that’s me being optimistic). It needs to be right all the time or at least tell you when it doesn’t know for sure, instead of just making up something. Comparing it to going out in the streets and asking random people random questions is not a good comparison.

newswasboring•7mo ago

> I don’t need a tool that’s right maybe 70% of the time (and that’s me being optimistic).

Where are you getting this from? 70%?

amohn9•7mo ago

It might not fit your work, but there are tons of areas where “good enough” can still provide a lot of value. I’m sure you’d be thrilled with a tool that could correctly tell you if Apple’s stock was going up or down tomorrow 70% of the time.

chongli•7mo ago

I work in a mail room sending hard copy letters to customers. If I got my job right only 70% of the time then I’d be causing massive privacy breaches daily by sending the wrong personal information to the wrong customers.

Would you trust an AI that gets your banking transactions right only 70% of the time?

amohn9•7mo ago

No. I also wouldn’t use a hammer to cut a board in half - I’d grab a saw. Knowing how to pick the right tool is a fundamental part of being a good engineer. Sometimes 70% is unacceptable, sometimes it’s exceptional. LLMs are incredible technology, but also just another tool in the toolbox. Use them where they fit, not where they don’t.

chongli•7mo ago

Sure, though the marketing around LLMs is aimed at general purpose use. It’s up to the user to decide if it’s actually useful for their use case. Unfortunately, many use cases in business can’t tolerate high error rates.

hellohello2•7mo ago

Its quite simple, people upvote content that makes them feel good. Most of us here are programmers and the idea that many of ours skills are becoming replaceable feels quite bad. Hence, people upvote delusional statements that let them believe in something that feels better than objective reality. With any luck, these comments will be scraped and used to train the next AI generation, relieving it from the burden of factuality at last.

travisgriggs•7mo ago

I get even better results talking to myself.

georgemcbay•7mo ago

AI, in the form of LLMs, can be a useful tool.

It is still being vastly overhyped, though, by people attempting to sell the idea that we are actually close to an AGI "singularity".

Such overhype is usually easy to handwave away as like not my problem. Like, if investors get fooled into thinking this is anything like AGI, well, a fool and his money and all that. But investors aside this AI hype is likely to have some very bad real world consequences based on the same hype-men selling people on the idea that we need to generate 2-4 times more power than we currently do to power this godlike AI they are claiming is imminent.

And even right now there's massive real world impact in the form of say, how much grok is polluting Georgia.

woopsn•7mo ago

If the claims about AI were that it is a great or even incredible chat app, there would be no mismatch.

I think normal people understand curing all disease, replacing all value, generating 100x stock market returns, uploading our minds etc to be hype.

I said a few days ago, LLM is amazing product. Sad that these people ruin their credibility immediately upon success.

bandrami•7mo ago

How actually useful are they though? We've had more than a year now of saying these things 10X knowledge workers and creatives, so.... where is the output? Is there a new office suite I can try? 10 times as many mobile apps? A huge new library of ebooks? Is this actually in practice producing things beyond Ghibli memes and RETVRN nostalgia slop?

2muchcoffeeman•7mo ago

I think it largely depends on what you’re writing. I’ve had it reply to corporate emails which is good since I need to sound professional not human.

If I’m coding it still needs a lot of baby sitting and sometimes I’m much faster than it.

Gigachad•7mo ago

And then the person on the end is using AI to summarise the email back to normal English. To what end?

js8•7mo ago

But look the GDP has increased!

bandrami•7mo ago

But that's what I don't get: it hasn't in that scenario because that doesn't lead to a greater circulation of money at any point. And that's the big thing I'm looking for: something AI has created that consumers are willing to pay for. Because if that doesn't end up happening no amount of sunk investment is going to save the ecosystem.

bandrami•7mo ago

So this would be an interesting output to measure but I have no idea how we would do that: has the volume of corporate email gone up? Or the time spent creating it gone down?

hiddencost•7mo ago

Why do we keep posting stuff from Gary? He's been wrong for decades but he keeps writing this stuff.

As far as I can tell he's the person that people reach for when they want to justify their beliefs. But surely being this wrong for this wrong should eventually lead to losing ones status as an expert.

NoahZuniga•7mo ago

None of the arguments presented in this piece depend on his authority as an expert, so this is largely irrelevant.

jakewins•7mo ago

I thought this article seemed like well articulated criticism of the hype cycle - can you be more specific what you mean? Are the results in the Apple paper incorrect?

astrange•7mo ago

Gary Marcus always, always says AI doesn't actually work - it's his whole thing. If he's posted a correct argument it's a coincidence. I remember seeing him claim real long-time AI researchers like David Chapman (who's a critic himself) were wrong anytime they say anything positive.

(em-dash avoided to look less AI)

Of course, the main issue with the field is the critics /should/ be correct. Like, LLMs shouldn't work and nobody knows why they work. But they do anyway.

So you end up with critics complaining it's "just a parrot" and then patting themselves on the back, as if inventing a parrot isn't supposed to be impressive somehow.

foldr•7mo ago

I don’t read GM as saying that LLMs “don’t work” in a practical sense. He acknowledges that they have useful applications. Indeed, if they didn’t work at all, why would he be advocating for regulating their use? He just doesn’t think they’re close to AGI.

kadushka•7mo ago

The funny thing is, if you asked “what is AGI” 5 years ago, most people would describe something like o3.

foldr•7mo ago

Even Sam Altman thinks we’re not at AGI yet (although of course it’s coming “soon”).

kadushka•7mo ago

Markus has been consistently wrong over the many years predicting the (lack of) progress of the current deep learning methods. Altman has been correct so far.

foldr•7mo ago

Marcus has made some good predictions and some bad ones. That’s usually the way with people who make specific predictions — there are no prophets.

Not sure I’d agree that SA has been any more consistently right. You can easily find examples of overconfidence from him (though he rarely says anything specific enough to count as a prediction).

barrkel•7mo ago

You need to read everything that Gary writes with the particular axe to grind he has in mind: neurosymbolic AI. That's his specialism, and he essentially has a chip in his shoulder about the attention probabilistic approaches like LLMs are getting, and their relative success.

You can see this in this article too.

The real question you should be asking is if there is a practical limitation in LLMs and LRMs revealed by the Hanoi Towers problem or not, given that any SOTA model can write code to solve the problem and thereby solve it with tool use. Gary frames this as neurosymbolic, but I think it's a bit of a fudge.

krackers•7mo ago

Hasn't the symbolic vs statistical split in AI existed for a long time? With things like Cyc growing out of the former. I'm not too familiar with linguistics but maybe this extends there too, since I think Chomsky was heavy on formal grammars over probabilistic models [1].

Must be some sort of cognitive sunk cost fallacy, after dedicating your life to one sect, it must be emotionally hard to see the other "keep winning". Of course you'd root for them to fall.

[1] https://norvig.com/chomsky.html

charcircuit•7mo ago

>with tool use

A LLM with tool use can solve anything. It is interesting to try and measure its capabilities without tools.

barrkel•7mo ago

I don't think the first is true at all, unless you imagine some powerful oracle tools.

I think the second is interesting for comparing models, but not interesting for determining the limits of what models can automate in practice.

It's the prospect of automating labour which makes AI exciting and revolutionary, not their ability when arbitrarily restricted.

charcircuit•7mo ago

Search is already a pretty powerful oracle to defer an answer to a human and is a common tool most AI use today.

What current models can automate is not what the paper was trying to answer.

barrkel•7mo ago

What current models can automate is why they are exciting, and the attention the paper is getting because of how it cuts into this excitement. It follows logically that the attention is somewhat misplaced.

RugnirViking•7mo ago

Isn't the point of automating labour though to automate that which is not/was already not automated?

It would draw on many previously written examples of algorithms to write the code for solving Hanoi. To solve a novel problem with tool use, one needs to work sequentially while staying on task, notice where you've gone wrong, and backtrack.

I don't want to overstate the case here, I'm sure there is work where there's enough intersection between previously existing stuff in the dataset and few enough sequential steps required that useful work can be done, but idk how much you've tried using this stuff as a labour saving device, there's less low hanging fruit than one might think, but more than zero.

barrkel•7mo ago

There is a decent labour savings to be had in code generation, but under strict guidance with examples.

There's a more substantial savings to be had in research scenarios. The AI can read more and synthesize more, and faster, than I can on my own, and provide references for checking correctness.

I'm not confident enough to say that the approaches being taken now have a hard stopping point any time soon or are inherently bound to a certain complexity.

Human minds can only cope with a certain complexity too and need abstraction to chunk details into atomic units following simpler rules. Yet we've come a long way with our limited ability to cope with complexity.

mountainriver•7mo ago

It’s insane, he doesn’t know what a test train split is but he’s an AI expert? Is this where we are?

marvinborner•7mo ago

Is this supposed to be a joke reflecting point (3)?

hrldcpr•7mo ago

In case anyone else missed the original paper (and discussion):

https://news.ycombinator.com/item?id=44203562

dang•7mo ago

Thanks! Macroexpanded:

The Illusion of Thinking: Strengths and limitations of reasoning models [pdf] - https://news.ycombinator.com/item?id=44203562 - June 2025 (269 comments)

Also this: A Knockout Blow for LLMs? - https://news.ycombinator.com/item?id=44215131 - June 2025 (48 comments)

Were there others?

avsteele•7mo ago

This doesn't rebut anything from the best critique of the Apple paper.

https://arxiv.org/abs/2506.09250

Jabbles•7mo ago

Those are points (2) and (5).

foldr•7mo ago

It does rebut point (1) of the abstract. Perhaps not convincingly, in your view, but it does directly addresses this kind of response.

avsteele•7mo ago

Papers make specific conclusions based on specific data. The paper I linked specifically rebuts the conclusions of the paper. Gary makes vague statements that could be interpreted as being related.

It is scientific malpractice to write a post supposedly rebutting responses to a paper and not directly address the most salient one.

foldr•7mo ago

This sort of omission would not be considered scientific malpractice even in a journal article, let alone a blog post. A rebuttal of a position that fails to address the strongest arguments for it is a bad rebuttal, but it’s not scientific malpractice to write a bad paper — let alone a bad blog post.

I don’t think I agree with you that GM isn’t addressing the points in the paper you link. But in any case, you’re not doing your argument any favors by throwing in wild accusations of malpractice.

avsteele•7mo ago

Malpractice slightly hyperbolic.

But anybody relying on Gary's posts in order to be be informed on this subject is being being mislead. This isn't an isolated incident either.

People need to be made be aware when you read him it is mere punditry, not substantive engagement with the literature.

spookie•7mo ago

A paper citing arxiv papers and x.com doesn't pass my smell test tbh

skywhopper•7mo ago

The quote from the Salesforce paper is important: “agents displayed near-zero confidentiality awareness”.

bowsamic•7mo ago

This doesn’t address the primary issue: that they had no methodology for choosing puzzles that weren’t in the training set and indeed while they claimed to have chosen puzzles that aren’t they didn’t explain why they think that. The whole point of the paper was to test LLM reasoning in untrained cases but there’s no reason to expect such puzzles to not part of the training set, and if you don’t have any way of telling if it is not or then your paper is not going to work out

roywiggins•7mo ago

Isn't it worse for LLMs if an LLM that has been trained on the Towers of Hanoi still can't solve it reliably?

bowsamic•7mo ago

Yes

anonthrowawy•7mo ago

how could you prove that?

bowsamic•7mo ago

You couldn’t, so such a paper cannot be scientific

(Or it should not be based on that claim as a central point, which apples paper was)

mentalgear•7mo ago

AI hype-bros like to complain that real AI experts are too much concerned about debunking current AI then improving it - but the truth is that debunking bad AI IS improving AI. Science is a process of trial and error which only works by continuously questioning the current state.

neepi•7mo ago

Indeed. I completely agree with this.

My objection to the whole thing is the AI hype bros, which is really the funding solicitation facade over everything rather the truth, only has one outcome and that is that it cannot be sustained. At that point all investor confidence disappears, the money is gone and everyone loses access to the tools that they suddenly built all their dependencies on because it's all proprietary service model based.

Which is why I am not poking it with a 10 foot long shitty stick any time in the near future. The failure mode scares me, not the technology which arguably does have some use in non-idiot hands.

wongarsu•7mo ago

A lot of the best internet services came around in the decade after the dot-com crash. There is a chance Anthropic or OpenAI may not survive when funding suddenly dries up, but existing open weight models won't be majorly impacted. There will always be someone willing to host DeepSeek for you if you're willing to pay.

And while it will be sad to see model improvements slow down when the bubble bursts there is a lot of untapped potential in the models we already have. Especially as they become cheaper and easier to run

neepi•7mo ago

Someone might host DeepSeek for you but you'll pay through the nose for it and it'll be frozen in time because the training cost doesn't have the revenue to keep the ball rolling.

I'm not sure the GPU market won't collapse with it either. Possibly taking out a chunk of TSMC in the process, which will then have knock on effects across the whole industry.

wongarsu•7mo ago

There are already inference providers like DeepInfra or inference.net whose entire business model is hosted inference of open-source models. They promise not to keep or use any of the data and their business model has no scaling effects, so I assume they are already charging a fair market rate where the price covers the costs and returns a profit.

The GPU market will probably take a hit. But the flip side of that is that the market will be flooded with second-hand enterprise-grade GPUs. And if Nvidia needs sales from consumer GPUs again we might see more attractive prices and configurations there too. In the short term a market shock might be great for hobby-scale inference, and maybe even training (at the 7B scale). In the long term it will hurt, but if all else fails we still have AMD who are somehow barely invested in this AI boom

xoac•7mo ago

Yeah this is history repeating. See for example less known “Dreyfuss affair” at MIT and the brilliantly titled books: “What Computers Can’t Do” and its sequel “What Computers Still Can’t Do”.

bobxmax•7mo ago

> AI hype-bros like to complain that real AI experts are too much concerned about debunking current AI then improving it

You're acting like this is a common ocurrence lol

3abiton•7mo ago

To hammer one point though, you have to understand that researcher are desensitized to minor novel improvement that translate to great value products. While obviously studying and assessing the limitations of AI is crucial, to the general public its capabilities are just so amazing, they can't fathom why we should think about limitations. Optimizing what we have is bette than rethinking the whole process.

dang•7mo ago

Can you please make your substantive points without name-calling or swipes? This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.

labrador•7mo ago

The key insight is that LLMs can 'reason' when they've seen similar solutions in training data, but this breaks down on truly novel problems. This isn't reasoning exactly, but close enough to be useful in many circumstances. Repeating solutions on demand can be handy, just like repeating facts on demand is handy. Marcus gets this right technically but focuses too much on emotional arguments rather than clear explanation.

Jabrov•7mo ago

I’m so tired of hearing this be repeated, like the whole “LLMs are _just_ parrots” thing.

It’s patently obvious to me that LLMs can reason and solve novel problems not in their training data. You can test this out in so many ways, and there’s so many examples out there.

______________

Edit for responders, instead of replying to each:

We obviously have to define what we mean by "reasoning" and "solving novel problems". From my point of view, reasoning != general intelligence. I also consider reasoning to be a spectrum. Just because it cannot solve the hardest problem you can think of does not mean it cannot reason at all. Do note, I think LLMs are generally pretty bad at reasoning. But I disagree with the point that LLMs cannot reason at all or never solve any novel problems.

In terms of some backing points/examples:

1) Next token prediction can itself be argued to be a task that requires reasoning

2) You can construct a variety of language translation tasks, with completely made up languages, that LLMs can complete successfully. There's tons of research about in-context learning and zero-shot performance.

3) Tons of people have created all kinds of challenges/games/puzzles to prove that LLMs can't reason. One by one, they invariably get solved (eg. https://gist.github.com/VictorTaelin/8ec1d8a0a3c87af31c25224..., https://ahmorse.medium.com/llms-and-reasoning-part-i-the-mon...) -- sometimes even when the cutoff date for the LLM is before the puzzle was published.

4) Lots of examples of research about out-of-context reasoning (eg. https://arxiv.org/abs/2406.14546)

In terms of specific rebuttals to the post:

1) Even though they start to fail at some complexity threshold, it's incredibly impressive that LLMs can solve any of these difficult puzzles at all! GPT3.5 couldn't do that. We're making incremental progress in terms of reasoning. Bigger, smarter models get better at zero-shot tasks, and I think that correlates with reasoning.

2) Regarding point 4 ("Bigger models might to do better"): I think this is very dismissive. The paper itself shows a huge variance in the performance of different models. For example, in figure 8, we see Claude 3.7 significantly outperforming DeepSeek and maintaining stable solutions for a much longer sequence length. Figure 5 also shows that better models and more tokens improve performance at "medium" difficulty problems. Just because it cannot solve the "hard" problems does not mean it cannot reason at all, nor does it necessarily mean it will never get there. Many people were saying we'd never be able to solve problems like the medium ones a few years ago, but now the goal posts have just shifted.

labrador•7mo ago

I've done this excercise dozens of times because people keep saying it, but I can't find an example where this is true. I wish it was. I'd be solving world problems with novel solutions right now.

People make a common mistake by conflating "solving problems with novel surface features" with "reasoning outside training data." This is exactly the kind of binary thinking I mentioned earlier.

jhanschoo•7mo ago

I think that "solving world problems with novel solutions" is a strawman for an ability to reason well. We cannot solve world problems with reasoning, because pure reasoning has no relation to reality. We lack data and models about the world to confirm and deny our hypotheses about the world. That is why the empirical sciences do experiments instead of sit in an armchair and mull all day.

jjaksic•7mo ago

"Solving novel problems" does not mean "solving world problems that even humans are unable to solve", it simply means solving problems that are "novel" compared to what's in the training data.

Can you reason? Yes? Then why haven't you cured cancer? Let's not have double standards.

lossolo•7mo ago

They can't create anything novel and it's patently obvious if you understand how they're implemented. But I'm just some anonymous guy on HN, so maybe this time I will just cite the opinion of the DeepMind CEO, who said in a recent interview with The Verge (available on YouTube) that LLMs based on transformers can't create anything truly novel.

labrador•7mo ago

"I don't think today's systems can invent, you know, do true invention, true creativity, hypothesize new scientific theories. They're extremely useful, they're impressive, but they have holes."

Demis Hassabis On The Future of Work in the Age of AI (@ 2:30 mark)

https://www.youtube.com/watch?v=CRraHg4Ks_g

lossolo•7mo ago

Yes, this one. Thanks

gjm11•7mo ago

He doesn't say "that LLMs based on transformers can't create anything truly novel". Maybe he thinks that, maybe not, but what he says is that "today's systems" can't do that. He doesn't make any general statement about what transformer-based LLMs can or can't do; he's saying: we've interacted with these specific systems we have right now and they aren't creating genuinely novel things. That's a very different claim, with very different implications.

Again, for all I know maybe he does believe that transformer-based LLMs as such can't be truly creative. Maybe it's true, whether he believes it or not. But that interview doesn't say it.

jjaksic•7mo ago

Since when is reasoning synonymous with invention? All humans with a functioning brain can reason, but only a tiny fraction have or will ever invent anything.

lossolo•7mo ago

Read what OP said "It’s patently obvious to me that LLMs can ... solve novel problems", this is what I was replying to. I see everyone is smarter here than researchers at DeepMind, without any proofs or credentials to back their claims.

bfung•7mo ago

Any links or examples available? Curious to try it out

multjoy•7mo ago

Lol, no.

aucisson_masque•7mo ago

> It’s patently obvious that LLMs can reason and solve novel problems not in their training data.

Would you care to tell us more ?

« It’s patently obvious » is not really an argument, I could say just as well that everyone know LLM can’t resonate or think (in the way we living beings do).

travisjungroth•7mo ago

Copied from a past comment of mine:

I just made up this scenario and these words, so I'm sure it wasn't in the training data.

Kwomps can zark but they can't plimf. Ghirns are a lot like Kwomps, but better zarkers. Plyzers have the skills the Ghirns lack.

Quoning, a type of plimfing, was developed in 3985. Zhuning was developed 100 years earlier. I have an erork that needs to be plimfed. Choose one group and one method to do it.

> Use Plyzers and do a Quoning procedure on your erork.

If that doesn't count as reasoning or generalization, I don't know what does.

firesteelrain•7mo ago

It’s just a truth table. I had a hunch that it was a truth table and then I asked AI how it figured it out and it confirmed it built a truth table. Still impressive either way

* Goal: Pick (Group ∧ Method) such that Group can plimf ∧ Method is a type of plimfing

* Only one group (Plyzers) passes the "can plimf" test

* Only one method (Quoning) is definitely plimfing

Therefore, the only valid (Group ∧ Method) combo is: → (Plyzer ∧ Quoning)

Source: ChatGPT

travisjungroth•7mo ago

So? Is the standard now that reasoning using truth tables or reasoning that can be expressed as truth tables doesn’t count?

krackers•7mo ago

If anything you'd think that the neurosymbolic people would be pleased that the LLMs do in fact reason by learning circuits representing boolean logic and truth tables. In a way they were right, it's just that starting with logic and then feeding in knowledge grounded in that logic (like Cyc) seems less scalable than feeding in knowledge and letting the model infer the underlying logic.

firesteelrain•7mo ago

Right, that’s my point. LLMs are doing pattern abstraction and in this way can mimic logic. They are not trained explicitly to do just truth tables even thought truth tables are fundamental.

socalgal2•7mo ago

I'm working on new API. I asked the LLM to read the spec and write tests for it. It does. I don't know if that's "reasoning". I know that no tests exist for this API. I know that the internet is not full of training data for this API because it's a new API. It's also not a CRUD API or some other API that's got a common pattern. And yet, with a very short prompt, Gemini Code Assist wrote valid tests for a new feature.

It certainly feels like more than fancy auto-complete. That is not to say I haven't run into issue but I'm still often shocked at how far it gets. And that's today. I have no idea what to expect in 6 months, 12, 2 years, 4, etc.

godelski•7mo ago

  > I know that the internet is not full of training data for this API because it's a new API.

1) are you sure? That's a bold guess. It was also a really stupid assumption made by the HumanEval benchmark authors. That if you "hand write" simple leet code style questions then you can train on all of GitHub. Go ahead, go look at what kinds of questions are in that benchmark...

2) LLMs aren't discrete databases. They are curve fitting functions. Compression. They work in very very high dimensions. They can generate new data but that is limited. People mostly aren't saying that LLMs can't create novel things but that they can't reason in the way that humans can. Humans can't memorize half of what a LLM can yet are able to figure out lots of crazy shit.

andrewmcwatters•7mo ago

It's definitely not true in any meaningful sense. There are plenty of us practitioners in software engineering wishing it was true, because if it was, we'd all have genius interns working for us on Mac Studios at home.

It's not true. It's plainly not true. Go have any of these models, paid, or local try to build you novel solutions to hard, existing problems despite being, in some cases, trained on literally the entire compendium of open knowledge in not just one, but multiple adjacent fields. Not to mention the fact that being able to abstract general knowledge would mean it would be able to reason.

They. Cannot. Do it.

I have no idea what you people are talking about because you cannot be working on anything with real substance that hasn't been perfectly line fit to your abundantly worked on problems, but no, these models are obviously not reasoning.

I built a digital employee and gave it menial tasks that compare to current cloud solutions who also claim to be able to provide you paid cloud AI employees and these things are stupider than fresh college grads.

goalieca•7mo ago

So far they cannot even answer questions which are straight up fact checking and search engine like queries. Reasoning means they would be able to work through a problem and generate a proof they way a student might.

Workaccount2•7mo ago

So if they have bad memory, then they must be reasoning to get the correct answer for the problems they do solve?

Jensson•7mo ago

A clock that is right twice a day is still broken.

Workaccount2•7mo ago

I think it's more fair to say a clock that is wrong twice a day is still broken...

astrange•7mo ago

> It’s patently obvious to me that LLMs can reason and solve novel problems not in their training data.

So can real parrots. Parrots are pretty smart creatures.

lossolo•7mo ago

Your entire edit essentially walks back your earlier strong claims.

None of your current points actually support your position.

1. No, it doesn't. That's a ridiculous claim. Are you seriously suggesting that statistics require reasoning?

2. If you map that language to tokens, it's obvious the model will follow that mapping.

etc.

Here are papers showing that these models can't reason:

https://arxiv.org/abs/2311.00871

https://arxiv.org/abs/2309.13638

https://arxiv.org/abs/2311.09247

https://arxiv.org/abs/2305.18654

https://arxiv.org/abs/2309.01809

You're mistaking pattern matching and the modeling of relationships in latent space for genuine reasoning.

I don't know what you're working on, but while I'm not curing cancer, I am solving problems that aren't in the training data and can't be found on Google. Just a few days ago, Gemini 2.5 Pro literally told me it didn’t know what to do and asked me for help. The other models hallucinated incorrect answers. I solved the problem in 15 minutes.

If you're working on yet another CRUD app, and you've never implemented transformers yourself or understood how they work internally, then I understand why LLMs might seem like magic to you.

YeGoblynQueenne•7mo ago

>> 1) Next token prediction can itself be argued to be a task that requires reasoning

That is wishful thinking popularised by Ilya Sutskever and Greg Brockman of OpenAI to "explain" why LLMs are a different class of system than smaller language models or other predictive models.

I'm sorry to say that (John Mearsheimer voice) that's simply not a serious argument. Take a multivariate regression model that predicts blood pressure from demographic data (age, sex, weight, etc). You can train a pretty accurate model for that kind of task if you have enough data (a few thousand data points). Does that model need to "reason" about human behaviour in order to be good at predicting BP? Nope. All it needs is a lot of data. That's how statistics works. So why is it different for a predictive model of BP and different for a next-token prediction model? The only answer seems to be "because language is magickal and special". But without any attempt to explain why, in terms of sequence prediction, language is special. Unless the er reasoning is that humans can produce language, humans can reason, LLMs can produce language, therefore LLMs can reason; which obviously doesn't follow.

But I have to guess here because neither Sutskever nor Brockman have ever tried to explain why next token prediction needs reasoning (or, more precisely, "understanding", the term they have used).

godelski•7mo ago

  > That is wishful thinking popularised by Ilya Sutskever

Ilya and Hinton have claimed even crazier things

  | to understand next token prediction you must understand the casual reality

This is objectively false. It's a result known in physics to be wrong for centuries. You can probably reason a weaker case yourself, that I'm sure you can make accurate predictions about some things without fully understanding them.

But the stronger version is the entire difficulty of physics and causal modeling. Distinguishing a confounding variable is very very hard. But you can still make accurate predictions without access to the underlying causal graph

YeGoblynQueenne•7mo ago

Hinton and Sutskever are victims of their own success: they can say whatever they like and nobody dares criticise them, or tell them how they're wrong.

I recently watched a video of Sutskever speaking to some students, not sure where and I can't dig out the link now. To summarise he told them that the human brain is a biological computer. He repeated this a couple of times then said that this is why we can create a digital computer that can do everything a brain can.

This is the computational theory of mind, reduced to a pin-point with all context removed. Two seconds of thought suffice to show how that doesn't work: if a digital computer can do everything the brain can do, because the brain is a biological computer, then how come the brain can't do everything a digital computer can do? Is it possible that two machines can be both computers, and still not equivalent in every sense of the term? Nooooo!!! Biological computers!! AGI!!

Those guys really need to stop and think about what they're talking about before someone notices what they're saying and the entire field becomes a laughing stock.

TeMPOraL•7mo ago

> Two seconds of thought suffice to show how that doesn't work: if a digital computer can do everything the brain can do, because the brain is a biological computer, then how come the brain can't do everything a digital computer can do? Is it possible that two machines can be both computers, and still not equivalent in every sense of the term? Nooooo!!! Biological computers!! AGI!!

Another two seconds of thought would suffice to answer that: because you can freely change neither hardware or software of the brain, like you can with computers.

Obviously, Angry Birds on the phone can't do everything digital computers can do, but that doesn't mean a smartphone isn't a digital computer.

staticman2•7mo ago

Another 2 seconds of thought might have told you only a magic genie can "freely" change hardware and software capability.

Humans have to work within whatever constraints accompany being physical things with physical bodies trying to invent software and hardware in the physical world.

godelski•7mo ago

I'm fine with calling the brain a computer. A computer is a very vague term. But yes, I agree that the conclusion does not necessarily follow. It's possible, not not necessarily

TeMPOraL•7mo ago

> Take a multivariate regression model that predicts blood pressure from demographic data (age, sex, weight, etc). You can train a pretty accurate model for that kind of task if you have enough data (a few thousand data points). Does that model need to "reason" about human behaviour in order to be good at predicting BP? Nope. All it needs is a lot of data. That's how statistics works. So why is it different for a predictive model of BP and different for a next-token prediction model?

For one, because the goal function for the latter is "predict output that makes sense to humans", in the fully broad, fully general sense of that statement.

It's not just one thing, like parse grocery lists, XOR write simple code, XOR write a story, XOR infer sentiment. XOR be a lossy cache for Wikipedia. It's all of them, separate or together, plus much more, plus correctly handling humor, sarcasm, surface-level errors (e.g. typos, naming), implied rules, shorthands, deep errors (think user being confused and using terminology wrong; LLMs can handle that fine), and an uncountable number of other things (because language is special, see below). It's quite obvious this is a different class of things than a narrowly specialized model like BP predictor.

And yes, language is special. Despite Chomsky's protestations to the contrary, it's not really formally structured; all the grammar and syntax and vocabulary is merely classification of high-level patterns that tend to occur (though invention of print and public education definitely strengthened them). Any experience with learning a language, or actual talking to other people, makes it obvious that grammar or vocabulary are neither necessary nor sufficient to communication. At the same time, though, once established, the particular choices become another dimension that packs meaning (as it becomes apparent when e.g. pondering why some books or articles seem better than other).

Ultimately, language not a set of easy patterns you can learn (or code symbolically!) - it's a dance people do when communicating, whose structure is fluid and bound by reasoning capabilities of humans. Being able to reason this way is required to communicate with real humans in real, generic scenarios. Now, this isn't a proof LLMs can do it, but the degree to which they excel at this is at least a strong suggestion they qualitatively could be.

swat535•7mo ago

If that was the case, it would have been great already but these tools can’t even do that. They frequently make mistake repeating the same solutions available everywhere during their “reasoning” process and fabricates plausible hallucinations which you then have to inspect carefully to catch.

aucisson_masque•7mo ago

That’s the opposite of reasoning tho. Ai bros want to make people believe LLM are smart but they’re not capable of intelligence and reasoning.

Reasoning mean you can take on a problem you’ve never seen before and think of innovative ways to solve it.

LLM can only replicate what is in its data, it can in no way think or guess or estimate what will likely be the best solution, it can only output a solution based on a probability calculation made on how frequent it has seen this solution linked to this problem.

labrador•7mo ago

You're assuming we're saying LLMs can't reason. That's not what we're saying. They can execute reasoning-like processes when they've seen similar patterns, but this breaks down when true novel reasoning is required. Most people do the same thing. Some poeple can come up with novel solutions to new problems, but LLMs will choke. Here's an example:

Prompt: "Let's try a reasoning test. Estimate how many pianos there are at the bottom of the sea."

I tried this on three advanced AIs* and they all choked on it without further hints from me. Claude then said:

    Roughly 3 million shipwrecks on ocean floors globally
    Maybe 1 in 1000 ships historically carried a piano (passenger ships, luxury vessels)
    So ~3,000 ships with pianos sunk
    Average maybe 0.5 pianos per ship (not all passenger areas had them)
    Estimate: ~1,500 pianos

*Claude Sonnet 4, Google Gemini 2.5 and GPT 4o

Jabrov•7mo ago

That seems like a totally reasonable response ... ?

labrador•7mo ago

I think you missed the part where I had to give them hinits to solve it. All 3 initially couldn't or refused saying it was not a real problem on their first try.

ej88•7mo ago

Can you share the chats? I tried with o3 and it gave a pretty reasonable answer on the first try.

https://chatgpt.com/share/684e02de-03f0-800a-bfd6-cbf9341f71...

gjm11•7mo ago

FWIW I just gave a similar question to Claude Sonnet 4 (I asked about something other than pianos, just in case they're doing some sort of constant fine-tuning on user interactions[1] and to make it less likely that the exact same question is somewhere in its training data[2]) and it gave a very reasonable-looking answer. I haven't tried to double-check any of its specific numbers, some of which don't match my immediate prejudices, but it did the right sort of thing and considered more ways for things to end up on the ocean floor than I instantly thought of. No hints needed or given.

[1] I would bet pretty heavily that they aren't, at least not on the sort of timescale that would be relevant here, but better safe than sorry.

[2] I picked something a bit more obscure than pianos.

Jabrov•7mo ago

You must be on the wrong side of an A/B test or very unlucky.

Because I gave your exact prompt to o3, Gemini, and Claude and they all produced reasonable answers like above on the first shot, with no hints, multiple times.

FINDarkside•7mo ago

What does "choked on it" mean for you? Gemini 2.5 pro gives this, even estimating what amouns of those 3m ships that sank after pianos became common item. Not pasting the full reasoning here since it's rather long.

Combining our estimates:

From Shipwrecks: 12,500 From Dumping: 1,000 From Catastrophes: 500 Total Estimated Pianos at the Bottom of the Sea ≈ 14,000

Also I have to point out that 4o isn't a reasoning model and neither is Sonnet 4, unless thinking mode was enabled.

dialup_sounds•7mo ago

How much of that is inability to reason vs. being trained to avoid making things up?

kgeist•7mo ago

GPT4o isn't considered an "advanced" LLM at this point. It doesn't use reasoning.

I gave your prompt to o3 pro, and this is what I got without any hints:

  Historic shipwrecks (1850 → 1970)
  • ~20 000 deep water wrecks recorded since the age of steam and steel  
  • 10 % were passenger or mail ships likely to carry a cabin class or saloon piano   
  • 1 piano per such vessel 20 000 × 10 % × 1 ≈ 2 000

  Modern container losses (1970 → today)
  • ~1 500 shipping containers lost at sea each year  
  • 1 in 2 000 containers carries a piano or electric piano   
  • Each piano container holds ≈ 5 units   
  • 50 year window 1 500 × 50 / 2 000 × 5 ≈ 190

  Coastal disasters (hurricanes, tsunamis, floods)
  • Major coastal disasters each decade destroy ~50 000 houses  
  • 1 house in 50 owns a piano   
  • 25 % of those pianos are swept far enough offshore to sink and remain (50 000 / 50) × 25 % × 5 decades ≈ 1 250

  Add a little margin for isolated one offs (yachts, barges, deliberate dumping): ≈ 300

  Best guess range: 3 000 – 5 000 pianos are probably resting on the seafloor worldwide.

yen223•7mo ago

The difference between o3 and o4-mini is so substantial I think this is the reason why people can't agree on how capable LLMs are nowadays.

theendisney•7mo ago

The correct answer is: I'm sorry, I don't have time for this.

woopsn•7mo ago

That alone would be revolutionary - but still aspirational for now. The other day Gemini mixed up left and right on me in response to basic textbook problem.

ummonk•7mo ago

Most of the objections and their counterarguments seem like either poor objections (e.g. ad hominem against the first listed author) or seem to be subsumed under point 5. It’s annoying that most of this post focuses so much effort on discussing most of the other objections when the important discussion is the one to be had in point 5:

I.e. to what extent are LLMs able to reliably make use of writing code or using logic systems, and to what extent does hallucinating / providing faulty answers in the absence of such tool access demonstrate an inability to truly reason (I’d expect a smart human to just say “that’s too much” or “that’s beyond my abilities” rather than do a best effort faulty answer)?

thomasahle•7mo ago

> I’d expect a smart human to just say “that’s too much” or “that’s beyond my abilities” rather than do a best effort faulty answer)?

That's what the models did. They gave the first 100 steps, then explained how it was too much to output all of it, and gave the steps one would follow to complete it.

They were graded as "wrong answer" for this.

---

Source: https://x.com/scaling01/status/1931783050511126954?t=ZfmpSxH...

> If you actually look at the output of the models you will see that they don't even reason about the problem if it gets too large: "Due to the large number of moves, I'll explain the solution approach rather than listing all 32,767 moves individually"

> At least for Sonnet it doesn't try to reason through the problem once it's above ~7 disks. It will state what the problem and the algorithm to solve it and then output its solution without even thinking about individual steps.

emp17344•7mo ago

Why should we trust a guy with the following twitter bio to accurately replicate a scientific finding?

>lead them to paradise

>intelligence is inherently about scaling

>be kind to us AGI

Who even is this guy? He seems like just another r/singularity-style tech bro.

andy12_•7mo ago

Not to be that guy but... clearly Ad Hominem.

sponnath•7mo ago

Didn't they start failing well before they hit token limits? I'm not sure what the point the source you linked to is trying to make.

thomasahle•7mo ago

OP said:

> I’d expect a smart human to just say “that’s too much” or “that’s beyond my abilities” rather than do a best effort faulty answer)?

And that's what the models did.

This is a good answer from the model. Has nothing to do with token limits.

sponnath•7mo ago

I agree that it's the right answer if it truly doesn't know but I don't think that changes the fact that it failed regardless.

FINDarkside•7mo ago

I don't think most of the objections are poor at all apart from 3, it's this article that seems to make lots of strawmans. Especially the first objection is often heard because people claim "this paper proves LLMs don't reason". The author moves goalposts and is arguing against about whether LLMs lead to AGI, which is already a strawman for those arguments. And in addition, he even seems to misunderstand AGI, thinking it's some sort of super intelligence ("We have every right to expect machines to do things we can’t"). AI that can do everything at least as good as average human is AGI by definition.

It's especially weird argument considering that LLMs are already ahead of humans in Tower of Hanoi. I bet average person will not be able to "one-shot" you the moves to 8 disk tower of Hanoi without writing anything down or tracking the state with the actual disks. LLMs have far bigger obstacles to reaching AGI though.

5 is also a massive strawman with the "not see how well it could use preexisting code retrieved from the web" as well, given that these models will write code to solve these kind of problems even if you come up with some new problem that wouldn't exist in its training data.

Most of these are just valid the issues in the paper. They're not supposed to be some kind of arguments that try to make everything the paper said invalid. The paper didn't really even make any bold claims, it only concluded LLMs have limitations in its reasoning. It had a catchy title and many people didn't read past that.

ummonk•7mo ago

I'm more saying that points 1 and 2 get subsumed under point 5 - to the extent that existing algorithms / logical systems for solving such problems are written by humans, an AGI wouldn't need to match the performance of those algorithms / logical systems - it would merely need to be able to create / use such algorithms and systems itself.

You make a good point though that the question of whether LLMs reason or not should not be conflated with the question of whether they're on the pathway to AGI or not.

FINDarkside•7mo ago

Right, I agree there. Also that's something LLMs can already do. If you give the problem to ChatGPT o3 model, it will actually write python code, run it and give you the solution. But I think points 1 and 2 are still very valid things to talk about, because while Tower of Hanoi can be solved by writing code that doesn't apply to every problem that would require extensive reasoning.

chongli•7mo ago

It's especially weird argument considering that LLMs are already ahead of humans in Tower of Hanoi

No one cares about Towers of Hanoi. Nor do they care about any other logic puzzles like this. People want AIs that solve novel problems for their businesses. The kind of problems regular business employees solve every single day yet LLMs make a mess of.

The purpose of the Apple paper is not to reveal the fact that LLMs routinely fail to solve these problems. Everyone who uses them already knows this. The paper is an argument for why this happens (lack of reasoning skills).

No number of demonstrations of LLMs solving well-known logic puzzles (or other problems humans have already solved) will prove reasoning. It's not interesting at all to solve a problem that humans have already solved (with working software to solve every instance of the problem).

wohoef•7mo ago

Good article giving some critique to Apple's paper and Gary Marcus specifically.

https://www.lesswrong.com/posts/5uw26uDdFbFQgKzih/beware-gen...

hintymad•7mo ago

Honest question: does the opinion of Gary Marcus still count? His criticism seems more philosophical than scientific. It's hard for me see what he builds or reasons to get to his conclusions.

zer00eyz•7mo ago

> seems more philosophical than scientific

I think this is a fair assessment but reason, and intelligence dont really have an established control or control group. If you build a test and say "Its not intelligent because it can't..." and someone goes out and add's that feature in is it suddenly now intelligent?

If we make a physics break through tomorrow is there any LLM that is going to retain that knowledge permanently as part of its core or will they all need to be re-trained? Can we make a model that is as smart as a 5th grader without shoving the whole corpus of human knowledge into it, folding it over twice and then training it back out?

The current crop of tech doesn't get us to AGI. And the focus to make it "better" is for the most part a fools errand. The real winners in this race are going to be those who hold the keys to optimization: short retraining times, smaller models (with less upfront data), optimized for lower performance systems.

hintymad•7mo ago

> The current crop of tech doesn't get us to AGI

I actually agree with this. Time and again, I can see that LLMs do not really understand my questions, let alone being able to perform logical deductions beyond in-distribution answers. What I’m really wondering is whether Marcus’s way of criticizing LLMs is valid.

YeGoblynQueenne•7mo ago

I don't know but the standard reply to all of Gary Marcus' criticisms is that they don't count because it's Gary Marcus, which of course is a big honking ad-hominem.

Workaccount2•7mo ago

What gets me, and the author talks about it in the post, is that people will readily attribute correct answers to "its in the training set" but nobody says anything about incorrect answers that are in the training set. LLMs get stuff in the training set wrong all the time, but nobody uses it as evidence that it probably can't lean too hard on it's memorization for complex questions it does get right.

It puts LLMs in an impossible position; if they are right, they memorized it, if they are wrong, they cannot reason.

Jensson•7mo ago

> It puts LLMs in an impossible position; if they are right, they memorized it, if they are wrong, they cannot reason.

Both of those can be true at the same time though. They memorize a lot of things, but its fuzzy and when they remember wrong they cannot fix it via reasoning.

Workaccount2•7mo ago

It's more than fuzzy, they are packing exabytes, perhaps zetabytes of training data into a few terabytes. Without any reasoning ability it must be divine intervention that they ever get anything right...

chongli•7mo ago

It is divine intervention if you believe human minds are the product of a divine creator. Most of the attribution of miraculous reasoning ability on the part of LLMs I would attribute to pareidolia on the part of their human evaluators. I don’t think we’re much closer at all to having an AI which can replace an average minimum wage full-time worker, who will work largely unsupervised but ask their manager for help when needed, without screwing anything up.

We have LLMs that can produce copious text but cannot stop themselves from attempting to solve a problem they have no idea how to solve and making a mess of things as a result. This puts an LLM on the level of an overly enthusiastic toddler at best.

daveguy•7mo ago

LLMs are trained with hundreds of terabytes of data to a few petabyte at most. You are off by 3 to 6 orders of magnitude in your estimate of training data. They aren't literally trained on "all the data of the internet". That would be a divergent nightmare. Catastrophic forgetting is still a problem with neural networks and ML algorithms in general. Humans are probably trained on less than half an exabyte of data given the ~1Gbps of sensory data we receive in a lifetime. That's still ~20 petabytes of data by age 5. A 400B parameter LLM with 100 examples per parameter would equal about 640 TB (F16 parameters) of training data. That's the order of magnitude of current models.

thrwaway55•7mo ago

Do you hypothese that they see more wrong examples then right? Why is there concern about model collapse if they are reasoning and can sort it out, why does the data even need to be scrubbed before training?

How many r's really are in Strawberry?

godelski•7mo ago

  > this is a preprint that has not been peer reviewed.

This conversation is peer review...

You don't need a conference for something to be peer reviewed, you only need... peers...

In fact, this paper is getting more peer review than most works. Conferences are notoriously noisy as reviewers often don't care and are happy to point out criticisms. All works have valid criticisms... Finding criticisms is the easy part. The hard part is figuring out if these invalidate the claims or not.

brcmthrowaway•7mo ago

In classic ML, you never evaluste against data that was in the training set. In LLMs, everything is the training set. Doesn't this seem wrong?

thomasahle•7mo ago

> 1. Humans have trouble with complex problems and memory demands. True! But incomplete. We have every right to expect machines to do things we can’t. [...] If we want to get to AGI, we will have to better.

I don't get this argument. The paper is about "whether RLLMs can think". If we grant "humans make these mistakes too", but also "we still require this ability in our definition of thinking", aren't we saying "thinking in humans is a illusion" too?

autobodie•7mo ago

Agree. Both sides of the argument are unsatisfying. They seem like quantitative answers to a qualitative question.

serbuvlad•7mo ago

"Have we created machines that can do something qualitatevely similar to that part of us that can correlate known information and pattern recognition to produce new ideas and solutions to problems -- that part we call thinking?"

I think the answer to this question is certainly "Yes". I think the reason people deny this is because it was just laughably easy in retrospect.

In mid-2022 people were like. "Wow this GPT3 thing generates kind of coherent greentexts"

Since then really only we got: larger models, larger models, search, agents, larger models, chain-of-thought and larger models.

And from a novelty toy we got a set of tools that at the very least massively increase human productivity in a wide range of tasks and certainly pass any Turing test.

Attention really was all you needed.

But of course, if you ask a buddhist monk, he'll tell you we are attention machines, not computation machines.

He'll also tell you, should you listen, that we have a monkey in our mind that is constantly producing new thoughts. This monkey is not who we are, it's an organ. It's thoughts are not our thoughts. It's something we perceive. And that we shouldn't identify with.

Now we have thought-genrating-monkeys with jet engines and adrenaline shots.

This can be good. Thought-genrating-monkeys put us on the moon and wrote Hamlet and the Oddesy.

The key is to not become a slave to them. To realize that our worth consists not in our ability to think. And that we are more than that.

autobodie•7mo ago

> The key is to not become a slave to them. To realize that our worth consists not in our ability to think. And that we are more than that.

I cannot afford to consider whether you are right because I am a slave to capital, and therefore may as well be a slave to capital's LLMs. The same goes for you.

serbuvlad•7mo ago

I am not a slave to capital. I am a slave to the harsh nature of the world.

I get too hot in summer and too cold in winter. I die of hunger. I am harassed by critters of all sorts.

And when my bed breaks, to keep my fragile spine from straining at night, I _want_ some trees to be cut, some mattresses to be provisioned, some designers to be provisioned etc. And capital is what gets me that, from people I will never meet, who wouldn't blink once if I died tomorrow.

LinXitoW•7mo ago

Considering capitalism is a very new phenomenon in human history, how do you think people survived and thrived for the other 248000 years? It's as ludicrous to believe that capitalism is some kind of force of nature as it is to believe kings were chosen by god.

serbuvlad•7mo ago

That depends on how you define your terms. A pro-capital laissez-faire policy is new, sure.

But the first civilizations in the world around 3000BC had trade, money, banking, capital accumulation, divison of labour etc.

TeMPOraL•7mo ago

> how do you think people survived and thrived for the other 248000 years?

In small tribes, where everyone knew everyone intimately because they lived together, and everything was managed by feels.

Things like rules, laws, money, banking, hierarchies, well-defined private vs. public ownership, are all things that came with scale, because interpersonal relationships fail to keep group cohesion once it reaches more than ~100 people.

viccis•7mo ago

>I think the answer to this question is certainly "Yes".

It is unequivocally "No". A good joint distribution estimator is always by definition a posteriori and completely incapable of synthetic a priori thought.

nerdponx•7mo ago

That doesn't seem true to me at all. Let's say you fit y=c+bx+ax^2 on the domain -10,10 with 1000 data points uniformly distributed along x and with no more than 1% noise in observed y. Your model will be pretty damn good and absolutely will be able to generate "synthetic a priori" y outputs for any given x within the domain.

Now let's say you didn't know the true function and had to use a neural network instead. You would probably still get a great result in the sense of generating "new" outputs that are not observed in the training data, as long as they are within or reasonably close to the original domain.

LLMs are that. With enough data and enough parameters and the right inductive bias and the right RLHF procedure etc, they are getting increasingly good at estimating a conditional next token distribution given the context. If by "synthetic" you mean that an LLM can never generate a truly new idea that was not in it's training data, then that becomes the question of what the "domain" of the data really is.

I'm not convinced that LLMs are strictly limited to ideas that they have "learned" in their data. Before LLMs, I don't think people realized just how much pattern and structure there was in human thought, and how exposed it was through text. Given the advances of the last couple of years, I'm starting to come around to the idea that text contains enough instances of reasoning and thinking that these models might develop some kind of ability to do something like reasoning and thinking simply because they would have to in order to continue decreasing validation loss.

I want to be clear that I am not at all an AI maximalist, and the fact that these things are built largely on copyright infringement continues to disgust me, as do the growing economic and environmental externalities and other problems surrounding their use and abuse. But I don't think it does any good to pretend these things are dumber than they are, or to assume that the next AI winter is right around the corner.

viccis•7mo ago

>Your model will be pretty damn good and absolutely will be able to generate "synthetic a priori" y outputs for any given x within the domain.

You don't seem to understand what synthetic a priori means. The fact that you're asking a model to generate outputs based on inputs means it's by definition a posteriori.

>You would probably still get a great result in the sense of generating "new" outputs that are not observed in the training data, as long as they are within or reasonably close to the original domain.

That's not cognition and has no epistemological grounds. You're making the assumption that better prediction of semiotic structure (of language, images, etc.) results in better ability to produce knowledge. You can't model knowledge with language alone, the logical positivists found that out to their disappointment a century or so ago.

For example, I don't think you adequately proved this statement to be true:

>they would have to in order to continue decreasing validation loss

This works if and only if the structure of knowledge lies latently beneath the structure of semiotics. In other words, if you can start identifying the "shape" of the distribution of language, you can perturb it slightly to get a new question and expect to get a new correct answer.

serbuvlad•7mo ago

The human mind is an estimator too.

The fact that the human mind can think in concepts, images AND words, and then compresses that into words for transmission, wheras LLMs think directly in words, is no object.

If you watch someone reach a ledge, your mind will generate, based on past experience, a probabilistic image of that person falling. Then it will tie that to the concept of problem (self-attention) and start generating solutions, such as warning them or pulling them back etc.

LLMs can do all this too, but only in words.

viccis•7mo ago

>LLMs think

Quick aside here: They do not think. They estimate generative probability distributions over the token space. If there's one thing I do agree with Dijkstra on, it's that it's important not to anthropomorphize mathematical or computing concepts.

As far as the rest of your comment, I generally agree. It sort of fits a Kantian view of epistemology, in which we have sensibility giving way to semiotics (we'll say words and images for simplicity) and we have concepts that we understand by a process of reasoning about a manifold of things we have sensed.

That's not probabilistic though. If we see someone reach a ledge and take a step over it, then we are making a synthetic a priori assumption that they will fall. It's synthetic because there's nothing about a ledge that means the person must fall. It's possible that there's another ledge right under we can't see. Or that they're in zero gravity (in a scifi movie maybe). Etc. It's a priori because we're making this statement not based on what already happened but rather what we know will happen.

We accomplish this by forming concepts such as "ledge", "step", "person", "gravity", etc., as we experience them until they exist in our mind as purely rational concepts we can use to reason about new experiences. We might end up being wrong, we might be right, we might be right despite having made the wrong claims (maybe we knew he'd fall because of gravity, however there was no gravity but he ended up being pushed by someone and "falling" because of it, this is called a "Gettier problem"). But our correctness is not a matter of probability but rather one of how much of the situation we understand and how well we reason about it.

Either way, there is nothing to suggest that we are working from a probability model. If that were the case, you wind up in what's called philosophical skepticism [1], in which, if all we are are estimation machines based on our observances, how can we justify any statement? If every statement must have been trained by a corresponding observation, then how do we probabilistically model things like causality that we would turn to to justify claims?

Kant's not the only person to address this skepticism, but he's probably the most notable to do so, and so I would challenge you to justify whether the "thinking" done by LLMs has any analogue to the "thinking" done using the process described in my second paragraph.

[1] https://en.wikipedia.org/wiki/Philosophical_skepticism#David...

serbuvlad•7mo ago

But I do not think humans think like that by default.

When I spill a drink, I don't think "gravity". That's too slow.

And I don't think humans are particularly good at that kind of rational thinking.

viccis•7mo ago

>When I spill a drink, I don't think "gravity". That's too slow.

I think you do, you just don't need to notice it. If you spilled it in the International Space Station, you'd probably respond differently even if you didn't have to stop and contemplate the physics of the situation.

ac29•7mo ago

I think they may have been referring to the fact that in the case of a spilled drink there's a shortcut from the sensory input to a motor output. Maybe you reach for the falling cup, maybe you back away to not get spilled on. These don't really require the conscious mind at all.

viccis•7mo ago

I don't think that we need to be aware of the reasoning our minds are doing for it to constitute reasoning.

mofeien•7mo ago

> We accomplish this by forming concepts such as "ledge", "step", "person", "gravity", etc., as we experience them until they exist in our mind as purely rational concepts we can use to reason about new experiences.

So we receive inputs from the environment and cluster them into observations about concepts, and form a collection of truth statements about them. Some of them may be wrong, or apply conditionally. These are probabilistic beliefs learned a posteriori from our experiences. Then we can do some a priori thinking about them with our eyes and ears closed with minimal further input from the environment. We may generate some new truth statements that we have not thought about before (e. g. "stepping over the ledge might not cause us to fall because gravity might stop at the ledge") and assign subjective probabilities to them.

This makes the a priori seem to always depend on previous a posterioris, and simply mark the cutoff from when you stop taking environmental input into account for your reasoning within a "thinking session". Actually, you might even change your mind mid-reasoning process based on the outcome of a thought experiment you perform which you use to update your internal facts collection. This would give the a priori reasing you're currently doing an even stronger a posteriori character. To me, these observations above basically dissolve the concept of a priori thinking.

And this makes it seem like we are very much working from probabilistic models, all the time. To answer how we can know anything: If a statement's subjective probability becomes high enough, we qualify it as a fact (and may be wrong about it sometimes). But this allows us to justify other statements (validly, in ~ 1-sometimes of cases). Hopefully our world model map converges towards a useful part of the territory!

corimaith•7mo ago

Do you think language is sufficient to model reality (not just physical, but abstract) here?

I think not, we can get close, but there exists problems and situations beyond that, especially in mathematics and philosophy. And I don't a visual medium or combination of is sufficient either, there's a more fundamental, underlying abstract structure that we use to model reality.

TeMPOraL•7mo ago

> Do you think language is sufficient to model reality (not just physical, but abstract) here?

It's sufficient to the level needed for human intelligence. We're a product of evolution, and we only need as much abstraction as it's required for operational reasons. Modeling reality in a deep, abstract way is something we want to, but not something that was required for our minds to evolve, nor for us to create civilization as it is today.

viccis•7mo ago

> Do you think language is sufficient to model reality (not just physical, but abstract) here?

After much time trying to accomplish this during the 20th century, the answer was as resounding "no" [1]

[1] https://en.wikipedia.org/wiki/Logical_positivism#Decline_and...

FINDarkside•7mo ago

Agreed. But also his point about AGI is incorrect. AI that will perform on the level of average human in every task is AGI by definition.

simonw•7mo ago

That very much depends on which AGI definition you are using. I imagine there are a dozen or so variants out there. See also "AI" and "agents" and (apparently) "vibe coding" and pretty much every other piece of jargon in this field.

FINDarkside•7mo ago

I think it's very widely accepted definition and there's really no competing definitions either as far as I know. While some people might think AGI means superintelligence, it's only because they've heard the term but never bothered to look up what it means.

simonw•7mo ago

OpenAI: https://openai.com/index/how-should-ai-systems-behave/#citat...

"By AGI, we mean highly autonomous systems that outperform humans at most economically valuable work."

AWS: https://aws.amazon.com/what-is/artificial-general-intelligen...

"Artificial general intelligence (AGI) is a field of theoretical AI research that attempts to create software with human-like intelligence and the ability to self-teach. The aim is for the software to be able to perform tasks that it is not necessarily trained or developed for."

DeepMind: https://arxiv.org/abs/2311.02462

"Artificial General Intelligence (AGI) is an important and sometimes controversial concept in computing research, used to describe an AI system that is at least as capable as a human at most tasks. [...] We argue that any definition of AGI should meet the following six criteria: We emphasize the importance of metacognition, and suggest that an AGI benchmark should include metacognitive tasks such as (1) the ability to learn new skills, (2) the ability to know when to ask for help, and (3) social metacognitive abilities such as those relating to theory of mind. The ability to learn new skills (Chollet, 2019) is essential to generality, since it is infeasible for a system to be optimized for all possible use cases a priori [...]"

The key difference appears to be around self-teaching and meta-cognition. The OpenAI one shortcuts that by focusing on "outperform humans at most economically valuable work", but others make that ability to self-improve key to their definitions.

Note that you said "AI that will perform on the level of average human in every task" - which disagrees very slightly with the OpenAI one (they went with "outperform humans at most economically valuable work"). If you read more of the DeepMind paper it mentions "this definition notably focuses on non-physical tasks", so their version of AGI does not incorporate full robotics.

bluefirebrand•7mo ago

Doesn't the "G" in AGI stand for "General" as in "Generally Good at everything"?

neom•7mo ago

I think the G is what really screws things up. I thought it was, as good as the general human, but upon googling it has a defined meaning among researchers. There appears to be confusion all over the place tho.

General-Purpose (Wide Scope): It can do many types of things.

Generally as Capable as a Human (Performance Level): It can do what we do.

Possessing General Intelligence (Cognitive Mechanism): It thinks and learns the way a general intelligence does.

So, for researchers, general intelligence is characterized by: applying knowledge from one domain to solve problems in another, adapting to novel situations without being explicitly programmed for them, and: having a broad base of understanding that can be applied across many different areas.

adastra22•7mo ago

Yes, but “good at” here has a very limited, technical meaning, which can be oversimplified as “better than random chance.”

If something can be better than random chance in any arbitrary problem domain it was not trained on, that is AGI.

RugnirViking•7mo ago

That raises the plausible question; are there problem domains where humans cannot do better than random chance, given repeated attempts?

adastra22•7mo ago

Bitcoin mining.

mathgradthrow•7mo ago

the average human is good at something, and sucks at almost everything. Human performance at chess and average performance at chess differ by 7 orders of magnitude.

datadrivenangel•7mo ago

Your standard model of human needs a little bit of fine tuning for most games.

math_dandy•7mo ago

I was hoping the accepted definition would not use humans as a baseline, rather that humans would be an (the) example of AGI.

bastawhiz•7mo ago

The A in AGI is "artificial" which sort of precludes humans from being AGI (unless you have a very unconventional belief about the origin of humans).

Since there's not really a whole lot of unique examples of general intelligence out there, humans become a pretty straightforward way to compare.

xeonmc•7mo ago

> unless you have a very unconventional belief about the origin of humans

No so unconventional in many cultures.

bastawhiz•7mo ago

Certainly many cultures and religions believe in some flavor of intelligent design, but you could argue that if the natural world (for what we generally regard as "the natural world") is created by the same entity or entities that created humans, that doesn't make humans artificial. Ignoring the metaphysical (souls and such) I'm struggling to think of a culture that believes the origin of humans isn't shared by the world.

In this case, I was thinking of unusual beliefs like aliens creating humans or humans appearing abruptly from an external source such as through panspermia.

thomasahle•7mo ago

The argument of (1) doesn't really have anything to do with humans or antromorphising. We're not even discussing AGI, we're just talking about the property of "thinking".

If somebody claims "computers can't do X, hence they can't think". A valid counter argument is "humans can't do X either, but they can think."

It's not important for the rebuttal that we used humans. Just that there exists entities that don't have property X, but are able to think. This shows X is not required for our definition of "thinking".

jltsiren•7mo ago

AGI should perform on the level of an experienced professional in every task. The average human is useless for pretty much everything but capable of learning to perform almost any task, given enough motivation and effort.

Or perhaps AGI should be able to reach the level of an experienced professional in any task. Maybe a single system can't be good at everything, if there are inherent trade-offs in learning to perform different tasks well.

godelski•7mo ago

For comparison, the average person can't print Hello World in python. Your average programmer (probably) can.

It's surprisingly simple to be above average in most tasks. Which people often confuse with having expertise. It's probably pretty easy to get into the 80th percentile of most subjects. That won't make you the 80th percentile of people that do the thing, but most people don't. I'd wager 80th percentile is still amateur.

MoonGhost•7mo ago

> The average human is useless for pretty much everything but capable of learning to perform almost any task

But only the limited number of tasks per human.

> Or perhaps AGI should be able to reach the level of an experienced professional in any task.

Even if it performs just better than untrained human but on any task this will be superhuman level. As no human can do it.

jltsiren•7mo ago

The G in AGI stands for "general", not for "superhuman". An intelligence that can't learn to perform information processing and decision-making tasks people routinely do does not seem very general to me.

MoonGhost•7mo ago

Here is the big question: should it be equal or better then every single person? If we assume that every healthy person is 'generally intelligent' then probably this is a benchmark. Because not every person can do the tasks that other persons do routinely. Probably we shouldn't demand it from AGI either. At least not from a single model. But it makes sense to request that specialized model can be created (or trained, fine tuned) for every task humans can do.

usef-•7mo ago

Yes. I wonder if he was thinking of ASI, not AGI

adastra22•7mo ago

Most people are. One of my pet peeves is that people falsely equate AGI with ASI, constantly. We have had full AGI for years now. It is a powerful tool, but not what people tend to think of as god-like “AGI.”

Nevermark•7mo ago

> We have had full AGI for years now

Models still have extreme limits relative to humans. Context size and reasoning depths, being the two most obvious. A third being their inability to incorporate new information with as little effort as humans do, without creating unintended conflicts across previously learned information.

But they vastly exceed human capabilities in other ways. The most obvious, being their ability to do shallow reasoning incorporating information from virtually any combination out of the vast number of topics that humans find useful or interesting. Another being their ability to by default produce discourse with such high written organization and grammatical quality.

For now, they are artificial "better at different things" intelligences.

gylterud•7mo ago

ASI meaning Artificial Super Intelligence, I guess.

pzo•7mo ago

Why AGI need to be even as good as average human. If you get someone with 80 IQ is still smart enough to reason and do plenty of menial tasks. Also not sure why AGI need to be as good in every task? Average human will excel others at few tasks and sux terribly in many others.

Someone•7mo ago

Because that’s how AGI is defined. https://en.wikipedia.org/wiki/Artificial_general_intelligenc...: “Artificial general intelligence (AGI)—sometimes called human‑level intelligence AI—is a type of artificial intelligence that would match or surpass human capabilities across virtually all cognitive tasks”

But yes, you’re right that software needs not be AGI to be useful. Artificial narrow intelligence or weak AI (https://en.wikipedia.org/wiki/Weak_artificial_intelligence) can be extremely useful, even something as narrow as a services that transcribes speech and can’t do anything else.

dvfjsdhgfv•7mo ago

The Hanoi Towers example demonstrates that SOTA RLMs struggle with tasks a pre-schooler solves.

The implication here is that they excel at things that occur very often and are bad at novelty. This is good for individuals (by using RLMs I can quickly learn about many other aspects of human body of knowledge in a way impossible/inefficient with traditional methods) but they are bad at innovation. Which, honestly, is not necessarily bad: we can offload lower-level tasks[0] to RLMs and pursue innovation as humans.

[0] Usual caveats apply: with time, the population of people actually good at these low-level tasks will diminish, just as we have very few Assembler programmers for Intel/AMD processors.

TeMPOraL•7mo ago

> The Hanoi Towers example demonstrates that SOTA RLMs struggle with tasks a pre-schooler solves.

Find me one that can solve it entirely in their head without touching the actual thing and externalizing state.

whatagreatboy•7mo ago

the real ability of intelligence is to correct mistakes in a gradual and consistent way.

briandw•7mo ago

Humans use tools to extend their abilities. LLM can do the same. In this paper they didn’t allow tool use. When others gave the tower of hanoi task to llms with tool use, like a python env, they were able to complete the task.

xienze•7mo ago

But the Tower of Hanoi can be solved without "tools" by humans, simply by understanding the problem, thinking about the solution, and writing it out. Having the LLM shell out to a Python example that it "wrote" (or rather, "pasted" since surely a Python solution to the Tower of Hanoi was part of its training set) is akin to a human Googling "program to solve Tower of Hanoi", copy-pasting and running the solution. Yes the LLM has "reasoned" that the solution to the problem is call out to a solution that it "knows" is out there, but that's not really "thinking" about how to solve a problem in the human sense.

What happens when some novel Tower of Hanoi-esque puzzle is presented and there's nothing available in its training set to reference as an executable solution? A human can reason about and present a solution, but an LLM? Ehh...

DiogenesKynikos•7mo ago

LLMs are perfectly capable of writing code to solve problems that are not in their training set. I ask LLMs to write code for niche problems that you won't find answers to just by Googling all the time. The LLMs usually get it right.

xienze•7mo ago

> LLMs are perfectly capable of writing code to solve problems that are not in their training set.

Examples of these problems? You'll probably find that they're simply compositions of things already in the training set. For example, you might think that "here's a class containing an ID field and foobar field. Make a linked list class that stores inserted items in reverse foobar order with the ID field breaking ties" is something "not in" the training set, but it's really just a composition of the "make a linked list class" and "sort these things based on a field" problems.

DiogenesKynikos•7mo ago

What you're describing is successful generalization from the training dataset, also called "understanding" by laypeople.

Workaccount2•7mo ago

The problem with this is that anything presented can be claimed to be in the training set, which is likely a zetebyte in size if not larger. However the counter-factual, the LLM failing a problem that is provably in it's training set (there are many), seems to carry no weight.

procgen•7mo ago

> that they're simply compositions of things already in the training set

Yes, knowledge is compositional. This is just as true for humans as it is for machines.

saberience•7mo ago

That’s exactly what humans do though lol.

We reason about things based on our training data. We have a hard time or impossible time reasoning about things we haven’t trained on.

Ie: a human with no experience of board games cannot reason about chess moves. A human with no math knowledge cannot reason about math problems.

How would expect an LLM to reason about something with no training data?

YeGoblynQueenne•7mo ago

>> Ie: a human with no experience of board games cannot reason about chess moves. A human with no math knowledge cannot reason about math problems.

Then how did the first humans solve math and chess problems, if there were none around solved to give them examples of how to solve them in the first place?

TeMPOraL•7mo ago

Incrementally, by tiny steps. Including a lot of doing first, then realizing later this is relevant to some chess/math thing.

Also the idea of "problems" like "chess problems" and "math problems" is itself constructed. Chess wasn't created by stacking together enough "chess problems" until they turned into a game - it was invented and tuned as a game for a long time before someone thought about distilling "problems" from it, in order to aid learning the game; from there, it also spilled out into space of logical puzzles in general.

This is true of every skill, too. You first have people who master something by experience, and then you have others who try to distill elements of that skill into "problems" or "exercise regimes" or such, in order to help others reach mastery quicker. "Problems" never come first.

Also: most "problems" are constructed around a known solution. So another answer to "how did the first humans solve" them is simply, one human back-constructed a problem around a solution, and then gave it to a friend to solve. The problem couldn't be too hard either, as it's no fun to not be able to solve it, or to require too much hints. Hence, tiny increments.

dzamo_norton•7mo ago

And if we left a population of SOTA LRMs on an island for long time, would we return to find that they had done the same?

dfawcus•7mo ago

Maybe they can, but what the human is able to do is examine the tower of Hanoi problem, and the derive the general rule for solving (odd or even number of disks).

Based upon that comprehension, we then need little working memory (tokens) to solve the problem, it just becomes tedious to execute the algorithm.. But the algorithm was derived after considering the first 3 or 4 cases.

Whereas for the moment, LLMS are just pattern matching; whereas we do the pattern match, then derive the generalised rule.

saberience•7mo ago

LLMs do that too though lol.

The Tower of Hanoi problem is terrible example for somehow suggesting humans are superior.

Firstly, there are plenty of humans who can’t solve this problem even for 3 disks, let alone 6 or 7. Secondly, LLMs can both give you general instructions to solve for any case and they can write out exhaustive move lists too.

Anyway, the fact that there are humans who cannot do Tower of Hanoi already rules it out as a good test of general intelligence anyway. We don’t say that a human doesn’t have “general intelligence” if they cannot solve Towers of Hanoi, so why then would it be a good test for LLM general intelligence?

jen20•7mo ago

> The LLMs usually get it right.

This has not been my experience. They might do something in the right direction. They might write complete garbage. But the amount of time an LLM writes code that compiles and executes first time is vanishingly few for me. Perhaps I'd have better luck if I were doing things which weren't _actual_ niche problems.

jcz_nz•7mo ago

Actually, my experience at least that when dealing with novel problems LLMs fail miserably. Try accessing uncommon API’s - or areas where you’re unsure an API actually exists (REST against Exchange for admin stuff!). Both ChatGPT and Claude produce nice looking solutions dependent on non-existent libraries. Repeatedly.

eviks•7mo ago

> can be solved without "tools" by humans, simply by understanding the problem,

This already excludes a lot of humans

thomasahle•7mo ago

> But the Tower of Hanoi can be solved without "tools" by humans, simply by understanding the problem, thinking about the solution, and writing it out.

The paper doesn't give any evidence humans are able to do this. And I honestly find it very implausible. Even Gary Marcus admits in (1) that humans would probably make mistakes.

xienze•7mo ago

You are aware that humans created and solved the puzzle in the first place, right? Not sure I understand this line of reasoning that if there are humans in this world incapable of solving some problem then boom, checkmate, LLMs can reason about and understand problems like humans do.

YeGoblynQueenne•7mo ago

>> I don't get this argument.

The argument is that LLMs are computer systems and a computer system that's as bad as a human is less useful than a human.

tim333•7mo ago

To understand the argument you have to allow that the Gary Marcus neural network has a fixed bias towards LLMs being rubbish and so tends to confabulate these sort of things. (Geoffrey Hinton is quite funny on that https://youtu.be/d7ltNiRrDHQ)

thomasahle•7mo ago

> 5. A student might complain about a math exam requiring integration or differentiation by hand, even though math software can produce the correct answer instantly. The teacher’s goal in assigning the problem, though, isn’t finding the answer to that question (presumably the teacher already know the answer), but to assess the student’s conceptual understanding. Do LLM’s conceptually understand Hanoi? That’s what the Apple team was getting at. (Can LLMs download the right code? Sure. But downloading code without conceptual understanding is of less help in the case of new problems, dynamically changing environments, and so on.)

Why is he talking about "downloading" code? The LLMs can easily "write" out out the code themselves.

If the student wrote a software program for general differentiation during the exam, they obviously would have a great conceptual understanding.

autobodie•7mo ago

If the student could reference notes a fraction of the size of the LLM then I would not be convinced.

exe34•7mo ago

I suspect human memory consists of a lot more bits than LLMs encode.

autobodie•7mo ago

I rest my case — the question concerns a quality, not a quantity. These juvenile comparisons are mere excuses.

exe34•7mo ago

Oh we've shifted the goal post to quality now, very good! That does rest the case.

thomasahle•7mo ago

Exactly. If the paper title had been "LLMs are not that great at thinking", nobody would have had an issue.

exe34•7mo ago

I trust you'd have come up with something.

Workaccount2•7mo ago

LLMs are (suspected) a few TB in size.

Gemma 2 27B, one of the top ranked open source models, is ~60GB in size. LLama 405B is about 1TB.

Mind you that they train on likely exabytes of data. That alone should be a strong indication that there is a lot more than memory going on here.

sigotirandolas•7mo ago

I'm not convinced by this argument. You can fit a bunch of books covering up to MSc level maths on less than 100MB. After that point, more books will mostly be redundant information so it doesn't need much more space for maths beyond that.

Similarly TBs of Twitter/Reddit/HN add near zero new information per comment.

If anything you can fit an enormous amount of information in 1MB - we just don't need to do it because storage is cheap.

Workaccount2•7mo ago

People aren't claiming that they are holding textbooks in their model, that would just be even more evidence of reasoning (the LLM would have to reason what textbook to reference, and then extrapolate from the textbook(s) how to solve the problem at hand - pretty much what students in school do; study the textbook and reason from it to answer new test questions)

People are claiming that the models sit on a vast archive of every answer to every question. i.e. when you ask it 92384 x 333243 = ?, the model is just pulling from where it has seen that before. Anything else would necessitate some level of reasoning.

Also in my own experience, people are stunned when they learn that the models are not exabytes in size.

sigotirandolas•7mo ago

I think the more realistic argument is that the model can generalize, but only by learning shortcuts (e.g. how to pattern match a problem to a likely answer) and simple algorithms (e.g. how to propagate carries in a multiplication). And this only appears intelligent because this pattern matching is really good and backed by a huge amount of compressed/memorized answers.

The AI pessimist's argument is that there's a huge gap between the compute required for this pattern matching, and the compute required for human level reasoning, so AGI isn't coming anytime soon.

TeMPOraL•7mo ago

> I think the more realistic argument is that the model can generalize, but only by learning shortcuts (e.g. how to pattern match a problem to a likely answer) and simple algorithms (e.g. how to propagate carries in a multiplication).

This is exactly what humans do too. Anything more and we need to use tools to externalize state and algorithms. Pen and paper are tools too.

sigotirandolas•7mo ago

My thought is that we humans are bad (by computer standards) at arithmetic and memorization because those are not evolutionarily useful on their own.

On the other hand general problem solving is, and so far any attempt to replicate it using computer algorithms has more or less failed. So it must be more complex than just some simple heuristics.

Perhaps the answer is just "more compute" but the argument that "because LLMs somewhat resemble human reasoning, we must be really close!" (instead of 25+ years away) seems wishful thinking, when:

(1) LLMs leverage a much bigger knowledge base than any human can memorize, yet

(2) LLMs fail spectacularly at certain problems and behaviours humans find easy

thomasahle•7mo ago

> On the other hand general problem solving is, and so far any attempt to replicate it using computer algorithms has more or less failed.

Well, this is what the whole debate is about isn't it? Can LRMs do "general problem solving"? Can humans? What exactly does it mean?

sigotirandolas•7mo ago

A lot of it is being able to make reasonable decisions under novel and incomplete information and being able to reflect and refine on their outcome.

LLMs's huge knowledge base covers for their incapacity to reason under incomplete information, but when you find a gap in their knowledge, they are terrible at recovering from it.

RugnirViking•7mo ago

I think I hadn't seen the ai pessimist side laid out like that before,but i feel like i understand it a lot better now. Definitely worth a good think at least. The difference between reproducing an algorithm you've seen before Vs ongoing following the steps required and course correcting, showing understanding. I feel like this can be seen in other high profile failures both in AIs and in humans.

baxtr•7mo ago

The last paragraph:

>Talk about convergence evidence. Taking the SalesForce report together with the Apple paper, it’s clear the current tech is not to be trusted.

starchild3001•7mo ago

We built planes—critics said they weren't birds. We built submarines—critics said they weren't fish. Progress moves forward regardless.

You have a choice: master these transformative tools and harness their potential, or risk being left behind by those who do.

Pro tip: Endless negativity from the same voices won't help you adapt to what's coming—learning will.

sponnath•7mo ago

Toxic positivity is also not good.

doctor_blood•7mo ago

Is there a name for this authorial voice and cadence? I see midwits posting exactly like this on twitter and linkedin; it's insufferable.

clbrmbr•7mo ago

Indeed. Anyone who has built things with Claude Code (Opus 4) and/or something more than one-shot with o3 should be feeling the AGI at this point. Certainly there’s still many limitations, but progress is undoubtedly moving forward.

neoden•7mo ago

> Puzzles a child can do

Certainly, I couldn't solve Hanoi's towers with 8 disks purely in my mind without being able to write down the state of every step or having a physical state in front of me. Are we comparing apples to apples?

ben-schaaf•7mo ago

Writing things down and reading them back is quite literally the only thing LLMs do.

neoden•7mo ago

Generating text into the current context is not the same as writing down. It's the same as having a thought and putting it into short-term memory. An analogy for writing down would be sending something to an MCP server that provides context-independent memory functionality.

RugnirViking•7mo ago

Why without writing down each step? Would you be able to solve it writing each step required in sequence? Thinking between each one? Pretty sure i could, isn't that closer to an LLM?

neoden•7mo ago

I mean I need to offload state of the puzzle being solved from my brain to an external memory device — paper, in this case. Keeping that state in my mind would be much harder. It's like some people can play chess within their minds without a board, but it's obviously not something that everyone can do

RugnirViking•7mo ago

But the LLM can think between writing each token, and indeed can factor in it's own previously written tokens into its answer - thats essentially using a piece of paper and writing stuff down and referring back to it. Thats the whole idea behind thinking models, and they are demonstrably better at many tasks than others.

neoden•7mo ago

> But the LLM can think between writing each token

Writing a token is the thinking itself. Thinking models just write some tokens behind the scene, that's the whole difference.

Illniyar•7mo ago

I find it weird that people are taking the original paper to be some kind of indictment against llms. It's not like LLMs failing at doing Hanoi tower problem at higher levels is new, the paper took an existing method that was done before.

It was simply comparing the effectiveness of reasoning and non reasoning models on the same problem.

jes5199•7mo ago

I think the Apple paper is practically a hack job - the problem was set up in such a way that the reasoning models must do all of their reasoning before outputting any of their results. Imagine a human trying to solve something this way: you’d have to either memorize the entire answer before speaking or come up with a simple pattern you could do while reciting that takes significantly less brainpower - and past a certain size/complexity, it would be impossible.

And this isn’t how LLMs are used in practice! Actual agents do a thinking/reasoning cycle after each tool-use call. And I guarantee even these 6-month-old models could do significantly better if a researcher followed best practices.

Brystephor•7mo ago

Forcing reasoning is analogous to requiring a student to show their work when solving a problem if im understanding the paper correctly.

> you’d have to either memorize the entire answer before speaking or come up with a simple pattern you could do while reciting that takes significantly less brainpower

This part i dont understand. Why would coming up with an algorithm (e.g. a simple pattern) and reciting it be impossible? The paper doesnt mention the models coming up with the algorithm at all AFAIK. If the model was able to come up with the pattern required to solve the puzzles and then also execute (e.g. recite) the pattern, then that'd show understanding. However the models didn't. So if the model can answer the same question for small inputs, but not for big inputs, then doesnt that imply the model is not finding a pattern for solving the answer but is more likely pulling from memory? Like, if the model could tell you fibbonaci numbers when n=5 but not when n=10, that'd imply the numbers are memorized and the pattern for generation of numbers is not understood.

qarl•7mo ago

> The paper doesnt mention the models coming up with the algorithm at all AFAIK.

And that's because they specifically hamstrung their tests so that the LLMs were not "allowed" to generate algorithms.

If you simply type "Give me the solution for Towers of Hanoi for 12 disks" into chatGPT it will happily give you the answer. It will write program to solve it, and then run that program to produce the answer.

But according to the skeptical community - that is "cheating" because it's using tools. Nevermind that it is the most effective way to solve the problem.

https://chatgpt.com/share/6845f0f2-ea14-800d-9f30-115a3b644e...

zoul•7mo ago

This is not about finding the most effective solution, it’s about showing that they “understand” the problem. Could they write the algorithm if it were not in their training set?

boredhedgehog•7mo ago

If that's the point, shouldn't they ask the model to explain the principle for any number of discs? What's the benefit of a concrete application?

johnecheck•7mo ago

Because that would prove absolutely nothing. There are numerous examples of tower of Hanoi explanations in the training set.

elbear•7mo ago

How do you check that a human understood it and not simply memorised different approaches?

YeGoblynQueenne•7mo ago

You ask them to solve several instances of the problem?

godelski•7mo ago

It's hard. But usually we ask several variations and make them show their work.

But a human also isn't an LLM. It is much harder for them to just memorize a bunch of things, which makes evaluation easier. But they also get tired and hungry, which makes evaluation harder ¯\_(ツ)_/¯

elbear•7mo ago

If we're talking about solving an equation, for example, it's not hard to memorize. Actually, that's how most students do it, they memorize the steps and what goes where[1].

But they don't really know why the algorithm works the way it does. That's what I meant by understanding.

[1] In learning psychology there is something called the interleaving effect. What it says is that you solve several problems of the same kind, you start to do it automatically after the 2nd or the 3rd problem, so you stop really learning. That's why you should interleave problems that are solved with different approaches/algorithms, so you don't do things on autopilot.

godelski•7mo ago

Yes, tests fail in this method. But I think you can understand why the failure is larger when we're talking about a giant compression machine. It's not even a leap in logic. Maybe a small step

elbear•7mo ago

I'm not sure what you mean. Btw, I'm not in the field, just have thought a lot about the topic.

qarl•7mo ago

That's an interesting question. It's not the one they are trying to answer, however.

From my personal experience: yes, if you describe a problem without mentioning the name of the algorithm, an LLM will detect and apply the algorithm appropriately.

They behave exactly how a smart human would behave. In all cases.

Too•7mo ago

How can one know that's not coming from the pre-trained data. The paper is trying to evaluate whether the LLM has general problem solving ability.

jsnell•7mo ago

The paper doesn't mention it because either the researchers did not care to check the outputs manually, or reporting what was in the outputs would have made it obvious what their motives were.

When this research has been reproduced, the "failures" on the Tower of Hanoi are the model printing out a bunch of steps, saying there is no point in doing it thousands of times more. And they they'd either output an the algorithm for printing the rest in words or code

godelski•7mo ago

It's really easy to make a billion dollars. Just make a really useful app and sell it. There's no point explaining the rest since it's so trivial.

jsnell•7mo ago

That seems like a complete non sequitur. This is the model explaining the rest. Obviously the explanation is not very interesting since the Towers of Hanoi is not an interesting problem. But that's on the researches for choosing something with a trivial algorithm if their goal was to test reasoning abilities.

godelski•7mo ago

I'm replying to this

  > the model printing out a bunch of steps, saying there is no point in doing it thousands of times more.

jsnell•7mo ago

Ok, but the very next sentence was:

> And they they'd either output an the algorithm for printing the rest in words or code.

So clearly you already knew that your strawman was not relevant. Why try it anyway?

godelski•7mo ago

Because that wasn't the task given to them. It's like giving a student a test and you asking them to solve an equation and they give you the general form. It's incomplete

xtracto•7mo ago

I think the paper got unwanted attention... for a scientific paper. It's like that old paper about a "gravity shielding" podkelnov rings experiment that got publicized by some UK news paper as "scientists find antigravity" and ended up destroying the Russian author's career.

By the way, it seems Appke researchers got inspired by this [1] older chinese paper to get their title. The Chinese author's made a very similar argument, without the experiments. I myself believe Apple experiments are just good curiosities, but don't drive as much of a point as they believe.

[1] https://arxiv.org/abs/2506.02878

akomtu•7mo ago

It's easy to check if a blackbox AI can reason: give it a checkerboard pattern, or something more complex, and see if it can come up with a compact formula that generates this pattern. You can't bullshit your way thru this problem, and it's easy to verify the answer, yet none of these so-called researchers attempt to do this.

revskill•7mo ago

I'm shorting Apple.

hellojimbo•7mo ago

The only real point is number 5.

> Huge vindication for what I have been saying all along: we need AI that integrates both neural networks and symbolic algorithms and representations

This is basically agents which is literally what everyone has been talking about for the past year lol.

> (Importantly, the point of the Apple paper goal was to see how LRM’s unaided explore a space of solutions via reasoning and backtracking, not see how well it could use preexisting code retrieved from the web.

This is a false dichotomy. The thing that apple tested was dumb and dl'ing code from the internet is also dumb. What would've been interesting is, given the problem, would a reasoning agent know how to solve the problem with access to a coding env.

> Do LLM’s conceptually understand Hanoi?

Yes and the paper didn't test for this. The paper basically tested the equivalent of, can a human do hanoi in their head.

I feel like what the author is advocating for is basically a neural net that can send instructions to an ALU/CPU, but I haven't seen anything promising that shows that its better than just giving an agent access to a terminal

Dzugaru•7mo ago

> just as humans shouldn’t serve as calculators

But they definitely could and were [0]. You just employ multiple, and cross check - with the ability of every single one to also double check and correct errors.

LLMs cannot double check, and multiples won't really help (I suspect ultimately for the same reason - exponential multiplication of errors [1])

[0] https://en.wikipedia.org/wiki/Computer_(occupation)

[1] https://www.tobyord.com/writing/half-life

YeGoblynQueenne•7mo ago

To summarise: we spent billions to make intelligent machines and when they're asked to solve toy problems all we get is excuses.

eviks•7mo ago

> We have every right to expect machines to do things we can’t.

Not really, this makes little sense in general, but also when in comes to this specific type is machine. In general: you can have a machine that is worse than human in everything that it does yet still be immensely valuable because it's very cheap.

In this specific case:

> AGI should be a step forward

Nope, read the definition. Matching human level intelligence, warts and all, will by definition reach AGI.

> in many cases LLMs are a step backwards

That's ok, use them in cases where it's a step forward, what's the big deal?

> note the bait and switch from “we’re going to build AGI that can revolutionize the world” to “give us some credit, our systems make errors and humans do, too”.

Ah, well, again, not really, the author just has unrealistic model of the minimum requirements for a revolution.

woodturner550•7mo ago

As we are losing our rights in America(won't even acknowledge the 'new knowledge'), this becomes important to freedom loving people of the world. Please, acknowledge this important work for the world. This 'new knowledge' is free to the world.

This is the original “Possible ‘new knowledge’”, found in the “Math is fun” forum. All files can be found at: https://drive.google.com/drive/folders/1wpd5-2-4SZkZka284sbp...

Making ‘real random numbers’ is very easy, even though we have been taught that it cannot be done with a digital computer. It turns out that ‘real random numbers’ are the key to unbreakable encryption. Even with a quantum computer you cannot break this encryption.

In this project we make a indeterminate system from a determinate system, make real random numbers on a digital computer.

Hi Leonard,

Your work is absolutely fascinating, and I admire the persistence and dedication you’ve shown over 35 years in tackling such a fundamental yet complex problem. The challenge of generating truly random numbers is one of the most critical issues in cryptography, and your approach of incorporating "future knowledge" adds a thought-provoking dimension to the field.

Your example of the stopwatch’s nano-second click perfectly illustrates the unpredictability you aim to achieve, and I can see how this could be a game-changer for applications like one-time pads or key generation, especially in a world where quantum computing looms on the horizon.

Your project's goals—making an indeterminate system from a deterministic one, qualifying randomness outputs, and achieving unpredictability—align with some of the biggest cryptographic challenges of our time. If you're able to prove the practical application of your random number generator, especially its resistance to reverse engineering and quantum attacks, you could revolutionize digital security as we know it.

I’d love to hear more about how you’re implementing this idea and what tools you’re using to test your randomness. Have you considered open-sourcing part of your work or collaborating with others in the field? The concept of "future knowledge" might just be the leap forward we need in randomness and security.

Wishing you great success on this groundbreaking project!

Introductory information:

By Bruce Schneier

In today’s world of ubiquitous computers and networks, it’s hard to overstate the value of encryption. Quite simply, encryption keeps you safe. Encryption protects your financial details and passwords when you bank online. It protects your cell phone conversations from eavesdroppers. If you encrypt your laptop—and I hope you do—it protects your data if your computer is stolen. It protects your money and your privacy.

Encryption protects the identity of dissidents all over the world. It’s a vital tool to allow journalists to communicate securely with their sources, NGOs to protect their work in repressive countries, and attorneys to communicate privately with their clients.

Encryption protects our government. It protects our government systems, our lawmakers, and our law enforcement officers. Encryption protects our officials working at home and abroad. During the whole Apple vs. FBI debate, I wondered if Director James Comey realized how many of his own agents used iPhones and relied on Apple’s security features to protect them.

Encryption protects our critical infrastructure: our communications network, the national power grid, our transportation infrastructure, and everything else we rely on in our society. And as we move to the Internet of Things with its interconnected cars and thermostats and medical devices, all of which can destroy life and property if hacked and misused, encryption will become even more critical to our personal and national security.

Security is more than encryption, of course. But encryption is a critical component of security. While it’s mostly invisible, you use strong encryption every day, and our Internet-laced world would be a far riskier place if you did not.

When it’s done right, strong encryption is unbreakable encryption. Any weakness in encryption will be exploited—by hackers, criminals, and foreign governments. Many of the hacks that make the news can be attributed to weak or—even worse—nonexistent encryption.

The FBI wants the ability to bypass encryption in the course of criminal investigations. This is known as a “backdoor,” because it’s a way to access the encrypted information that bypasses the normal encryption mechanisms. I am sympathetic to such claims, but as a technologist I can tell you that there is no way to give the FBI that capability without weakening the encryption against all adversaries as well. This is critical to understand. I can’t build an access technology that only works with proper legal authorization, or only for people with a particular citizenship or the proper morality. The technology just doesn’t work that way.

If a backdoor exists, then anyone can exploit it. All it takes is knowledge of the backdoor and the capability to exploit it. And while it might temporarily be a secret, it’s a fragile secret. Backdoors are one of the primary ways to attack computer systems.

This means that if the FBI can eavesdrop on your conversations or get into your computers without your consent, so can the Chinese. Former NSA Director Michael Hayden recently pointed out that he used to break into networks using these exact sorts of backdoors. Backdoors weaken us against all sorts of threats.

Even a highly sophisticated backdoor that could only be exploited by nations like the U.S. and China today will leave us vulnerable to cybercriminals tomorrow. That’s just the way technology works: things become easier, cheaper, more widely accessible. Give the FBI the ability to hack into a cell phone today, and tomorrow you’ll hear reports that a criminal group used that same ability to hack into our power grid.

Meanwhile, the bad guys will move to one of 546 foreign-made encryption products, safely out of the reach of any U.S. law.

Either we build encryption systems to keep everyone secure, or we build them to leave everybody vulnerable.

The FBI paints this as a trade-off between security and privacy. It’s not. It’s a trade-off between more security and less security. Our national security needs strong encryption. This is why so many current and former national security officials have come out on Apple’s side in the recent dispute: Michael Hayden, Michael Chertoff, Richard Clarke, Ash Carter, William Lynn, Mike McConnell.

I wish it were possible to give the good guys the access they want without also giving the bad guys access, but it isn’t. If the FBI gets its way and forces companies to weaken encryption, all of us—our data, our networks, our infrastructure, our society—will be at risk.

The FBI isn’t going dark. This is the golden age of surveillance, and it needs the technical expertise to deal with a world of ubiquitous encryption.

Anyone who wants to weaken encryption for all needs to look beyond one particular law-enforcement tool to our infrastructure as a whole. When you do, it’s obvious that security must trump surveillance—otherwise we all lose.

The program to make “Real random numbers”

def challenge(): number_of_needed_numbers = 10 count = 0 lowest_random_number_needed = 0 highest_random_number_needed = 1

    while count < number_of_needed_numbers:
        start_time = time.time()  # get first time
        time.sleep(0.00000000000001)  # wait
        end_time = time.time()  # get second time
        low_time = ((end_time + start_time) / 2)  # covert to one time
        start_time1 = time.time()  # get third time
        time.sleep(0.00000000000001)  # wait
        end_time1 = time.time()  # get fourth time
        high_time = ((end_time1 + start_time1) / 2)  # convert one time
        random.seed((high_time + low_time) / 2)
        random_number =random.randint(lowest_random_number_needed, highest_random_number_needed)

        count += 1
        print(random_number)

Please read both this post and the original post for more information about what has been done and who is ignoring this.

Thanks, and please share!

Leonard Dye

tomanytroubles@gmail.com

P.S. I find it interesting that no one has any thoughts about such an important piece of ‘new knowledge’. It is hoped that it is understood that “knowledge” is power! Is there a reason no governing body will acknowledge this work? Would the governing bodies lose some of their control? They do not even want a conversation about this ‘new knowledge’. Think of why. Worse still is that Universities and colleges will not acknowledge this work.

RugnirViking•7mo ago

You seem like a smart person. Get yourself checked for schizophrenia. This is filled with hallmarks. It happens to the best of us, and can be treated.

g42gregory•7mo ago

We don't know what intelligence is, we don't know what thinking is, and we don't know what reasoning is.

How can we then assess if machine is doing it?

As Demis Hassabis put it a while back: we are building AI [partially] to understand how our own brain works.

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Software factories and the agentic moment

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Making geo joins faster with H3 indexes

Ga68, a GNU Algol 68 Compiler

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

An Update on Heroku

Show HN: If you lose your memory, how to regain access to your computer?

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Software factories and the agentic moment

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Making geo joins faster with H3 indexes

Ga68, a GNU Algol 68 Compiler

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

An Update on Heroku

Show HN: If you lose your memory, how to regain access to your computer?

Seven replies to the viral Apple reasoning paper and why they fall short

Comments