Also I got caught on this one kind of irrelevant point regarding the characterization of the Matrix: I would say Matrix is not just diguised as a story about escaping systems of control, it's quite clearly about oppressive systems in society, with specific reference to gender expression. Lilly Wachowski has explicitly stated that it was supposed to be an allegory for gender transition.
Going to rewatch the Matrix tonight.
https://www.them.us/story/lilly-wachowski-work-in-progress-s...
Scott Jenson is one of my favorite authors.
He's really big on integrating an understanding of basic human nature, into design.
I would love Portal 3, even more.
But arguably much deeper is the fact that nothing in this game, or any single-player game, is a living thing in any form. Arguably the game's characterization of GLaDOS hits even harder on the anthropomorphism angle.
How do you know LLMs aren't intelligent, if you can't define what that means?
Doesn't sound like a test of intelligence to me, so no.
I think winning a Go or a chess competition does demonstrate intelligence. And winning a math competition does even more so.
I do not think a trivia competition like Jeopardy demonstrates intelligence much at all, however. Specifically because it reads like it's not about intelligence, but about knowledge: it tests for association and recall, not for performing complex logical transformations.
This isn't to say I consider these completely independent. Most smart people are both knowledgeable and intelligent. It's just that they are distinct dimensions in my opinion.
You wouldn't say something tastes bad because its texture feels weird in your mouth, would you?
Chess is a simple alfa beta pruned minmax seaech tree. If that's intelligent then a drone flight controller or a calculator is aswell.
> association and recall, not for performing complex logical transformations.
By that definition humans doing chess aren't as intelligent as a computer doing chess, since high level chess is heavily reliant on memory and recall of moves and progressions.
So your definition falls appart.
I did not share any definitions, only vague opinions. Not that I'd know what it means for a definition to "fall apart".
And the specific bit you cite is barely even a vague opinion; it is my interpretation of the show "Jeopardy!" based on the Wiki article (I've never seen a single episode, wasn't really a thing where I'm from):
> Specifically because it reads like it's about (...) knowledge: it tests for association and recall (...)
Also:
> By that definition humans doing chess aren't as intelligent as a computer doing chess, since high level chess is heavily reliant on memory and recall of moves and progressions.
Yes, I did find this really quite disappointing and disillusioning when I first learned about it. A colleague of mine even straight up quit competitive chess over it.
You are spot on though. I mostly wanted to argue that no decent distinction can be made here.
> I did find this really quite disappointing and disillusioning when I first learned about it
ye... same here.
---
I'm personally in the camp that "intelligence" is a human concept. A metric to compare humans. Applying it to computers makes us anthropomorphism computers and think of them as people. Thinking of LLMs as people makes us trust them with bad things.
So we should call them impressive, fluent, fast, useful, good at tasks. Computers already beat us at most math, statistics, searching for information, spacial visualization, information recollection, lossless communication. LLMs just adds to that list, but does nothing new to make the word "intelligent" applicable. Even if we reach the AGI singularity; thinking of them as humans or using human terminology to describe them is a fatal error.
(Destroying earth to make paperclips is arguably the least intelligent thing you could ever do.)
As for solving math problems, that is an important part of recursive self improvement. If it can come up with better algorithms and turn them into code, that will translate into raising it's own intelligence.
But they are not at all similar for computers. Chess has a constrained small set of rules and it is pretty straightforward to make a machine that beats humans by brute force computation. Pre-Leela chess programs were just tree search, a hardcoded evaluation function, and lots of pruning heuristics. So those programs are really approaching the game in a fundamentally different way from strong humans, who rely much more on intuition and pattern-recognition rather than calculation. It just turns out the computer approach is actually better than the human one. Sort of like how a car can move faster than a human even though cars don’t do anything much like walking.
Math is not analogous: there’s no obvious algorithm for discovering mathematical proofs or solving difficult problems that could be implemented in a classical, pre-Gen AI computer program.
Fundamentally opposite. Computer algorithms have been part of math research since they where invented, and mathematical proof algorithms are widespread and excellent.
The llms that are now "intelligent enough to do maths" are just trained to rephrase questions into prolog code.
Do you have a source that talks about this?
But math olympiad questions have been beaten before by AlphaGeometry and a few other's using prolog or similar logic evaluation engines. And it works quite well. (Simply searching LLM prolog gives alot of results on Google and Google scholar)
If openai did it through brute forces text reasoning, its both impressive and frighteningly inefficient.
Even just normal algebra is something llms struggle with, hence using existing algebra solvers is faar more effective.
But Jeopardy is not like that. In fact, it is infinite in scope also, yet we earlier dismissed that as not intelligent
Because advocates of LLMs don't use their alleged intelligence as a defense; but opponents of LLMs do use their alleged non-intelligence as an attack.
Really, whether or not the machine is "intelligent", by whatever definition, shouldn't matter. What matters is whether it is a useful tool.
On the other hand, one early researcher said something to the effect of, Researchers in physics look at the universe and wonder how it all works. Researchers in biology look at living organisms and wonder how they can be alive. Researchers in artificial intelligence wonder how software can be made to wonder such things.
I feel like we are still way off from having a working solution there.
As a tool its useful yes, that is not the issue;
- theyre used as phycologist and life coaches.
- judges of policy and law documents
- writers of life affecting computer systems.
- Judges of job applications.
- Sources of medical advice,
- legal advisors
- And increasingly as a thing to blame when any of above goes awry.
If we think of llms as very good text writing tools, the responsibility to make "intelligent" decisions and more crucially take responsibility for those decisions remains on real people rather than dice.
But if we think of them as intelligent humans, we making a fatal misjudgement.
Pragmatic people see news like "LLMs achieve gold in Math Olympiad" and think "oh wow, it can do maths at that level, cool!" This gets misinterpreted by so called "critics of LLM" who scream "NO THEY ARE JUST STOCHASTIC PARROTS" at every opportunity yet refuse to define what intelligence actually is.
The average person might not get into that kind of specific detail, but they know that LLMs can do some things well but there are tasks they're not good at. What matters is what they can do, not so much whether they're "intelligent" or not. (Of course, if you ask a random person they might say LLMs are pretty smart for some tasks, but that's not the same as making a philosophical claim that they're "intelligent")
Of course there's also the AGI and singularity folks. They're kinda loony too.
LLMs give a very strong appearance of intelligence, because humans are super receptive to information provided via our native language. We often have to deal with imperfect speakers and writers, and we must infer context and missing information on our own. We do this so well that we don't know we're doing it. LLMs have perfect grammar and we subtly feel that they are extremely smart because subconsciously we recognize that we don't have to think about anything that's said, it is all syntactically perfect.
So, LLMs sort of trick us into masking their true limitations and believing that they are truly thinking; there are even models that call themselves thinking models, but they don't think, they just predict what the user is going to complain about and say that to themselves as an additional, dynamic prompt on top of the one you actually enter.
LLMs are very good at fooling us into the idea that they know anything at all; they don't. And humans are very bad at being discriminate about the source of the information presented to them if it is presented in a friendly way. The combination of those things is what has resulted in the insanely huge AI hype cycle that we are currently living in the middle of. Nearly everyone is overreacting to what LLMs actually are, and the few of us that believe that we sort of see what's actually happening are ignored for being naysayers, buzz-kills, and luddites. Shunned for not drinking the Kool-Aid.
For example, I can spin up any LLM and get it to translate some English text into Japanese with maybe 99% accuracy. I don't need to believe whether it "really knows" English or Japanese, I only need to believe the output is accurate.
Similarly I can ask a LLM to code up a function that does a specific thing, and it will do it with high accuracy. Maybe there'll be some bugs, but I can review the code and fix them, which in some cases boosts my productivity. I don't need to believe whether it "really knows" C++ or Rust, I only need it to write something good enough.
I mean, just by these two examples, LLMs are really great tools, and I'm personally hyped for these use cases alone. Am I fooled by the LLM? I don't think so, I don't have any fantasy about it being extremely intelligent or always being right. I doubt most reasonable people these days would either.
So basically you're going about assuming people are fooled by LLMs (which they might not be), and wondering why you're unpopular when you're basically telling everyone they're gullible and foolish.
no assumption. people believe that these things think. I encourage you to hang out in non-technical places and listen to people talking about what LLMs do. "Mine has a personality!" People who do not understand the limitations of LLMs believe these things are alive.
I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?
I see two possibilities:
1. We define "intelligence" as definitionally unique to humans. For example, maybe intelligence depends on the existence of a human soul, or specific to the physical structure of the human brain. In this case, a machine (perhaps an LLM) could achieve "quacks like a duck" behavioral equality to a human mind, and yet would still be excluded from the definition of "intelligent." This definition is therefore not useful if we're interested in the ability of the machine, which it seems to me we are. LLMs are often dismissed as not "intelligent" because they work by inferring output based on learned input, but that alone cannot be a distinguishing characteristic, because that's how humans work as well.
2. We define "intelligence" in a results-oriented way. This means there must be some specific test or behavioral standard that a machine must meet in order to become intelligent. This has been the default definition for a long time, but the goal posts have shifted. Nevertheless, if you're going to disparage LLMs by calling them unintelligent, you should be able to cite a specific results-oriented failure that distinguishes them from "intelligent" humans. Note that this argument cannot refer to the LLMs' implementation or learning model.
the ability for long-term planning and, more cogently, actually living in the real world where time passes
1. LLMs seem to be able to plan just fine.
2. LLMs clearly cannot be "actually living" but I fail to see how that's related to intelligence per se.
Depends on the content of the list.
A list of the names of the seven dwarfs: no, not an act of planning.
A list of steps necessary to build a web site: yes, an act of planning.
my genuine question is how does a LLM handle that situation? and as you point out, it’s an absurd comparison
If you want to know how an LLM would handle that situation, why don't you ask it?
a LLM cannot actually be intelligent if it cannot operate in a temporal context ;)
> a LLM cannot actually be intelligent if it cannot operate in a temporal context ;)
When I have a conversation with an LLM, that conversation happens in time. It has a beginning, a middle, and an end. The conversation can refer to earlier parts of the conversation. How is that not a "temporal context"?
Furthermore, can you explain why a temporal context is necessary for intelligence? For example, if a human being could download their entire brain into a computer and exist there, as if they were an LLM, would they cease to be intelligent, in your view?
This is not what I mean for a few reasons:
1. This context literally has limits; we'll get back to the grocery store 2. This is a point-in-time conversation
On the latter point, that is, you can have the same conversation tomorrow. The LLM has not "learned" anything, it has not adapted in any way. Yes, you are experiencing time, and the conversation is happening over time, but the LLM is not experiencing nor aware of time and is not intelligently adapting to it. Yes, they get trained and "updated" in that way, it's not the same thing.
If you don't respond for an hour, then do, the LLM is not aware of that unless its system injects a "datetime.now()" somewhere in the prompt. Point of this being: an LLM is not an adaptable system. Now you can play the "What if?" game ad ininfinitum -- make it aware of the current time, current location, etc. etc.
Hence my grocery store example. If I go out into the real world, I experience real things, and I make intelligence decisions based off those experiences. An LLM cannot do that, just full stop. And again, you can go "well what if I put the LLM in a robot body, and give it a system, then it can go grocery shopping". And only at this point are we kinda-sorta-close to having a discussion about intelligence. If this mythical creature can go to the grocery store, notice it's not there, look up what happened, maybe ask some friends who live in the same city if they know, maybe make some connection months later to some news article...a LLM or system we build on an LLM cannot do this. It cannot go into the store and think "ya know, if I buy all this ice cream and eat it, that could be bad" and connect it to the million other things a real person is doing and considering in their day to day life
The actual world is practically infinitely complex. Talking about "a LLM writing a list is planning and that shows intelligence" is a frighening attenuation of intelligence in the world world to anthropomorphization to a very high degree. Reframing as "intelligence needs to be able to adapt to the world around it over time" is a much better starting point IMO
They do learn, OpenAI has a memory feature. I just opened up a chat, asked "What do you know about me?" and got a long list of things specific to me that it certainly did not infer from the chat so far. It's a bit unsettling really, someone at OpenAI would probably have little difficulty matching my OpenAI account to my HN one, it looks like they have quite a few bits of information to work with. Privacy is a hard thing to maintain.
I really don't see the "LLMs don't learn" position as a defensible one long term given the appalling limitations of human memory and the strengths computers have at it. Given the improvements in RAG and large context windows it actually seems pretty likely that LLMs will be quite a lot better at humans when it comes to memory, they have SSDs. We just don't build LLMs with memory right yet for whatever reason.
adding text to storage and searching over it is not memory. “knowing” those things about you is not learning. and guess what, context still fills up. trying putting that LLM again in the real world, facing real human challenges, with all the real sensory input around you. it’s nonsensical
and it’s not about “limits” of humans. machines can do math and many things better, that’s been apparent for decades. yes, they can “remember” 8k video streams much better than us. that’s not “memory” in the human sense and machines don’t “learn” from it in the human sense
(your IP address is much easier to link your accounts than your text)
Why not? If humans store data in their brains, isn't that learning? Of course data can include skills and connections, for both humans and AIs.
> and guess what, context still fills up.
Human memory also has limits.
Sorry, but I really don't see how the distinctions you're trying to make even exist, much less qualify as markers of intelligence.
No. We’re back to my earlier point of you and I have fundamentally different understanding of cognition, intelligence, and learning. And genuinely not trying to be condescending, but I suspect you don’t have a good grounding in the technology we’re discussing
Yeah, that definitely came off as condescending. Especially on HN, where pretty much everyone here has a grounding in the technology we're discussing. In any case, your arguments have not dealt with technology at all, but on hand-wavy distinctions like "temporality."
Anyway, to the larger point: I agree that "you and I have fundamentally different understanding of cognition, intelligence, and learning" but your inability to explain your own understanding of these terms and why they are relevant is why your arguments are unpersuasive.
First, stay calm—it's unusual for a store to vanish overnight, but it could be due to closure, renovation, relocation, or even a mix-up with the address. Here's a step-by-step plan to handle it:
### Immediate Steps
- **Double-check the location**: Use your phone's maps app (like Google Maps or Apple Maps) to search for the store's name or address. It might show if it's temporarily closed, has moved nearby, or if there's an error in navigation.
- **Look for signs or info on-site**: If you're already there, check for any posted notices, construction signs, or talk to nearby people/businesses to ask what happened.
### Find Alternatives
- **Locate nearby stores**: Search for other grocery stores in the area. Apps like Yelp, Google Maps, or store-specific apps (e.g., Walmart, Kroger) can show options with ratings, hours, and directions.
- **Use delivery or online shopping**: If you're in a pinch, order groceries via services like Instacart, Amazon Fresh, DoorDash, or the apps of chains like Whole Foods or Target. Upload your list and get it delivered.
- **Adapt your list if needed**: If certain items are unique to that store, prioritize essentials and substitute where possible (e.g., if it's a specialty ethnic store, find similar options elsewhere).
### Prevent Future Issues
- **Diversify your options**: Add a couple of backup stores to your routine.
- **Stay updated**: Follow local news or the store's social media/website for announcements about changes.
If you provide more details like the store's name, location, or what might have happened (e.g., was it demolished?), I can help search for specifics or suggest exact alternatives.It still matters that there's nobody in there. You're figuring out better ways to tap into the history of language-users having represented planning in language. As such, this seems a brittle way to represent 'planning'.
Isn't "echoing previous examples of X" the meaning of learning? If I learn, for example, calculus, or C++, or oil painting, aren't I just echoing pieces of math, code, or art that I've experienced elsewhere, but arranged into new patterns?
Can't we say that the LLM "learned" planning, just as we can say that a person learned planning? I fail to see the distinction between "echo of examples of plans" in an LLM is any less authentic than a human being making similar echos in their own plans.
Ive found 'Triarchic theory of intelligence' [0] to be helpful here. It's a much more rounded approach to human intelligence that, imo, is closer to what people are trying to get at during these types of 'AI vs human intelligence' discussions. The full encompassing of human Intelligence is not something that can simply be chalked up to some set of measurements -- nor is it about a soul or whatever. If you want to say LLMs are good at a particular sets of test measurements, then okay say that. But thats not equivalent to human intelligence, it only represents a tiny measurable subset that happens to be a part of the whole
[0] https://en.wikipedia.org/wiki/Triarchic_theory_of_intelligen...
sure, but it feels like this is just looking at what distinguishes humans from LLMs and calling that “intelligence.” I highlight this difference too when I talk about LLMs, but I don’t feel the need to follow up with “and that’s why they’re not really intelligent.”
But the problem of anthropomorphizing is real. LLMs are deeply weird machines - they’ve been fine-tuned to sound friendly and human, but behind that is something deeply alien: a huge pile of linear algebra that does not work at all like a human mind (notably, they can’t really learn form experience at all after training is complete). They don’t have bodies or even a single physical place where their mind lives (each message in a conversation might be generated on a different GPU in a different datacenter). They can fail in weird and novel ways. It’s clear that anthropomorphism here is a bad idea. Although that’s not a particularly novel point.
But we're not there, at least in my mind. I feel no guilt or hesitation about ending one conversation and starting a new one with a slightly different prompt because I didn't like the way the first one went.
Different people probably have different thresholds for this, or might otherwise find that LLMs in the current generation have enough of a context window that they have developed a "lived experience" and that ending that conversation means that something precious and unique has been lost.
Actually, that's another level of humans-being-tricked going on: The "personality" most people are thinking of is a fictional character we humans perceive in a document.
Controversies about the word niggardly ~ https://en.m.wikipedia.org/wiki/Controversies_about_the_word...
This is a spectacularly, objectively invalid take.
Now, within some contexts it is best to stick to standard precise definitions for some words. Still, the meaning of a word within a community is determined by how it is used and understood within that community, not by what is in a dictionary.
You could also argue around how our brains process vast amounts of information unconsciously as a backdrop to the conscious part of us being alive at all, and how they pull all of this and awareness off on the same energy that powers a low-energy light bulb, but that's expanding beyond the basic and obvious difference stated above.
The Turing test has been broken by LLMs, but this only shows that it was never a good test of sentient artificial intelligence to begin with. I do incidentally wish Turing himself could have stuck around to see these things at work, and ask him what he thinks of his test and them.
Furthermore, our history is full of instances where we invented something that a) we didn't understand and b) had uses way beyond what we intended. As a recent example, GLP-1 was intended for glucouse regulation but is now basically a wonder drug that can treat a bunch of issues ranging from obesity to addiction, inflammation, dementia, fertility and more.
Given we don't know how to define intelligence, I think it's fine to apply some "duck typing" here.
If your argument is that only things made in the image of humans can be intelligent (i.e. #1), then it just seems like it's too narrow a definition to be useful.
If there's a larger sense in which some system can be intelligent (i.e. #2), then by necessity this can't rely on the "implementation or learning model".
What is the third alternative that you're proposing? That the intent of the designer must be that they wanted to make something intelligent?
Thus we have to abandon any sort of metric for intelligence and just call it a tautology and rely on an something that we can define to be the litmus for whatever property we are looking for. I think 'agency' should be under consideration for this, since it is actually somewhat definable and testable.
If we measure intelligence as results oriented, then my calculator is intelligent because it can do math better than me; but that’s what it’s programmed/wired to do. A text predictor is intelligent at predicting text, but it doesn’t mean it’s general intelligence. It lacks any real comprehension of the model or world around it. It just know words, and
It’s cool technology, but the burden of proof of real intelligence shouldn’t be “can it answer questions it has great swaths of information on”, because that is the result it was designed to do.
It should be focused on whether it can truly synthesize information and know its limitations - something any programmer using Claude, copilot, Gemini, etc will tell you that it fabricates false information/apis/etc on a regular basis and has no fundamental knowledge that it even did that.
Or alternatively, ask these models leading questions that have no basis in reality — and watch what it comes up with. It’s become a fun meme in some circles to ask for definitions of nonsensical made up phrases to models, and see what crap it comes up with (again, without even knowing that it is).
> but the goal posts have shifted
Is this shocking? We don't have a rigorous definition of intelligence so doesn't it make sense? The question isn't about such a goal post moving so much about how it is moving. It is perfectly acceptable for it to be refined while it wouldn't be to rewrite the definition in a way that isn't similar to the previous one.So I think there are a lot more than your two possibilities. I mean psychologists and neuroscientists have been saying for decades that tests aren't a precise way to measure knowledge or intelligence, but that it is still a useful proxy.
> "quacks like a duck" behavioral
I see this phrase used weirdly frequently. The duck test is | If it looks like a duck, swims like a duck, and quacks like a duck, then it ***probably*** is a duck.
I emphasize probably because the duck test doesn't allow you to distinguish a duck from a highly sophisticated animatronic. It's a good test, don't get me wrong, but that "probably" is a pretty important distinction.I think if we all want to be honest, the reality is "we don't know". There's arguments to be made in both directions and with varying definitions of intelligence with different nuances involved. I think these arguments are fine as they make us refine our definitions but I think they can also turn to be entirely dismissive and that doesn't help us refine and get closer to the truth. We all are going to have opinions on this stuff but frankly, the confidence of our opinions needs to be proportional to the amount of time and effort spent studying the topic. I mean the lack of a formal definition means nuances dominate the topic. Even if things are simple once you understand them that doesn't mean they aren't wildly complex before that. I mean I used to think Calculus was confusing and now I don't. Same process but not on an individual scale.
Why is it an important distinction? The relevance of the duck test is that if you can't tell a duck from a non-duck, then the non-duck is sufficiently duck-like for the difference to not matter.
I'm assuming you do, since apples aren't oranges and ducks aren't robots no matter what conspiracy theorists joke or say. You can't eat an animatronic duck. It doesn't lay eggs, no matter how much another duck tries to mate with it.
> then the non-duck is sufficiently duck-like for the difference to not matter.
Here's where things fall apart. This is not actually true. It is about your measurements, not about the actual duck-iness of the thing we're applying the duck test to. I know this sounds funny, but let's just say we're using only sight and sound. Does my duck get destroyed when flying through a magnetic field? A biological duck won't but an animatronic duck will. Now let's say your blind (you can't see magnetic fields), can something pass the duck test for you but it will obviously not be a duck for someone who can see? This is obviously true[0]. I'm sure we can find a bird that you'd think is a duck but isn't.So it matters, do you care if something is a duck or not? And in what way? There's no single duck measurement, there's a whole suite of tests that need to be combined carefully and thoughtfully depending on how you answer the previous questions.
Obviously this is extra important when we aren't exactly sure what a duck is...
(Also, see Turing Test)
[0] https://images.squarespace-cdn.com/content/547e053ae4b04768d...
But the point of the article is a distinct claim: personification of a model, expecting human or even human-like responses is a bad idea. These models can be held responsible for their answers independently because they are tools. They should be used as tools until they are powerful enough to be responsible for their actions and interactions legally.
But we're not there. These are tools. With tool limitations.
That ain't shortening because none of that was in his post.
That reflection seems totally off to me: fluent, and flavored with elements of the article, but also not really what the article is about and a pretty weird/tortured use of the elements of the allegory of the cave, like it doesn't seem anything like Plato's Cave to me. Ironically demonstrates the actual main gist of the article if you ask me.
But maybe you meant that you think that summary is good and not textually similar to that post so demonstrating something more sophisticated than "shortening".
A New Yorker book review often does the opposite of mere shortening. The reviewer:
* Places the book in a broader cultural, historical, or intellectual context.
* Brings in other works—sometimes reviewing two or three books together.
* Builds a thesis that connects them, so the review becomes a commentary on a whole idea-space, not just the book’s pages.
This is exactly the kind of externalized, integrative thinking Jenson says LLMs lack. The New Yorker style uses the book as a jumping-off point for an argument; an LLM “shortening” is more like reading only the blurbs and rephrasing them. In Jenson’s framing, a human summary—like a rich, multi-book New Yorker review—operates on multiple layers: it compresses, but also expands meaning by bringing in outside information and weaving a narrative. The LLM’s output is more like a stripped-down plot synopsis—it can sound polished, but it isn’t about anything beyond what’s already in the text.
I don't think the Plato's Cave analogy is confusing, I think it's completely wrong. It's "not in the article" in the sense that it is literally not conceptually what the article is about and it's also not really what Plato's Cave is about either, just taking superficial bits of it and slotting things into it, making it doubly wrong.
You can see where the LLM has gone wrong. It’s hooked in to “summary” and therefore given excessive emphasis to this part of the article. The plato’s cave analogy is stupid; and what on earth is it going on about with ghosts?
It’s not shortening, sure, it’s dribbling nonsense.
Humans seem to get wrapped around these concepts like intelligence consciousness etc. because they seem to be the only thing differentiating us from every other animal when in fact it’s all a mirage.
LLM intelligence is in the spot where it is simultaneously genius-level but also just misses the mark a tiny bit, which really sticks out for those who have been around humans their whole lives.
I feel that, just like more modern CGI, this will slowly fade with certain techniques and you just won't notice it when talking to or interacting with AI.
Just like in his post during the whole Matrix discussion.
> "When I asked for examples, it suggested the Matrix and even gave me the “Summary” and “Shortening” text, which I then used here word for word. "
He switches in AI-written text and I bet you were reading along just the same until he pointed it out.
This is our future now I guess.
As opposed to what grandmasters actually did, which was less look ahead and more pattern matching to strengthen the position.
Now LLMs successfully leverage pattern matching, but interestingly it is still a kind of brute force pattern matching, requiring the statistical absorption of all available texts, far more than a human absorbs in a lifetime.
This enables the LLM to interpolate an answer from the structure of the absorbed texts with reasonable statistical relevance. This is still not quite “what humans do” as it still requires brute force statistical analysis of vast amounts of text to achieve pretty good results. For example training on all available Python sources in github and elsewhere (curated to avoid bad examples) yields pretty good results, not how a human would do it, but statistically likely to be pertinent and correct.
The common LLMs are not trying to be a better human than a single human. They are trying to be more useful per $ cost than a bevy of humand doing some basket of human tasks.
Making a replacement for an individual complete human, an effort that was popular before the Internet made that idea seem quaint, is very different challenge, and more niche, because it's not as economically efficient.
My impression is, there is currently one tendency to "over-anthropomorphize" LLMs and treat them like conscious or even superhuman entities (encouraged by AI tech leaders and AGI/Singularity folks) and another to oversimplify them and view them as literal Markov chains that just got lots of training data.
Maybe those articles could help guarding against both extremes.
[1] https://www.verysane.ai/p/do-we-understand-how-neural-networ...
I’ll be the first to say one should pick their battles. But hearing that over and over from a crowd like this that can be quite pedantic is very telling.
If you keep redefining real intelligence as the set of things machines can’t do, then it’s always going to be true.
Language is really powerful, I think it's a huge part of our intelligence.
The interesting part of the article to me is the focus on fluency. I have not seen anything that LLMs do well that isn't related to powerful utilization of fluency.
This is right out of Community
I heard from Miguel Nicolelis that language is a filter for the human mind, so you can never build a mind from language. I interpreted this like trying to build an orange from its juice.
On the contrary, all their training data is their "experience".
The "summary vs shortening" distinction is moving the goalposts. They makes the empirical claim that LLMs fail at summarizing novel PDFs without any actual evidence. For a model trained on a huge chunk of the internet, the line between "reworking existing text" and "drawing on external context" is so blurry it's practically meaningless.
Similarly, can we please retire the ELIZA and Deep Blue analogies? Comparing a modern transformer to a 1960s if-then script or a brute-force chess engine is a category error. It's a rhetorical trick to make LLMs seem less novel than they actually are.
And blaming everything on anthropomorphism is an easy out. It lets you dismiss the model's genuinely surprising capabilities by framing it as a simple flaw in human psychology. The interesting question isn't that we anthropomorphize, but why this specific technology is so effective at triggering that response from humans.
The whole piece basically boils down to: "If we define intelligence in a way that is exclusively social and human, then this non-social, non-human thing isn't intelligent." It's a circular argument.
- Even if CEOs are completely out of touch and the tool can't do the job you can still get laid off in an ill informed attempt to replace you. Then when the company doesn't fall over because the leftover people, desperate to keep covering rent fill the gaps it just looks like efficiency to the top.
- I don't think our tendency anthropomorphize LLMs is really the problem here.
They aren’t just intelligence mimics, they are people mimics, and they’re getting better at it with every generation.
Whether they are intelligent or not, whether they are people or not, it ultimately does not matter when it comes to what they can actually do, what they can actually automate. If they mimic a particular scenario or human task well enough that the job gets done, they can replace intelligence even if they are “not intelligent”.
If by now someone still isn’t convinced that LLMs can indeed automate some of those intelligence tasks, then I would argue they are not open to being convinced.
Asking an LLM to take all this knowledge and apply it to a new domain? That will take a whole new paradigm.
If/when LLMs or other AIs can create novel work / discover new knowledge, they will be "genius" in the literal sense of the word.
More genius would be great! (probably) . But genius is not required for the vast majority of tasks.
I mean, don't most people break down in this case too? I think this needs to be more precise. What is the specific task that you think can reliably distinguish between an LLM's capability in this sense vs. what a human can typically manage?
That is, in the sense of [1], what is the result that we're looking to use to differentiate.
How is that a summary? It reads as a one-liner review I would leave on Letterboxed or something I would say, trying to be pretentious and treating the movie as a work of art. It is a work of art, because all movies are art, but that's an awful summary.
A philosophical exploration of free will and reality disguised as a sci-fi action film about breaking free from systems of control.
----- Do I need to explain more?
>The original Turing Test was designed to compare two participants chatting through a text-only interface: one AI and one human. The goal was to spot the imposter. Today, the test is simplified from three participants to just two: a human and an LLM.
By the original meaning of the test it's easy to tell an LLM from a human.
In fact, the example provided by the author is a great illustration of this:
> A philosophical exploration of free will and reality disguised as a sci-fi action film about breaking free from systems of control.
The words here refer back to the notions "free will" that is prominent in Western discourse from St. Augustine through Descartes and thereafter and similarly of "sci-fi". These are notions an uneducated East-Asian with limited Internet use and pop culture fluency will simply not understand. They would in fact prefer the latter description. The author and this hypothetical East-Asian live in very different zeitgeists, and correspondingly experience the movie differently and value different summaries of the film. They each prefer a summary that agrees with the zeitgeist, rather than a novel, unorthodox summary (relative to their zeitgeist) that requires a justification as to its deviation.
On the other hand, if you asked LLMs to explain material and concepts, one modality in which it does is use formulaic, novel, and unorthodox analogies to explain it to you. By formulaic and novel, I mean that the objects and scenarios in the analogies are frequently of a certain signature kind that it has been trained with, but it is novel in that the analogies are not found in the wild on the internet.
If you have frequently used LLMs for the purpose of explaining concepts, you will have encountered these analogies and know what I mean by this. The analogies are frequently not too accurate, but they round out the response by giving an ELI5-style answer.
Ironically, the author may have succumbed to LLM sycophancy.
Unless you believe in the magic sauce of a supernatural soul/mind/etc, our brains function as deterministic biological machines. There's essentially nothing that separates the processing and memory potential of silicon from neurons. And if the building blocks can be made analogous, then all the same emergent properties are possible. There's no reason to believe a circuit couldn't be made to behave in exactly the same way as a human brain. I'm not saying that's where LLMs are; only that it is theoretically possible. So if you imagine such a machine, and you deem that it is not intelligent, you have reserved intelligence as an exclusive human trait and this entire discussion is meaningless.
Secondly, although I'm not in either pro or anti LLM-intelligence camp, I find a lot of the arguments against machine intelligence disingenuous and/or unbalanced.
For instance the "Can't process information it's not familiar with" argument. Another commenter stated the case of scientific papers that it doesn't have any reference for, that it may hallucinate a garbage interpretation of the paper. Not surprising, but guess what, a human would do the same thing if they were forced! Imagine holding a gun to someone's head and telling them to explain a concept or system they've never heard of. That's essentially what we're doing with LLMs; obviously we don't need to threaten, because we haven't given them agency to say no.
Another example is the "Can't be novel, unique, or create something completely new." First of all, difficult to prove, but okay let's take it as given that an LLM can't be novel. Can you prove that a human can? We make all these assumptions on how intelligent and creative we are as humans, and how original our thoughts can be... but how original are they, and can we prove it? How do you know your original thought, or Beethoven's 5th, or the fast inverse square root trick was completely separate from any prior influences? Or... was that "original thought" the conglomeration of a thousand smaller inputs and data points that you trained on, that became part of your brains subconscious processing system, and came together in a synthesis that looks like brilliance.
Finally, whenever this discussion comes up with friends I ask them to think of the least intelligent person they know. Then imagine how many of those there are in the world (likely millions). Could you imagine any conceivable test of any length or depth which would designate all the those humans as intelligent, and all the LLMs as not? I certainly can't. I highly doubt there could be anything approaching 100% accuracy at this point.
Ultimately I think we should ditch both the intelligence and consciousness questions. We can't define them in ourselves, we certainly can't define them in another entity. Let's just come to terms with the fact we're highly functioning biological machines who are both scared and excited to have created something so similar to us.
Sure, AI often makes mistakes, but so do humans. Ask a random person a random question and I guarantee that at least 90% of the times an LLM will produce a better answer. So if knowledge or reasoning mistakes are a benchmark, then humans are not intelligent either.
What is the difference then? I feel that whoever claims that AI is not intelligent simply feels uncomfortable thinking that they are, and is going out of their way to rationalize that, by using various double standards. Like, AI making occasional mistakes counts as evidence against it, but the fact that the vast majority of people are incapable of answering some of the most basic questions somehow doesn't count against human intelligence.
And then there are various contrived "definitions of intelligence", such as "living in the physical world". These have nothing to do with cognitive intelligence and are just made up to make humans fit the definition and AI to not fit. Doing "truly novel things"? How many people have ever come up with anything truly novel? What about the other 99.99%, are they intelligent?
I think there's one thing that's tripping people up. Humans do have a very high COLLECTIVE intelligence. Collectively (and given enough time) we're capable of accomplishing great things, much more than current AI. But each individual human on their own is no more intelligent than a caveman (we have essentially the same brain).
sobiolite•5mo ago
"A true summary, the kind a human makes, requires outside context and reference points. Shortening just reworks the information already in the text."
Then later says...
"LLMs operate in a similar way, trading what we would call intelligence for a vast memory of nearly everything humans have ever written. It’s nearly impossible to grasp how much context this gives them to play with"
So, they can't summarize, because they lack context... but they also have an almost ungraspably large amount of context?
kayodelycaon•5mo ago
naikrovek•5mo ago
kayodelycaon•5mo ago
Depending how unique the text is determines how accurate the summarization is likely to be.
kblissett•5mo ago
timmg•5mo ago
It did a really good -- surprisingly good -- job. That incident has been a reference point for me. Even if it is anecdotal.
pc86•5mo ago
Spivak•5mo ago
pc86•5mo ago
BobaFloutist•5mo ago
It feels like they've mastered language, but it's looking more and more like they've actually mastered canon. Which is still impressive, but very different.
throwway120385•5mo ago
We are warned in statistics to be careful when extrapolating from a regression analysis.
famouswaffles•5mo ago
hnfong•5mo ago
hannofcart•5mo ago
https://en.m.wikipedia.org/wiki/Hero%27s_journey
originalcopy•5mo ago
I think LLMs do great summaries. I am not able to come up with anything where I could criticize it and say "any human would come up with a better summary". Are my tasks not "truly novel"? Well, then I am not able, as a human, to come up with anything novel either.
tovej•5mo ago
In other words the LLM does not contain the knowledge of what the words represent.
neerajsi•5mo ago
This is probably true for some words and concepts but not others. I think we find that llms make inhuman mistakes only because they don't have the embodied senses and inductive biases that are at the root of human language formation.
If this hypothesis is correct, it suggests that we might be able to train a more complete machine intelligence by having them participate in a physics simulation as one part of the training. I.e have a multimodal ai play some kind of blockworld game. I bet if the ai is endowed with just sight and sound, it might be enough to capture many relevant relationships.
btown•5mo ago
Imagine an oracle that could judge/decide, with human levels of intelligence, how relevant a given memory or piece of information is to any given situation, and that could verbosely describe which way it's relevant (spatially, conditionally, etc.).
Would such an oracle, sufficiently parallelized, be sufficient for AGI? If it could, then we could genuinely describe its output as "context," and phrase our problem as "there is still a gap in needed context, despite how much context there already is."
And an LLM that simply "shortens" that context could reach a level of AGI, because the context preparation is doing the heavy lifting.
The point I think the article is trying to make is that LLMs cannot add any information beyond the context they are given - they can only "shorten" that context.
If the lived experience necessary for human-level judgment could be encoded into that context, though... that would be an entirely different ball game.
entropicdrifter•5mo ago
IMO we already have the technology for sufficient parallelization of smaller models with specific bits of context. The real issue is that models have weak/inconsistent/myopic judgement abilities, even with reasoning loops.
For instance, if I ask Cursor to fix the code for a broken test and the fix is non-trivial, it will often diagnose the problem incorrectly almost instantly, hyper-focus on what it imagines the problem is without further confirmation, implement a "fix", get a different error message while breaking more tests than it "fixed" (if it changed the result for any tests), and then declare the problem solved simply because it moved the goalposts at the start by misdiagnosing the issue.
ratelimitsteve•5mo ago
jchw•5mo ago
gus_massa•5mo ago
card_zero•5mo ago
BobaFloutist•5mo ago
throwway120385•5mo ago
BobaFloutist•5mo ago
It's like the difference between teaching kids rate problems and then putting ones with negative values or nested rates on a test versus giving them a continuous compound interest problem and expecting them to derive e, because it is fundamentally about rates of change, isn't it?
jchw•5mo ago
My primary assertion is that LLMs struggle to generalize concepts and ideas, hence why they need petabytes of text just to often fail basic riddles when you muck with the parameters a little bit. People get stuck on this for two reasons: one, because they have to reconcile this with what they can see LLMs are capable of, and it's just difficult to believe that all of this can be accomplished without at least intelligence as we know it; I reckon the trick here is that we simply can't even conceive of how utterly massive the training datasets for these models are. We can look at the numbers but there's no way to fully grasp just how vast it truly is. The second thing is definitely the tendency to anthropomorphize. At first I definitely felt like OpenAI was just using this as an excuse to hype their models and come up with reasons for why they can never release weights anymore; convenient. But also, you can see even engineers who genuinely understand how LLMs work coming to the conclusion that they've become sentient, even though the models they felt were sentient now feel downright stupid compared to the current state-of-the-art.
Even less sophisticated pattern matching than what humans are able to do is still very powerful, but it's obvious to me that humans are able to generalize better.
strangattractor•5mo ago
My own impression is something more akin to a natural language search query system. If I want a snippet of code to do X it does that pretty well and keeps me from having to search through poor documentation of many OSS projects. Certainly doesn't produce anything I could not do myself - so far.
Ask it about something that is currently unknown and it list a bunch of hypotheses that people have already proposed.
Ask it to write a story and you get a story similar to one you already know but with your details inserted.
I can see how this may appear to be intelligent but likely isn't.
jchw•5mo ago
But seriously, how would you determine if an LLM's output was novel? The training data set is so enormous for any given LLM that it would be hard to know for sure that any given output isn't just a trivial mix of existing data.
Earw0rm•5mo ago
kylebyte•5mo ago
Earw0rm•5mo ago
I've no priors as to how original you are, nor how humble, so I mean this in a general, rather than personal, sense.
jjaksic•5mo ago
Remember, humans needed some 100, 000 years to figure out that you can hit an animal with a rock, and that's using more or less the same brain capacity we have today. If we were born in stone age, we'd all be nothing but cavemen.
What genuinely novel thing have you figured out?
jchw•5mo ago
Really? How do I know that with such great certainty?
Well, I don't know how much text I've read in one lifetime, but I can tell you it's less than the literally multiple terabytes of text fed into the training process of modern LLMs.
Yet, LLMs can still be found failing logic puzzles and simple riddles that even children can figure out, just by tweaking some of the parameters slightly, and it seems like the best thing we can do here is just throw more terabytes of data and more reinforcement learning at it, only for it to still fail, even if a little more sparingly each time.
So what novel things do average people do anyways, since beating animals with rocks apparently took 100,000 years to figure out? Hard call. There's no definitive bar for novel. You could argue almost everything we do is basically just mixing things we've seen together before, yet I'd argue humans are much better at it than LLMs, which need a metric shit load of training data and burn tons of watts. In return, you get some superhuman abilities, but superhuman doesn't mean smarter or better than people; a sufficiently powerful calculator is superhuman. The breadth of an LLM is much wider than any individual human, but the breadth of knowledge across humanity is obviously still much wider than any individual LLM, and there remain things people do well that LLMs definitely still don't, even just in the realm of text.
So if I don't really believe humans are all that novel, why judge LLMs based on that criteria? Really two reasons:
- I think LLMs are significantly worse at it, so allowing your critical thinking abilities to atrophy in favor of using LLMs is really bad. Therefore people need to be very careful about ascribing too much to LLMs.
- Because I think many people want to use LLMs to do truly novel things. Don't get me wrong, a lot of people also just want it to shit out another React Tailwind frontend for a Node.js JSON HTTP CRUD app or something. But, a lot of AI skeptics are no longer the types of people that downplay it as a cope or out of fear, but actually are people who were at least somewhat excited by the capabilities of AI then let down when they tried to color outside the lines and it failed tremendously.
Likewise, imagine trying to figure out how novel an AI response is; the training data set is so massive, that humans can hardly comprehend the scale. Our intuition about what couldn't possibly be in the training data is completely broken. We can only ever easily prove that a given response isn't novel, not that it is.
But honestly maybe it's just too unconvincing to just say all of this in the abstract. Maybe it would better to at least try to come up with some demonstration of something I think I've come up with that is "novel".
There's this sort-of trick I came up with when implementing falling blocks puzzle games for handling input that I think is pretty unique. See, in most implementations, to handle things like auto-repeating movements, you might do something like have a counter that increments, then once it hits the repeat delay, it gets reset again. Maybe you could get slightly more clever by having it count down and repeat at zero: this would make it easier to, for example, have the repeat delay be longer for only the first repeat. This is how DAS normally works in Tetris and other games, and it more or less mirrors the key repeat delay. It's easier with the count down since on the first input you can set it to the high initial delay, then whenever it hits zero you can set it to the repeat delay.
I didn't like this though because I didn't like having to deal with a bunch of state. I really wanted the state to be as simple as possible. So instead, for each game input, I allocate a signed integer. These integers are all initialized to zero. When a key is pressed down, the integer is set to 1 if it is less than 1. When a key is released, it is set to -1 if it is greater than 0. And on each frame of game logic, at the end of the frame, each input greater than 0 is incremented, and each input less than 0 is decremented. This is held in the game state and when the game logic is paused, you do nothing here.
With this scheme, the following side effects occur:
- Like most other schemes, there's no need to special-case key repeat events, as receiving a second key down doesn't do anything.
- Game logic can now do a bunch of logic "statelessly", since the input state encodes a lot of useful information. For example, you can easily trigger an event upon an input being pressed by using n == 1, and you can easily trigger an event upon an input being released using n == -1. You do something every five frames an input is held by checking n % 5 == 0, or slightly more involved for a proper input repeat with initial delay. On any given frame of game logic, you always know how long an input has been held down and after it's released you know how many frames it has been since it was pressed.
Now I don't talk to tons of other game developers, but I've never seen or heard of anyone doing this, and if someone else did come up with it, then I discovered it independently. It was something I came up with when playing around with trying to make deterministic, rewindable game logic. I played around with this a lot in highschool (not that many years ago, about 15 now.)
I fully admit this is not as useful for the human race as "hitting animals with a rock", but I reckon it's the type of thing that LLMs basically only come up with if they've already been exposed to the idea. If I try to instruct LLMs to implement a system that has what I think is a novel idea, it really seems to rapidly fall apart. If it doesn't fall apart, then I honestly begin to suspect that maybe the idea is less novel than I thought... but it's a whole hell of a lot more common, so far, for it to just completely fall apart.
Still, my point was never that AI is useless, a lot of things humans do aren't very novel after all. However, I also think it is definitely not time to allow one's critical thinking skills to atrophy as today's models definitely have some very bad failure modes and some of the ways they fail are ways that we can't afford in many circumstances. Today the biggest challenge IMO is that despite all of the data the ability to generalize really feels lacking. If that problem gets conquered, I'm sure more problems will rise to the top. Unilaterally superhuman AI has a long way to go.
jjaksic•5mo ago
I guess disagreement about this question often stems from what we mean by "human", even more than what we mean by "intelligence".
There are at least 3 distinct categories of human intelligence/capability in any given domain:
1) average human (non-expert) - LLMs are already better (mainly because the average human doesn't know anything, but LLMs at least have some basic knowledge),
2) domain expert humans - LLMs are far behind, but can sometimes supplement human experts with additional breadth,
3) collective intelligence of all humans combined - LLMs are like retarded cavemen in comparison.
So when answering if AI has human-level intelligence, it really makes sense to ask what "human-level" means.
usefulcat•5mo ago
> "It’s nearly impossible to grasp how much context this gives them to play with"
Here, I think the author means something more like "all the material used to train the LLM".
> "A true summary, the kind a human makes, requires outside context and reference points."
In this case I think that "context" means something more like actual comprehension.
The author's point is that an LLM could only write something like the referenced summary by shortening other summaries present in its training set.
jjaksic•5mo ago
cainxinth•5mo ago