good (if superficial) post in general, but on this point specifically, emphatically: no, they do not -- no shade, nobody does, at least not in any meaningful sense
There is a lot left to learn about the behaviour of LLMs, higher-level conceptual models to be formed to help us predict specific outcomes and design improved systems, but this meme that "nobody knows how LLMs work" is out of control.
This is likely (certainly?) impossible. So not a useful definition.
Meanwhile, I have observed a very clear binary among people I know who use LLMs; those who treat it like a magic AI oracle, vs those who understand the autoregressive model, the need for context engineering, the fact that outputs are somewhat random (hallucinations exist), setting the temperature correctly...
"we" are not, what i quoted and replied-to did! i'm not inventing strawmen to yell at, i'm responding to claims by others!
This is really cool, I was wondering how memory had been implemented in ChatGPT. Very interesting to see the completely different approaches. It seems to me like Claude's is better suited for solving technical tasks while ChatGPT's is more suited to improving casual conversation (and, as pointed out, future ads integration).
I think it probably won't be too long before these language-based memories look antiquated. Someone is going to figure out how to store and retrieve memories in an encoded form that skips the language representation. It may actually be the final breakthrough we need for AGI.
I disagree. As I understand them, LLMs right now don’t understand concepts. They actually don’t understand, period. They’re basically Markov chains on steroids. There is no intelligence in this, and in my opinion actual intelligence is a prerequisite for AGI.
- a map of the world, or concept space, or a codebase, etc
- causality
- "factoring" which breaks down systems or interactions into predictable parts
Language alone is too blurry to do any of these precisely.
It is not "language alone" anymore. LLMs are multimodal nowadays, and it's still just the beginning.
And keep in mind that these results are produced by a cheap, small and fast model.
And how's that not like stored information (memories) and weighted links between them and groups of them?
Markov chains can’t deduce anything logically. I can.
Do you? Or do you just have memory and are run in a short loop?
In my uninformed opinion it feels like there's probably some meaningful learned representation of at least common or basic concepts. It just seems like the easiest way for LLMs to perform as well as they do.
My interpretation of what you're saying is that since the next token is simply a function of the proceeding tokens, i.e. a Markov chain on steroids, then it can't come up with something novel. It's just regurgitating existing structures.
But let's take this to the extreme. Are you saying that systems that act in this kind of deterministic fashion can't be intelligent? Like if the next state of my system is simply some function of the current state, then there's no magic there, just unrolling into the future. That function may be complex but ultimately that's all it is, a "stochastic parrot"?
If so, I kind of feel like you're throwing the baby out with the bathwater. The laws of physics are deterministic (I don't want to get into a conversation about QM here, there are senses in which that's deterministic too and regardless I would hope that you wouldn't need to invoke QM to get to intelligence), but we know that there are physical systems that are intelligent.
If anything, I would say that the issue isn't that these are Markov chains on steroids, but rather that they might be Markov chains that haven't taken enough steroids. In other words, it comes down to how complex the next token generation function is. If it's too simple, then you don't have intelligence but if it's sufficiently complex then you basically get a human brain.
Does the mechanism really disqualify it from intelligence if behaviorally, you cannot distinguish it from “real” intelligence?
I’m not saying that LLMs have certainly surpassed the “cannot distinguish from real intelligence” threshold, but saying there’s not even a little bit of intelligence in a system that can solve more complex math problems than I can seems like a stretch.
How do you define "understanding a concept" - what do you get if a system can "understand" concept vs not "understanding" a concept?
The idea that "understanding" may be able to be modeled with general purpose transformers and the connections between words doesn't sound absolutely insane to me.
But I have no clue. I'm a passenger on this ride.
This argument is circular.
A better argument should address (given the LLM successes in many types of reasoning, passing the turing test, and thus at producing results that previously required intelligence) why human intelligence might not also just be "Markov chains on even better steroids".
https://ai.meta.com/research/publications/large-concept-mode...
Edit: They apparently just announced this as well: https://www.anthropic.com/news/memory
It will be very interesting to see which approach is deemed to "win out" in the future
There is a clear directionality for ChatGPT. At some point they will monetize by ads and affiliate links. Their memory implementation is aimed at creating a user profile.
Claude's memory implementation feels more oriented towards the long term goal of accessing abstractions and past interactions. It's very close to how humans access memories, albeit with a search feature. (they have not implemented it yet afaik), there is a clear path where they leverage their current implementation w RL posttraining such that claude "remembers" the mistakes you pointed out last time. It can in future iterations derive abstractions from a given conversation (eg: "user asked me to make xyz changes on this task last time, maybe the agent can proactively do it or this was the process last time the agent did it").
At the most basic level, ChatGPT wants to remember you as a person, while Claude cares about how your previous interactions were.
I am coming from a data privacy perspective; while I know the LLM is getting it anyway, during inference, I’d prefer to not just spell it out for them. “Interests: MacOS, bondage, discipline, Baseball”
richwater•1h ago
WJW•1h ago
aleph_minus_one•47m ago
Rather: use your time to learn serious, deep knowledge instead of wasting your time reading (and particularly: spreading) the science-fiction stories the AI bros tell all the time. These AI bros are insanely biased since they will likely loose a lot of money if these stories turn out to be false, or likely even if people stop believing in these science-fiction fairy tales.
visarga•1h ago
Running LLMs is expensive and we can swap models easily. The fight for attention is on, it acts like an evolutionary pressure on LLMs. We already had the sycophantic trend as a result of it.