Have they? They still seem to be a dead end toward AGI.
I think the solution lies into cracking the core algorithms used by nature to build the brain. Too bad it’s such an inscrutable hairball of analog spaghetti code.
Humans are not intrinsically machines. Through the education system and so on, humans are taught to somewhat behave as such.
> Some people were technical, but they didn't do technical work for many months, or longer, and now are no longer technical, they fell behind, but still think they are.
Seems that he is able to garner support for his ideas and to make progress at the leading edge - yes a little bit hard to take the “I know better” style, but then many innovations are driven by narcissism.
For how "naive" transformer LLMs seem, they sure set a high bar.
Saying "I know better" is quite easy. Backing that up is really hard.
That's kind of an awkward timing to say that, as alternative to transformers have flourished over the past few weeks (Qwen3-Next, Granite 4).
But IIRC Le Cun's criticism applies to more than just transformers and to next-token predictors as a whole.
Improvements in long context efficiency sure are nice, and I do think that trying to combine transformers with architectures that aren't cursed with O(n^2) on sequence length is a promising approach. But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.
Long context is a massive capability improvement.
> But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.
Transformers themselves were an incremental improvement over RNN with attention, and in terms of capabilities they weren't immediately superior to their predecessor.
What changed the game was that they were vastly cheaper to train which allowed to train massive models on phenomenal amounts of data.
Linear attention models being much more compute-efficient than transformers on longer context may result in a similar breakthrough.
It's very hard to tell in advance what will be a marginal improvement and what will be a game changer.
intalentive•4mo ago
Also I’m skeptical that self-supervised learning is sufficient for human level learning. Some of our ability is innate. I don’t believe it’s possible for statistical methods to learn language from raw audiovisual data the way children can.
suddenlybananas•4mo ago
krallistic•4mo ago
Some people just believe there is no innate knowledge or we dont need it if we just scale/learn better (in the direction of Bitter Lesson)
(ML) Academia is also heavily biased against it due to mainly two reasons: - Its harder to publish, since if you learn Task X with innate Knowledge, its not as general, so reviewer can claims its just (feature) engineering - Which hurts acceptance, so people always try to frame their work as general as possible - Historical reasons due to the conflict the symbolic community (which rely heavily on innate knowledge)
yorwba•4mo ago
So the existence of a sensorimotor feedback loop for a basic behavior is innate (e.g. moving forward to seek food), but the fine-tuning for reliabily executing this behavior while adapting to changing conditions (e.g. moving over difficult terrain with an injured limb after spotting a tasty plant) needs to be learned through interacting with the environment. (Stumbling around eating random stuff to find out what is edible.)
suddenlybananas•4mo ago
That's not the only way to one could encode innate knowledge. Besides, we have demonstrated that animals have innate knowledge experimentally many times, the only reason we can't do this to humans is that it would be horrifically unethical.
>Stumbling around eating random stuff to find out what is edible
Plenty of animals have innate knowledge about what is and isn't edible: it's why, for example, tasty things generally speaking smell good and why things that are bad (rotting meat) smell horrific.
yorwba•4mo ago
I'm saying that there are limits to how much knowledge can be inherited. I.e. the question isn't "Where could innate knowledge be encoded other than in synapses?" but "Considering the extremely large number of synapses involved in complex behavior far exceeds genetic storage capacity, how are their weights determined?" And since we know that in addition to having innate behaviors, animals are also capable of learning (e.g. responding to artificial stimuli not found in nature), it stands to reason that most synapse weights must be set by a dynamic learning process.
suddenlybananas•4mo ago
bemmu•4mo ago
Maybe sections could be read from DNA and broadcast as action potentials?
There's already ribosomes that go over RNA. You'd need a variant which instead of making amino acids, would read out the base pairs and make something that causes action potentials to happen based on the contents.
geremiiah•4mo ago
ACCount37•4mo ago
This puts a severe limit on how much "innate knowledge" a human can possibly have.
Sure, human brain has a strong inductive bias. It also has a developmental plan, and it follows that plan. It guides its own learning, and ends up being better at self-supervised learning than even the very best of our AIs. But that guidance, that sequencing and that bias must all be created by the rules encoded in the DNA, and there's only this much data in the DNA.
It's quite possible that the human brain has a bunch of simple and clever learning tricks that, if we pried out and applied to our AIs, would give us x100 the learning rate and x1000 the sample efficiency. Or it could be that a single neuron in the human brain is worth 10000 neurons in an artificial neural network, and thus, the biggest part of the "secret" of human brain is just that it's hilariously overparameterized.
intalentive•4mo ago
I think of DNA analogously to the rules of cellular automata. The entropy of the rules is much less than the entropy of the dynamical system the rules describe.
The body is filled with innate knowledge. The organs all know what to do. The immune system learns to detect intruders (without synapses). Even a single cell organism is capable of complex and fluid goal-oriented behavior, as Michael Levine attests.
I think the assumption that all knowledge exists in the brain, and all knowledge in the brain is encoded by neuronal weights, is probably too simplistic.
Regarding language and vision, I think the cognitive scientists are right: it is better to view these as organs or “modules” suited to a function. Damage Broca’s area and you get Broca’s aphasia. Damage your lung and you get trouble breathing. Neither of these looks like the result of statistical learning from randomly initialized parameters.
ACCount37•4mo ago
Human brain has specialized regions, but there's still a lot of flexibility in it. It isn't a hard fixed function system at all. A lung can't just start pumping blood to compensate for a heart issue, but similar things happen to brain regions. The regions can end up repurposed, and an impressive amount of damage can be routed around.
A lot of the "brain damage" studies seem to point at a process not too dissimilar to ablation in artificial neural networks. You can null out some of the weights in a pretrained neural network, and that can fuck it up. But if you start fine-tuning the network afterwards, or train from scratch, with those weights still pinned to zero? The resulting performance can end up quite similar to a control case.
A major difference is that human brain doesn't separate training from inference. Both are always happening - but the proportion varies. It may be nigh-impossible to fully "undo" some types of damage if it happens after a certain associated development window has closed, but easy enough if the damage happens beforehand.
littlestymaar•4mo ago
Citation needed.
Cerebral plasticity is a thing, but its not magic either.
ACCount37•4mo ago
Way too much weird re-routing and re-purposing can happen in the brain for that to be the case.
Human brain implements a learning algorithm of some kind - neuroscientists disagree on the specifics, but not on the core notion. It doesn't ship with all the knowledge it needs, or anywhere close. To work well, it has to learn, and it has to learn by getting information from the environment.
littlestymaar•4mo ago
You cannot confidently disprove anything unless you can back your statement.
> information-theoretic reasons somehow weren't enough for you.
Your “Information-theoric reasoning” is completely pointless though.
> Human brain implements a learning algorithm of some kind - neuroscientists disagree on the specifics, but not on the core notion. It doesn't ship with all the knowledge it needs, or anywhere close. To work well, it has to learn, and it has to learn by getting information from the environment
Nobody said otherwise. But that doesn't mean everything is being learned either. There are many things a human is born with that it doesn't have to learn. (It's pretty obvious when you have kids: as primates humans are naturally attracted to climbing trees, and they will naturally collect stones and sticks, which is what primitive tools are made of).
ACCount37•4mo ago
1 gigabyte. That's the absolute limit of how much "innate knowledge" a human brain can have in it! Every single instinct, every learning algorithm, every innate behavior and every little cue a brain uses to build itself has to fit into a set of data just 1 gigabyte in size.
Clearly, nature must have found some impressively large levers - to be able to build and initialize brain with 90 billion connected neurons in it off something this small.
littlestymaar•4mo ago
Yes, the same way Turing completeness fits in 8bits, which is both perfectly true (see rule 110) and perfectly useless to derive any conclusion about the limitation of innate knowledge.
Similarly, just because you can encode the number Pi in just two bytes (the ASCII for both “p” and “i” letters) it doesn't mean the number contains only two bytes of entropy.
ACCount37•4mo ago
littlestymaar•4mo ago
And for that reason, your argument about 1GB of data makes absolutely no sense at all.
ACCount37•4mo ago
littlestymaar•4mo ago
littlestymaar•4mo ago
Being overparameterized alone doesn't explain how fast we learn things compared to deep neural nets though. Quite the opposite actually.
nialse•4mo ago
bjornsing•4mo ago
arcwhite•4mo ago
DNA is the ultimate demoscene exe
bjornsing•4mo ago