Self-supervised learning, JEPA, world models, and the future of AI [video]

https://www.youtube.com/watch?v=yUmDRxV0krg

46•twoodfin•4mo ago

Comments

intalentive•4mo ago

Generative world models seem to be doing ok. Dreamer V4 looks promising. I’m not 100% sold on the necessity of EBMs.

Also I’m skeptical that self-supervised learning is sufficient for human level learning. Some of our ability is innate. I don’t believe it’s possible for statistical methods to learn language from raw audiovisual data the way children can.

suddenlybananas•4mo ago

I don't know why people really dislike the idea of innate knowledge so much, it's obvious other animals have tons of it, why would we be any different.

krallistic•4mo ago

Various reasons

Some people just believe there is no innate knowledge or we dont need it if we just scale/learn better (in the direction of Bitter Lesson)

(ML) Academia is also heavily biased against it due to mainly two reasons: - Its harder to publish, since if you learn Task X with innate Knowledge, its not as general, so reviewer can claims its just (feature) engineering - Which hurts acceptance, so people always try to frame their work as general as possible - Historical reasons due to the conflict the symbolic community (which rely heavily on innate knowledge)

yorwba•4mo ago

The problem with assuming tons of innate knowledge is that it needs to be stored somewhere. DNA certainly contains enough information to determine the development of various different neuron types and which kinds of other neurons they connect to, but it certainly cannot specify weights for every individual synapse, except for animals with very low neuron counts.

So the existence of a sensorimotor feedback loop for a basic behavior is innate (e.g. moving forward to seek food), but the fine-tuning for reliabily executing this behavior while adapting to changing conditions (e.g. moving over difficult terrain with an injured limb after spotting a tasty plant) needs to be learned through interacting with the environment. (Stumbling around eating random stuff to find out what is edible.)

suddenlybananas•4mo ago

>certainly cannot specify weights for every individual synapse

That's not the only way to one could encode innate knowledge. Besides, we have demonstrated that animals have innate knowledge experimentally many times, the only reason we can't do this to humans is that it would be horrifically unethical.

>Stumbling around eating random stuff to find out what is edible

Plenty of animals have innate knowledge about what is and isn't edible: it's why, for example, tasty things generally speaking smell good and why things that are bad (rotting meat) smell horrific.

yorwba•4mo ago

I'm not saying that there's no innate knowledge. This entire list of reflexes https://en.wikipedia.org/wiki/List_of_reflexes is essentially a list of innate knowledge in humans, many of which have been demonstrated in newborns, apparently without considering such experiments unethical.

I'm saying that there are limits to how much knowledge can be inherited. I.e. the question isn't "Where could innate knowledge be encoded other than in synapses?" but "Considering the extremely large number of synapses involved in complex behavior far exceeds genetic storage capacity, how are their weights determined?" And since we know that in addition to having innate behaviors, animals are also capable of learning (e.g. responding to artificial stimuli not found in nature), it stands to reason that most synapse weights must be set by a dynamic learning process.

suddenlybananas•4mo ago

Yeah but the point was that people are uncomfortable with positing any innate knowledge at all.

bemmu•4mo ago

> That's not the only way to one could encode innate knowledge.

Maybe sections could be read from DNA and broadcast as action potentials?

There's already ribosomes that go over RNA. You'd need a variant which instead of making amino acids, would read out the base pairs and make something that causes action potentials to happen based on the contents.

geremiiah•4mo ago

But generative models are always going to seem like they are doing ok. That's how they work. They are good at imitating and producing misleading demos.

ACCount37•4mo ago

Human DNA has under 1GB of information content in it. Most of which isn't even used in the brain. And the brain doesn't have a mechanism to read data out from the DNA efficiently.

This puts a severe limit on how much "innate knowledge" a human can possibly have.

Sure, human brain has a strong inductive bias. It also has a developmental plan, and it follows that plan. It guides its own learning, and ends up being better at self-supervised learning than even the very best of our AIs. But that guidance, that sequencing and that bias must all be created by the rules encoded in the DNA, and there's only this much data in the DNA.

It's quite possible that the human brain has a bunch of simple and clever learning tricks that, if we pried out and applied to our AIs, would give us x100 the learning rate and x1000 the sample efficiency. Or it could be that a single neuron in the human brain is worth 10000 neurons in an artificial neural network, and thus, the biggest part of the "secret" of human brain is just that it's hilariously overparameterized.

intalentive•4mo ago

The complexity of the human body surely weighs in at over 1 GB.

I think of DNA analogously to the rules of cellular automata. The entropy of the rules is much less than the entropy of the dynamical system the rules describe.

The body is filled with innate knowledge. The organs all know what to do. The immune system learns to detect intruders (without synapses). Even a single cell organism is capable of complex and fluid goal-oriented behavior, as Michael Levine attests.

I think the assumption that all knowledge exists in the brain, and all knowledge in the brain is encoded by neuronal weights, is probably too simplistic.

Regarding language and vision, I think the cognitive scientists are right: it is better to view these as organs or “modules” suited to a function. Damage Broca’s area and you get Broca’s aphasia. Damage your lung and you get trouble breathing. Neither of these looks like the result of statistical learning from randomly initialized parameters.

ACCount37•4mo ago

Damage Broca’s area early in brain development and... nothing happens?

Human brain has specialized regions, but there's still a lot of flexibility in it. It isn't a hard fixed function system at all. A lung can't just start pumping blood to compensate for a heart issue, but similar things happen to brain regions. The regions can end up repurposed, and an impressive amount of damage can be routed around.

A lot of the "brain damage" studies seem to point at a process not too dissimilar to ablation in artificial neural networks. You can null out some of the weights in a pretrained neural network, and that can fuck it up. But if you start fine-tuning the network afterwards, or train from scratch, with those weights still pinned to zero? The resulting performance can end up quite similar to a control case.

A major difference is that human brain doesn't separate training from inference. Both are always happening - but the proportion varies. It may be nigh-impossible to fully "undo" some types of damage if it happens after a certain associated development window has closed, but easy enough if the damage happens beforehand.

littlestymaar•4mo ago

> Damage Broca’s area early in brain development and... nothing happens?

Citation needed.

Cerebral plasticity is a thing, but its not magic either.

ACCount37•4mo ago

It's not magic, but it's just magic enough to conclusively disprove "brain regions are fixed function" - if information-theoretic reasons somehow weren't enough for you.

Way too much weird re-routing and re-purposing can happen in the brain for that to be the case.

Human brain implements a learning algorithm of some kind - neuroscientists disagree on the specifics, but not on the core notion. It doesn't ship with all the knowledge it needs, or anywhere close. To work well, it has to learn, and it has to learn by getting information from the environment.

littlestymaar•4mo ago

> It's not magic, but it's just magic enough to conclusively disprove "brain regions are fixed function"

You cannot confidently disprove anything unless you can back your statement.

> information-theoretic reasons somehow weren't enough for you.

Your “Information-theoric reasoning” is completely pointless though.

> Human brain implements a learning algorithm of some kind - neuroscientists disagree on the specifics, but not on the core notion. It doesn't ship with all the knowledge it needs, or anywhere close. To work well, it has to learn, and it has to learn by getting information from the environment

Nobody said otherwise. But that doesn't mean everything is being learned either. There are many things a human is born with that it doesn't have to learn. (It's pretty obvious when you have kids: as primates humans are naturally attracted to climbing trees, and they will naturally collect stones and sticks, which is what primitive tools are made of).

ACCount37•4mo ago

And all of that "innate knowledge" still fits into under 1 gigabyte of compressed DNA.

1 gigabyte. That's the absolute limit of how much "innate knowledge" a human brain can have in it! Every single instinct, every learning algorithm, every innate behavior and every little cue a brain uses to build itself has to fit into a set of data just 1 gigabyte in size.

Clearly, nature must have found some impressively large levers - to be able to build and initialize brain with 90 billion connected neurons in it off something this small.

littlestymaar•4mo ago

> all of that "innate knowledge" still fits into under 1 gigabyte of compressed DNA.

Yes, the same way Turing completeness fits in 8bits, which is both perfectly true (see rule 110) and perfectly useless to derive any conclusion about the limitation of innate knowledge.

Similarly, just because you can encode the number Pi in just two bytes (the ASCII for both “p” and “i” letters) it doesn't mean the number contains only two bytes of entropy.

ACCount37•4mo ago

Your comment is completely nonsensical. Are you disagreeing just to disagree?

littlestymaar•4mo ago

Applying information theory out of its domain is nonsensical, yes. That's the point.

And for that reason, your argument about 1GB of data makes absolutely no sense at all.

ACCount37•4mo ago

Bullshit. We're talking about information, and were always talking about information.

littlestymaar•4mo ago

The problem is that you claim that you can quantify it based on bad use of irrelevant tools.

littlestymaar•4mo ago

> Or it could be that a single neuron in the human brain is worth 10000 neurons in an artificial neural network, and thus, the biggest part of the "secret" of human brain is just that it's hilariously overparameterized.

Being overparameterized alone doesn't explain how fast we learn things compared to deep neural nets though. Quite the opposite actually.

nialse•4mo ago

Only 1 GB code maybe, but dependent on the universe as the runtime environment.

bjornsing•4mo ago

You’d have to explain where that innate knowledge is stored though. The entire human genome is less than a GB if I remember correctly. Some of that being allocated to ”priors” for neural circuit development seems reasonable, but it can’t be very detailed across everything a brain does. The rest of the body needs some bytes too.

arcwhite•4mo ago

Not really - that 1GB is the seed for a procedural generation mechanism that has been finely tuned to its unfolding in an environment over 4 billion years.

DNA is the ultimate demoscene exe

bjornsing•4mo ago

Sure. But that’s just compression, right? I guess you could argue that some information is stored outside the genome, in the structure of proteins etc. But the counter argument is that that information is quickly lost in cell divisions. Only DNA has the error correcting mechanisms needed to reliably store information, is my impression.

SilverElfin•4mo ago

This seems like the same exact talk LeCun has been giving for years, basically pushing JEPA, world models, and attacking contemporary LLMs. Maybe he’s right but it also seems like he’s wrong in terms of timing or impact. LLMs have been going strong for longer than he expected, and providing more value than expected.

Philpax•4mo ago

This is also my read; JEPA is a genuinely interesting concept, but he's been hawking it for several years, and nothing has come of it in the domains in which LLMs are successful. Hoping that changes at some point!

charcircuit•4mo ago

>LLMs have been going strong for longer than he expected

Have they? They still seem to be a dead end toward AGI.

viking123•4mo ago

2 more years bro.

Ratelman•4mo ago

Yeah, he was quite vocal in his opinion that they would plateau earlier than they did and that little value would be derived from them because they're just stochastic parrots. Agree with him that they're probably not sufficient for AGI, but, at least in my experience, they're adding a lot of value and they're continuously performing better in a range of tasks that he wasn't expecting them to.

ml-anon•4mo ago

Give it up Yann…LLMs won.

wiz21c•4mo ago

They won for now...

tleyden5iwx•4mo ago

Agree with LeCun that current ai doesn’t exhibit anything close to actual intelligence.

I think the solution lies into cracking the core algorithms used by nature to build the brain. Too bad it’s such an inscrutable hairball of analog spaghetti code.

rhetocj23•4mo ago

The mistake you and many others are making is assuming that it is algorithmic.

Humans are not intrinsically machines. Through the education system and so on, humans are taught to somewhat behave as such.

lordofgibbons•4mo ago

LeCun has been giving the same talk with literally the exact same slides for the past 3 years. JEPA still hasn't delivered despite FAIR's substantial backing.

margorczynski•4mo ago

LeCun seems like an extremely smart person that suffers from an overgrown ego. I got that strong impression from seeing his Twitter feed - basically "smarter than thou".

dist-epoch•4mo ago

To quote Zuck:

> Some people were technical, but they didn't do technical work for many months, or longer, and now are no longer technical, they fell behind, but still think they are.

gwd•4mo ago

Where is this from?

dist-epoch•4mo ago

https://youtu.be/WuTJkFvw70o?t=2340

librasteve•4mo ago

I think that LeCun has correctly identified that LLM is only one type of intelligence and that AGI/AMI needs to combine multiple other types … hierarchical goal setting, attention/focus management, and so on.

Seems that he is able to garner support for his ideas and to make progress at the leading edge - yes a little bit hard to take the “I know better” style, but then many innovations are driven by narcissism.

ACCount37•4mo ago

There is a lot of "transformer LLMs are flawed" going around, and a lot of alternative architectures being proposed, or even trained and demonstrated. But so far? There's nothing that would actually outperform transformer LLMs at their strengths. Most alternatives are sidegrades at best.

For how "naive" transformer LLMs seem, they sure set a high bar.

Saying "I know better" is quite easy. Backing that up is really hard.

librasteve•4mo ago

excellent point

littlestymaar•4mo ago

> There is a lot of "transformer LLMs are flawed" going around, and a lot of alternative architectures being proposed, or even trained and demonstrated. But so far? There's nothing that would actually outperform transformer LLMs at their strengths. Most alternatives are sidegrades at best.

That's kind of an awkward timing to say that, as alternative to transformers have flourished over the past few weeks (Qwen3-Next, Granite 4).

But IIRC Le Cun's criticism applies to more than just transformers and to next-token predictors as a whole.

ACCount37•4mo ago

Both are still transformer LLMs at their core, and perform as such. They don't show a massive jump in capabilities over your average transformer.

Improvements in long context efficiency sure are nice, and I do think that trying to combine transformers with architectures that aren't cursed with O(n^2) on sequence length is a promising approach. But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.

littlestymaar•4mo ago

> They don't show a massive jump in capabilities over your average transformer

Long context is a massive capability improvement.

> But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.

Transformers themselves were an incremental improvement over RNN with attention, and in terms of capabilities they weren't immediately superior to their predecessor.

What changed the game was that they were vastly cheaper to train which allowed to train massive models on phenomenal amounts of data.

Linear attention models being much more compute-efficient than transformers on longer context may result in a similar breakthrough.

It's very hard to tell in advance what will be a marginal improvement and what will be a game changer.

I replaced the front page with AI slop and honestly it's an improvement

Economists vs. Technologists on AI

Life at the Edge

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

I replaced the front page with AI slop and honestly it's an improvement

Economists vs. Technologists on AI

Life at the Edge

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

Self-supervised learning, JEPA, world models, and the future of AI [video]

Comments