I suspect the calculus is more favorable for robotics
- Reinforcement Learning (2026)
- General Intelligence (2027)
- Continual Learning (2028)
EDIT: lol, funny how the idiots downvote
1. Robust to adversarial attacks (e.g. in classification models or LLM steering).
2. Solving ARC-AGI.
Current models are optimized to solve the current problem they're presented, not really find the most general problem-solving techniques.
Edit: I'm trying arc-agi tests now and it's looking bad for me: https://arcprize.org/play?task=e3721c99
LeCun: Energy Based Self-Supervised Learning
Chollet: Program Synthesis
Fei-Fei: ???
Are there any others with hot takes on the future architectures and techniques needed for of A-not-quite-G-I?
Underrated and unsung. Fei Fei Li first launched ImageNet way back in 2007, a hugely influential move sparking much of the computer vision deep learning that followed since. I remember in a lecture about 7 years ago jph00 saying "text is just waiting for its imagenet moment" -> then came the gpt explosion. Fei Fei was massively instrumental in where we are today.
Their success is due to datasets and the tooling that allowed models to be trained on large amounts of data, sufficiently fast using GPU clusters.
> I spent years building ImageNet, the first large-scale visual learning and benchmarking dataset and one of three key elements enabling the birth of modern AI, along with neural network algorithms and modern compute like graphics processing units (GPUs).
Datasets + NNs + GPUs. Three "vastly different" advances that came together. ImageNet was THE dataset.
There are actually a lot of people trying to figure out spatial intelligence, but those groups are usually in neuroscience or computational neuroscience. Here is a summary paper I wrote discussing how the entorhinal cortex, grid cells, and coordinate transformation may be the key: https://arxiv.org/abs/2210.12068 All animals are able to transform coordinates in real time to navigate their world and humans have the most coordinate representations of any known living animal. I believe human level intelligence is knowing when and how to transform these coordinate systems to extract useful information. I wrote this before the huge LLM explosion and I still personally believe it is the path forward.
Yes, you and the Mosers who won the Nobel Prize all believe that grid cells are the key to animals understanding their position in the world.
https://www.nobelprize.org/prizes/medicine/2014/press-releas...
There's a whole giant gap between grid cells and intelligence.
I believe the zero hypothesis would be that a model natively understanding both would work best/come closest to human intelligence (and possibly other different modalities are also needed).
Also, as a complete laymen, our language having several interconnections with spatial concepts would also point towards a multi-modal intelligence. (topic: place, subject: lying under or near, respect/prospect: look back/ahead, etc). In my understanding these connections only secondarily make their way into LLM's representations.
While virtual world systems and physical world systems look similar based on description, a bit like chemistry and chemical engineering, they are largely unrelated problems with limited theory overlap. A virtual world model is essentially a special trivial case that becomes tractable because it defines away most of the hard computer science problems in physical world models.
A good argument could be made that spatial intelligence is a critical frontier for AI, many open problems are reducible to this. I don't see any evidence that this company is positioned to make material progress on it.
Key distinction: Constant and continuous updating. I.e. feedback loops with observation, prediction, action (agency), and once more, observation.
It should have survival and preservation as a fundamental architectural feature.
nothrowaways•1h ago
ares623•1h ago