We made a sandwich but it cost you 10x more than it would a human and slower might slowly become faster and more efficient but by the time you get really good at it, its simply not transferable unless the model is genuinely able to make the leap across into other domains that humans naturally do.
I'm afraid this is where the barrier of general intelligence and human intelligence lies and with enough of these geospatial motor skill database, we might get something that mimics humans very well but still run into problems at the edge, and this last mile problem really is a hinderance to so many domains where we come close but never complete.
I wonder if this will change with some sort of computing changes as well as how we interface with digital systems (without mouse or keyboard), then this might be able to close that 'last mile gap'.
if you havent - highly recommended.
> the core insight: predict in representation space, not pixels
We've been doing this since 2014? Not only that, others have been doing it at a similar scale. e.g. Nvidia's world foundation models (although those are generative).
> zero-shot generalization (aka the money shot)
This is easily beaten by flow-matching imitation learning models like what Pi has.
> accidentally solved robotics
They're doing 65% success on very simple tasks.
The research is good. This article however misses a lot of other work in the literature. I would recommend you don't read it as an authoritative source.
My first thought upon reading this was that an LLM had been instructed to add a pithy meme joke to each paragraph. They don't make sense in context, and while some terminally online people do speak in memes, those people aren't quoting doge in 2025.
There's also a sense of incoherence in the whole piece. For instance, this section:
"- after: 22 million videos + 1 million images (now we're talking)
they basically hoovered up everything: something-something v2, kinetics, howto100m, and a billion youtube videos"
Was it a billion vids or 22m? It turns out the latter sentence is just rephrasing the list of sources in a cool casual way, and the last one is called YT-Temporal-1B. That's a billion frames of video, not a billion videos.
okdood64•5h ago
dangoodmanUT•5h ago
bobmcnamara•5h ago
mouse_•5h ago
Who cares at this point? No one is stopping ML sets from being primarily pirated. The current power is effectively dismantling copyright for AI related work.
perching_aix•5h ago
Out of the loop apparently, could you elaborate? By "the current power" I take you mean the current US administration?
bgwalter•5h ago
https://www.heise.de/en/news/After-criticism-of-AI-training-...
The "Big Beautiful Bill" contains a clause that prohibits state "AI" legislation.
Trump has a "Crypto and AI czar" who is very active in promoting "AI" on his YouTube propaganda outlet. The same czar also promoted, pre-election of course, accelerated peace with Russia and then stopped talking about the subject altogether.
perching_aix•4h ago
snickerdoodle12•5h ago
Anyone who has a shred of integrity. I'm not a fan of overreaching copyright laws, but they've been strictly enforced for years now. Decades, even. They've ruined many lives, like how they killed Aaron Swartz.
But now, suddenly, violating copyright is totally okay and carries no consequences whatsoever because the billionaires decided that's how they can get richer now?
If you want to even try to pretend you don't live in a plutocracy and that the rule of law matters at all these developments should concern you.
MaxPock•5h ago
klysm•5h ago
perching_aix•5h ago
And since scraping of publicly available data is not illegal (in the US, according to the aforementioned "lawyer"), it seems like it's okay?
Not legal advice.
[0] https://www.skadden.com/insights/publications/2024/05/distri...