This guy a month ago for example: https://youtu.be/SGJC4Hnz3m0
The game is called "Explorers' Guild", or "xg" for short. It's easier for Claude to act as a player than a director (xg's version of a dungeon master or game master), again mainly because of permance and learning issues, but to the extent that I can help it past those issues it's also fairly good at acting as a director. It does require some pretty specific stuff in the system prompt to, for example, avoid confabulating stuff that doesn't fit the world or the scenario.
But to really build a version of xg on Claude it needs better ways to remember and improve what it has learned about playing the game, and what it has learned about a specific group of players in a specific scenario as it develops over time.
I don't have access to the DeepMind demo, but from the video it looks like it takes the idea up a notch.
(I don't know the exact lineage of these ideas, but a general observation is that it's a shame that it's the norm for blog posts / indie demos to not get cited.)
[1] https://news.ycombinator.com/item?id=43798757
[2] https://madebyoll.in/posts/world_emulation_via_dnn/demo/
- That forest trail world is ~5 million parameters, trained on 15 minutes of video, scoped to run on a five-year-old iPhone through a twenty-year old API (WebGL GPGPU, i.e OpenGL fragment shaders). It's the smallest '3D' world model I'm aware of.
- Genie 3 is (most likely) ~100 billion parameters trained on millions of hours of video and running across multiple TPUs. I would be shocked if it's not the largest-scale world model available to the public.
There are lots of neat intermediate-scale world models being developed as well (e.g. LingBot-World https://github.com/robbyant/lingbot-world, Waypoint 1 https://huggingface.co/blog/waypoint-1) so I expect we'll be able to play something of Genie quality locally on gaming GPUs within a year or two.
Regarding the specific boiling-textures effect: there's a tradeoff in recurrent world models between jittering (constantly regenerating fine details to avoid accumulating error) and drifting (propagating fine details as-is, even when that leads to accumulating error and a simplified/oversaturated/implausible result). The forest trail world is tuned way towards jittering (you can pause with `p` and step frame-by-frame with `.` to see this). So if the effect resembles LSD, it's possible that LSD applies some similar random jitter/perturbation to the neurons within your visual cortex.
If making games out of these simulations work, it't be the end for a lot of big studios, and might be the renaissance for small to one person game studios.
There's obviously something insanely impressive about these google experiments, and it certainly feels like there's some kind of use case for them somewhere, but I'm not sure exactly where they fit in.
If I am wrong, then the huge supply of fun games will completely saturate demand and be no easier for indie game devs to stand out.
You COULD create a sailing sim but after ten minutes you might be walking on water, or in the bath, and it would use more power than a small ferry.
There's no way this tech can run on a PS5 or anything close to it.
I mean, if making a game eventually boils down to cooking a sufficient prompt (which to be clear, I'm not talking about text, these prompts are probably going to be more like video databases) then I'm not sure if it will be a renaissance for "one person game studios" any more than AI image generation has been a renaissance for "one person artists".
I want to be optimistic but it's hard to deny the massive distribution stranglehold that media publishing landscape has, and that has nothing to do with technology.
Indie games are already bigger than ever as far as I know.
The goal of world models like Genie is to be a way for AI and robots to "imagine" things. Then, they could practice tasks inside of the simulated world or reason about actions by simulating their outcome.
I think he's lucky he got out with his reputation relatively intact.
When the right move (strategically, economically) is to not compete, the head of the AI division acknowledging the above and deciding to focus on the next breakthrough seems absolutely reasonable.
Non-developers I know use them to organise meetings, write emails, research companies, write down and summarise counselling sessions (not the clients, the counselor), write press reports, help with advertising campaigns management, review complex commercial insurance policies, fix translations... The list of uses is endless, really. And I'm only talking of work-related usage, personal usage goes of course well beyond this.
I'm factual. You are the one with the extraordinary claim that LLMs will find new substantial markets/go through transformative breakthrough.
> Everywhere I look, everyone I talk to, is using LLMs
And everywhere I look, I don't. It might be the case that you stand right in the middle of an LLMs niche. Never did I say that one doesn't exist or that LLMs are inadequate at parroting existing code.
> Non-developers I know use them […]
among those are:
- things that have nothing to do with LLMs/AI
- things that you should NOT use LLMs for the reason that they will give you confidently wrong and/or random answers (because it's not in their training data/cut-off window, it's non-public information, they don't have the computing abilities to produce meaningful results)
- things that are low-value/low-stakes for which people won't be willing to pay for when asked to
> The list of uses is endless
no, it is not
> And I'm only talking of work-related usage
and we will get to see rather sooner than later how much business actually value LLMs when the real costs will be finally passed on to them.
These are things that have to do with intelligence. Human or LLM doesn't matter.
> things that you should NOT use LLMs for / parroting existing code / not in their training data/cut-off window, it's non-public information, they don't have the computing abilities to produce meaningful results
Sorry, but I just get the picture that you have no clue of what you're talking about- though most probably you're just in denial. This is one on the most surprising things about the emergence of AI: the existence of a niche of people that is hell-bent on denying its existence.
Being enthusiastic about a technology isn't incompatible with objective scrutiny. Throwing-up an ill-defined "intelligence" in the air certainly doesn't help with that.
Where I stand is where measured and fact-driven (aka. scientists) people do, operating with the knowledge (derived from practical evidence¹) that LLMs have no inherent ability to reason, while making a convincing illusion of it as long as the training data contains the answer.
> Sorry, but I just get the picture that you have no clue of what you're talking about- though most probably you're just in denial.
This isn't a rebuttal. So, what is it? An insult? Surely that won't help make your case stronger.
You call me clueless, but at least I don't have to live with the same cognitive dissonances as you, just to cite a few:
- "LLMs are intelligent, but when given a trivially impossible task, they happily make stuff up instead of using their `intelligence` to tell you it's impossible"
- "LLMs are intelligent because they can solve complex highly-specific tasks from their training data alone, but when provided with the algorithm extending their reach to generic answers, they are incapable of using their `intelligence` and the supplemented knowledge to generate new answers"
¹: https://arstechnica.com/ai/2025/06/new-apple-study-challenge...
I don't really think it's possible to convince you. Basically everyone I talk to is using LLMs for work, and in some cases- like mine- I know for a fact that they do produce enormous amounts of value- to the point that I would pay quite some money to keep using them if my company stopped paying for them.
Yes LLMs have well known limitations, but at they're still a brand new technology in its very early stages. ChatGPT appeared little more than three years ago, and in the meantime it went from barely useful autocomplete to writing autonomously whole features. There's already plenty of software that has been 100% coded by LLMs.
"Intelligence", "understanding", "reasoning".. nobody has clear definitions for these terms, but it's a fact that LLMs in many situations act as if they understood questions, problems and context, and provide excellent answers (better than the average human). The most obvious is when you ask an LLM to analyse some original artwork or poem (or some very recent online comic, why not?)- something that can't be in its training data- and they come up with perfectly relevant and insightful analyses and remarks. We don't have an algorithm for that, we don't even begin to understand how those questions can be answered in any "mechanical" sense, and yet it works. This is intelligence.
And other people try it - really sincerely try it - and they don't "get it". It doesn't work for them. And those who "get it" tell those who don't that they just need to really try it, and keep trying it until they get it. And some people never get it, and are told that they didn't try enough (and also it gets implied that they are stupid if they really can't get it).
But I think that at least part of it is in how peoples' brains work. People think in different ways. Some languages just work for some people, and really don't work very well for other people. If a language doesn't work for you, it doesn't mean either that it's a bad language or that you're stupid (or just haven't tried). It can just be a bad fit. And that's fine. Find a language that fits you better.
Well, I wonder if that applies to LLMs, and especially to LLMs doing coding. It's a tool. It has capabilities, and it has limitations. If it works for you, it can really work for you. And if it doesn't, then it doesn't, and that doesn't mean that it's a bad tool, or that you are stupid, or that you haven't tried. It can just be a bad fit for how you think or for what you're trying to do.
I can relate to this. And I can understand that, depending on how and what you code, LLMs might have different value, or even none. Totally understand.
At the same time.. well, let's put it this way. I've been fascinated with programming and computers for decades, and "intelligence", whatever it is, for me has always been the holy grail of what computers can do. I've spent a stupid amount of time thinking about how intelligence works, how a computer program could unpack language, solve its ambiguities, understand the context and nuance, notice patterns that nobody told it were there, etc. Until ten years ago these problems were all essentially unsolved, despite more than half a century of attempts, large human curated efforts, funny chatbots that produced word salads with vague hints of meaning and infuriating ones that could pass for stupid teenagers for a couple of minutes provided they selected sufficiently vague answers from a small database... I've seen them all. In 1968's A Space Odyssey there's a computer that talks (even if "experts prefer to say that it mimics human intelligence") and in 2013's Her there's another one. In between, in terms of actual results, there's nothing. "Her" is as much science fiction as it is "2001", with the aggravating factor that in Her the AI is presented as a novel consumer product: absurd. As if anything like that were possible without a complete societal disruption.
All this to say: I can't for the life of me understand people who act blasé when they can just talk to a machine and the machine appears to understand what they mean, doesn't fall for trivial language ambiguities but will actually even make some meta-fun about it if you test them with some well known example; a machine that can read a never-seen-before comic strip, see what happens in it, read the shaky lettering and finally explain correctly where the humour lies. You can repeat to yourself a billion times "transformers something-something" but that doesn't change the fact that what you are seeing is intelligence, that's exactly what we always called intelligence- the ability to make sense of messy inputs, see patterns, see the meanings behind the surface, and communicate back in clear language. Ah, and this technology is only a few years old- little more than three if we count from ChatGPT. These are the first baby steps.
So it's not working for you right now? Fine. You don't see the step change, the value in general and in perspective? Then we have a problem.
I'm banning my wife from ever buying any Alexander Wang clothing, because his leadership is so poor in comparison that he's going to also devalue the name-collision fashion brand that he shares a name with. That's how bad his leadership is going to be in comparison to Yann. Scale AI was only successful for the same reason Langchain was. Easy to be a big fish in a pond with no other fishes.
Genie looks at the video, "when this group of pixels looks like this and the user presses 'jump', I will render the group different in this way in the next frame."
Genie is an artist drawing a flipbook. To tell you what happens next, it must draw the page. If it doesn't draw it, the story doesn't exist.
JEPA is a novelist writing a summary. To tell you what happens next, it just writes "The car crashes." It doesn't need to describe what the twisted metal looks like to know the crash happened.
The deadness you're talking about is there in procedural worlds too, and it stems from the fact that there's not actually much "there." Think of it as a kind of illusion or a magic trick with math. It replicates some of the macro structure of the world but the true information content is low.
Search YouTube for procedural landscape examples. Some of them are actually a lot more visually impressive than this, but without the interactivity. It's a popular topic in the demo scene too where people have made tiny demos (e.g. under 1k in size) that generate impressive scenes.
I expect to see generative AI techniques like this show up in games, though it might take a bit due to their high computational cost compared to traditional procedural generation.
"Sure it can write a single function but the code is terrible when it tries to write a whole class..."
Look at how much prompting it takes to vibe code a prototype. And they want us to think we'll be able to prompt a whole world?
This is only a useful premise if it can do any of those things accurately, as opposed to dreaming up something kinda plausible based on an amalgamation of every vaguely related YouTube video.
What's the use? Current scientific models clearly showing natural disasters and how to prevent them are being ignored. Hell, ignoring scientific consensus is a fantastic political platform.
LLMs can barely remember the coding style I keep asking it to stick to despite numerous prompts, stuffing that guideline into my (whatever is the newest flavour of product-specific markdown file). They keep expanding the context window to work around that problem.
If they have something for long-term learning and growth that can help AI agents, they should be leveraging it for competitive advantage.
Let's say, you simulate a long museum hallway with some vases in it. Who holds what? The basic game engine has the geometry, but once the player pushes it and moves it, it needs to inform the engine it did, and then to draw the next frame, read from the engine first, update the position in the video feed, then again feed it back to the engine.
What happens if the state diverges. Who wins? If the AI wins then...why have the engine at all?
It is possible but then who controls physics. The engine? or the AI? The AI could have a different understanding of the details of the base. What happens if the vase has water inside? who simulates that? what happens if the AI decides to break the vase? who simulates the AI.
I don't doubt that some sort of scratchpad to keep track of stuff in game would be useful, but I suspect the researchers are expecting the AI to keep track of everything in its own "head" cause that's the most flexible solution.
> Why are they not training models to help write games instead?
Genie isn't about making games... Granted, they for some reason they don't put this at the top. Classic Google, not communicating well... | It simulates physics and interactions for dynamic worlds, while its breakthrough consistency enables the simulation of any real-world scenario — from robotics and modelling animation and fiction, to exploring locations and historical settings.
The key part is simulation. That's what they are building this for. Ignore everything else.Same with Nvidia's Earth 2 and Cosmos (and a bit like Isaac). Games or VR environments are not the primary drive, the primary drive is training robots (including non-humanoids, such as Waymo) and just getting the data. It's exactly because of this that perfect physics (or let's be honest, realistic physics[0,1]). Getting 50% of the way there in simulation really does cut down the costs of development, even if we recognize that cost steepens as we approach "there". I really wish they didn't call them "world models" or more specifically didn't shove the word "physics" in there, but hey, is it really marketing if they don't claim a golden goose can not only lay actual gold eggs but also diamonds and that its honks cure cancer?
[0] Looking right does not mean it is right. Maybe it'll match your intuition or undergrad general physics classes with calculus but talk to a real physicist if you doubt me here. Even one with just an undergrad will tell you this physics is unrealistic and any one worth their salt will tell you how unintuitive physics ends up being as you get realistic, even well before approaching quantum. Go talk to the HPC folks and ask them why they need superocmputers... Sorry, physics can't be done from observation alone.
[1] Seriously, I mean look at their demo page. It really is impressive, don't get me wrong, but I can't find a single video that doesn't have major physics problems. That "A high-altitude open world featuring deformable snow terrain." looks like it is simulating Legolas[2], not a real person. The work is impressive, but it isn't anywhere near realistic https://deepmind.google/models/genie/
I think it really comes down to dev time and adaptability. But honestly I'm fairly with you. I don't think this is a great route. I have a lot of experience in synthetic data generation and nothing beats high quality data. I do think we should develop world models but I wouldn't all something a world model unless it actually models a physics. And I mean "a physics" not "what people think of as 'physics'" (i.e. the real world). I mean having a counterfactual representation of an environment. Our physics equations are an extremely compressed representation of our reality. You can't generate these representations through observation alone, and that is the naive part of the usual way to develop world models. But we'd need to go into metaphysics and that's a long conversation not well suited for HN.
These simulations are helping but they have a clear limit to their utility. I think too many people believe that if you just feed the models enough data it'll learn. Hyperscaleing is a misunderstanding of the Bitter Lesson that slows development despite showing some progress.
Problem is, that's not what we've observed to happen as these models get better. In reality there is some metaphysical coarse-grained substrate of physics/semantics/whatever[1] which these models can apparently construct for themselves in pursuit of ~whatever~ goal they're after.
The initially stated position, and your position: "trying to hallucinate an entire world is a dead-end", is a sort of maximally-pessimistic 'the universe is maximally-irreducible' claim.
The truth is much much more complicated.
Eh? Context rot is extremely well known. The longer you let the context grow, the worse LLMs perform. Many coding agents will pre-emptively compact the context or force you to start a new session altogether because of this. For Genie to create a consistent world, it needs to maintain context of everything, forever. No matter how good it gets, there will always be a limit. This is not a problem if you use a game engine and code it up instead.
Once you hit a billion or so parameters, rocks suddenly start to think.
I've tried using it a couple of times, but can't get in. It is either down or hopelessly underprovisioned by Google. Do you have any links to videos showing that the quality degrades after only a few seconds?
Edit: no, it just doesn't work in Firefox. It works incredibly well, at least in Chrome, and it does not lose coherence to any great extent. The controls are terrible, though.
- https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197 walking through various cities
- https://x.com/fofrAI/status/2016936855607136506 helicopter / flight sim
- https://x.com/venturetwins/status/2016919922727850333 space station, https://x.com/venturetwins/status/2016920340602278368 Dunkin' Donuts
- https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207 simulating a laptop computer, moving the mouse
- https://x.com/emollick/status/2016919989865840906 otter airline pilot with a duck on its head walking through a Rothko inspired airport
Ironically, he covered PixVerse's world model last week and it came close to your ask: https://youtu.be/SAjKSRRJstQ?si=dqybCnaPvMmhpOnV&t=371
(Earlier in the video it shows him live prompting.)
World models are popping up everywhere, from almost every frontier lab.
From a product perspective, I still don't have a good sense of what the market for WMs will look like. There's a tension between serious commercial applications (robotics, VFX, gamedev, etc. where you want way, way higher fidelity and very precise controllability), vs current short-form-demos-for-consumer-entertainment application (where you want the inference to be cheap-enough-to-be-ad-supported and simple/intuitive to use). Framing Genie as a "prototype" inside their most expensive AI plan makes a lot of sense while GDM figures out how to target the product commercially.
On a personal level, since I'm also working on world models (albeit very small local ones https://news.ycombinator.com/item?id=43798757), my main thought is "oh boy, lots of work to do". If everyone starts expecting Genie 3 quality, local WMs need to become a lot better :)
https://www.youtube.com/watch?v=FyTHcmWPuJE
It's an experimental research prototype, but it also feels like a hint of the future. Feel free to ask any questions.
It's neat I guess that I can use a few words and generate the equivalent of an Unreal 5 asset flip and play around in it. Also I will never do that, much less pay some ongoing compute cost for each second I'm doing it.
They were too concerned with whether or not they could, they never stopped to think if they should.
>Genie 3’s consistency is an emergent capability. Other methods such as NeRFs and Gaussian Splatting also allow consistent navigable 3D environments, but depend on the provision of an explicit 3D representation. By contrast, worlds generated by Genie 3 are far more dynamic and rich because they’re created frame by frame based on the world description and actions by the user.
Although that probably precludes her from having animations in those worlds...
I mean, yes, the probability of having that level of tech in decades is quite high.
But the technology is moving very fast right now. It sounds crazy, but I think that there is a 50% chance of having ready player one level technology before 2030.
It's absolutely possible it will take more time to become economical.
Your neighbors in the street protesting for comprehensive single payer healthcare? Yeah they're perfectly fine leaving your existence up to "market forces".
Copy-paste office workers everywhere reciting memorized platitudes and compliance demands.
You're telling me I could interact even less with such selfish (and often useless given their limited real skillset) people? Deal.
America needs to rethink the compensation package if it wants to survive as a socio-political meme. Happy to call myself Canadian or Chinese if their offer is better. No bullets needed.
You have a dangerously low opinion of your fellow man, and while I sympathize with your frustration, I would humbly suggest you direct that anger at owners of companies/politicians, rather than aim it at your everyday citizen.
Perhaps better to roam a virtual reality than be starved in the real world.
Quite how they stopped a line forming three decks long outside every holodeck on the Enterprise is a mystery to me.
You probably need captain approval for NSFW content. I wonder if there will ever be an AI service that is not "Enterprise" filtered.
It's reality privilege. Most of humanity will yearn for the worlds that AI will cook up for them, customized to their whims.
What data/metric are you drawing from to arrive at this conclusion? How could you even realistically make such a statement?
I'm developing filmmaking tools with World Labs' Marble world model:
https://www.youtube.com/watch?v=wJCJYdGdpHg
https://github.com/storytold/artcraft
I think we'll eventually get to the point where these are real time and have consistent representations. I've been excited about world models since I saw the in-the-browser Pokemon demo:
https://madebyoll.in/posts/game_emulation_via_dnn/demo/
At some point, we'll have the creative Holodeck. If you've seen what single improv performers can do with AI, it's ridiculously cool. I can imagine watching entertainers in the future that summon and create entire worlds before us:
https://www.youtube.com/watch?v=MYH3FIFH55s
(If you haven't seen CodeMiko, she's an incredibly talented engineer and streamer. She develops mocap + AI streams.)
Just like how people in the 50s thought we would have flying cars and nuclear fusion by 2000.
Maybe they can unplug from 500+ AQI pollution and spend time with their loved ones and friends in a simulated clean world?
Imagine working for 10-12 hours a day, and you come home to a pod (and a building could house thousands of pods, paid for by the company) where you plug in and relax for a few hours. Maybe a few more decades of breakthroughs can lead to simulated sleep as well so they get a full rest.
Wake up, head to the factory to make whatever the developed world needs.
(holy fuck that is a horrible existence but you know some people would LOVE for that to be real)
They'd have their own economy and "life" and leave the rest of us alone. It would be completely transactional, so I'd have zero reason to feel bad if they do it voluntarily.
If they can be happy in a simulated world, and others can be happy in the real world, then everyone wins!
Except you'll never have to leave your pod. Extract the $$ from their attention all day, then sell them manufactured virtual happiness all night. It's just a more streamlined version of how many people live right now.
I'll be running away from that hellscape, thanks.
Although, I am feeling a bit lazy so let me see if I can simulate a walk.
That is not the goal.
The purpose of world models like Genie is to be the "imagination" of next-generation AI and robotics systems: a way for them to simulate the outcomes of potential actions in order to inform decisions.
The whole reason for LLMs inferencing human-processable text, and "world models" inferencing human-interactive video, is precisely so that humans can connect in and debug the thing.
I think the purpose of Genie is to be a video game, but it's a video game for AI researchers developing AIs.
I do agree that the entertainment implications are kind of the research exhaust of the end goal.
When you simulate a stream of those latents, you can decode them into video.
If you were trying to make an impressive demo for the public, you probably would decode them into video, even if the real applications don't require it.
Converting the latents to pixel space also makes them compatible with existing image/video models and multimodal LLMs, which (without specialized training) can't interpret the latents directly.
I think robots imagining the next step (in latent space) will be useful. It’s useful for people. A great way to validate that a robot is properly imagining the future is to make that latent space renderable in pixels.
[1] “By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment.”
Yeah, I think this is what the person above was saying as well. This is what people at google have said already (a few podcasts on gdm's channel, hosted by Hannah Fry). They have their "agents" play in genie-powered environments. So one system "creates" the environment for the task. Say "place the ball in the basket". Genie creates an env with a ball and a basket, and the other agent learns to wasd its way around, pick up the ball and wasd to the basket, and so on. Pretty powerful combo if you have enough compute to throw at it.
I do wonder if I can frankenstein together a passable VLA using pretrained LTX-2 as a base.
Even if you just wire this output (or probably multiples running different counterfactuals) into a multimodal LLM that interprets the video and uses it to make decisions, you have something new.
Soft disagree. What is the purpose of that imagination if not to map it to actual real world outfcomes. For this to compare them to the real world and possibly backpropagate through them you'll need video frames.
If you don't decode, how do you judge quality in a world where generative metrics are famously very hard and imprecise? How do you go about integrating RLHF/RLAF in your pipeline if you don't decode, which is not something you can skip anymore to get SotA?
Just look at the companies that are explicitly aiming for robotics/simulation, they *are* doing video models.
It's not really as much of a boon as you'd think though, since throwing together a 3D model is not the bottleneck to making a sellable video game. You've had model marketplaces for a long time now.
It is for filmmaking! They're perfect for constructing consistent sets and blocking out how your actors and props are positioned. You can freely position the camera, control the depth of field, and then storyboard your entire scene I2V.
Example of doing this with Marble: https://www.youtube.com/watch?v=wJCJYdGdpHg
Marble definitely changes the game if the game is "move the camera", just most people would not consider that a game (but hey there's probably a good game idea in there!)
First of all, there are a variety of different types of world models. Simulation, video, static asset, etc. It's a loaded term, just as the use cases are widespread.
There are world models you can play in your browser inferred entirely by your CPU:
https://madebyoll.in/posts/game_emulation_via_dnn/ (my favorite, from 2022!)
https://madebyoll.in/posts/world_emulation_via_dnn/ (updated, in 3D)
There are static asset generating world models, like WorldLabs' Marble. These are useful for video games, previz, and filmmaking.
I wrote open source software to leverage marble for filmmaking (I'm a filmmaker, and this tech is extremely useful for scene consistency):
https://www.youtube.com/watch?v=wJCJYdGdpHg
https://github.com/storytold/artcraft
There are playable video-oriented models, many of which are open source and will run on your 3080 and above:
https://github.com/Robbyant/lingbot-world
There are things termed "world models" that really shouldn't be:
https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0
There are robotics training oriented world models:
https://github.com/leggedrobotics/robotic_world_model
Genie is not strictly robotics-oriented.
The other examples you've given are neat, but for players like Google they are mostly an afterthought.
Gaming: $350B TAM
All media and entertainment: $3T TAM
Manufacturing: $5T TAM
Roughly the same story.
This tech is going to revolutionize "films" and gaming. The entire entertainment industry is going to transform around it.
When people aren't buying physical things, they're distracting themselves with media. Humans spend more time and money on that than anything else. Machines or otherwise.
AI impact on manufacturing will be huge. AI impact on media and entertainment will be huge. And these world models can be developed in a way that you develop exposure and competency for both domains.
edit: You can argue that manufacturing will boom when we have robotics that generalize. But you can also argue that entertainment will boom when we have holodecks people can step into.
Robots is also just one example. A hypothetically powerful AI agent (which might also use a world model) that controls a mouse and keyboard could replace a big chunk of white-collar work too.
Those are worth 10's of trillions of dollars. You can argue about whether they are actually possible, but the people backing this tech think they are.
They would try it once, think its cool and stop there. You would probably have a niche group of "world surfers" that would keep playing with it.
Most people do not have an idea on what they would want to play and how it would look like - they want a curated experience. As games adapted to the mass market, they became more and more curated experiences with lots of hand-holding the player.
Yeah, a holodeck would be popular, but that's a whole different technology ballpark and akin to talking about flying cars in this context.
This will have a giant impact on robotics and general models tho, as now they can simulate action/reaction inside a world in parallel, choosing the best course, by just having a picture of the world and probably a generated image of the end result or "validators" to check if task is accomplished.
And while robotics is $88B TAM nowadays, expect it to hit $888B in the next 5-10 years, with world simulators like this being one of the reasons.
From the team side, gotta be cool to build this, feels like one of those things all devs dream about.
You cannot invent data.
For example, image generators like stable diffusion carry strong representations of depth and geometry, such that performant depth estimation models can be built out of them with minimal retraining. This continues to be true for video generation models.
Early work on the subject: https://arxiv.org/pdf/2409.09144
If instead of a photo you have a video feed, this is one step closer to implementing subjective experience.
This is a paper that recently got popular ish and discusses the counter to your viewpoint.
> Paradox 1: Information cannot be increased by deterministic processes. For both Shannon entropy and Kolmogorov complexity, deterministic transformations cannot meaningfully increase the information content of an object. And yet, we use pseudorandom number generators to produce randomness, synthetic data improves model capabilities, mathematicians can derive new knowledge by reasoning from axioms without external information, dynamical systems produce emergent phenomena, and self-play loops like AlphaZero learn sophisticated strategies from games
In theory yes, something like the rules of chess should be enough for these mythical perfect reasoners that show up in math riddles to deduce everything that *can* be known about the game. And similarly a math textbook is no more interesting than a book with the words true and false and a bunch of true => true statements in it.
But I don't think this is the case in practice. There is something about rolling things out and leveraging the results you see that seems to have useful information in it even if the roll out is fully characterizable.
What I object to are the "scaling maximalists" who believe that if enough training data were available, that complicated concepts like a world model will just spontaneously emerge during training. To then pile on synthetic data from a general-purpose generative model as a solution to the lack of training data becomes even more untenable.
Besides, we already know that agents can be trained with these world models successfully. See[1]:
> By learning behaviors in imagination, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, without environment interaction. Our work provides a scalable recipe for imagination training, marking a step towards intelligent agents
I prefer real danger as living in the simulation is derivative.
So, like, it's very important to understand the lineage of training and not just the "this is it"
As soon as this thing is hooked up to VR and reaches a tipping point with the general public we all know exactly what is going to happen. The creation of the most profitable, addictive and ultimately dystopian technology Big Tech has ever come up with.
Surely a small percentage, at least, would go on to colonize.
I think you are anthropomorphising the AI too much. Imagination is inspired by reality, which AI does not have. Introducing a reality which the AI fully controls (looking beyond issues of vision and physics simulation) would only induce psychosis in the AI itself since false assumptions would only be amplified.
I think you're anthropomorphising the AI too much: what does it mean for an LLM to have psychosis? This implies that LLMs have a soul, or a consciousness, or a psyche. But... do they?
Speaking of reality, one can easily become philosophical and say that we humans don't exactly "have" a reality either. All we have are sensor readings. LLMs' sensors are texts and images they get as input. They don't have the "real" world, but they do have access to tons of _representations_ of this world.
I don’t get it. Is that supped to be a gotchya? Have you tried maliciously messing with an LLM? You can get it into a state that resembles psychosis. I mean you give it a context that is removed from reality, yet close enough to reality to act on and it willl give you crazy output.
Read "Stars don't dream" by Chi Hui (vol1 of "Think weirder") :)
But I do think it's a partial existence proof.
Humanity goes into the box and it never comes back out. It's better in there than it is out there for 99% of the population.
I can't even fathom what it would be like for the future of simulation and physical world when it gets far more accurate and realistic.
This is most evident in the way things collide.
If there is a possibility where it continue to improve at a similar rate with llms. A way to simulate fluid dynamics or structural dynamics with reasonable accuracy and speed can unlock much faster pace of innovation in the physical world. (And validated with rigorous scientific methods)
A nice thing about numerical simulation from first principles, is it innately supports arbitrary speed/precision, that's in fact the backbone of the mathematical analysis for why it works.
In some cases, as is the case for CFD, we're actually mathematically screwed because you just have to resolve the small scales to get the macro dynamics. So the standard remains a kind of hack, which is to introduce additional equations (turbulence models) that steer the dynamics in place of the small (unresolved) scales. We know how to do better though (DNS), but it costs an arm and a leg (like years to milenia on a super computer).
How are you justifying the enormous energy cost this toy is using, exactly?
I don't find anything "responsible" about this. And it doesn't even seem like something that has any actual use - it's literally just a toy.
Of course, maybe its a bridge to something else, but all I see is an image generator that's working really fast, so nothing novel.
RIP Stadia.
While "journalists" were busy bootlicking a laggy 720p Android only xCloud beta, Stadia was already delivering flawless 4K@60FPS in a web browser
They killed the only platform that actually worked just to protect Microsoft
This will be a textbook case study in how a legacy monopoly kills innovation to protect its own mediocrity
Microsoft won't survive the century, they are a dinosaur on borrowed time that has already lost the war in mobile, AI, and robotics
They don't create,, they just buy marrket share to suffocate the competition and ruin every product they touch
Even their cloud dominance is about to end, as they are already losing their grip on the European market to antitrust and sovereign alternatives
No idea how long it is supposed to take. They can pull a 3D world out of thin air but they apparently can't vibe-code a progress bar...
Edit: Now it's saying "We'll notify you when it's ready, and you'll have 30 seconds to enter your world. You are 37th in the queue." Go to restroom, come back 1 minute later: "The time to enter your world ran out." Lame-o.
Oh, is this the joke?
We saw a very diverse group of users, the common uses was paragliders, gliders, and pilots who wanted to view their or other peoples flights. Ultramarathons, mountain bike and some road-races where it provided an interactive way to visualize the course from any angle and distance. Transportation infrastructure to display train routes to be built. The list goes on.
In this view, we are essentially living inside a high-fidelity generative model. Our brains are constantly 'hallucinating' a predicted reality based on past experience and current goals. The data from our senses isn't the source of the image; it's the error signal used to calibrate that internal model. Much like Genie 3 uses latent actions and frames to predict the next state of a world, our brains use 'Active Inference' to minimize the gap between what we expect and what we experience.
It suggests that our sense of 'reality' isn't a direct recording of the world, but a highly optimized, interactive simulation that is continuously 'regularized' by the photons hitting our retinas.
At what point does the processing become so strong that it's less a photograph and more a work of computational impressionism?
This actually happened.
It goes through how sensations fit into this highly constrained, highly functional hallucination that models the outside world as a sort of bayesian prediction about the world as they relate to your concerns and capabilities as a human, and then it has a very interesting discussion about emotions as they relate to inner bodily sensations.
Hinton had a course on Coursera around 2015 that covered a lot of pre NN deep learning. Sadly I don't think it's up anymore.
https://www.youtube.com/playlist?list=PLoRl3Ht4JOcdU872GhiYW...
Also See essentia foundation videos
It’s also easy to find this treated in various philosophy/religion through time and space. And anyway as consciousness is eager to project whatever looks like a possible fit, elements of suggesting prior arts can be inferred back as far as traces can be found.
And more specifically Analytic Idealism
https://youtu.be/P-rXm7Uk9Ys?si=q7Kefl7PbYfGiChZ
Google DeepMind’s Project Genie is being framed as a “world model.” Given a text prompt, it generates a coherent, navigable, photorealistic world in real time. An agent can move through it, act within it, and the world responds consistently. Past interactions are remembered. Physics holds. Cause and effect persist.
From a technical standpoint, this is impressive engineering. From a philosophical standpoint, it’s an unexpectedly clean metaphor.
In analytic idealism, the claim is not that the physical world is fake or arbitrary. The claim is that what we call the “physical world” is how reality appears from a particular perspective. Experience is primary. The world is structured appearance.
Genie makes this intuitive.
There is no “world” inside Genie in the classical sense. There is no pre-existing ocean, mountain, fox, or library. There is a generative substrate that produces a coherent environment only when a perspective is instantiated. The world exists as something navigable because there is a point of view moving through it.
Change the character, and the same environment becomes a different lived reality. Change the prompt, and an entirely different universe appears. The underlying system remains, but the experienced world is perspective-dependent.
This mirrors a core idealist intuition: reality is not a collection of objects waiting to be perceived. It is a structured field of possible experiences, disclosed through perspectives.
The interesting part is not that Genie “creates worlds.” It’s that the worlds only exist as worlds for an agent. Without a perspective, there is no up, down, motion, danger, beauty, or meaning. Just latent structure.
Seen this way, Genie is not a model of consciousness. It’s a model of how worlds arise from viewpoints.
If you replace “agent” with “local mind,” and “world model” with “cosmic mental process,” the analogy becomes hard to ignore. A universal consciousness need not experience everything at once. It can explore itself through constrained perspectives, each generating a coherent, law-bound world from the inside.
That doesn’t prove idealism. But it makes the idea less mystical and more concrete. We are already building systems where worlds are not fundamental, but perspectival.
And that alone is worth sitting with.
- but does it mean I don’t believe all the words written above are valid? No absolutely not.
I reviewed and copyedited what I posted and the meaning is exactly what I intended to post so I’m not sure what’s the issue here
If we use LLMs to expound on our own thoughts is it a crime? They are literal masters of wordplay and rote clarification on complex topics so I think this is a very legitimate use-case for them, since I was going for clarity as an objective- esp considering the topic
Also none of my previous posts were LLM written (including this one)
People are a little over-sensitive on this topic these days
For example, you can spin around, or change position, or close your eyes, and you're still you. As you navigate and interact with the evolving universe, the only continual, relatively unchanging part of the experience is what your brain uses to differentiate itself from the rest of your perceptions.
We are never able to interact with the physical world directly, we first perceive it and then interpret those perceptions. More often than not, our interpretation ignores and modifies those perceptions, so we really are just living in a world created by our own mental chatter.
This is one of the core tenets of Buddhism, and it's also expounded on Greg Egan's short novel "Learning to Be Me". He's one of my favorite sci-fi authors and this particular short led me down a deep rabbit hole of reading many of his works within a few months.
I found a copy online, if you haven't read it, do yourself a favor and check it out. You won't be able to put it down and the ending is sublime. https://gwern.net/doc/fiction/science-fiction/1995-egan.pdf
This is an orthodox position in modern philosophy, dating back to at least Locke, strengthened by Kant and Schopenhauer. It’s held up to scrutiny for the past ~400 years.
But really it’s there in Plato too, so 2300+ years. And maybe further back
The idea that there's an objective but imperceivable world (except by philosophers) is... a slippery slope to philosophical excess.
It's easy to spin whatever fancy you want when nobody can falsify it.
For Kant, and therefore for Schopenhauer, the visible world is composed merely of objects, which are by definition only mental representations: a world of objects "exists" only in the mind of a subject. If there is a Thing-in-Itself (which even Kant does not doubt, if I recall correctly), it certainly cannot be a mental representation: the nature of the Thing-in-Itself is unknowable (says Kant) but certainly in no way at all like the mere object that appears to our mental processes. (Schopenhauer says the Thing-in-Itself is composed of pure Will, whatever that means.) The realest world is "behind" or "below" the visible one, completely divorced from human reason, and by definition completely inaccessible to any form of cognition (which includes the sensory perception we share with the animals, as well as the reason that belongs to humans alone). The Third Man paradox makes no sense at all for Kant, first because whatever the ineffable Thing-in-Itself is, it certainly won't literally "partake" of any mental concept we might come up with, and secondly because it would be a category error to suppose that any property could be true of both a mental object and a thing-in-itself, which are nothing alike. (The Thing-in-Itself doesn't even exist in time or space, nor does it have a cause. Time, space, and causality are all purely human frameworks imposed by our cognitive processes: to suppose that space has any real existence simply because you perceive it is, again, a category error, akin to supposing that the world is really yellow-tinged just because you happen to be wearing yellow goggles.)
What I wasn't able to properly highlight is how this belief has become a fundamental part of my day to day, moment to moment experience. I enjoy the constant and absolute knowledge that everything that's happening is my interpretation. And it gives me a superpower -- because for most of my life the world felt unforgiving and unpredictable. But it's actually the complete opposite, since whatever we interpret is actually in our control.
I also credit my understanding of this as a reality vs an intellectual concept to Siddhartha Gautam and his presentation of "samsara". But wherever it comes from, it is an inescapable idea and I encourage all HNers to dive deeper.
What would count as anything or anyone interacting with the physical world directly?
Personally I often catch myself making reading mistakes and knowing for a fact that the mistake wasn't just conceptual, but an actual visual error where my brain renders the wrong word. Sometimes it's very obvious because the effect will last for seconds before my vision "snaps" back into reality and the word/phrase changes.
I first noticed this phenomenon in my subjective experience whenever I was 5 and started playing Pokémon. For many months, I thought Geodude was spelled and pronounced Gordude, until my neighbor said the name correctly one day and it "unlocked" my brain's ability to see the word spelled correctly.
The effect is so strong sometimes that I can close my eyes and imagine a few different moments in my life, even as a child, where my brain suddenly "saw" the right word while reading and it changed before my eyes.
My brain seems to store/recall words phonetically, possibly because I taught myself to read at age 3 with my own phonetic approach, but also possibly due to how I trained myself out of a long spell of aphasia during high school by consciously relearning how to speak in a way that engaged the opposite hemisphere of my brain; thinking in pitches, intonation, rhyme, rhythm, etc. and turning speaking into a musical expression. I'd read about this technique and after months of work I managed to make it work for me. So in that aspect, there really might be some crossed wires out of necessity.
I was homeless in high school and thus too poor to visit doctors and get scans done, so I'm really not sure if the assumed damage to my left hemisphere which I experienced was temporary or permanent, or even detectable. The aphasia was coupled with years of intense depersonalization and derealization as well. The brain is a very strange thing and many events in my life such as the ones described above have only reinforced to me how subjective my experience really is.
- How come we have 2 eyes but see one 3d world?
- We hear sounds and music coming from various directions, but all of this is created from 2 vibrating eardrums
How much compute do you need to convince a brain its environment is "real"?
What happens if I build a self replicating super computer in this environment that finds solutions to some really big SAT instances that I can verify?
Dreams run into contradictions quite quickly.
Do jump forward to the contents' discussion marker unless you enjoy British professor banter.
I'm not certain but I think the LLM is also generating the physics itself. It's generating rules based on its training data, e.g. watch a cat walk enough and you can simulate how the cat moves in the generated "world".
Or in gaming terms do these models think FPS or RTS?
Text models and pixel grid vision models is easy but struggling to wrap my head around what world model "sees" so to speak.
> Diego Rivas, Shlomi Fruchter, and Jack Parker-Holder from the Project Genie team join host Logan Kilpatrick for an in-depth discussion on Google DeepMind’s latest breakthrough in world models. Project Genie is an experimental research prototype that allows users to generate, explore, and interact with infinitely diverse, photorealistic worlds in real-time. Learn more about the shift from passive video generation to interactive media, the technical challenges of maintaining world consistency and memory, and how these models serve as an essential training ground for AI agents.
In 6-12 months they will announce another really cool tech demo. And so on.
They have been doing this for decades. To us this seems like the starting point of something really cool. To them it's a delivery; finally time to move on to something else.
The only test I ever want to see with these frame-gen models is a full 1080 degree camera spin. Miss me with that 30 degree back and forth crap. I want multiple full turns. Some jitter and a little back-and-forth wobble is fine. But I want multiple full spins.
I’m pretty sure I know why they don’t do this :)
> a full 1080 degree camera spin
Do you mean 3 full turns, or do you mean 180 (one half-turn)?
Like I want to take my skyrim character, drop it into Diablo 2, then drop Diablo (the demon) into Need for Speed, then have my need for speed car show up on another planet and upgrade it into a space ship, while the space ship takes me to fight some mega aliens. All while offering a coherent & unique experience. As you play, the game saves major forks in your story & game genre, so you can invite/share your game recipe with other humans to enjoy.
Also, when are we getting a new Spore game? This game is a sleeping giant waiting to be awaken.
Notice we didn’t see the cat go behind the couch. Maladaptive cut.
They also don’t mention that the longer it runs over around 15 seconds the more hallucination until it’s a garbled mess.
Nobody wants to be responsible for writing the film they're watching, as they watch it. Nobody wants to engage with a virtual world that has no story, narrative, meaning, purpose.
What is the actual use case for this, once we move beyond the tech demo stage?
The idea it will wholesale replace existing 3D graphics/rendering pipelines and processes any time soon seems so far-fetched to me (even just logistically) that I can't wrap my head around what people are thinking here.
This strikes me as a fancy parlor trick. Unless I'm mistaken, we need proper world models to do the things people are erroneously assuming this to be capable of one day.
I really don't want games to turn into a soulless, AI generated mush. This could also bring a huge amount of slop-generated content flooding the game market.
On the other hand I think this could be very cool from some specific use cases like outdoor scenarios in simulators. I've always wanted a game like Euro Truck Simulator where I can drive a car around a whole 1:1 representation of a country and this might just allow that, obviously I don't care about an accurate representation of every building or tree or hallucinations, just for it to be believable enough.
I wonder if it can be integrated into already existing engines though, because it seems like a big stretch to write actual game logic as an LLM prompt.
meetpateltech•1w ago
Try it in Google Labs: https://labs.google/projectgenie
(Project Genie is available to Google AI Ultra subscribers in the US 18+.)