This guy a month ago for example: https://youtu.be/SGJC4Hnz3m0
The game is called "Explorers' Guild", or "xg" for short. It's easier for Claude to act as a player than a director (xg's version of a dungeon master or game master), again mainly because of permance and learning issues, but to the extent that I can help it past those issues it's also fairly good at acting as a director. It does require some pretty specific stuff in the system prompt to, for example, avoid confabulating stuff that doesn't fit the world or the scenario.
But to really build a version of xg on Claude it needs better ways to remember and improve what it has learned about playing the game, and what it has learned about a specific group of players in a specific scenario as it develops over time.
I don't have access to the DeepMind demo, but from the video it looks like it takes the idea up a notch.
(I don't know the exact lineage of these ideas, but a general observation is that it's a shame that it's the norm for blog posts / indie demos to not get cited.)
[1] https://news.ycombinator.com/item?id=43798757
[2] https://madebyoll.in/posts/world_emulation_via_dnn/demo/
If making games out of these simulations work, it't be the end for a lot of big studios, and might be the renaissance for small to one person game studios.
There's obviously something insanely impressive about these google experiments, and it certainly feels like there's some kind of use case for them somewhere, but I'm not sure exactly where they fit in.
If I am wrong, then the huge supply of fun games will completely saturate demand and be no easier for indie game devs to stand out.
You COULD create a sailing sim but after ten minutes you might be walking on water, or in the bath, and it would use more power than a small ferry.
There's no way this tech can run on a PS5 or anything close to it.
I mean, if making a game eventually boils down to cooking a sufficient prompt (which to be clear, I'm not talking about text, these prompts are probably going to be more like video databases) then I'm not sure if it will be a renaissance for "one person game studios" any more than AI image generation has been a renaissance for "one person artists".
I want to be optimistic but it's hard to deny the massive distribution stranglehold that media publishing landscape has, and that has nothing to do with technology.
The goal of world models like Genie is to be a way for AI and robots to "imagine" things. Then, they could practice tasks inside of the simulated world or reason about actions by simulating their outcome.
I think he's lucky he got out with his reputation relatively intact.
When the right move (strategically, economically) is to not compete, the head of the AI division acknowledging the above and deciding to focus on the next breakthrough seems absolutely reasonable.
Genie looks at the video, "when this group of pixels looks like this and the user presses 'jump', I will render the group different in this way in the next frame."
Genie is an artist drawing a flipbook. To tell you what happens next, it must draw the page. If it doesn't draw it, the story doesn't exist.
JEPA is a novelist writing a summary. To tell you what happens next, it just writes "The car crashes." It doesn't need to describe what the twisted metal looks like to know the crash happened.
The deadness you're talking about is there in procedural worlds too, and it stems from the fact that there's not actually much "there." Think of it as a kind of illusion or a magic trick with math. It replicates some of the macro structure of the world but the true information content is low.
Search YouTube for procedural landscape examples. Some of them are actually a lot more visually impressive than this, but without the interactivity. It's a popular topic in the demo scene too where people have made tiny demos (e.g. under 1k in size) that generate impressive scenes.
I expect to see generative AI techniques like this show up in games, though it might take a bit due to their high computational cost compared to traditional procedural generation.
"Sure it can write a single function but the code is terrible when it tries to write a whole class..."
Look at how much prompting it takes to vibe code a prototype. And they want us to think we'll be able to prompt a whole world?
This is only a useful premise if it can do any of those things accurately, as opposed to dreaming up something kinda plausible based on an amalgam of vaguely related YouTube videos.
What's the use? Current scientific models clearly showing natural disasters and how to prevent them are being ignored. Hell, ignoring scientific consensus is a fantastic political platform.
> Why are they not training models to help write games instead?
Genie isn't about making games... Granted, they for some reason they don't put this at the top. Classic Google, not communicating well... | It simulates physics and interactions for dynamic worlds, while its breakthrough consistency enables the simulation of any real-world scenario — from robotics and modelling animation and fiction, to exploring locations and historical settings.
The key part is simulation. That's what they are building this for. Ignore everything else.Same with Nvidia's Earth 2 and Cosmos (and a bit like Isaac). Games or VR environments are not the primary drive, the primary drive is training robots (including non-humanoids, such as Waymo) and just getting the data. It's exactly because of this that perfect physics (or let's be honest, realistic physics[0]). Getting 50% of the way there in simulation really does cut down the costs of development, even if we recognize that cost steepens as we approach "there". I really wish they didn't call them "world models" or more specifically didn't shove the word "physics" in there, but hey, is it really marketing if they don't claim a golden goose can not only lay actual gold eggs but also diamonds and that its honks cure cancer?
[0] Looking right does not mean it is right. Maybe it'll match your intuition or undergrad general physics classes with calculus but talk to a real physicist if you doubt me here. Even one with just an undergrad will tell you this physics is unrealistic and any one worth their salt will tell you how unintuitive physics ends up being as you get realistic, even well before approaching quantum. Go talk to the HPC folks and ask them why they need superocmputers... Sorry, physics can't be done from observation alone.
- https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197 walking through various cities
- https://x.com/fofrAI/status/2016936855607136506 helicopter / flight sim
- https://x.com/venturetwins/status/2016919922727850333 space station, https://x.com/venturetwins/status/2016920340602278368 Dunkin' Donuts
- https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207 simulating a laptop computer, moving the mouse
- https://x.com/emollick/status/2016919989865840906 otter airline pilot with a duck on its head walking through a Rothko inspired airport
I mean, yes, the probability of having that level of tech in decades is quite high.
But the technology is moving very fast right now. It sounds crazy, but I think that there is a 50% chance of having ready player one level technology.
It's absolutely possible it will take more time to become economical.
That is not the goal.
The purpose of world models like Genie is to be the "imagination" of next-generation AI and robotics systems: a way for them to simulate the outcomes of potential actions in order to inform decisions.
The whole reason for LLMs inferencing human-processable text, and "world models" inferencing human-interactive video, is precisely so that humans can connect in and debug the thing.
I think the purpose of Genie is to be a video game, but it's a video game for AI researchers developing AIs.
I do agree that the entertainment implications are kind of the research exhaust of the end goal.
When you simulate a stream of those latents, you can decode them into video.
If you were trying to make an impressive demo for the public, you probably would decode them into video, even if the real applications don't require it.
Converting the latents to pixel space also makes them compatible with existing image/video models and multimodal LLMs, which (without specialized training) can't interpret the latents directly.
I think robots imagining the next step (in latent space) will be useful. It’s useful for people. A great way to validate that a robot is properly imagining the future is to make that latent space renderable in pixels.
It's not really as much of a boon as you'd think though, since throwing together a 3D model is not the bottleneck to making a sellable video game. You've had model marketplaces for a long time now.
meetpateltech•2h ago
Try it in Google Labs: https://labs.google/projectgenie
(Project Genie is available to Google AI Ultra subscribers in the US 18+.)