Genie 3: A new frontier for world models

https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/

1102•bradleyg223•11h ago

Comments

93po•11h ago

I wouldn't want to be a hollywood production studio or game developer right now.

tkgally•10h ago

Same here. Though if I were a 17-year-old film fan or gamer with an imaginative drive, I would be really excited about the powerful creative tools that might become available to me soon.

Mouvelie•7h ago

I don't know if you had the teenage years I had, but there would be A LOT of NSFW content made on that thing.

mclau157•10h ago

Hollywood maybe for small scenes, but gamers would quickly realize and destroy this level of quality and continuity vs. a 3D game engine with defined meshes

93po•7h ago

i meant more in 5 years when its significantly better

ducktective•10h ago

Well, what would you want to be? Frontend dev? Mobile dev? Script writer? Logo designer? Junior lawyer?

93po•7h ago

housewife

assword•7h ago

Housewife to an AI CEO might be a good gig, if you can take the beatings.

myaccountonhn•9h ago

I actually think indie game dev is quite safe from AI (well its already insanely competitive). It might change the field, or shrink the market but I think AI has a chance at replacing workers where the only metric that matters is $$$ and productivity. I just don't see myself consuming, for example, an AI generated autobiography or any AI generated book. As long as enough people feel that way the market will continue to be there.

rane•8h ago

Is it though? Won't AI make the barrier of entry to indie game dev even lower, as assets and code will be able to be created effortlessly.

hooverd•8h ago

There's already a glut of open world slop!

vaenaes•7h ago

A common refrain among the least creative people in the world.

yanis_t•10h ago

> Text rendering. Clear and legible text is often only generated when provided in the input world description.

Reminds me of when image AIs weren't able to generate text. It wasn't too long until they fixed it.

reactordev•10h ago

And made hands 10x worse. Now hands are good, text is good, image is good, so we’ll have to play where’s Waldo all over again trying to find the flaw. It’s going to eventually get to a point where it’s one of those infinite zoom videos where the AI watermark is the size of 1/3rd of a pixel.

What I’d really love to see more of is augmented video. Like, the stormtrooper vlogs. Runway has some good stuff but man is it all expensive.

maerF0x0•3h ago

someone mentioned physics. Which might be an interesting conundrum because an important characteristic of games is that some part of them is both novel and unrealistic. (They're less fun if they're too real)

reactordev•3h ago

It depends on the genre. Simulation “games” tend to love realism of simulation while providing an accelerated time. Others like you said, are more fun with plausible physics or physics that bend the rules a little bit. Sometimes, a game just about funky physics becomes a hit - Goat Simulator.

Walking/Running/Steps have already been solved pretty well with NN’s, but simulation of vehicle engines and vehicle physics have not. Not to my knowledge. I suspect iRacing would be extremely interested in such a model.

edit

I take it back, PINN’s are a thing and now I have a new rabbit hole…

TheAceOfHearts•10h ago

I wouldn't say that the text problem has been fully fixed. It has certainly gotten a lot better, but even gpt-image-1 still fails occasionally when generating text.

yencabulator•8h ago

Note that the prompt and the generated chalkboard disagree on whether there's a dash or not.

yanis_t•10h ago

And unfortunately not possible to play around for the general public.

cj•10h ago

> World models are also a key stepping stone on the path to AGI, since they make it possible to train AI agents in an unlimited curriculum of rich simulation environments.

I don't think Humans are the target market for this model, at least right now.

Sounds like the use case is creating worlds for AI agents to play in.

hodgehog11•10h ago

This kind of announcement without an appropriate demo to verify their claims is pretty common with DeepMind at this point. They barely even discuss their limitations, so as always, this should be taken with a grain of salt.

Miraste•10h ago

Most of the big labs never go into their models' limitations. OpenAI does it best, despite their inveterate hype-building. Their releases always have a reasonable limitations section, usually with text/image/video examples of failures.

qingcharles•10h ago

Here's a (weaker) competitor that's live:

https://odyssey.world/introducing-interactive-video

romanovcode•7h ago

Yeah, honestly - what's the point of announcing it then?

I DECLARE BANKRUPTCY vibes here

zb3•10h ago

Yet another unavailable model from Google.. if I can't use it, I don't care. Tell me about it when it's ready to use.

Tadpole9181•10h ago

What a strange take. Do you not care about news coming from the James Webb Telescope either, just because you can't play with the telescope personally?

It's a whitepaper release to share the STOTA research. This doesn't seem like an economically viable model, nor does it look polished enough to be practically usable.

growthwtf•10h ago

I think it's a perfectly valid take coming from some intersection of an engineering mindset and FOSS culture. And, the comparison you bring up is a bit of a category error.

We know how James Webb works and it's developed by an international consortium of researchers. One of our most trusted international institutions, and very verifiable.

We do not know how Genie works, it is unverifiable to non-Google researchers, and there are not enough technical details to move much external teams forward. Worst case, this page could be a total fabrication intended to derail competition by lying about what Google is _actually_ spending their time on.

We really don't know.

I don't say this to defend the other comment and say you're wrong, because I empathize with both points. But I do think that treating Google with total credulity would be a mistake, and the James Webb comparison is a disservice to the JW team.

zb3•10h ago

James Webb Telescope is not something that can be - and is released. AI models are, and others are announcing them when they're available, but DeepMind introduces noise here with their "trust us, that works, now go away" approach.

delusional•10h ago

> James Webb Telescope is not something that can be - and is released

I would actually turn that around. The Telescope is released. It's flying around up there taking photos. If they kept it in some garage while releasing flashy PR pages about how groundbreaking it is, then I'd be pretty skeptical.

alganet•10h ago

It's a bitter take, but your comparison with JWST is invalid.

The main product of the telescope is its data, not the ability for anyone to play with the instruments.

The main product of the model is the ability for anyone to play with it.

Strange rebutal.

esafak•7h ago

You don't have to be the customer of every product (that affects you).

brotchie•10h ago

First AI thing that’s made me feel a bit of derealization…

…and this is the worst the capabilities will ever be.

Watching the video created a glimmer of doubt that perhaps my current reality is a future version of myself, or some other consciousness, that’s living its life in an AI hallucinated environment.

curwin•10h ago

Same, even worse that the world was generated on the fly—nobody even hand-crafted it. That makes it even more depressing.

drstewart•10h ago

... so like the real world?

thatfrenchguy•9h ago

I mean, Minecraft worlds are generated on the fly and they've unleashed quite a it of creativity in children

simianparrot•10h ago

The same argument can be used for anything.

Personal jetpacks are the worst they’ll ever be. Doesn’t mean they’re any close to being useful.

delusional•10h ago

It's also just wrong. Plenty of things get worse.

crossbody•10h ago

Not really. E.g. clay tablets, physical books will not be meaningfully better.

suddenlybananas•10h ago

Yes, but they're not going to get any worse.

crossbody•9h ago

"not getting worse" is a pretty low bar

simianparrot•8h ago

Which is precisely my point. It's the _lowest_ bar possible.

yencabulator•8h ago

Books are printed on worse paper these days that doesn't last as long.

Workaccount2•10h ago

While true, it's the speed of improvement that gives the statement gravity.

westoncb•10h ago

The difference is the incentive to improve, and actual present rate of improvement, for models like this is far higher than it is for jetpacks. (That and certain intrinsic features at least suggest the route to improvement is roughly "more of the same," vs "needs massive unknown breakthrough".)

r0fl•9h ago

Trillions of dollars are not being invested in making jet packs any better

Your comparison is incorrect

rkozik1989•9h ago

You say assuming money is limitless and investor patience for returns is endless.

dragonwriter•9h ago

And if trillions of dollars were being invested in that, it would mean lots of investors being disappointed in a few years, not that jet packs were close to being useful.

Not sure if that's what you are trying to say about AI, or not.

ekianjo•9h ago

> Personal jetpacks are the worst they’ll ever be

Have they become better over the past 20 years?

danvoell•10h ago

My take as well. Feels like something that is going to be plugged into my brain when I'm drooling in a nursing home.

RALaBarge•9h ago

Think of all of the suffering it will prevent

dlivingston•5h ago

Literally, The Matrix. (I just rewatched the first one for the first time in a decade and forgot how damn good of a movie it is.)

MarkusQ•10h ago

> First AI thing that’s made me feel a bit of derealization…

> …and this is the worst the capabilities will ever be.

I guess if this bothers you (and I can see how it might) you can take some small comfort in thinking that (due to enshitification) this could in fact be the _best_ the capabilities will ever be.

Philpax•9h ago

Once it has been proven to be possible, other companies [1][2][3] can and will reproduce it, and will attempt to push the frontier. As far as we know, there's no bottleneck that's stalling development here.

[1]: https://www.worldlabs.ai/

[2]: https://wayfarerlabs.ai/

[3]: https://runwayml.com/research/introducing-general-world-mode...

dandellion•9h ago

I suggest you go to google/bing/whatever floats your boat and search "it will only get better" then filter results earlier than 2010. Things that I just found that were going to "only get better":

- Google search

- Web browsers

- Web content

- Internet Explorer

- Music

- Flight process at Mosul airport

- Star Wars

dist-epoch•9h ago

Google search is better. It's just that now you ask the LLM what you want to find out.

zanellato19•9h ago

Google search is absolutely not better.

konart•9h ago

Depends on a perspective though.

iammrpayments•7h ago

I was going to say it was maybe better for advertisers but the auction has gone up and the dashboard has much less data due to legal restrictions

Philpax•9h ago

Those are worse due to economic and cultural reasons, not technological reasons. The technology itself will only get better.

(Also, implying that music has gotten worse is a boomer-ass take. It might not be to your liking, but there's more of it than ever before, and new sonic frontiers are being discovered every day.)

x187463•8h ago

None of those have a quantifiable definition of 'better'. The current range of AI models have very easily measured metrics.

internetter•7h ago

Very much disagree. Current AI benchmarks are quite arbitrary as evidenced by the ability of a model to be fitted to a particular benchmark. Like the closest benchmark to objectivity is “does it answer this question factually” and benchmarks like that are just as failable really because who decides what questions we ask? The same struggles happen when we try to measure human intelligence. The more complex the algorithm the harder it is to quantify because there are so many parameters. I could easily contrive some “search engine benchmark”, but it wouldn’t be that useful because it’s only adherent to my own subjective definition of what it means for a search engine to be good.

Terretta•8h ago

> Star Wars

And then you watched Mandalorian and Andor?

Jokes aside, Google Search results are worse thanks to so much web content being just ad scaffolding, but the interesting one here is music.

Music is typically imagined to be its best at whatever ages one most listened to it, partly trained in and partly thanks to meanings/memories/nostalgia attached to it. As a consequence, for most everyone, more recent music seems to be “getting worse”!

That said, and back to the SEO effect on Google Results, I'd argue mass distribution/advertising/marketing has resulted in most audio airtime getting objectively* less complex, but if one turns off the mass distribution, and looks around, there seems to be plenty of just as good — even building on what came before — music to be found.

* https://www.researchgate.net/publication/387975100_Decoding_...

Dardalus•8h ago

Are you really trying to say that these models aren't going to get better from here? You think that the insane progress of the last 5 years just stops right here?

torginus•9h ago

If it helps, if you look at the biology of human vision, you find out things like the width of your cone of sharp vision is about 2 degrees, or the size of your thumb held out at arms length.

Due to this physical limitation, what you 'see' in front of you, widely accepted as ground truth reality, cannot possibly real, its a hallucination produced by your brain.

Your brain, compared to the sensory richness of reality you experience around you, has very limited direct inputs from the outside world, it must construct a rich internal model based on this.

It's very weird (at least to me), that the boundary between reality and assumption (basically educated guessing) is very arbitrary, and definitely only exists in our heads.

remir•7h ago

That's pretty much the basis for the simulation theory. See also "My Big TOE" (Theory of Everything) from Tom Campbell.

swax•7h ago

It's an unsettling feeling as what's more complicated - all the atoms and galaxies, trillions of life forms, the unimaginable distances of our universe OR a relatively simple world model that is our conscious experience and nothing else.

j_timberlake•4h ago

No it's good, you're ahead of the curve, most people aren't there yet.

The next step is to realize that, if life is a cheap simulation, not everyone might have... uh... fully simulated minds. Player Characters vs NPCs is what gamers would say, though it doesn't have to be binary like that, and the term NPC has already been ruined by social media rants. (Also, NPC is a bad insult because most of the coolest characters in games are NPC rivals or bosses or whatnot.)

NoScopeNinja•10h ago

It sounds cool that Genie 3 can make whole worlds you can explore, but I wonder how soon regular people will actually get to try it out?

qingcharles•10h ago

These guys are working on the same thing and have a real demo you can play:

https://odyssey.world/introducing-interactive-video

yoavm•5h ago

Wow, a few years ago, if you've shown me this and Genie 3, I'd assume there were at least 10 years of development between them. This looks worse than Doom.

qingcharles•3h ago

The rate of change is insane these days. I remember Sora launching and thinking "wow" and within weeks it looked like hot garbage.

modeless•10h ago

Consistency over multiple minutes and it runs in real time at 720p? I did not expect world models to be this good yet.

> Genie 3’s consistency is an emergent capability

So this just happened from scaling the model, rather than being a consequence of deliberate architecture changes?

Edit: here is some commentary on limitations from someone who tried it: https://x.com/tejasdkulkarni/status/1952737669894574264

> - Physics is still hard and there are obvious failure cases when I tried the classical intuitive physics experiments from psychology (tower of blocks).

> - Social and multi-agent interactions are tricky to handle. 1vs1 combat games do not work

> - Long instruction following and simple combinatorial game logic fails (e.g. collect some points / keys etc, go to the door, unlock and so on)

> - Action space is limited

> - It is far from being a real game engines and has a long way to go but this is a clear glimpse into the future.

Even with these limitations, this is still bonkers. It suggests to me that world models may have a bigger part to play in robotics and real world AI than I realized. Future robots may learn in their dreams...

kfarr•10h ago

Bitter lesson strikes again!

nxobject•4h ago

_Especially_ given the goal of a world model using a rasters-only frame-by-frame approach. Holy shit.

ivape•9h ago

So this just happened from scaling the model

Unbelievable. How is this not a miracle? So we're just stumbling onto breakthroughs?

silveraxe93•9h ago

Is it actually unbelievable?

It's basically what every major AI lab head is saying from the start. It's the peanut gallery that keeps saying they are lying to get funding.

ivape•9h ago

It's akin to us sending a rocket to space and immediately discovering a wormhole. Sure, there's a lot of science about what's out there, but to discover all this in our first few trips to orbit ...

silveraxe93•9h ago

Lemme start by saying this is objectively amazing. But I just really wouldn't call it a breakthrough.

We had one breakthrough a couple of years ago with GPT-3, where we found that neural networks / transformers + scale does wonders. Everything else has been a smooth continuous improvement. Compare today's announcement to Genie-2[1] release less than 1 year ago.

The speed is insane, but not surprising if you put in context on how fast AI is advancing. Again, nothing _new_. Just absurdly fast continuous progress.

[1] - https://deepmind.google/discover/blog/genie-2-a-large-scale-...

ducktective•9h ago

Wasn't the model winning gold in IMO result of a breakthrough? I doubt an stochastic parrot can solve math at IMO level...

Philpax•9h ago

As far as we know, it was "just" scale on depth (model capability) and breadth (multiple agents working at the same time).

bakuninsbart•8h ago

Why wouldn't it? I still have to hear one convincing argument how our brain isn't working as a function of probable next best actions. When you look at amoebas work, and animals that are somewhere between them and us in intelligence, and then us, it is a very similar kind of progression we see with current LLMs, from almost no state of the world, to a pretty solid one.

pantalaimon•9h ago

Joscha Bach postulates that what we call consciousness must be something rather simple, an emergent property present in all sufficiently complex biological organisms.

We don't inherit any software, so cognitive function must bootstrap itself from it's underlying structure alone.

https://media.ccc.de/v/38c3-self-models-of-loving-grace

glenstein•9h ago

>We don't inherit any software, so cognitive function must bootstrap itself from it's underlying structure alone.

Hardware and software, as metaphors applied to biology, I think are better understood as a continuum than a binary, and if we don't inherit any software (is that true?), we at least inherit assembly code.

pantalaimon•8h ago

> we don't inherit any software (is that true?), we at least inherit assembly code

To stay with the metaphor, DNA could be rather understood as firmware that runs on the cell. What I mean with software is the 'mind' that runs on a collection of cells. Things like language, thoughts and ideas.

There is also a second level of software that runs not on a single mind alone, but collection of minds, to form cliques or a societies. But this is not encoded in genes, but in memes.

glenstein•8h ago

I think we have some notion of a proto-grammar or ability to linguistically conceptualize, probably at the level of some primordial conceptual units that are more fundamental than language, thoughts and ideas in the concrete forms we generally understand them to have.

I think it's like Chomsky said, that we don't learn this infrastructure for understanding language any more than a bird "learns" their feathers. But I might be losing track of what you're suggesting is software in the metaphor. I think I'm broadly on board with your characterization of DNA, the mind and memes generally though.

airstrike•7h ago

At the most fundamental level, is it even linguistic? Would Tarzan speak at all?

suddenlybananas•4h ago

Children (who aren't alone) will invent languages to communicate between each other, see Nicaraguan Sign Language.

quesera•6h ago

The emergent property theory seems logical, but I'm also partial to the quantum-tunneling-miasma theory which basically posits that there could be something fairly complex going on, and we just lack the ability to observe/measure it in our current physics. (Although I have difficulty coherently separating this theory from faith-based beliefs)

CharlieDigital•5h ago

   > We don't inherit any software

I wonder, though. Many animal species just "know" how to perform certain complex actions without being taught the way humans have to be taught. Building a nest, for example.

If you say that this is emergent from the "underlying structure alone", doesn't this mean that it would still be "inherited" software (though in this case, maybe we think of it like punch cards).

pantalaimon•4h ago

That's interesting indeed - or take spiders building nets. So there must be some 'microcode' that does get inherited like physical features.

But then you have things like language or societal customs that are purely 'software'.

IAmGraydon•1h ago

I’ve always said that animals have short term and long term memory via the hippocampus, and then there’s supragenerational memory stored in DNA - behaviors that are learned over many generations and passed down via genetics.

tim333•3h ago

We inherit ~2GB of digital data as DNA. Quite how that turns into nest building how tos is not yet known but it must happen somehow.

dgfl•20m ago

I’ve seen different figures for information content of DNA but they’re all mostly misleading. What we actually inherit is much more. We are the result of an unpacking algorithm starting from a single cell over time, so our information content should at the very least include the entirety of the cell (which is probably impossible to calculate). Additionally, in a more general sense, arbitrarily complex behavior can be derived from very simple mathematics, e.g. cellular automata. With sufficient complex dynamics (which for us are given by the laws of physics), even very small information changes lead to vastly different “emergent behavior”, whatever that means. One could improperly say that part of the information is included in the laws of physics itself.

A biological example that I like: the neural structures for vision develop almost fully formed from the very beginning. The state of our network at initialization is effectively already functional. I’m not sure to which extent this is true for humans, but it is certainly true for simpler organisms like flies. The way cells achieve this is through some extremely simple growth rules as the structure is being formed for the first time. Different kinds of cells behave almost independently of each other, and it just so happens that the final structure is a perfectly functional eye. I’ve seen animations of this during a conference talk and it was one of the most fascinating things I’ve ever seen. It truly shows how the complexity of a biological organism is just billions of times any human technology. And at the same time, it’s a beautiful illustration of the lack of intelligent design. It’s like watching a Lego assemble by just shaking the pieces.

sitkack•1h ago

> We don't inherit any software

How do you claim to know this?

dboreham•7m ago

Don't know who this Bach dude is, but I've been postulating the same thing since the early 1980s. Only to my friends in the pub, but still..

JeremyNT•9h ago

Even as a layman and AI skeptic, to me this entirely matches my expectations, and something like this seemed like it was basically inevitable as of the first demos of video rendering responding to user input (a year ago? maybe?).

Not to detract from what has been done here in any way, but it all seems entirely consistent with the types of progress we have seen.

It's also no surprise to me that it's from Google, who I suspect is better situated than any of its AI competitors, even if it is sometimes slow to show progress publicly.

westbrookt•15m ago

https://worldmodels.github.io/

I think this was the first mention of world models I've seen circa 2018.

This is based on VAEs though.

glenstein•9h ago

>It's basically what every major AI lab head is saying from the start.

I suppose it depends what you count as "the start". The idea of AI as a real research project has been around since at least the 1950s. And I'm not a programmer or computer scientist, but I'm a philosophy nerd and I know debates about what computers can or can't do started around then. One side of the debate was that it awaited new conceptual and architectural breakthroughs.

I also think you can look at, say, Ted Talks on the topic, with guys like Jeff Hawkins presenting the problem as one of searching for conceptual breakthroughs, and I think similar ideas of such a search have been at the center of Douglas Hofstadter's career.

I think in all those cases, they would have treated "more is different" like an absence of nuance, because there was supposed to be a puzzle to solve (and in a sense there is, and there has been, in terms of vector space and back propagation and so on, but it wasn't necessarily clear that physics could "pop out" emergently from such a foundation).

jonas21•2h ago

When they say "the start", I think they mean the start of the current LLM era (circa 2017). The main story of this time has been a rejection of the idea that major conceptual breakthroughs and complex architectures are needed to achieve intelligence. Instead, it's better to focus on simple, general-purpose methods that can scale to massive amounts of data and compute (i.e. the Bitter Lesson [1]).

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

spaceman_2020•7h ago

becoming really, really hard to refute the Simulation Theory

shreezus•4h ago

There are a lot of "interesting" emergent behaviors that happen just a result of scaling.

Kind of like how a single neuron doesn't do much, but connect 100 billion of them and well...

diwank•9h ago

> Future robots may learn in their dreams...

So prescient. I definitely think this will be a thing in the near future ~12-18 months time horizon

neom•9h ago

I'm invested in a startup that is doing something unrelated robotics, but they're spending a lot of time in Shenzhen, I keep a very close eye on robotics and was talking to their CTO about what he is seeing in China, versions of this are already being implemented.

dingnuts•9h ago

what is a robot dream when there is clearly no consciousness?

What's with this insane desire for anthropomorphism? What do you even MEAN learn in its dreams? Fine-tuning overnight? Just say that!

gavinray•8h ago

  > What's with this insane desire for anthropomorphism?

Devil's advocate: Making the assumption that consciousness is uniquely human, and that humans are "special" is just as ludicrous.

Whether a computational medium is carbon-based or silicon-based seems irrelevant. Call it "carbon-chauvinism".

bakuninsbart•8h ago

That's not even a devil's advocate, many other animals clearly have consciousness, at least if we're not solipsistic. There have been many very dangerous precedents in medicine where people have been declared "brain dead" only to awake and remember.

Since consciousness is closely linked to being a moral patient, it is all the more important to err on the side of caution when denying qualia to other beings.

mandolingual•7h ago

"Consciousness" is an overloaded thought killer that swerves all conversation into obfuscated semantic arguments. One person will be talking about 'internality' and self-image (in the testable, mechanical sense that you could argue Chain of Thought models already have in a petty way) and the other will be grappling with the concept of qualia and the ineffable nature of human experience.

olddustytrail•7h ago

Yes, and an object in OOP isn't really a physical object. And a string isn't really a thin bit of rope.

No-one cares. It's just terminology.

Aco-•9h ago

"Do Androids Dream of Electric Sheep?"

casenmgreen•8h ago

I may be wrong, but this seems to make no sense.

A neural net can produce information outside of its original data set, but it is all and directly derived from that initial set. There are fundamental information constraints here. You cannot use a neural net to itself generate from its existing data set wholly new and original full quality training data for itself.

You can use a neural net to generate data, and you can train a net on that data, but you'll end up with something which is no good.

schmidtleonard•8h ago

We are miles away from the fundamental constraint. We know that our current training methodologies are scandalously data inefficient compared to human/animal brains. Augmenting observations with dreams has long been theorized to be (part of) the answer.

vanviegen•3h ago

> current training methodologies are scandalously data inefficient compared to human/animal brains

Are you sure? I've been ingesting boatloads of high definition multi-sensory real-time data for quite a few decades now, and I hardly remember any of it. Perhaps the average quality/diversity of LLM training data has been higher, but they sure remember a hell of a lot more of it than I ever could.

neom•8h ago

I might be misunderstanding your comment so sorry if so. Robots have sensors and RL is a thing, they can collect real world data and then processing and consolidating real world experiences during downtime (or in real time), running simulations to prepare for scenarios, and updating models based on the day's collected data. The way I saw it that I thought was impressive was the robot understood the scene, but didn't know how the scene would respond to it's actions, so it gens videos of the possible scenarios, and then picks the best ones and models it's actuation based on it's "imagination".

hnuser123456•8h ago

It's feasible you could have a personal neural net that fine-tunes itself overnight to make less inference mistakes in the future.

exe34•6h ago

Any idea how humans do it? Where do they get novel information from?

Demplolo•5h ago

I actually think you can.

The LLM has plenty of experts and approaches etc.

Give it tool access let it formulate it's own experiments etc.

The only question here is if it becomes a / the singularity because of this, gets stuck in some local minimum or achieves random perfection and random local minimum locations.

scarmig•5h ago

Humans are dependent on their input data (through lifetime learning and, perhaps, information encoded in the brain from evolution), and yet they can produce out of distribution information. How?

There is an uncountably large number of models that perfectly replicate the data they're trained on; some generalize out of distribution much better. Something like dreaming might be a form of regularization: experimenting with simpler structures that perform equally well on training data but generalize better (e.g. by discovering simple algorithms that reproduce the data equally well as pure memorization but require simpler neural circuits than the memorizing circuits).

Once you have those better generalizing circuits, you can generate data that not only matches the input data in quality but potentially exceeds it, if the priors built into the learning algorithm match the real world.

delusional•5h ago

Computers aren't humans.

We have truly reached peak hackernews here.

stavros•5h ago

Humans produce out-of-distribution data all the time, yet if you had a teacher making up facts and teaching them to your kids, you would probably complain.

scarmig•4h ago

Humans also sometimes hallucinate and produce non-sequitors.

suddenlybananas•4h ago

Maybe you do, but people don't "hallucinate". Lying or being mistaken is a very different thing.

tim333•3h ago

Humans can learn from visualising situations and thinking through different scenarios. I don't see why AI / robots can't do similar. In fact I think quite a lot of training for things like Tesla self driving is done in simulation.

thecupisblue•3h ago

This is definitely one of the potential issues that might happen to embodied agents/robots/bodies trained on the "world model". As we are training a model for the real world based on a model that simulates the real world, the glitches in the world simulator model will be incorporated into the training. There will be edge cases due to this layered "overtraining", where a robot/agent/body will expect Y to happen but X will happen, causing unpredictable behaviour.I assume that a generic world agent will be able to autocorrect, but this could also lead to dangerous issues.

I.e. if the simulation has enough videos of firefighters breaking glass where it seems to drop instantaneously and in the world sim it always breaks, a firefighter robot might get into a problem when confronted with unbreakable glass, as it expects it to break as always, leading to a loop of trying to shatter the glass instead of performing another action.

casenmgreen•9h ago

The guy who tried was invite by Google to try it.

He seems to me too enthusiastic, such that I feel Google asked him in particular because they trusted him to write very positively.

alphabetting•8h ago

I doubt there was a condition on writing positively. Other people who tested have said this won't replace engines. https://togelius.blogspot.com/2025/08/genie-3-and-future-of-...

echelon•7h ago

> What I don't think this technology will do is replace game engines. I just don't see how you could get the very precise and predictable editing you have in a regular game engine from anything like the current model. The real advantage of game engines is how they allow teams of game developers to work together, making small and localized changes to a game project.

I've been thinking about this a while and it's obvious to me:

Put Minecraft (or something similar) under the hood. You just need data structures to encode the world. To enable mutation, location, and persistence.

If the model is given additional parameters such as a "world mesh", then it can easily persist where things are, what color or texture they should be, etc.

That data structure or server can be running independently on CPU-bound processes. Genie or whatever "world model" you have is just your renderer.

It probably won't happen like this due to monopolistic forces, but a nice future might be a future where you could hot swap renderers between providers yet still be playing the same game as your friends - just with different looks and feels. Experiencing the world differently all at the same time. (It'll probably be winner take all, sadly, or several independent vertical silos.)

If I were Tim Sweeny at Epic Games, I'd immediately drop all work on Unreal Engine and start looking into this tech. Because this is going to shore them up on both the gaming and film fronts.

K0balt•7h ago

As a renderer, given a POV, lighting conditions, and world mesh might be a very, very good system. Sort of a tight MCP connection to the world-state.

I think in this context, it could be amazing for game creation.

I’d imagine you would provide item descriptions to vibe-code objects and behavior scripts, set up some initial world state(maps), populated with objects made of objects - hierarchically vibe-modeled, make a few renderings to give inspirational world-feel and textures, and vibe-tune the world until you had the look and feel you want. Then once the textures and models and world were finalised, it would be used as the rendering context.

I think this is a place that there is enough feedback loops and supervision that with decent tools along these lines, you could 100x the efficiency of game development.

It would blow up the game industry, but also spawn a million independent one or two person studios producing some really imaginative niche experiences that could be much, much more expansive (like a AAA title) than the typical indie-studio product.

echelon•5h ago

> you could 100x the efficiency of game development.

> It would blow up the game industry, but also spawn a million independent one or two person studios producing some really imaginative niche experiences that could be much, much more expansive (like a AAA title) than the typical indie-studio product.

All video games become Minecraft / Roblox / VRChat. You don't need AAA studios. People can make and share their own games with friends.

Scary realization: YouTube becomes YouGame and Google wins the Internet forever.

keithwhor•5h ago

You’ve just described what Roblox is already doing.

echelon•4h ago

Roblox can't beat Google in AI. Roblox has network effects with users, but on an old school tech platform where users can't magic things into existence.

I've seen Roblox's creative tools, even their GenAI tools, but they're bolted on. It's the steam powered horse problem.

phkahler•7h ago

But can we use it to create movies one scene at a time?

SequoiaHope•6h ago

You don’t ask people to speak how you want, you simply only invite people who already have a history of speaking how you want. This phenomena is explained in detail I. Noam Chomsky’s work around mass media (eg NY Times doesn’t tell their editors what to do exactly, but only hire editors who already want to say what NY Times wants, or have a certain world view). The same can be applied to social media reviews. Invite the person who gives glowing reviews all the time.

delusional•5h ago

Do you know where Noam makes that argument? I've been trying to figure out where I picked it up years ago. I'd like to revisit it to deepen my understanding. It's a pretty universal insight.

kevindamm•5h ago

I think it was in "Manufacturing Consent" by Edward S. Herman and Noam Chomsky.

https://en.wikipedia.org/wiki/Manufacturing_Consent#:~:text=...

https://www.goodreads.com/book/show/12617.Manufacturing_Cons...

Though this is often associated with his and Herman's "Propaganda Model," Chomsky has also commented that the same appears in scholarly literature, despite the overt propaganda forces of ownership and advertisement being absent:

https://en.wikipedia.org/wiki/Propaganda_model#:~:text=Choms...

sitkack•1h ago

Look for discussion with British journalist Andrew Marr during a BBC interview in 1996.

The lead in to the quote starts at https://youtu.be/GjENnyQupow?t=662

"I don't say you're self-censoring - I'm sure you believe everything you're saying; but what I'm saying is, if you believed something different, you wouldn't be sitting where you're sitting." -- Noam Chomksy to Andrew Marr

make3•6h ago

It wouldn't be surprising if a structured version of this with state cached per room for example could be used in a game.

& you're basically seeing GPT-3 and saying it will never be used in any serious application.. the rate of improvement in their model is insane

echelon•6h ago

Don't put the world state into the model. Use the model as a renderer of whatever objects the "engine" throws at it.

Use the CPU and RAM for world state, then pass it off to the model to render.

Regardless of how this is done, Unreal Engine with all of its bells and whistles is toast. That C++ pile of engineering won't outdo something this flexible.

rpcope1•4h ago

How many watts and how much capital does it take to run this model? How many watts and how much capital does it take to run unity or unreal? I suspect there's a huge discrepancy here, among other things.

echelon•8h ago

I don't know. I wasn't there and I'm excited.

I think this puts Epic Games, Nintendo, and the whole lot into a very tough spot if this tech takes off.

I don't see how Unreal Engine, with its voluminous and labyrinthine tomes of impenetrable legacy C++ code, survives this. Unreal Engine is a mess, gamers are unhappy about it, and it's a PITA to develop with. I certainly hate working with it.

Innovator's Dilemma fast approaching the entire gaming industry and they don't even see it coming it's happening so fast.

Exciting that building games could become as easy as having the idea itself. I'm imagining something like VRChat or Roblox or Fortnite, but where new things are simply spoken into existence.

It's absolutely terrifying that Google has this much power.

sureglymop•8h ago

How so? It's not really by itself being creative yet, no? It sure seems like a game changer but who knows if one can even use this at scale?

echelon•8h ago

I played around with Diamond WM on my 3090 machine. I also ran fast SDXL-turbo and LCM models with ControlNets paired with a 3D game prototype I threw together. The results were very compelling, and I was just one person hacking things together.

This is 100% going to happen on-device. It's just a matter of time.

rakete•4h ago

I am convinced as well this will eventually be how we render games and simulations.

Maybe just as kind of a DLSS on steroids where the engine only renders very simple objects and a world model translates these to the actual graphics.

tim333•3h ago

I imagine Unreal Engine will start incorporating such stuff?

csomar•8h ago

Also he is ex-Google Mind. Like the worst kind of pick you can make when there are dozens of eligible journalists out there.

kkukshtel•8h ago

I similarly am surprised at how fast they are progressing. I wrote this piece a few months ago about how I think steering world model output is the next realm of AAA gaming:

https://kylekukshtel.com/diffusion-aaa-gamedev-doom-minecraf...

But even when I wrote that I thought things were still a few years out. I facetiously said that Rockstar would be nerd-sniped on GTA6 by a world model, which sounded crazy a few months ago. But seeing the progress already made since GameNGen and knowing GTA6 is still a year away... maybe it will actually happen.

throwmeaway222•5h ago

I'm trying to wrap my head around this since we're still seeing text spit out slowly ( I mean slowly as in 1000's of tokens a second)

I'm starting to think some of the names behind LLMs/GenAI are cover names for aliens and any actual humans involved have signed an NDA that comes with millions of dollars and a death warrant if disobeyed.

ewoodrich•5h ago

> Rockstar would be nerd-sniped on GTA6 by a world model

I'm having trouble parsing your meaning here.

GTA isn't really a "drive on the street simulator", is it? There is deliberate creative and artistic vision that makes the series so enjoyable to play even decades after release, despite the graphics quality becoming more dated every year by AAA standards.

Are you saying someone would "vibe model" a GTAish clone with modern graphics that would overtake the actual GTA6 in popularity? That seems extremely unlikely to me.

everforward•3h ago

Probably depends on how you engage with GTA. “Drive on the street simulator” along with arrays of weapons and explosions is the majority of my hours in GTA.

I despise the creative and artistic vision of GTA online, but I’m clearly in a minority there gauging by how much money they’ve made off it.

corimaith•55m ago

The future of games was MMORPGs and RPG-ization in general as other genres adopted progression systems. But the former two are simply too expensive and risky even today for AAA to develop. Which brings us to another point, the problem with Western AAA is more about high levels of risk aversion, which is what's really feeding the lack of imaginative. And that's more to do with the economics of opportunity cost to the S&P 500.

Anyways, crafting pretty looking worlds is one thing, but you still need to fill them in with something worth doing, and that's something we haven't really figured out. That's one of the reasons why the sandbox MMORPG was developed as opposed to "themeparks". The underlying systems, the backend is the real meat here. At most with the world models right now is that you're replacing 3d artists and animators, but I would not say that is a real bottleneck in relation to one's own limitations.

forrestthewoods•7h ago

> this is a clear glimpse into the future.

Not for video games it isn’t.

dlivingston•5h ago

Unless and until state can be stored outside of the model.

I for one would love a video game where you're playing in a psychedelic, dream-like fugue.

throwmeaway222•5h ago

It's kinda crazy though that a single game session would be burning enough natural gas to power 3 cities. Unless that's not true

forrestthewoods•3h ago

It is plausible to run a full simulation the old fashioned way and realtime render it with a diffusion model.

It is not currently, or near term, realistic to make a video game where a meaningful portion of the simulation is part of the model.

There will probably be a few interactive model-first experiences. But they’ll be popular as short novelties not meaningful or long experiences.

A simple question to consider is how would you adjust a set of simple tunables in a model-first simulator? For example giving the player more health, making enemies deal 2x damage, increasing move speed, etc etc. You can not.

tugn8r•4h ago

But that was always going to be the case?

Reality is not composed of words, syntax, and semantics. A human modal is.

Other human modals are sensory only, no language.

So vision learning and energy models that capture the energy to achieve a visual, audio, physical robotics behavior are the only real goal.

Software is for those who read the manual with their new NES game. Where are the words inside us?

Statistical physics of energy to make machine draw the glyphs of language not opionated clustering of language that will close the keyboard and mouse input loop. We're like replicating human work habits. Those are real physical behaviors. Not just descriptions in words.

ojosilva•3h ago

Gaming is certainly a use case, but I think this is primarily coming as synthetic data generation for Google's robots training in warehouses:

https://www.theguardian.com/technology/2025/aug/05/google-st...

Gemini Robot launch 4 mo ago:

https://news.ycombinator.com/item?id=43344082

resters•2h ago

consider the hardware DOOM runs on. 720p would only be a true test of capability if every bit of possible detail was used.

SequoiaHope•1h ago

Yeah this is going to be excellent for robotics because it’s good enough to clear the reality gap (visually - physics would be another story).

ProofHouse•49m ago

Curious how multiplayer would possibly work not only logistically, but technically and from a game play POV

Oarch•10h ago

I don't think I've ever seen a presentation that's had me question reality multiple times before. My mind is suitably blown.

andhuman•10h ago

Now this could be the killer app VR's been looking for.

gundmc•10h ago

Interesting! This feels like they're trying to position it as a competitor to Nvidia's Omniverse, which is based on the Universal Scene Descriptor format as the backbone. I wonder what format world objects can be ingested into Genie in - e.g. for the manufacturing use cases mentioned.

dzonga•10h ago

movies are about to become cheap to produce.

good writers will remain scarce though.

maybe we will have personalized movies written entirely through A.I

mclau157•10h ago

I can see this being incredible for history lessons and history school lectures

MarkusQ•10h ago

Some physicist once said "I endeavor to never write more clearly than I think"; in the same way, history probably shouldn't be presented more vividly than it's understood. (We already have this problem with people remembering incidental details and emotional vibes from historical fiction as if they were established historical fact; VR diffusion delusions would make this much worse.)

scotty79•9h ago

History is mostly made up. You can be sure mostly about general facts. The other 80% are just narratives.

ecshafer•9h ago

If you read actual history the historians typically go into quite a lot of depth on why they think X happened as opposed to Y, and what the limitations are on the theories and the reasoning. The amount of archaeological and written records we have is very important to those facts.

quesera•9h ago

Also true of the present. :)

suddenlybananas•10h ago

Not really, since it will hallucinate all sorts of ridiculous anachronisms.

ivape•9h ago

It's going to replace video games.

mclau157•9h ago

Do people play video games to look at pretty scenery? No most people are testing skills in video games and this will not test skill for a while

okasaki•9h ago

They do both. Nobody played Cyberpunk 2077 for the riveting gameplay.

Actually that game felt a lot like these videos, because often you would turn around and then look back and the game had deleted the NPCs and generated new ones, etc.

cptroot•7h ago

People played Cyberpunk 2077 because it had oceans of engaging story, which _is_ the gameplay.

klibertp•8h ago

There's an entire genre of games (immersive sims) that focus on experiencing the world with little to sometimes no skill required on the part of the player. The genre is diverse and incorporates elements of more gameplay-focused genres. It's also pretty popular.

I think some people want to play, and some want to experience, in different proportions. Tetris is the emanation of pure gameplay, but then you have to remember "Colossal Cave Adventure" is even older than Tetris. So there's a long history of both approaches, and for one of them, these models could be helpful.

Not that it matters. Until the models land in the hands of indie developers for long enough for them to prove their usefulness, no large developer will be willing to take on the risks involved in shipping things that have the slightest possibility of generating "wrong" content. So, the AI in games is still a long way off, I think.

lbrito•8h ago

>No most people are testing skills in video games

You must be young. As people get older they (usually) care less about that.

nosignono•4h ago

> Do people play video games to look at pretty scenery?

Yes.

> No most people are testing skills in video games

That's not mutually exclusive with playing for scenery.

Games, like all art, have different communities that enjoy them for different reasons. Some people do not want their skills tested at all by a game. Some people want the maximum skill testing. Some want to experience novel fantasy places, some people want to experience real places. Some people want to tell complex weaving narratives, some people want to optimize logistics.

A game like Flower is absolutely a game about looking at pretty scenery and not one about testing skill.

SirMaster•9h ago

I doubt it. The only video games I play are competitive games like DotA 2, Counter Strike 2, Call of Duty, Rainbow 6 Siege, etc. I don't really see how this completes or replaced that at all.

ecshafer•9h ago

Why? Sure a virtual walk around the Pantheon in all its glory would be nice. But would that really improve history lessons? It doesn't help students understand why things happened, and what the consequences were and how they have impacted the rest of history of the modern world.

Philpax•9h ago

Inhabiting a foreign cultural context can provide information that factual lessons may struggle to convey to the same degree. Of course, there's a limit to this - especially with regards to historical accuracy - but you are much more likely to understand why specific historical decisions were made if you are "in the room" where they happened, so to speak.

motoxpro•9h ago

Engagement is one of the core pieces education and one of the hardest things to solve. If you remember back to being a kid, reading white papers is not really a thing. Interesting (e.g. engaging) teachers and field trips (which not all schools have access to) are tools that help kids learn.

At the limit, if you could stay engaged you would be an expert in pretty much anything.

"It doesn't help students understand why things happened, and what the consequences were and how they have impacted the rest of history of the modern world." I would say the opposite, let's recreate each step in that historical journey so you can see exactly what the concequenses were, exactly why they happened and when.

Workaccount2•10h ago

I wonder how hard it would be to get VR output?

That's an insane product right there just waiting to happen. Too bad Google sleeps so hard on the tech they create.

SeanaldMcDnld•8h ago

Consistent output and spatial coherence across each eye, maybe a couple years? But meeting head tracking accuracy and latency requirements, I’d bet decades. There’s no way any of this tech reduces end to end latency to acceptable levels, without a massive change in hardware. We’ll probably see someone use reprojection techniques in a year or so and claim they’ve done it. But true generated pixels straight to the headset based on head tracking, is so so far away.

kridsdale3•6h ago

Agree. So I'll make a wild bet of "20 years". And hope for the best.

nosignono•5h ago

You don't have to do it in real time, per se. I imagine a world in which the renderer and the world generation are decoupled. For example, you could descriptively articulate what you wanted to achieve and have it generate a world, quietly do some structure from motion (or just generate the models and textures), and those those as assets in a game engine for the actual moment to moment rendering.

You'd have some "please wait in this lobby space while we generate the universe" moments, but those are easy to hide with clever design.

pawelduda•7h ago

It's hard to get an acceptable VR output for today's rendering engines still. In the examples provided, the movement seems to be slow and somewhat linear, which doesn't translate to head movements in VR. VR needs 2 consistent videos with much higher resolutions and low latency is a must. The feedback would still be very dependent on people's tolerance to all imperfections - some would be amazed, others would puke. That's why VR still isn't in the spotlight after all the years (I personally find it great).

kridsdale3•6h ago

I think VR will come at the same time they make multiplayer. There needs to be differentiation between the world-state and the viewport. Right now, I suspect they're the same.

But once you can get N cameras looking at the same world-state, you can make them N players, or a player with 2 eyes.

fnands•10h ago

Damn, I'm getting Black Mirror vibes from this. Maybe because I watched the Eulogy episode last night.

Really great work though, impressive to see.

ollin•10h ago

This is very encouraging progress, and probably what Demis was teasing [1] last month. A few speculations on technical details based on staring at the released clips:

1. You can see fine textures "jump" every 4 frames - which means they're most likely using a 4x-temporal-downscaling VAE with at least 4-frame interaction latency (unless the VAE is also control-conditional). Unfortunately I didn't see any real-time footage to confirm the latency (at one point they intercut screen recordings with "fingers on keyboard" b-roll? hmm).

2. There's some 16x16 spatial blocking during fast motion which could mean 16x16 spatial downscaling in the VAE. Combined with 1, this would mean 24x1280x720/(4x16x16) = 21,600 tokens per second, or around 1.3 million tokens per minute.

3. The first frame of each clip looks a bit sharper and less videogamey than later stationary frames, which suggests this is could be a combination of text-to-image + image-to-world system (where the t2i system is trained on general data but the i2w system is finetuned on game data with labeled controls). Noticeable in e.g. the dirt/textures in [2]. I still noticed some trend towards more contrast/saturation over time, but it's not as bad as in other autoregressive video models I've seen.

[1] https://x.com/demishassabis/status/1940248521111961988

[2] https://deepmind.google/api/blob/website/media/genie_environ...

ollin•9h ago

Regarding latency, I found a live video of gameplay here [1] and it looks like closer to 1.1s keypress-to-photon latency (33 frames @ 30fps) based on when the onscreen keys start lighting up vs when the camera starts moving. This writeup [2] from someone who tried the Genie 3 research preview mentions that "while there is some control lag, I was told that this is due to the infrastructure used to serve the model rather than the model itself" so a lot of this latency may be added by their client/server streaming setup.

[1] https://x.com/holynski_/status/1952756737800651144

[2] https://togelius.blogspot.com/2025/08/genie-3-and-future-of-...

rotexo•8h ago

You know that thing in anxiety dreams where you feel very uncoordinated and your attempts to manipulate your surroundings result in unpredictable consequences? Like you try to slam on the brake pedal but your car doesn’t slow down, or you’re trying to get a leash on your dog to lead it out of a dangerous situation and you keep failing to hook it on the collar? Maybe that’s extra latency because your brain is trying to render the environment at the same time as it is acting.

svdr•3h ago

Your brain does not need to render any environments, just the experience of being in them.

dymk•20m ago

What do you think the difference is?

blibble•1h ago

> I found a live video of gameplay here [1] and it looks like closer to 1.1s keypress-to-photon latency (33 frames @ 30fps) based on when the onscreen keys start lighting up vs when the camera starts moving.

so better than Stadia?

crossbody•10h ago

I am much more convinced now that the Simulation Argument is correct

sercanov•9h ago

yeah the whole explosion around AI made me lean more to simulation theory. it's literally happening in front of our eyes and we're a baby civilization

Fraterkes•9h ago

I’m seeing a lot of variations on this in this thread, but we have been able to render photoreal things, and do intricate physical simulations, for a long time. This is mostly impressive because it is a real-time way to generate and render big, intricate worlds.

But if you believe reality is a simulation, why would these “efficient” world-generation methods convince you of anything? The tech our reality would have to be running on is still inconceivable science fiction.

ivape•9h ago

but we have been able to render photoreal things, and do intricate physical simulations, for a long time.

Not like this we haven't. This is convincing because I can have any of you close your eyes and imagine a world where pink rabbits hand out parking tickets. We're a neurolink away from going from thought > to prompt > to fantasy.

crossbody•9h ago

Agree with ivape.

To add: our reality does not have to be rendered in it's entirety, we'll just have very convincing and unscripted first-person view simulations. Only what you look at is getting rendered (e.g. tiny structures only get rendered when you use microscope).

Fraterkes•9h ago

I guess I should have clarified: when you talk about reality being a simulation, do you mean that we collectively live in a simulated universe, or that you personally are playing a very realistic vr game?

alec_irl•9h ago

What is the purpose of this? It seems designed to muddy the waters of reality vs. falsehood and put creatives in film/tv out of jobs. Real Jurassic Park moment here

Centigonal•9h ago

They mention some possible applications in the video. Training environments for robotics (use sample data to simulate the surface of mars or the inside of a nuclear reactor), educational worlds for students (like the old Encarta virtual tours), and disaster preparedness simulations (e.g. training firefighters on an endless variety of burning homes).

Obviously, none of these are super viable given the low accuracy and steerability of world models out today, but positive applications for this kind of tech do exist.

Also (I'm speculating now instead of restating the video), I think pretty soon someone will hook up a real time version of this to a voice model, and we will get some kind of interactive voice + keyboard (or VR) lucid dream experience.

sercanov•9h ago

like how? is this mainly realtime inference?

idencap•9h ago

what a time to be alive

hnthrow90348765•9h ago

Think of the pornographic possibilities

Mouvelie•7h ago

Why even be sarcastic about it ? There is no human invention that has not exploded thanks (or because of) pornographic possibilities. HD-DVD vs Blueray, Internet...I'd even argue that XR is not as big as it could be because it is really clamped down to deviant usage !

lackoftactics•9h ago

I thought I was not going to see too many negative comments here, yet I was mistaken. I thought if it's not LLM, people would have a more nuanced take and could look at the research with an open mind. The examples on the website are probably cherry-picked, but progress is really nice compared to Genie 2.

It's a nice step towards gains in embodied AI. Good work, DeepMind.

Uehreka•9h ago

A lot of the negativity around this post is about the fact that there’s no demo and no open weights, which is Correct Negativity. Like don’t get me wrong, it would be cool for something like this to exist, but I’ve generally learned not to trust AI companies’ descriptions of their models until someone (or I) can actually get their hands on it and see if it’s usable at all. A description of a model that isn’t going to be released to the public isn’t very interesting to me.

whywhywhywhy•8h ago

> but I’ve generally learned not to trust AI companies’ descriptions of their models

Sora was described very similar to this as a "world simulator" but ultimately it never materialized.

This one is a bit more hopeful from the videos though.

rowanG077•9h ago

I'm not sure this is interesting beyond the wow effect. Unless we can actually get the world out of the AI. The real reason chatgpt and friends actually have customers is that the text interface is actually durable and easily to build upon after generation. It's also super ez to feed text into a fresh cycle. But this, while looking fancy, doesn't seem to be on the path to actually working out. Unless there is a sane export to unreal or something.

xlbuttplug2•9h ago

What would scare me is if this becomes economically viable enough to release to the public, rather than staying an unlimited budget type of demo.

nkotov•9h ago

I wonder how far are we from being able to use this at home as a form of entertainment.

timeattack•9h ago

Advances in generative AI are making me progressively more and more depressive.

Creativity is taken from us at exponential rate. And I don't buy argument from people who are saying they are excited to live in this age. I can get that if that technology stopped at current state and remained to be just tools for our creative endeavours, but it doesn't seem to be an endgame here. Instead it aims to be a complete replacement.

Granted, you can say "you still can play musical instruments/paint pictures/etc for yourself", but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.

So what is final state here for us? Return to menial not-yet-automated work? And when this would be eventually automated, what's left? Plug our brains to personalized autogenerated worlds that are tailored to trigger related neuronal circuitry for producing ever increasing dopamine levels and finally burn our brains out (which is arguably already happening with tiktok-style leasure)? And how you are supposed to pay for that, if all work is automated? How economics of that is supposed to work?

Looks like a pretty decent explanation of Fermi paradox. No-one would know how technology works, there are no easily available resources left to make use of simpler tech and planet is littered to the point of no return.

How to even find the value in living given all of that?

delfinom•9h ago

All I know is I am investing into suicide booth startups

taberiand•9h ago

So that the robots have a leisure activity, or so that humans get a quick escape in the face of runaway climate change?

wahnfrieden•9h ago

Automation only leads to more labor if we allow that employer relation to dictate so. Automation affords leisure time (for everything besides labor that life has to offer, including optional labor-like pursuits) but it’s currently unevenly distributed who gets to benefit from that

worldsayshi•9h ago

We keep coming back to the conclusion that we need to turn the economy on its head.

With business as usual capital is power and capital is increasingly getting centralized.

fgafford•9h ago

You need to read Brave New World. Already have all that figured out.

Work is fundamental part of society and will never be eliminated, regardless of its utility/usefulness. The cast/class system determines the type of work. The amount (time) of work is set as it was discovered additional leisure and to reduce it does not improve individuals happiness.

wahnfrieden•9h ago

Try reading Dawn of Everything

snickerdoodle12•9h ago

I see two ways this is going to go:

1. Universal Basic Income as we're on the way to a post-scarcity society. Unlikely to actually happen due to greed.

2. We take inspiration from the french revolution and then return to a simpler time.

dist-epoch•9h ago

In the french revolution the army and the people had similar kind of weapons. And there was no total surveillance to round up the leaders.

snickerdoodle12•9h ago

Yes, it'd be difficult. I have some faith that once things escalate far enough the people wielding the weapons are unwilling to murder their countrymen en masse.

Luigi Mangione has shown that all it takes is one person in the right time and place to remove some evil from the world.

HeatrayEnjoyer•3h ago

It needs to happen, at minimum, before drones can reliably maintain themselves and kill dissidents in the street. At that point even if the human police and soldiers become disloyal it'll be too late; a society of two types of people, the one guy with access to issue prompts, and everyone else.

holoduke•5h ago

If bio engineering takes off for real we will integrate our consciousness in our artificial digital ecosystem.

maerF0x0•3h ago

> Unlikely to actually happen due to greed.

Greed makes no sense in a truly post scarcity society. There is no scarcity from which to take in a zero sum way from another.

Status is the real issue. Humans use status to select sexually, and the display is both competitive and comparative. It doesnt matter absolutely how many pants you have, only that you have more and better than your competition.

I actually think this thing is baked into our DNA and until sex itself is saturated (if there is such a thing), or DNA is altered, we will continue to have a however subtle form of competition undergirding all interactions.

tim333•3h ago

I think UBI is likely to happen because of greed - people like free stuff and will vote for it is it's real. The trouble with the pitch:

>Vote for me and we'll hand free money to everyone and the robots will do the work

at the moment is the robots doing the work don't exist. Things will change when they do.

dist-epoch•9h ago

> Looks like a pretty decent explanation of Fermi paradox.

It's not. We will be replaced, but the AI will carry on.

dingnuts•9h ago

this is a religious opinion at this state of technological development lol

a lot of these comments border on cult thinking. it's a fucking text to 3D image model, not R Daneel Olivaw, calm down

myrmidon•8h ago

Do you honestly believe that human minds won't be overtaken within the century?

I'll concede that it might take even longer to get full artificial human capabilities (robust, selfrepairing, selfreplicating, adaptable), but the writing is on the wall.

Even in the very best case that I see (non-malicious AI with a soft practical ceiling not too far beyond human capabilities) poses giant challenges for our whole society, just in ressource allocation alone (because people, as workers, become practically worthless, undermining our whole system completely).

hooverd•9h ago

Eh, might as well kill yourself now then.

skybrian•9h ago

We already live in a world where a vast library of songs by musicians who play much better than you are readily available on YouTube and Spotify. This seems like more of the same?

podgietaru•9h ago

I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.

I don't want to live in a world where these things are generated cheaply and easily for the profit of a very select few group of people.

I know the world doesn't work like I described in the top paragraph. But it's a lot closer to it than the bottom.

wolttam•9h ago

It's hard to see how there will be room for profit as this all advances

There will be two classes of media:

- Generated, consumed en-masse by uncreative, uninspired individuals looking for cheap thrill

- Human created, consumed by discerning individuals seeking out real human talent and expression. Valuing it based merely on the knowledge that a biological brain produced (or helped produce) it.

I tend to suspect that the latter will grow in value, not diminish, as time progresses

skeezyboy•8h ago

https://en.wikipedia.org/wiki/Pause_Giant_AI_Experiments:_An...

people said the world could literally end if we train anything bigger than chatgpt4... I would take these projections with a handful of salt

pizzathyme•7h ago

This is an incredible artifact.

skybrian•8h ago

It seems to me that you’re describing Hollywood? Admittedly, there are big budget productions, but Hollywood is all about fakery, it’s cheap for the consumer, and there’s a lot of audience-pleasing dreck.

There’s no bright line between computer and human-created video - computer tools are used everywhere.

bko•8h ago

> I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.

Rewarded how? 99.99% of people who do things like sports or artistic like writing never get "rewarded for doing so", at least in the way I imagine you mean the phrase. The reward is usually the experience itself. When someone picks up a ball or an instrument, they don't do so for some material reward.

Why should anyone be rewarded materially for something like this? Why are you so hung up on the <0.001% that can actually make some money now having to enjoy the activity more as a hobby than a profession.

podgietaru•8h ago

99.99% of people, really? You think there isn't a huge swath of the economy that are made up of professional writers, artists, musicians, graphic designers, and all the other creative professionals that the producers of these models aim to replicate the skills of?

Why am I so "hung up" on the livelihood of these people?

Doing art is a Hobby is a good in and of itself. I did not say otherwise. But when I see a movie, when I listen to a song, I want to appreciate the integrity and talent of the people that wrote them. I want them to get paid for that enjoyment. I don't think that's bizarre.

holoduke•5h ago

You can still makes movies , music etc. But now with better tools. Just accept the new reality and try to play this new level. The old won't come back. Its a waste of time to complain and feel frustrated. There are plenty of opportunities to express your creativity.

fantasizr•7h ago

I could see that theater and live music (especially performed on acoustic instruments) become hyper popular because it'll be the only talent worth paying to see when everything else is 'cheaply' made.

pessimizer•6h ago

> I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.

That world has only existed for the last hundred or so years, and the talent is usually brutally exploited by people whose main talent is parasitism. Only a tiny percentage of people who sell creative works can make a living out of it; the living to be made is in buying their works at a premium, bundling them, and reselling them, while offloading almost all of the risk to the creative as an "advance."

Then you're left in a situation where both the buyer of art and the creator of art are desperate to pander to the largest audience possible because everybody is leveraged. It's a dogshit world that creates dogshit art.

Saline9515•9h ago

It still requires work, dedication and produces authenticity. A world where AI can produce music instantly commoditizes it.

skybrian•9h ago

Music is already a commodity. You can just buy some anonymous background music to play in your restaurant. No effort required.

Saline9515•8h ago

Yes but I don't want to hear some anonymous background music.

A better example would be Spotify replacing artist-made music recommandations with low-quality alternatives, to reduce what it pays to artists. Everyone except Spotify loses in this scenario.

rohit89•7h ago

In the future, everyone will have their own ai agents capable of generating music to their own tastes. They won't be using spotify.

The future with AI is not going to be our current world with some parts replaced by AI. It will be a whole new way of life.

roywiggins•7h ago

My prediction is that personal generation is going to be niche forever, for purely social reasons. The demand for fandoms and fan communities seems to be essentially unlimited. Big artists have big fandoms, tiny ones have tiny fandoms, but none of that works with personalized generations.

HeatrayEnjoyer•4h ago

Communities around fictional universes are already fractured and shrinking in member size because of the sheer number of algorithmically targeted universes available.

Water cooler talk about what happened this week in M.A.S.H. or Friends is extinct.

Worse, in the long run even community may be synthesized. If a friend is meat or if they're silicon (or even carbon fiber!), does it matter if you can't tell the difference? It might to pre-modern boomers like me and you.

roywiggins•1h ago

I think things will look a lot more like Vinge's Rainbows End than everyone burrowing into their own personal algoentertainment. I can't speak for GenZ but when D&D can sell out Madison Square Garden, there doesn't seem to be any softening in people's interest in fandom.

Virtual influencers might be a big thing, Hatsune Miku has lots of fans. But it's still a shared fandom.

whamlastxmas•9h ago

I mean you can just listen to human made music if that’s an important part of the experience for you. I doubt humans are going to stop anytime soon

danelski•8h ago

But availability of new works shall change once the floor of how popular you need to be to survive off of art will change and it will, since not everyone will care. Taylor Swift will be fine either way, but it's not about her.

Saline9515•8h ago

If you flood the space with AI-made music costing a few cents to create, human artists will have a much harder time to survive professionally.

svantana•9h ago

You're quite the pessimist. I think the arts would do well to look at sports as a glimpse of their future. Machines are faster and stronger than people, but that hasn't had any impact on sports at all. Nobody's tuning in to the robot olympics.

rishabhparikh•9h ago

Agreed that no one wants to watch shotput when the ball is launched out of a cannon, but people might be interested when the robots competing are anthropomorphs.

For example, robot boxing: https://www.youtube.com/watch?v=rdkwjs_g83w

AstroBen•5h ago

Who did the visual effects of the last movie you watched?

Most commercial artists are very much unknown, in the background. This is a different situation from sport

likium•4h ago

A better analogy would be musicians. Recorded music is around but some musicians still make a living, mostly off live concerts and merch.

But it might also go the way of pottery, glass-making and weaving. They’re still around but extremely niche.

Etheryte•9h ago

> I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.

Numerous famous writers, painters, artists, etc counter this idea, Kafka being a notable example, whose significant works only came to light after his passing and against his will. This doesn't take away from the rest of your discussion point, but art always has and always will also exist solely for its own sake.

zyruh•9h ago

I agree. While I love AI, advancements must be responsible. We are made to be social beings and giving more and more of lives over to AI takes us away from the fundamental need to draw creativity, inspiration, and connection from other people. Thoughts?

quantumHazer•9h ago

Machine learning as it is needs human data and input to progress further.

Synthetic data can be useful until a certain point, but you can’t expect to have a better model on synthetic data alone indefinitely.

The moat of GDM here is YouTube. That have a bazillion of gameplay and whatever videos. But here it is.

The downside I can see is that most people will stop to publish content online for free since this companies have absolutely no respect whatsoever for the humans that created the data they use.

dawnerd•9h ago

Charging for content means nothing. Meta was pirating media and training against that and I suspect everyone else is too but hasn’t been caught yet.

dbspin•6h ago

I've never understood this argument... The real world is an unbounded training set that its cheap to observe with readily available sensors that have existed for almost a century.

yomismoaqui•9h ago

The question is, why are you doing art?

- Because you enjoy it

- Because you get pats in the back from people you share it with

- Because you want to earn money from it

The 1st one will continue to be true in this dystopian AI art future, the other not so much.

And sincerely I find that kind of human art, the one that comes from a pure inner force, the more interesting one.

EDIT: list formatting

sunsunsunsun•8h ago

You seem to forget that most artists enjoy it but due to the structure of our society are forced to either give it up for most of their waking life to earn money or attempt to market their art to the masses to make money. This AI stuff only makes it harder for artists to make any kind of living off of their work.

MetaWhirledPeas•7h ago

While there are plenty of cases where good artists make most of their money from the art, there are plenty of other cases where good artists have a 'real job' on the side.

jjrh•6h ago

Ideally AI makes it so you don't have to work and can pursue whatever interests.

assword•7h ago

> The 1st one will continue to be true in this dystopian AI art future, the other not so much.

No it won’t, you’ll be too busy trying to survive off of what pittance is left for you to have any time to waste on leisure activities.

flyinglizard•9h ago

I share your feelings. Also couple that with a populist and cynical political climate that can’t create effective regulations even if it wanted, and that by its very appetite for scale AI thrives at the hands of the few that can feed it and you get something quite bleak.

My only hope is that we could have created 100k nukes of monstrous yields but collectively decided not to. We instead created 10k smaller ones. We could have destroyed ourselves long ago but managed to avoid it.

roboboffin•9h ago

In theory, creativity is an infinite space. As technology advances it allows humans to explore more and more complex things; take the advancement of music as an example, synths, loops etc.

If humans are not stretched to their limits, and are still able to be creative, then the tools will help us find our way through this infinite space.

AI will never be able to generate everything for us, because that means it will need infinite computation.

rowanG077•8h ago

AI will not be able to generate everything for us. Just the things that are able to be explored by humans and hopefully a tad bit more. AI is already more creative than humans by a lot of measures.

roboboffin•8h ago

Depends what you mean by creativity. In some ways, AI is not creative at all, everything is generated by mapping text to visuals using diffusion modelling via a shared latent space. It has no agency or creative thought of its own.

Humans have demonstrated time and again, even things beyond our experience can be explored by us; quantum mechanics for example. Humans find a way to map very complex subjects to our own experience using analogy. Maybe AI can help us go further by allowing us to do this on even more complex ideas.

seanw444•7h ago

It doesn't need to generate everything. It only needs to be marginally better or more efficient than a human for it to start generating everything humans need when needed.

Edit: left the page open for a while before responding, and the other person responded with basically the same thing within that time.

roboboffin•6h ago

If human need drives the creative process, then there will always be a human in the loop. Instead, each human becomes the “random seed” that initialises the process based on their own unique make-up. This is only different from how things work now, in that humans are also creating the artefact.

Similar to how synths meant we no longer need to play an instruments by plucking strings, it hasn’t affected the higher level creativity of creating music, only expanded it.

pixelesque•9h ago

What's interesting to me along these lines is I assume most of the companies funding the research are targeting the "creative" media in terms of image generation, music generation, avatars, speach, etc.

I can understand it's very interesting from a researcher's point-of-view (I'm a software dev who's worked adjacent to some ML researchers doing pipeline stuff to integrate models into software), but at the same time: Where are the robots to do menial work like clean toilets, kitchens, homes, etc?

I assume the funding isn't there? Or maybe it's much less exciting to research diffusion networks for image generation that working out algorithms for the best way to clean toilets :)

dingnuts•8h ago

robotics is difficult and since transformers are just next word predictors they can't actually help us design those robots :)

also the billionaires have help so they don't give a shit if the menial stuff is automated or not. throw in a little misogyny by and large too; I saw a LinkedIn Lunatic in the wild (some C-level) saying laundry is already automated because laundry machines exist

fucking.. tell me you don't ever do the laundry without telling me. That guy's poor wife.

einarfd•8h ago

There are companies out there working on those problems as well. How the funding climate for them are. I don't know. But the market for smart robots, should be gigantic. So there must be some. Keep in mind that what is easy, and hard for a human, which is the result of billions of years of evolution. Isn't necessary the same things that are hard or easy for our technologies.

cherry_tree•8h ago

There was a recent talk about using vision language models to train robots to do household tasks: https://youtu.be/a8-QsBHoH94

I wonder how advanced world models like genie 3 would change the approach if it all.

lentil_soup•7h ago

Or replacing CEOs, investors, bankers? I would have thought those would be easier to replace than creating robots to clean or replacing artists, or even developers. Maybe I am wrong?

SirHumphrey•6h ago

All these jobs are more who you know not what you know. The social network of these people is often an integral part of the work, so they are in a sense much safer than programmers, accountants and artists.

myahio•9h ago

What specific form of creative media is this supposed to replace though? I feel like its just going to create a brand new, exciting category of entertainment. I personally fail to see any bad precedent within this announcement.

skeezyboy•8h ago

a reminder, most of the world do manual labour in exchange for money. an LLM cant help with that and never will

rowanG077•8h ago

There is huge progress in robotics. Which includes fruits from the LLM hype. A lot of manual labor will be able to be done by humanoid robots.

Wissenschafter•8h ago

"Granted, you can say "you still can play musical instruments/paint pictures/etc for yourself", but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse."

I sit and play guitar by myself all the time, I play for nobody but myself, and I enjoy it a lot. Your argument is absurd.

furyofantares•8h ago

> but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse

Kids do it all the time.

> So what is final state here for us?

Something I haven't seen discussed too much is taste - human tastes change based on what has come before. What we will care about tomorrow is not what we care about today.

It seems plausible to me that generative AI could get higher and higher quality without really touching how human tastes changes. That would leave a lot of room for human creativity IMO - we have shared experience in a changing world that seems very hard to capture with data.

mbowcut2•8h ago

It's not a new problem (for individuals), though perhaps at an unprecedented scale (so, maybe a new problem for civilization). I'm sure there were black smiths that felt they had lost their meaning when they were replaced by industrial manufacturing.

Kiro•8h ago

I don't understand your argument at all. I've made hundreds of songs in my life that I haven't shared with anyone and so have all other musicians I know. The act of creating is separate from finding or having an audience. In fact, I would say that the complete opposite of what you say is true.

And even so, music production has been a constant evolution of replacing prior technologies and making it easier to get into. It used to be gatekept by expensive hardware.

p4coder•8h ago

Today physical world is largely mechanized, we rarely walk, run lift heavy things for survival. So we grow fat and weak unless we exercise. Tomorrow vast majority of us will never think, create, investigate for earning a living. So we will get dumb and dumber over time. A small minority of us will keep polishing their intellect but will never be smarter than machines just like the best athletes of today can't outrun machines.

pizzathyme•8h ago

This is surprisingly a great analogy because millions of people still run every week for their own benefit (physical and mental health, social connection, etc).

I wonder if mental exercises will move to the same category? Not necessarily a way to earn money, but something everybody does as a way of flourishing as a human.

psbp•7h ago

The process of thinking and exploring ideas is inherently enriching.

Nothing can take away your ability to have incredible experiences, except if the robots kill us all.

thinkingtoilet•6h ago

I don't know... There are plenty of otherwise capable adults who just get home from work and watch TV. They either never, or extremely rarely, indulge in hobbies, go see a concert, or even go out to meet others. Not that TV can't be art and challenge us but lets be honest, 99% of it is not that.

psbp•6h ago

I have been this person. I can say that it's not a time of my life I look back on fondly.

dartharva•8h ago

I look at it as the pendulum swinging back.

For too long has humanity been collectively submerged into this hyper-consumption of the arts. We, our parents and our grandparents have been getting bombarded by some or the other form of artificial dopamine sweets - from videos to reels to xeets to "news" to ads to tunes to mainstream media - every second of the day, every single day. The kind of media consumption we have every day is something our forefathers would have been overwhelmed by within an hour. It is not natural.

This complete cheapening of the arts is finally giving us a chance to shed off this load for good.

michalf6•8h ago

Nick Land kind of took this line of reasoning to its ultimate conclusion, I recommend giving his ideas a read even if they sound repulsive.

"Nothing human makes it out of the near-future."

pizzathyme•8h ago

I've tried and failed to find a good starting point for his ideas. Do you recommend any?

michalf6•8h ago

This is pretty decent, at least the first half: https://www.youtube.com/watch?v=lrOVKHg_PJQ

lbrito•7h ago

>And how you are supposed to pay for that, if all work is automated? How economics of that is supposed to work?

With UBI, probably. With a central government formed by our robot overlords. But why even pay us at that point?

curious_cat_163•7h ago

> So what is final state here for us? Return to menial not-yet-automated work? And when this would be eventually automated, what's left? Plug our brains to personalized autogenerated worlds that are tailored to trigger related neuronal circuitry for producing ever increasing dopamine levels and finally burn our brains out (which is arguably already happening with tiktok-style leasure)? And how you are supposed to pay for that, if all work is automated? How economics of that is supposed to work?

Wow. What a picture! Here's an optimistic take, fwiw: Whenever we have had a paradigm shift in our ability to process information, we have grappled with it by shifting to higher-level tasks.

We tend to "invent" new work as we grapple with the technology. The job of a UX designer did not exist in 1970s (at least not as a separate category employing 1000s of people; now I want to be careful this is HN, so there might be someone on here who was doing that in the 70s!).

And there is capitalism -- if everyone has access to the best-in-class model, then no one has true edge in a competition. That is not a state that capitalism likes. The economics _will_ ultimately kick in. We just need this recent S-curve to settle for a bit.

rohit89•7h ago

> So what is final state here for us?

I think we have a long way to go yet. Humanity is still in the early stages of its tech tree with so many unknown and unsolved problems. If ASI does happen and solves literally everything, we will be in a position that is completely alien to what we have right now.

> How to even find the value in living given all of that?

I feel like a lot of AI angst comes from people who place their self-worth and value on external validation. There is value in simply existing and doing what you want to do even if nobody else wants it.

tekacs•6h ago

We can dream bigger: when music, images, video and 3d assets are far easier then treat them as primitives.

We can use these to create entire virtual worlds, games, software that incorporates these, and to incorporate creativity and media into infinitely more situations in real life.

We can create massive installations that are not a single image but an endless video with endless music, and then our hand turns to stabilizing and styling and aestheticizing those exactly in line with our (the artist's) preferences.

Romanticizing the idea that picking at a guitar is somehow 'more creative' than using a DAW to create incredibly complex and layered and beautiful music is the same thing that's happening here, even if the primitives seem 'scarier' and 'bigger'.

Plus, there are many situations in life that would be made infinitely more human by the introduction of our collective work in designing our aesthetic and putting it into the world, and encoding it into models. Installations and physical spaces can absolutely be more beautiful if we can produce more, taking the aesthetic(s) that we've built so far and making them dynamic to spaces.

Also for learning: as a young person learning to draw and sing and play music and so many other things, I would have tremendously appreciated the ability to generate and follow subtle, personalized generation - to take a photo of a scene in front of me and have the AI first sketch it loosely so that I can copy it, then escalate and escalate until I can do something bigger.

stillpointlab•6h ago

> I don't buy argument from people who are saying they are excited to live in this age

What argument is required for excitement? Excitement is a feeling not a rational act. It comes from optimism and imagination. There is no argument for optimism. There is often little reason in imagination.

> How to even find the value in living given all of that?

You might have heard of the Bhagavad Gita, a 2000+ year old spiritual text. It details a conversation between a warrior prince and a manifestation of God. The warrior prince is facing a very difficult battle and he is having doubts justifying any action in the face of the decisions he has to make. He is begging this manifestation of God to give him good reasons to act, good reasons not just to throw his weapons down, give away all his possessions and sit in a cave somewhere.

There are no definite answers in the text, just meditations on the question. Why should we act when the result is ultimately pointless, we will all die, people will forget you, situations will be resolved with or without you, etc.

This isn't some new question that LLMs are forcing us to confront. LLMs are just providing us a new reason to ask the same age-old questions we have been facing for as long as writing has existed.

HocusLocus•6h ago

Genie 3 not only groks the Bhagavad Gita, it can generate "Blue & Elephant People: The Movie".

vessenes•6h ago

Don’t be mad bro. Seriously. Every single person working on a film has creative input, not just someone hand painting a backdrop. You have an immense number of tools available to be creative with now. This is a great thing!

pessimizer•6h ago

> I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.

You don't think there was ever a time without a mass media culture? Plenty of people have furniture older than mass media culture. Even 20 years ago people could manage to be creative for a tiny audience of what were possibly other people doing creative things. It's only the zoomers who have never lived in a world where you never thought to consider how you could sell the song you were writing in your bedroom to the Chinese market.

It used to be that music didn't come on piano rolls, records, tapes, CDs or files. It used to be that your daughter would play music on the piano in the living room for the entire family. Even if it was music that wouldn't really sell, and wasn't perfectly played, people somehow managed to enjoy it. It was not a situation that AI could destroy. If anything, AI could assist.

stronglikedan•6h ago

> How to even find the value in living given all of that?

If your value in living is in any way affected by AI, ever, then, well, let's just say I would never choose that for myself. Good luck.

rikroots•6h ago

> Granted, you can say "you still can play musical instruments/paint pictures/etc for yourself", but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.

There's a whole host of "art" that has been created by people - sometimes for themselves, sometimes for a select few friends - which had little purpose beyond that creation[1]. Some people create art because they simply have to create art - for pleasure, for therapy, for whatever[2]. For many, the act of creation was far more important than the act of distribution[3].

For me, my obsession is constructing worlds, maps, societies and languages that will almost certainly die with me. And that's fine. When I feel the compulsion, I'll work on my constructions for a while, until the compulsion passes - just as I have done (on and off) for the past 50 years. If the world really needs to know about me, then it can learn more than it probably wants to know through my poetry.

[1] - Emily Dickinson is an obvious example: https://en.wikipedia.org/wiki/Emily_Dickinson

[2] - Coral Castle, Florida: https://en.wikipedia.org/wiki/Coral_Castle

[3] - Federico Garcia Lorca almost certainly didn't write his Sonetos del amor oscuro for publication - he just needed to write them: https://es.wikisource.org/wiki/Sonetos_del_amor_oscuro

neom•6h ago

In my opinion, what humans need, crave, chase, is novelty. Just look at how phobic we are of boredom. I believe creativity is part of the chasing of novelty, or the allaying of boredom. I studied film making in my 20s when the shift to digital happened, and I was the first cohort through the first digital film program in my country. When new ways to create become available, the people who struggle are often the ones who are unable to adapt their mindset to the new creative mediums and don't think "what is new to be done here". Many people when I graduated thought I was totally nuts of not owning or using an analogue camera, so many reasons, oh you can't trust the CF cards, oh the HDR will never get there, oh the shutter is too slow. This is just a version of that imo. I think AI and robotics are going all the way to the end, I'm trying to adjust my old man brain to the new world the best I can, feel blessed to have been part of a version of this before.

tomrod•5h ago

Branding and differentiation.

People still value Amish furniture or woodworking despite Ikea existing. I love that if I want a cheap chair made of cardboard and glue that I can find something to satisfy that need; but I still buy nice furniture when I can.

AI creations are analogous. I've seen some cool AI stuff, but it definitely doesn't replace the real "organic" art one finds.

HeatrayEnjoyer•3h ago

What if it's not cardboard and glue but woodworking of ultra-master quality?

These fears aren't realized if AI never achieves superhuman performance, but what if they do?

tomrod•54m ago

(1) It's not, currently, and isn't on the horizon despite current polish with LLM frameworks.

(2) AI has already achieved superhuman performance in breadth and, with tuning, depth.

mindwok•5h ago

Man, same here. I was initially a massive AI evangelist up until about a year ago, now I just feel sad for some reason - and I don’t want to feel sad, I’m a technologist at heart and I’ve been thrilled by every advance since I was born. I feel like some sad old boomer yelling at clouds and I’m not even 30 yet.

My only hope is this: I think the depression is telling us something real, we are collectively mourning what we see as the loss of our humanity and our meaning. We are resilient creatures though, and hopefully just like the ozone layer, junk food, and even the increasing rejections of social media and screen time, we will navigate it and reclaim what’s important to us. It might take some pain first though.

lubujackson•4h ago

Be comforted by the fact that no matter how good the AI gets, people crave human connection. Just like AI can generate music there is an uncanny valley effect where you quickly deduce there's no true humanity behind any of it, and ultimately undervalue it. At best you can have something like Minecraft or Dwarf Fortress where the generated worlds CAN be inspiring to a degree, but that is because the rules around generation are incredibly intricate and, ultimately, human.

Yes, AI can make music that sounds decent and lyrics that rhyme and can even be clever. But listen to a couple songs and your brain quickly spots the patterns. Maybe AI gets there some day, but the uncanny valley seems to be quite a chasm - and anything that approaches the other side seems to do so by piling lots of human intention along the way.

imiric•4h ago

I can relate. It's exhausting.

The main challenge over the next decade as all our media channels are flooded with generated media will become curation. We desperately need ways to filter human-created content from generated content. Not just for the sake of preserving art, but for avoiding societal collapse from disinformation, which is a much more direct and closer threat. Hell, we've been living with the consequences of mass disinformation for the past decade, but automated and much more believable campaigns flooding our communication platforms will drastically lower the signal-to-noise ratio. We're currently unable to even imagine the consequences of that, and are far from being prepared for it.

This tech needs strict regulation on a global scale. Anyone against this is either personally invested in it, or is ignorant of its dangers.

rolfus•4h ago

I'm one of those excited people! We haven't lost anything with this new technology, only gained.

The way I see it, most people aren't creative. And the people who are creatives are mostly creating for the love of it. Most books that are published are read exclusively by the friends and family of the author. Most musicians, most stand-up comedians, most artist get to show off their works for small groups of people and make no money doing so. But they do it anyway. I draw terrible portraits, make little inventions and sometimes I build something for the home, knowing full well that I do these things for my own enjoyment and whatever ego boost I get from showing these things off to people I know.

I'm doing a marathon later and I've been working my ass off for the prospect of crossing the finishing line as number four thousand and something, and I'll do it again next year.

j_timberlake•4h ago

I don't know how on Earth people can think like this. Most people can find "value" in a slice of pizza. It doesn't even have to be a good pizza.

Or kittens and puppies. Do you think there won't be kittens and puppies?

And that's putting aside all the obvious space-exploration stuff that will probably be more interesting than anything the previous 100 billion humans ever saw.

tim333•3h ago

>So what is final state here for us?

The merge. (https://blog.samaltman.com/the-merge)

I'm quite enthusiastic. I've always thought mortality sucks.

HardCodedBias•2h ago

"Creativity is taken from us at exponential rate"

Nothing is being taken away.

SamInTheShell•18m ago

My bets are hedged on being replaced one day, followed by a few years roughing it, to be eventually be met with something along the lines "Well damn, we really couldn't complete the entire loop on automating the automation" because frankly autocomplete will always be just that. Autocomplete.

Till then, I just learn the tools with the deepest understanding that I can muster and so far the deeper I go, the less impressed with "automated everything" I become, because it isn't really going to be capable of doing anything people are going to find interesting when the creativity well dries up.

seydor•9h ago

Would progress in these be faster if they created 3d meshes and animations instead of full frame videos?

zbrw•9h ago

I believe that the corpus of video data to train on with video far exceeds that of 3D data. It's also much cheaper to produce video data. So I'd expect that this is probably the quickest way forward from a current world state perspective.

Additionally, video seems like a pretty forward output shape to me - 2D image with a time component. If we were talking 3D assets and animations I wouldn't even know where to start with modeling that as input data for training. That seems really hard to model as a fixed input size problem to me.

If there was comparable 3D data available for training, I'd guess that we'd see different issues with different approaches.

A couple of examples that I could think of quickly: Using these to build games, might be easier if we could interact with the underlying "assets". Getting photorealistic results with intricate detail (e.g. hair, vegetation) might be easier with video based solutions.

nosignono•5h ago

If the fidelity of the video is high enough, you could use SFM to build point clouds from the generated video frames and essentially do photogrammatry on the assets from a genie video.

teamonkey•7h ago

I always wonder why they’re not chasing more pragmatic and lower-hanging fruit first.

There’s absolutely no reason that a game needs to be generated frame-by-frame like this. It seems like a deeply unserious approach to making games.

(My feeling is that it must be easier to train this way.)

seydor•6h ago

well actually image output is fixed and there s lots of training data. Neural networks can learn anything in their latent space so there is no need to impose 3D rendering constraints, and it s not evident that it's less efficient (for the model).

3D model rendering would be useful however for interfacing with robots.

teamonkey•6h ago

You often view 3D games on a 2D screen. That doesn’t mean that a game is natively 2D and the 3D world is an inconvenient step that can be bypassed. Actually the opposite, the 2D representation on screen is just a projection.

In VR, for example, the same 3D scene will be rendered twice, once for each eye, from two viewpoints 10-15cm apart.

If you don’t have an internal 3D representation of the world, the AI would need to generate exactly the same scene from a very slightly different perspective for each eye, without any discrepancies or artefacts.

And that’s not even discussing physics, collisions or any form of consistent world logic that happens off-screen. Or multiplayer!

kouteiheika•9h ago

> To that end, we're exploring how we can make Genie 3 available to additional testers in the future.

No need to explore; I can tell you how. Release the weights to the general public so that everyone can play with it and non-Google researchers can build their work upon it.

Of course this isn't going to happen because "safety". Even telling us how many parameters this model has is "unsafe".

Dlanv•8h ago

Modern AI wouldn't exist without Google's contributions. Yet they're a for-profit company. I'm ok with them keeping some things closed source every now and then.

Davidzheng•9h ago

This is one of the most insane feats of AI I have ever seen to be honest.

obayesshelton•9h ago

Strap on a headset and we are one step closer to being in a simulation.

artificialprint•4h ago

Jokes on u, I'm already in a simulation

sirolimus•9h ago

Not open source, not worth it. Next.

jl6•9h ago

Have they explained anywhere what hardware resources it takes to run this in 720p at 24fps with minutes-long context?

addisonj•9h ago

Really impressive... but wow this is light on details.

While I don't fully align with the sentiment of other commenters that this is meaningless unless you can go hands on... it is crazy to think of how different this announcement is than a few years ago when this would be accompanied by an actual paper that shared the research.

Instead... we get this thing that has a few aspects of a paper - authors, demos, a bibtex citation(!) - but none of the actual research shared.

I was discussing with a friend that my biggest concern with AI right now is not that it isn't capable of doing things... but that we switched from research/academic mode to full value extraction so fast that we are way out over our skis in terms of what is being promised, which, in the realm of exciting new field of academic research is pretty low-stakes all things considered... to being terrifying when we bet policy and economics on it.

To be clear, I am not against commercialization, but the dissonance of this product announcement made to look like research written in this way at the same time that one of the preeminent mathematicians writing about how our shift in funding of real academic research is having real, serious impact is... uh... not confidence inspiring for the long term.

demirbey05•9h ago

This is bad use of AI, we spend our compute to make science faster. I am pretty confident computational cost of this will be maybe 100x of chatgpt query. I don't want to think even environmental effects.

dev0p•9h ago

That's completely bonkers. We are making machines dream of explorable, editable, interactable worlds.

I wonder how much it costs to run something like this.

netdur•9h ago

Mark Zuckberd must very very upset looking at this, I expect him to throw another billion dollars at google engineers

idiotsecant•9h ago

A Mind needs a few things: The ability to synthesize sensor data about the outside world into a form that can be compressed into important features, the ability to choose which of those features to pay attention to, the ability to model the physical world around it, find reasonable solutions to problems, and simulate its actions before taking them, The ability to understand and simulate the actions of other Minds, the ability to compress events into important features and store them in memory, the ability to retrieve those memories and appropriate times and in appropriate clarity, etc.

I feel like as time goes on more and more of these important features are showing up as disconnected proofs of concept. I think eventually we'll have all the pieces and someone will just need to hook them together.

I am more and more convinced that AGI is just going to eventually happen and we'll barely notice because we'll get there inch by inch, with more and more amazing things every day.

superjan•9h ago

There are very few people visible in the demo’s. I suppose that is harder?

badmonster•8h ago

a massive leap forward for real-time world modeling

Bjorkbat•8h ago

Genuinely technically impressive, but I have a weird issue with calling these world simulator models. To me, they're video game simulator models.

I've only ever seen demos of these models where things happen from a first-person or 3rd-person perspective, often in the sort of context where you are controlling some sort of playable avatar. I've never seen a demo where they prompted a model to simulate a forest ecology and it simulated the complex interplay of life.

Hence, it feels like a video game simulator, or put another way, a simulator of a simulator of a world model.

Bjorkbat•8h ago

Also, to drive my point further home, in one of the demos they were operating a jetski during a festival. If the jetski bumps into a small Chinese lantern, it will move the lantern. Impressive. However, when the jetski bumped into some sort of floating structure the structure itself was completely unaffected while the jetski simply stopped moving.

This is a pretty clear example of video game physics at work. In the real world, both the jetski and floating structure would be much more affected by a collision, but in the context of video game physics such an interaction makes sense.

So yeah, it's a video game simulator, not a world simulator.

rohit89•7h ago

The goal is to eventually be able to model physics and all the various interactions accurately.

Bjorkbat•7h ago

Sure, but if you're trying to get there by training a model on video games then you're likely going to wind up inadvertently creating a video game simulator rather than a physics simulator.

I don't doubt they're trying to create a world simulator model, I just think they're inadvertently creating a video game simulator model.

rohit89•6h ago

Are they training only on video game data though? I would be surprised when its so easy to generate proper training data for this.

It is interesting to think about. This kind of training and model will only capture macro effects. You cannot use this to simulate what happens in a biological cell or tweak a gravity parameter and see how plants grow etc. For a true world model, you'd need to train models that can simulate at microscopic scales as well and then have it all integrated into a bigger model or something.

As an aside, I would love to see something like this for the human body. My belief is that we will only be able to truly solve human health if we have a way of simulating the human body.

kridsdale3•6h ago

In the "first person standing in a room" demo, it's cool to see 100% optical (trained from recorded footage from cameras) graphics, including non-rectilinear distortion of parallel lines as you'd get from a wide-angle lens and not a high-FOV game engine. But still the motion of the human protagonist and the camera angle were 100% trained on how characters and controllers work in video games.

lubujackson•4h ago

It doesn't feel incredibly far off from demoscene scripts that generate mountain ranges in 10k bytes or something. It is wildly impressive but may also be wildly limited in how it accomplishes it and not extensible in a way we would like.

phgn•8h ago

So we cannot use this yet?

While watching the video I was just imagining the $ increasing by the second. But then it's not available at all yet :(

ACAVJW4H•8h ago

Wondering What happens when we peer through a microscope or telescope?

koakuma-chan•8h ago

Damn, this reminds me of those Chinese FMV games on Steam.

_hark•8h ago

Very cool! I've done research on reinforcement/imitation learning in world models. A great intro to these ideas is here: https://worldmodels.github.io/

I'm most excited for when these methods will make a meaningful difference in robotics. RL is still not quite there for long-horizon, sparse reward tasks in non-zero-sum environments, even with a perfect simulator; e.g. an assistant which books travel for you. Pay attention to when virtual agents start to really work well as a leading signal for this. Virtual agents are strictly easier than physical ones.

Compounding on that, mismatches between the simulated dynamics and real dynamics make the problem harder (sim2real problem). Although with domain randomization and online corrections (control loop, search) this is less of an issue these days.

Multi-scale effects are also tricky: the characteristic temporal length scale for many actions in robotics can be quite different from the temporal scale of the task (e.g. manipulating ingredients to cook a meal). Locomotion was solved first because it's periodic imo.

Check out PufferAI if you're scale-pilled for RL: just do RL bigger, better, get the basics right. Check out Physical Intelligence for the same in robotics, with a more imitation/offline RL feel.

jimmySixDOF•8h ago

What gets me is the egocentric perspective it has naturally produced from its training data, where you have the perception of a 3D 6 degrees of freedom world space around you. Once it's running at 90 frames per second and working in a meshed geometry space, this will intersect with augmented virtual XR headsets, and the metaverse will become an interaction arena for working with artificial intelligence using our physical action, our gaze, our location, and a million other points of background noise telemetry, all of which will be integrated into what we now today call context and the response will be adjusting in a useful, meaningful way what we see painted into our environment. Imagine the world as a tangible user interface.

mhitza•8h ago

Just imagine if the developers of Star Citizen had access to this technology, how much more they could have squeezed from unsuspecting backers.

red_hare•8h ago

Can you imagine explaining to someone from the 1800s that we've created a fully generative virtual world experience and the demo was "painting a wall blue"

mattjreid•7h ago

They would be impressed by the paint roller - it wasn't invented until the 1940s.

SirHumphrey•6h ago

Reading works of early computer scientists (mathematicians?) like Ada Lovelace or Alan Turing it seems to me that they would be a lot less surprised than some current observers. The idea of artificial mind comes up a lot and they weren't witness to 30 years of slow and uninspiring NLP developments.

sys32768•7h ago

I'm imagining how these worlds combined with AI NPCs could help people learn real-world skills, or overcome serious anxiety disorders, etc.

forrestthewoods•7h ago

They’re very clever to only turn 90 degrees. I’d like to see a couple of 1080s with a little bit of 120 degree zig zagging along the way please.

internetter•7h ago

I feel like this tech is a dead end. If it could instead generate 3d models which are then rendered, that would be immensely useful. Eliminates memory and playtime constraints, allows it to be embedded in applications like games. But this? Where do we go from here? Even if we eliminate all graphical issues and get latency from 1s to 0, what purpose does it serve?

calebh•6h ago

I think the most likely path forward for commercialization/widespread use is to use AI as a post-processing filter for low poly games. Imagine if you could take low quality/low poly assets, run it through a game engine to add some basic lighting, then pass this through AI to get a photo-realistic image. This solves the most egregious cases of world inconsistency and still allows for creative human fine-tuning. The trick will be getting the post-processor to run at a reasonable frame rate.

internetter•6h ago

Don’t we already have upscalers which are frequently used in games for this purpose? Maybe they could go further and get better but I’d expect a model specifically designed to improve the quality of an existing image to be better/more efficient at doing so than an image generation model retrofitted to this purpose.

ralusek•7h ago

It's interesting, because I was always a bit confused and annoyed by the Giant's Drink/Mind Game that Ender plays in Ender's Game. It just always felt so different to how games I knew played, it felt odd that he would "discover" things that the developers hadn't intended, because I always just thought "wait, someone had to build that into the game just in case he happened to do that one specific thing?" Or if it was implied that they didn't do that, then my thought was "that's not how this works, how is it coming up with new/emergent stories?"

This feels almost exactly like that, especially the weird/dreamlike quality to it.

arjie•7h ago

This is beautiful. An incredible device that could expand people's view of history and science. We could create such immersive experiences with this.

I know that everyone always worries about trapping people in a simulation of reality etc. etc. but this would have blown my mind as a child. Even Riven was unbelievable to me. I spent hours in Terragen.

guybedo•7h ago

a lot to unpack here, i've added a detailed summary here:

https://extraakt.com/extraakts/google-s-genie-3-capabilities...

jp1016•6h ago

This looks incredibly promising not just for AI research but for practical use cases in game development. Being able to generate dynamic, navigable 3D environments from text prompts could save studios hundreds of hours of manual asset design and prototyping. It could also be a game-changer for indie devs who don’t have big teams.

Another interesting angle is retrofitting existing 2D content (like videos, images, or even map data) into interactive 3D experiences. Imagine integrating something like this into Google Maps suddenly street view becomes a fully explorable 3D simulation generated from just text or limited visual data.

creata•5h ago

It just generates video, though, doesn't it? How are you going to get usable assets out of that?

CharlieDigital•5h ago

Why wouldn't one be able to train an AI model to extract 3D models/assets out of an image/still from video?

teamonkey•1h ago

That would be more useful and there are some services that attempt to do that, though I don’t know of any that do it well enough that a human isn’t needed to clean up the mess.

Genie 3 isn’t that though. I don’t think it’s actually intended to be used for games at all.

cranium•6h ago

I find the model very impressive, but how could it be used in the wild? They mention robots (maybe to test them cheaply in completely different environments?), but I don't see the use in games except during development to generate ideas/assets.

muskmusk•6h ago

Jesus.

This is starting to feel pretty **ing exponential.

SpaceManNabs•6h ago

So are foundational models real finally now?

Are they just multimodal for everything?

Are foundational time series models included in this category?

nektro•6h ago

google pushing new levels of evil with this one

bluehat974•6h ago

It's feel like Ready Player One on Vision Pro will arrive soon

yahoozoo•6h ago

What format do these world models output? Since it's interactive, it's not just a video...does DeepMind have some kind of proprietary runtime or what?

creata•5h ago

> Since it's interactive, it's not just a video

I think it just outputs image frames...

yahoozoo•3h ago

Ah, yea. You're right. After reading a bit more, it's just "responding" to the prompts/navigation with real-time generation. Pretty cool.

guybedo•5h ago

it's simulations all the way down

whatever1•5h ago

This is scary. I don’t have a benchmark to propose but in don’t think my brain can imagine things with greater fidelity than this. I can probably write down the physics better but I think these systems have reached parity with at least my imagination model

unboxingelf•5h ago

The Simulation Theory presents the following trilemma, one of which must be true:

1. Almost all human-level civilizations go extinct before reaching a technologically mature “posthuman” stage capable of running high-fidelity ancestor simulations.

2. Almost no posthuman civilizations are interested in running simulations of their evolutionary history or beings like their ancestors.

3. We are almost certainly living in a computer simulation.

lotyrin•5h ago

If you take the idea of it needing to be a constructed simulation you get the dream argument. If you add that one can't verify anyone else having subjective experience you get Boltzmann brain. If you add the idea that maybe the ancestor simulations are designed to teach us virtuous behavior through repeated visits to simulation worlds you get the karmic cycle, and Boltzmann brain + karmic cycle is roughly the egg theory.

I think some/all of these things can roughly true at the same time. Imagine an infinite space full of chaotic noise that arises a solitary Boltzmann brain, top level universe and top level intelligence. This brain, seeking purpose and company in the void, dreams of itself in various situations (lower level universes) and some of those universes' societies seek to improve themselves through deliberate construction of karmic cycle ancestor simulation. A hierarchy of self-similar universes.

It was incredibly comforting to me to think that perhaps the reason my fellow human beings are so poor at empathy, inclusion, justice, is that this is a karmic kindergarten where we're intended to be learning these skills (and the consequences for failing to perform them) and so of course we're bad at it, it's why we're here.

crazygringo•2h ago

But there are lots of critiques of that supposed trilemma.

Why would beings in simulations be conscious?

Or maybe running simulations is really expensive and so it's done sometimes (more than "almost none") but only sometimes (nowhere near "we are almost certainly").

Or simulations are common but limited? You don't need to simulate a universe if all you want to do is simulate a city.

The "trilemma" is an extreme example of black-and-white thinking. In the real world, things cost resources and so there are tradeoffs -- so middle grounds are the rule, not extremes.

lotyrin•5h ago

Kinda wish the ski scenario had "yeti" as an event you could trigger.

mason_mpls•4h ago

The demo looks like they’re being very gentle with the AI, this doesn’t look like much of an advancement.

mason_mpls•4h ago

The claims being made in this announcement are not demonstrated in the video. A very careful first person walk in an AI video isn’t very impressive these days…

qwertox•4h ago

This is revolutionary. I mean, we already could see this coming, but now it's here. With limitations, but this is the beginning.

In game engines it's the engineers, the software developers who make sure triangles are at the perfect location, mapping to the correct pixels, but this here, this is now like a drawing made by a computer, frame by frame, with no triangles computed.

j_timberlake•3h ago

People are thinking "how are video games going to use this?"

That's not the point, video games are worth chump-change compared to robotics. Training AIs on real-world robotic arms scaled poorly, so they're looking for paths that leverage what AI scales well at.

maerF0x0•3h ago

I'm still struggling to imagine a world where predicting the next pixel wins over over building a deterministic thing that is then ran.

Eg: Using AI to generate textures, wire models, motion sequences which themselves sum up to something that local graphics card can then render into a scene.

I'm very much not an expert in this space, but to me it seems if you do that, then you can tweak the wire model, the texture, move the camera to wherever you want in the scene etc.

wolttam•2h ago

At some point it will be computationally cheaper to predict the next pixel than to classically render the scene, when talking about scenes beyond a certain graphical fidelity.

The model can infinitely zoom in to some surface and depict(/predict) what would really be there. Trying to do so via classical rendering introduces many technical challenges

brap•2h ago

I imagine a future where the “high level” stuff in the environment is pre defined by a human (with or without assistance from AI), and then AI sort of fills in the blanks on the fly.

So for example, a game designer might tell the AI the floor is made of mud, but won’t tell the AI what it looks like if the player decides to dig a 10 ft hole in the mud, or how difficult it is to dig, or what the mud sounds like when thrown out of the hole, or what a certain NPC might say when thrown down the hole, etc.

lossolo•1h ago

> At some point it will be computationally cheaper to predict the next pixel than to classically render the scene,

This is already happening to some extent, some games struggle to reach 60 FPS at 4K resolution with maximum graphics settings using traditional rasterization alone, so technologies like DLSS 3 frame generation are used to improve performance.

corimaith•50m ago

I think that's more to do with poor optimization that the actual level of graphical fidelity that requires it.

energy123•26m ago

"Wins" in the sense of being useful, or being on the critical path to AGI?

Vipitis•3h ago

I wish they would share more about how it works. Maybe a reseach paper for once? we didn't even get a technical report.

From my best guess: it's a video generation model like the ones we already head. But they condition inputs (movement direction, viewangle). Perhaps they aren't relative inputs but absolute and there is a bit of state simulation going on? [although some demo videos show physics interactions like bumping against objects - so that might be unlikely, or maybe it's 2D and the up axis is generated??].

It's clearly trained on a game engine as I can see screenspace reflection artefacts being learned. They also train on photoscans/splats... some non realistic elements look significantly lower fidelity too..

some inconsistencies I have noticed in the demo videos:

- wingsuit discollcusions are lower fidelity (maybe initialized by high resolution image?)

- garden demo has different "geometry" for each variation, look at the 2nd hose only existing in one version (new "geometry" is made up when first looked at, not beforehand).

- school demo has half a caroutside the window? and a suspiciously repeating pattern (infinite loop patterns are common in transformer models that lack parameters, so they can scale this even more! also might be greedy sampling for stability)

- museum scene has odd reflection in the amethyst box, like the rear mammoth doesn't have reflections on the right most side of the box before it's shown through the box. The tusk reflection just pops in. This isn't fresnel effect.

slj•3h ago

Everyone is in agreement, this is impressive stuff. Mind blowing, even. But have the good people at Google decided why exactly we need to build the torment nexus?

pedalpete•2h ago

We were working towards this years ago with Doarama/Ayvri, and I remember fondly in 2018 an investor literally yelling at me that I didn't know what I was talking about and AI would never be able to do this. Less than a decade later, here we are.

Our product was a virtual 3d world made up of satellite data. Think of a very quick, higher-res version of google earth, but the most important bit was that you uploaded a GPS track and it re-created the world around that space. The camera was always focused on the target, so it wasn't a first person point of view, which, for the most part, our brains aren't very good at understanding over an extended period of time.

For those curious about the use case, our product was used by every paraglider in the world, commercial drone operations, transportation infrastructure sales/planning, out-door events promotions (specifically bike and ultramarathon races).

Though I suspect we will see a new form of media come from this. I don't pretend to suggest exactly what this media will be, but mixing this with your photos we can see the potential for an infinitely re-framable and zoomable type of photo media.

Creating any "watchable" content will be challenging if the camera is not target focused, and it makes it difficult to create a storyline if you can't dictate where the viewer is pointed.

swalsh•2h ago

To be fair, I'm seeing the demo video, and I still don't believe it's possible. This is sci-fi tech.

nextfx•2h ago

Another case of moving the goalposts until you score a goal.

brikym•1h ago

The elephant in the room is the porn capabilities. Onlyfans will be dead in 10 years.

ProofHouse•51m ago

Can anyone specifically working or with expertise in this field, give even a best guest breakdown (or better) of the technology and architecture, system design or possibly even the compute requirement's of how they think this was implemented? Very curious as to how thing works and methods employed, as they are atm tight lipped generally. So kind of curious for those who are specialists in this space what they could surmise or speculate on the implementation of Genie 3

aussieguy1234•20m ago

"developing simulated environments for open-ended learning and robotics"

What this means is that a robot model could be trained 1000x faster on GPUs compared to training a robot in the physical world where normal spacetime constraints apply.

pton_xd•5m ago

The rate of progress here is super impressive. The amount of money being poured into this, from every perspective, must be incredible. How long can this go on?

Open models by OpenAI

Genie 3: A new frontier for world models

Spotting base64 encoded JSON, certificates, and private keys

Ollama Turbo

Create personal illustrated storybooks in the Gemini app

Consider using Zstandard and/or LZ4 instead of Deflate

Claude Opus 4.1

Things that helped me get out of the AI 10x engineer imposter syndrome

Scientific fraud has become an 'industry,' analysis finds

What's wrong with the JSON gem API?

The First Widespread Cure for HIV Could Be in Children

Ask HN: Have you ever regretted open-sourcing something?

uBlock Origin Lite now available for Safari

Show HN: Stagewise (YC S25) – Front end coding agent for existing codebases

Kyber (YC W23) is hiring enterprise account executives

Build Your Own Lisp

US reportedly forcing TSMC to buy 49% stake in Intel to secure tariff relief

Quantum machine learning via vector embeddings

Injecting Java from native libraries on Android

Los Alamos is capturing images of explosions at 7 millionths of a second

Cow vs. Water Buffalo Mozzarella

Under the Hood of AFD.sys Part 1: Investigating Undocumented Interfaces

The mystery of Winston Churchill's dead platypus was finally solved

AI is propping up the US economy

Cannibal Modernity: Oswald de Andrade's Manifesto Antropófago (1928)

Tell HN: Anthropic expires paid credits after a year

No Comment (2010)

Eleven Music

Apache ECharts 6

GitHub pull requests were down

Open models by OpenAI

Genie 3: A new frontier for world models

Spotting base64 encoded JSON, certificates, and private keys

Ollama Turbo

Create personal illustrated storybooks in the Gemini app

Consider using Zstandard and/or LZ4 instead of Deflate

Claude Opus 4.1

Things that helped me get out of the AI 10x engineer imposter syndrome

Scientific fraud has become an 'industry,' analysis finds

What's wrong with the JSON gem API?

The First Widespread Cure for HIV Could Be in Children

Ask HN: Have you ever regretted open-sourcing something?

uBlock Origin Lite now available for Safari

Show HN: Stagewise (YC S25) – Front end coding agent for existing codebases

Kyber (YC W23) is hiring enterprise account executives

Build Your Own Lisp

US reportedly forcing TSMC to buy 49% stake in Intel to secure tariff relief

Quantum machine learning via vector embeddings

Injecting Java from native libraries on Android

Los Alamos is capturing images of explosions at 7 millionths of a second

Cow vs. Water Buffalo Mozzarella

Under the Hood of AFD.sys Part 1: Investigating Undocumented Interfaces

The mystery of Winston Churchill's dead platypus was finally solved

AI is propping up the US economy

Cannibal Modernity: Oswald de Andrade's Manifesto Antropófago (1928)

Tell HN: Anthropic expires paid credits after a year

No Comment (2010)

Eleven Music

Apache ECharts 6

GitHub pull requests were down

Genie 3: A new frontier for world models

Comments