Veo 3 and Imagen 4, and a new tool for filmmaking called Flow

https://blog.google/technology/ai/generative-media-models-io-2025/

364•youssefarizk•5h ago

Comments

IncreasePosts•4h ago

I don't care about AI animals but the old salt offended me.

Animats•4h ago

The ad for Flow would be much better if they laid off the swirly and wavy effects, and focused on realism.

Soon, you should be able to put in a screenplay and a cast, and get a movie out. Then, "Google Sequels" - generates a sequel for any movie.

colesantiago•4h ago

Definately plausible.

All this is in line with my prediction for the first entirely AI generated film (with Sora or other AI video tools) to win an Oscar being less than 5 years away.

And we're only 5 months in.

https://news.ycombinator.com/item?id=42368951

zanellato19•4h ago

I would believe that an AI generated film will never win an Oscar.

I bet they will soon add rules that AI movies can't even compete on it.

zombiwoof•4h ago

Separate categories

wongarsu•3h ago

Oscar in what category?

We are about six years into transformer models. By now we can get transformers to write coherent short stories, and you can get to novel lengths with very careful iterative prompting (e.g. let the AI generate an outline, then chapter summaries, consistency notes, world building, then generate the chapters). But to get anything approaching a good story you still need a lot of manual intervention at all steps of the process. LLMs go off the rail, get pacing completely wrong and demonstrate gaping holes in their understanding of the real world. Progress on new models is mostly focused in other directions, with better storytelling a byproduct. I doubt we get to "best screenplay" level of writing in five years.

Best Actor/Actress/Director/etc are obviously out for an AI production since those roles simply do not exist.

Similar with Best Visual Effects, I doubt AI generated films qualify.

That leaves us with categories that rate the whole movie (Best Picture, Best International Feature Film etc), sound-related categories (Best Original Score, Original Song, Sound) and maybe Best Cinematography. I doubt the first category is in reach. Video Generation will be good enough in five years. But editing? Screenwriting? Sound Design?

My bet would be on the first AI-related Oscar to be for an AI generated original score or original song, and that no other AI wins Oscars within five years.

Unless we go by a much wider definition of "entirely AI generated" that would allow significant human intervention and supervision. But the more humans are involved the less it has any claim to being "entirely AI". Most AI-generated trailers or the Balenciaga-Potter-style videos still require a lot of human work

mrandish•2h ago

> to win an Oscar being less than 5 years away.

You're assuming Oscar voting is primarily driven by film quality but this hasn't been true for a long time (if it ever was). Many academy voters are biased by whatever cultural and political trends are currently ascendant among the narrow subset of Hollywood creatives who belong to the academy (the vast majority of people listed in movie credits will never be academy voters). Due to the widespread impact of Oscar wins in major categories, voters heavily weight meta-factors like "what should the Hollywood community be seen as endorsing?"

No issue in recent memory has been as overwhelmingly central as AI replacing creatives among the Hollywood community. The entire industry is still recovering from the unprecedented strikes which shut down the industry and one of the main issues was the use of AI. The perception of AI use will remain cultural/political poison among the rarified community of academy voters for at least a decade. Of course, studios are businesses and will hire vendors who use AI to cut costs but those vendors will be smart enough to downplay that fact because it's all about perception - not reality. For the next decade "AI" will be to Academy-centric Hollywood what "child labor" is to shoe manufacturing. The most important thing is not that it doesn't happen, it's ensuring there's no clear proof it's happening - especially on any movie designed to be 'major category Oscar-worthy' (such films are specifically designed to check the requisite boxes for consideration from their inception). predict that in the near-term AI in the Oscars will be limited to, at most, a few categories awarded in the separate Technical Oscars (which aren't broadcast on TV or covered by the mainstream media).

FirmwareBurner•4h ago

>Soon, you should be able to put in a screenplay and a cast, and get a movie out.

This "fixes" Hollywood's biggest "issues". No more highly paid actors demanding 50 million to appear in your movie, no more pretentious movie stars causing dramas and controversies, no more workers' unions or strikes, but all gains being funneled directly to shareholders. The VFX industry being turned into a gig meatgrinder was already the canary in the coal mine for this shift.

Most of the major Hollywood productions from the last 10 years have been nothing but creatively bankrupt sequels, prequels, spinoffs and remakes, all rehashed from previous IP anyway, so how much worse than this can AI do, since it's clear they're not interested in creativity anyway? Hell, it might even be an improvement than what they're making today, and at much lower cost to boot. So why wouldn't they adopt it? From the bean counter MBA perspective it makes perfect sense.

com2kid•4h ago

> Hollywood's wet dream.

Except it bankrupts Hollywood, they are no longer needed. Of people can generate full movies at home, there is no more Hollywood.

The end game is endless ultra personalized content beamed into people's heads every free waking hour of the day. Hollywood is irrelevant in that future.

FirmwareBurner•4h ago

Good point, this is indeed a threat to them. Like how many young people are watching streamers now instead of worshiping present day's music, TV or movie star like in the 90's. The likes of Youtube and Twitch could be more valuable than Hollywood.

That's why I think Hollywood is rushing to adopt gen-AI, so they can churn out personalized content faster and cheaper straight to streaming, at the same rate as indie producers.

jsheard•4h ago

> Of people can generate full movies at home, there is no more Hollywood.

LLMs have been in the oven for years longer than this, and I'm not seeing any signs of people generating their own novels at home. Well, besides the get-rich-quick grifters spamming the Kindle store with incoherent slop in the hopes they can trick someone into parting with a dollar before they realize they've been had.

FirmwareBurner•4h ago

> I'm not seeing any signs of people generating their own novels at home

Most humans are also not good at writing great scripts/novels either. Just look at the movies that bring in billions of dollars at the box office. Do you think you need a famous novelist to write you a Fast & Furious 11 script?

Sure, there are still great writers that can make scripts that tickle the mind, but that's not what the studios want anymore. They want to push VFX heavy rehashed slop that's cheap to make, easy to digest for the doom-scrolling masses of consumers, and rakes in a lot of money.

You're talking about what makes gourmet Michelin star food but the industry is making money selling McDonals.

com2kid•3h ago

Look at view counts for short form videos that are 100% AI generated.

The good "creators" are already making bank, helped by app algorithms matching people up to content they'll find addictive to view.

The content doesn't have to be good it just has to be addictive for 80% of the population.

echelon•1h ago

You're describing the difference between The Godfather and Skibidi Toilet.

com2kid•2m ago

[delayed]

pelagicAustral•3h ago

I wish I could feed Dan Simmons' books to an AI and watch at my leisure

myth_drannon•3h ago

That's potentially not far in the future. If you can drop a couple of research pdfs and generate a podcast discussion on it, it is even more straightforward to generate a video based on a text. The limit is mostly hardware.

gh0stcat•2h ago

Infinite Jest?

quesera•4h ago

Actors will license their appearance, voice, and mannerisms to these new media projects. (Maybe by established Hollywood studios, maybe not).

Then the first fully non-human (but human-like) actors will be created and gain popularity. The IP of those characters will be more valuable than the humans they replaced. They will be derided by old people as "Mickey Mouse" AI actors. The SAG will be beside themselves. Younger people will not care. The characters will never get old (or they will be perfectly rendered when they need to be old).

The off-screen dramas and controversies are part of the entertainment, and these will be manufactured too. (If there will even be an off-screen...)

This is the future, and we've been preparing for it for years by presenting the most fake versions of ourselves on social media. Viewers have zero expectation of authenticity, so biological status is just one more detail.

It will be perfect, and it will be awful. Kids born five years from now will never know anything different.

FirmwareBurner•4h ago

>Actors will license their appearance, voice, and mannerisms to these new media projects

Very few actors have an appearance or a voice worth a lot in licenses. That's like the top 1% of actors, if that.

I think if done right, humans could also end up getting emotionally attached to 100% AI generated characters, not just famous celebrities.

quesera•4h ago

Sure, but I'd bet that 1% of actors (of the total pool of SAG on-screen talent membership?) comprise 75%+ of branding/name recognition for consumers.

So the appearance licenses for these 1% are valuable in Stage 1 of the takeover.

The rest are just forgotten collateral damage. Hollywood is full of 'em.

dimal•4h ago

The swirly effects are probably used to distract from the problems of getting realism right.

suddenlybananas•4h ago

Generating banal stock footage is wildly different than generating a film.

bilbo0s•2h ago

This doesn't necessarily preclude the possibility of making a model that can generate a film. It's still something they can work out. In fact, I wouldn't be surprised if models we're seeing these days are not a necessary first step in that process.

suddenlybananas•2h ago

I'm not saying it's in principle impossible, but rather I'm saying this doesn't show that it will happen soon.

esafak•4h ago

AI trailers already exist: https://www.youtube.com/playlist?list=PL_52fVxPZcIiEvGocuVn6...

jsheard•3h ago

I feel like we should probably draw a distinction between "AI trailers exist as a replacement for traditional trailers" and "AI trailers exist because they're the clickbait du-jour for cynical social media engagement farmers". For now they're 100% the latter.

elzbardico•4h ago

Got a bit of an uncanny valley feeling with the owl and the old man videos. And the origami video give me a sort of sinister feeling, seemed vaguely threatening, agressive.

vjerancrnjak•4h ago

It's a reflection of yourself.

Origami for me was more audio than video. Felt like it's exactly how it would sound.

jjcm•4h ago

Lower on the page there's a knitted characters version that feels much better. It seems like for some of these, divorcing yourself from reality a little bit helps avoid the uncanny valley.

thinkingtoilet•3h ago

The owl one had that glow that so many AI images have for some reason. The man was very impressive to me.

phh•4h ago

Of course they had to name a film making proprietary tool with the name of an award winning film made using open-source tools released less than a year ago...

paxys•4h ago

"Flow" is one of the most generic names in tech. I can think of 10+ products called that off the top of my head.

debugnik•4h ago

There's no way they named their AI filmmaking tool after the last winner of the Academy Award for Best Animated Feature by accident.

debugnik•4h ago

I still remember a style transfer paper which proudly mimicked a popular artist who had passed away barely a few years before (Qinni). Many AI researchers seemingly want to wear the skins of the people they rip off.

woah•3h ago

Seems pretty obvious that they named it after Facebook's JS type checker from 2015

quantumHazer•4h ago

Like most AI image or video generation tools, they produce results that look good at first glance, but the more you watch, the more flaws and sloppiness you notice, and they really lack storytelling

superb_dev•4h ago

You don’t even have to look close for some of these. The owl suddenly flipping direction in the first video was jarring

quantumHazer•4h ago

Yeah the owl ""animation"" is terrible, I bet they could have found better examples? If it wasn't the case I don't know what to think

JamesBarney•4h ago

It looked to me like the owl was turning around to land.

billyp-rva•4h ago

When it's in silhouette you don't know what direction it is facing, technically. I think what's happening is when you see a shot of something flying in front of something prominent (in this case, the moon), your brain naturally perceives it is going away from the camera and toward the object.

llm_nerd•4h ago

I think that's just the silhouette illusion[1]. In this case likely abetted by the framing elements moving near the edges.

[1] - https://en.wikipedia.org/wiki/Spinning_dancer

Workaccount2•4h ago

It doesn't flip, it's an illusion. The owl is always facing the camera.

harikb•4h ago

They don't have to be as good as the best film production team - they just need to be better than the average/B-grade ones to gain adoption.

With the media & entertainment hungry world which is about to get worse with the unempoyed/underemployed tiktok generation needing "content", something like this has to have a play.

inerte•4h ago

Or replace cutscenes in video games, or short videos on ads on mobile (already small and people are barely paying attention)

Nowadays when I randomly open a news website to read some article, at the bottom of the page all the generic "hack to lose your belly" or "doctors recommend weird japanese device" or "how seniors can fly business class", I've been noticing lately 1/3rd of the images seem to be AI generated...

nathan_compton•4h ago

God, what a dismal future we're building.

nine_k•4h ago

Warning: browsing the Web without an ad blocker is hazardous to your mental health. If you regularly see ads permeating most web pages, and don't know how to avoid that, you may need to see a specialist.

AStonesThrow•4h ago

Perhaps some of us choose not to rip off payments that are due to the people who provide a service that we're using. If I visit a website that's so infested with ads that I don't like being there, it's on me to stop visiting that website.

I simply don't think it's fair to cheat service providers when we don't like their service. You have a choice, and that choice is to not use that service at all. They're providing it under the terms that it is ad-supported. If you don't want to support it, but you still want to use it, then you're cheating someone. That is dishonest and unethical.

quesera•3h ago

If ad-supported sites would return an optional response header indicating such, and their opinion on adblocking visitors. E.g.:

  Advertisement-Permission: [required|requested]

And my adblockers had a config option to abort pageloads with an appropriate error message, if `required` or `requested`, then I would use it happily.

In the meantime, I'm browsing every site with all content blockers set at maximum, because any other choice is incomprehensible on the modern web.

If I consequently visit some sites that want me to consume advertising of which I am unaware, then that is entirely their issue, not mine.

bsimpson•4h ago

Jon Stewart did a gag in this week's episode where there were happy meal toy versions of a bunch of congressmen on screen for a few seconds.

A lot of content is like this - you just need an approximation to sell an idea, not a perfect reproduction. Makes way more sense to have AI generate you a quick image for a sight gag than to have someone spend all day trying to comp it by hand. And as AI imagery gets more exposure in these sort of scenarios, more people will be accustomed to it, and they'll be more forgiving of its faults.

The bar for "good enough" is gonna get a lot lower as the cost of producing it comes way down with AI.

jhaile•4h ago

Yea, but we're early days and I think that will go away as the tools get better. Also - did you watch the sample short films they have?

nine_k•4h ago

But you don't have to outsource 100% of your creative work to your tools. This is a toolbox, not a complete automatic masterpiece generator. If you want serious production, don't remove yourself from the loop.

Drive the storytelling, consult with AI on improving things and exploring variations.

Generate visuals, then adjust / edit / postprocess them to your liking. Feed the machine your drawings and specific graphic ideas, not just vague words.

Use generated voices where they work well, record real humans where you need specific performance. Blend these approaches by altering the voice in a recording.

All these tools just allow you to produce things faster, or produce things at all such that would be too costly to shoot in real life.

Closi•4h ago

I don’t get how someone can look at these videos and think “wow there’s lots of flaws and it’s sloppy and no storytelling” rather than “holy smokes this stuff is improving fast!”

In 2 years we have moved from AI video being mostly a pipe dream to some incredible clips! It’s not what this is like now, but what will it be like in 10 years!

onlyrealcuzzo•3h ago

AI used to be quite bad at coding just - what - 2 years ago?

Now it's "good enough" for a lot of cases (and the pace of improvement is astounding).

AI is still not great at image gen and video gen, but the pace of improvement is impressive.

I'm skeptical image, video, and sound gen are "too difficult" for AI to get "good enough" at for many use cases within the next 5 years.

spiderice•3h ago

You should see the terrible results it's possible to generate with AfterEffects, Blender, Houdini, etc..

sergiotapia•4h ago

How do you use Imagen 4 in Gemini? I don't see it in the model picker, I just 2.5 Flash and 2.5 Pro (Upgrade).

vunderba•4h ago

It's not at all obvious from Gemini - probably the easiest way is through Whisk.

https://labs.google/fx/tools/whisk

vunderba•4h ago

After doing some testing, Imagen 4 doesn't score any higher than Imagen 3 on my comparison chart, approximately ~60% prompt adherence accuracy.

https://genai-showdown.specr.net

Onavo•4h ago

How do companies like https://icon.com do their image Gen if the existing SOTA for prompt adherence is so poor?

peab•3h ago

fine tuning and prompt techniques can go a long way. That + cherrypicking results

yorwba•3h ago

People who generate images for ads probably don't often need strict prompt adherence, just a random backdrop to slap a picture of their product on top of. The kind of thing they'd have used a stock image library for before.

Also "create static + video ads that are 0-99% complete" suggests the performance is hit or miss.

htrp•1h ago

multishot generation with discriminators

xixixao•4h ago

Awesome showcase! Fun descriptions. Are there similar sites?

peab•3h ago

great website!

snug•3h ago

How do you determine how many attempts are made before the results are failing?

mcphage•1h ago

It's listed in Purple to the right of the model name.

zamadatix•1h ago

I think they're asking how the number to stop at was determined, not what the number stopped at was.

My guess as to determining whether it's 64 attempts to a pass for one and 5 attempts to a fail for another is simply "whether or not the author felt there was a chance random variance would result in a pass with a few more tries based on the initial 5ish". I.e. a bit subjective, as is the overall grading in the end anyways.

vunderba•18m ago

That's exactly what it was. It's hard to define a discrete rubric for grading at an inherently qualitative level. Usually more attempts means that it seemed like the model had the "potential" to get across the finish line so I gave it more opportunities.

If there's only a few attempts and ends in a failure, there's a pretty good chance that I could sort of tell that the model had ZERO chance.

woolion•3h ago

The winning image entry for "The Yarrctic Circle" by OpenAI 4o doesn't actually wields a cutlass. It's very aesthetically pleasing, even though it's so wrong in all fundamental aspects (perspective is nonsensical and anatomy is messed up, with one leg 150% longer than the other, ...).

It's a very interesting resource to map some of the limits of existing models.

NoahZuniga•1h ago

I can't find the image you're talking about. Link pls?

echelon•1h ago

Google Flow is remarkable as video editing UX, but Imagen 4 doesn't really stand out amongst its image gen peers.

I want to interrupt all of this hype over Imagen 4 to talk about the totally slept on Tencent Hunyuan Image 2.0 that stealthily launched last Friday. It's absolutely remarkable and features:

- millisecond generation times

- real time image-to-image drawing capabilities

- visual instructivity (eg. you can circle regions, draw arrows, and write prompts addressing them.)

- incredible prompt adherence and quality

Nothing else on the market has these properties in quite this combination, so it's rather unique.

Release Tweet: https://x.com/TencentHunyuan/status/1923263203825549457

Tencent Hunyuan had a bunch of model releases all wrapped up in a product that they call "Hunyuan Game", but the Hunyuan Image 2.0 real time drawing canvas is the real star of it all. It's basically a faster, higher quality Krea: https://x.com/TencentHunyuan/status/1924713242150273424

More real time canvas samples: https://youtu.be/tVgT42iI31c?si=WEuvie-fIDaGk2J6&t=141 (I haven't found any other videos on the internet apart from these two.)

You can see how this is an incredible illustration tool. If they were to open source this, this would immediately become the top image generation model over Flux, Imagen 4, etc. At this point, really only gpt-image-1 stands apart as having godlike instructivity, but it's on the other end of the [real time <--> instructive] spectrum.

A total creative image tool kit might just be gpt-image-1 and Hunyuan Image 2.0. The other models are degenerate cases.

More image samples: https://x.com/Gdgtify/status/1923374102653317545

If anyone from Tencent or the Hunyuan team is reading this: PLEASE, PLEASE, PLEASE OPEN SOURCE THIS. (PLEASE!!)

Narciss•35m ago

This is amazing, can’t see how I’ve missed it. Thank you!

dheera•12m ago

> but Imagen 4 doesn't really stand out amongst its image gen peers.

In this AI rat race, whenever one model gets ahead, they all tend to reach parity within 3-6 months. If you can wait 6 months to create your video I'm sure Imagen 5 will be more than good enough.

It's honestly kind of ridiculous the pace things are moving at these days. 10 years ago waiting a year for something was very normal, nowadays people are judging the model-of-the-week against last week's model-of-the-week but last week's org will probably not sleep and they'll release another one next week.

danpalmer•10m ago

In my own testing between the two this is what I’ve noticed. Imagen will follow the instructions, and 4o will often not, but produces aesthetically more pleasing images.

I don’t know which is more important, but I would say that people mostly won’t pay for fun but disposable images, and I think people will pay for art but there will be an increased emphasis on the human artist. However users might pay for reliable tools that can generate images for a purpose, things like educational illustrations, and those need to be able to follow the spec very well.

strongpigeon•2h ago

How can you tell you're using Imagen 4 and not Imagen 3? Gemini seems unable to tell me which model it's using. Are you using Vertex AI?

EGreg•2h ago

Tell me you’re using Imagen 3 without telling me you’re using Imagen 4… or something

sidibe•59m ago

Well they've labelled it 3/4 so I'm guessing they can't but you can use 4 it in whisk

vunderba•24m ago

I used Whisk. The model listing shows 3/4 because testing against Imagen 4 did not result in a measurable increase in accuracy from Imagen 3.

https://labs.google/fx/tools/whisk

tintor•2h ago

More difficult examples:

- wine glass that is full to the edge with wine (ie. not half full)

- wrist watch not showing V (hands at 10 and 2 o'clock)

- 9 step IKEA shelf assembly instruction diagram

- any kind of gymnastics / sport acro

tintor•2h ago

Hands in Winning entry in "Not the Bees" are very unlike any driver. I wouldn't count it as a pass.

vunderba•16m ago

I hate to say it but I feel like as a result of staring at so many equivalents of Tyrone Rugen since the dark ages of Stable Diffusion 1.5 - I literally DID NOT EVEN notice that until you called it out. The training data in my wetware has been corrupted.

mcphage•1h ago

> "A dolphin is using its fluke to discipline a mermaid by paddling it across the backside."

Hmm.

zamadatix•1h ago

I love the writing style in this.

carlosdp•4h ago

Wow, this is incredible work! Blown away at how well the audio/video matches up, and the dialogue is better sounding / on-par with dedicated voice models.

bowsamic•4h ago

I'm surprised at how bad these are

airstrike•4h ago

On a technical level, this is a great achievement.

On a more societal level, I'm not sure continuously diminishing costs for producing AI slop is a net benefit to humanity.

I think this whole thing parallels some of the social media pros and cons. We gained the chance to reconnect with long lost friends—from whom we probably drifted apart for real reasons, consciously or not—at the cost of letting the general level of discourse to tank to its current state thanks to engagement-maximizing algorithms.

pelagicAustral•4h ago

Have they reveled anything similar to Claude Code yet? I sure hope they are saving that for I/O next month... this video/photo reveals are too gimmicky for my liking, alas I'm probably biased because I don't really have a use for them.

dmd•4h ago

https://jules.google/ posted here today https://news.ycombinator.com/item?id=44034918

pelagicAustral•4h ago

Yeah, I saw that... not quite the same... I used it for a bit but it's more like an agent that clings to a Github repo and deals with tickets up there, can't really test live on local, it just serves a different purpose.

lxgr•3h ago

Google I/O is happening right now. This is one of the announcements, I believe.

jader201•4h ago

I'm surprised no one has yet to mention the use of the name "Flow", which is also the title of the 2025 Oscar winning animated movie, built using Blender. [1]

This naming seems very confusing, as I originally thought there must be some connection. But I don't think there is.

[1] https://news.ycombinator.com/item?id=43237273

imp0cat•3h ago

Would Google really stoop so low and try to use the success of the movie to prop their AI video generator tool?

But then again, the do no evil motto is long gone, so I guess anything goes now?

Legend2440•3h ago

It's a common word. There are like 50 things named Flow. It's unrelated.

lnyan•3h ago

Note that it's very likely that Veo models are based on "Flow Matching" [1]

[1] https://arxiv.org/abs/2210.02747

Workaccount2•4h ago

I'm sure by this point, and if not, pretty soon, everyone will have seen a clip of AI generated video and not thought twice about it.

Its something that is only obvious when it is obvious. And the more obvious examples you see, the more non-obvious examples slip by.

pier25•4h ago

what do they use to train these models? youtube videos?

jonplackett•4h ago

Has anyone actually tried Veo3 and know if it’s as good as this looks?

The demo videos for Sora look amazing but using it is substantially more frustrating and hit and miss.

sebau•3h ago

Future is not bright. While we are endlessly talking about details reality is that AI is taken over so many jobs.

Not in 10 years but now.

People who just see this as terrible are wrong. AI improving curves is exponential.

People adaptability is at best linear.

This makes me really sad. For creativity. For people.

mindvirus•3h ago

Maybe. The internet was also exponential, and while it has its drawbacks, I think it's resulted in a huge increase in creativity. The world looks very different than it did 30 years ago, and I think mostly for the better.

jampekka•3h ago

> Future is not bright. While we are endlessly talking about details reality is that AI is taken over so many jobs.

Of course this is not because of AI. It's because of the ridiculous system of social organization where increased automation and efficiency makes people worse off.

jjcm•3h ago

It finally feels like the professional tools have greatly outpaced the open source versions. While wan and hunyuan are solid free options, the latest from Google and Runway have started to feel like a league above. Interestingly it feels like the biggest differentiator is editing tools - ability to prompt motion, direction, cuts, or weaving in audio, rather than just pure ability to one shot.

These larger companies are clearly going after the agency/hollywood use cases. It'll be fascinating to see when they become the default rather than a niche option - that time seems to be drawing closer faster than anticipated. The results here are great, but they're still one or two generations off.

javchz•3h ago

I think open source still has an important advantage in the pro environment despite being less convenient, and it's the possibility of adding things in between the generation process like control net, and custom loras with new concepts or characters.

Plus in local generation you're not limited by the platform moderation that can be too strict and arbitrary and fail with the false positives.

Yes comfy UI can be intimidating at first vs an easy to use chatgpt-like ui, but the lack of control make me feel these tools will still not being used in professional productions in the short term, but more in small YouTube channels and smaller productions.

popalchemist•2h ago

Control net etc can be served via API; the intrinsic advantage of open-source is the ability to train and run inference privately.

MrScruff•1h ago

I don't think this is just about convenience - you're not going to get these results with a 14B video model. I'd much prefer to have something I could hack on in ComfyUI but the open weights models don't compete with this anymore than a 32B LLM competes with Gemini 2.5 Pro for coding. And at least in coding you can easily edit the output from the LLM regardless...

echelon•1h ago

> you're not going to get these results with a 14B video model

Foundation models are starting to outstrip any consumer hardware we have.

If Nvidia wants to stay ahead of Google's data center TPUs for running all of these advanced workloads, they should make edge GPU compute a priority.

There's a future where everything is a thin client to Google's data centers. Nvidia should do everything in its power to prevent that from happening.

irq-1•2h ago

> the agency/hollywood use cases.

It's for advertising.

echelon•1h ago

> While wan and hunyuan are solid free options, the latest from Google and Runway

The Tencent Hunyuan team is cooking.

Hunyuan Image 2.0 [1] was announced on Friday and it's pretty amazing. It's extremely high quality text-to-image and image-to-image with millisecond latency [2]. It's so fast that they've built a real time 2D drawing canvas application with it that pretty much duplicates Krea's entire product offering.

Unfortunately it looks like the team is keeping it closed source unlike their previous releases.

Hunyuan 3D 2.0 was good, but they haven't released the stunning and remarkable Hunyuan 3D 2.5 [3].

Hunyuan Video hasn't seen any improvements over Wan, but Wan also recently had VACE [4], which is a multimodal control layer and editing layer. The Comfy folks are having a field day with VACE and Wan.

[1] https://wtai.cc/item/hunyuan-image-2-0

[2] https://www.youtube.com/watch?v=1jIfZKMOKME&t=1351s

[3] https://www.reddit.com/r/StableDiffusion/comments/1k8kj66/hu...

[4] https://github.com/ali-vilab/VACE

nrjames•3h ago

This is technically impressive and I commend the team that brought it to life.

It makes me sad, though. I wish we were pushing AI more to automate non-creative work and not burying the creatives among us in a pile of AI generated content.

yieldcrv•3h ago

I’m a creative and I’m really glad that more people can express themselves

Just wanted to add representation to that feeling

lilwobbles•3h ago

Expressing themselves by generating boilerplate content?

Creativity is a conversation with yourself and God. Stripping away the struggle that comes with creativity defeats the entire purpose. Making it easier to make content is good for capital, but no one will ever get fulfillment out of prompting an AI and settling with the result.

ivape•2h ago

Check out all the creatives on /r/screenwriting, half the time they are trying to figure out how to "make connections" just to get a story considered. It's a fucking nightmare out there. Whatever god is providing us with AI is the greatest gift I could imagine to a creative.

onemoresoop•21m ago

AI could be useful if used like any other tool, but not as an all in box where everything is done for you minus the prompt. Im actually worried people will become lazy

yieldcrv•1h ago

exactly, creatives and everyone else can always do something fulfilling for themselves just like before AI. They can struggle all they want and continue doing it for no capital. because that process is fulfilling to them.

onemoresoop•23m ago

Yes but how many will sign up for that? Im sure few will continue to do so but creativity will certainly take a big hit.

StefanBatory•3h ago

They always could express themselves.

yieldcrv•1h ago

Not close to the way they wanted, and at too much sacrifice to the other things they were interested in or supported their family with

toenail•3h ago

Do you find it sad that people can use recordings, and don't have to hire musicians any more?

mirkodrummer•3h ago

Recordings of who? Not only sad but a disaster, I'm sorry but anyone that ever tried to play an instrument seriously knows how much human touch/imperfections come into play, otherwise you're just an anonymous guy playing in a cover band(like the ai will do)

onemoresoop•28m ago

A tsunami of effortless content is upon us and that will change many things including tastes, probably for the worse. People not having to learn instruments because same can be done with a prompt is a tragic loss for humanity, not because human work is better but because of the lost experience and joy of learning, connection with the self and others and so many other things.

ehsankia•3h ago

> burying the creatives among us in a pile of AI generated content.

Isn't the creativity in what you put in the prompt? Isn't spending hundreds of hours manually creating and rigging models based on existing sketch the non-creative work that is being automated here?

mirkodrummer•3h ago

How does a prompt describe creativity? It's a vision so far off that it's so frustrating because greater creativity came from limited tools, greater creativity came from imperfections, a different point of view, love, a slightly off touch of a painter or a guitar player, the wood of the instrument and the humidity affecting. I can go on and on, prompts are a reduction to the minimum term of everything you'd want to describe, no matter how much you can express via a prompt

ahmedfromtunis•3h ago

So deliberately writing a prompt that meticulously describes how a generated photo would look like isn't creative, but pushing a button for a machine to take the photo for you is??!! If anything, it's the way around!

Of course that's not what I believe, but let's not limit the definition of what creativity based on historical limitations. Let's see what the new generation of artists and creators will use this new capability to mesmerize us!

mirkodrummer•3h ago

To answer your first question: carpe diem! And historical limitations? Go visit the Sistine Chapel, unmatched still today

_DeadFred_•2h ago

Your meticulous prompt is using the work of thousands of experts, and generating a mashup of what they did/their work/their commitment/their livelihood.

Their placement of books. Their aesthetic. The collection of cool things to put into a scene to make it interesting. The lighting. Not yours. Not from you/not from the AI. None of it is yours/you/new/from the AI. It's ALL based underneath on someone else's work, someone else's life, someone else's heart and soul, and you are just taking it and saying 'look what I made'. The equivalent of a 4 year old being potty trained saying 'look I made a poop'. We celebrate it as a first step, not as the friggen end goal. The end goal is you making something uniquely you, based on your life experience, not on Bob the prop guys and Betty the set designer whose work/style you stole and didn't even have the decency to reference/thank.

And your prompt won't ever change dramatically, because there isn't going to be much new truly creative seedcorn for AI to digest. Entertainment will literally go into limbo/Groundhog Day, just the same generative, derivative things/asthetics from the same AI dataset.

ahmedfromtunis•1h ago

And that's exactly how your brain work. What you call "creativity" is nothing more than exactly that: mixing ideas and thoughts you were exposed to. We're all building on others' work. The only difference is that computers do it on a much larger scale. But it's the very same process.

renerick•1h ago

This is completely absurd and reductive point of view, which I always assume is a cop out. Just because it's called "machine learning" doesn't mean it actually has anything to do with how human learning or human brain works, and it's certainly not "exactly how" or "very same". There's much more going on on in human creative process, aside from mere "mixing": personal experience, understanding of the creative process, technique and style development, subtext, hidden ideas and nuances, etc. Computers are very good at mixing and combining, but this is not even close to what goes into actual creative process. I hate this argument

_DeadFred_•11m ago

But you aren't being creative here. You are literally using the 'average' of tons of actually creative peoples work to create an 'average', computer predicted scene. The opposite of art.

If I see a painting, I see an interpretation that makes me think through someone else's interpretation.

If I see a photograph, I don't analyze as much, but I see a time and place. What is the photographer trying to get me to see?

If I see AI, I see a machine dithered averaging that is/means/represents/construes nothing but a computer predicted average. I might as well generate a UUID, I would get more novelty. No backstory, because items in the scene just happened to be averaged in. No style, because it's just a machine dithered blend not a creative re-imaging. It represents nothing no matter the prompt you use because the majority is still just machine averaged/dithered non-meaning. Not placed with intention. Not focused with real intention/vision. Nothing obvious excluded with intention. Just purely exactly what a machine thinks is exactly average for the scene it had described to it.

Just like a 3-4 year old taking a poop averages the meals that their parents lovingly prepared into... something. Again, the poop is a start to an individual developing their own capabilities, enabled by external inputs, cheered on by people that see development coming. It is not the end goal. No one cheers on a 40 year old for pooping. But AI prompt people seem to think it's worthy. It's not. It's 'computer please generate for me the exact average scene you would predict based on the following:'. And you can never get away from that. It will always just be the averaging of thousands of hours of creative work put in by other people and stolen to use in AI.

hooverd•2h ago

Eh, every AI "artist" want the cachet of being an artist without any of the effort, but they're competing with other AI "artists" so they have no choice but to unleash a firehose of content sludge onto the commons in a race to the bottom.

blargey•39m ago

No, it isn’t, because the prompt doesn’t have a millionth of the information density of the output.

Merely changing a seed number will provide endless different outputs from the same single prompt from the same model; rng.nextInt() deserves as much artist credit as the prompter.

ahmedfromtunis•3h ago

The amount of gatekeeping I see when this topic is brought is outstanding! Why can't people be happy that more individuals would be soon able to create freely in a more accessible way?

Personally I can't wait to see the new creative doors ai will open for us!

StefanBatory•3h ago

Creative? There's nothing creative in it.

ahmedfromtunis•3h ago

It's funny that you're defending creativity by being close-minded about a creative new way to explore it. You're being your judgment of an entire new medium based on a few early examples. It's as if you're downplaying photography just based on the few first blurry, dark clichés produced. Let's keep our minds open to new forms of creativity if we really care about it so much.

kleiba•3h ago

What grandparent means is that the AI enables human creativity in a new way.

onemoresoop•32m ago

It does but so does it encourage efortlessness and haste. It definitely is something new, it remains to be see whether creatives bring it to new heigts as with other mediums. I remain a bit skeptical but am open to it. One thing is certain, a tsunami of content is upon us.

gamblor956•3h ago

There's nothing creative in having someone or something else doing the work for you.

"Creating" with an AI is like an executive "inventing" the work actually done by their team of researchers. A team owner "winning" a game played by the their team.

That being said, AI output is very useful for brainstorming and exploring a creative space. The problem is when the brainstorming material is used for production.

MichaelZuo•3h ago

> There's nothing creative in having someone or something else doing the work for you.

This would include almost everyone who’s used any editing software more advanced than photoshop CS4.

andoando•3h ago

I don't think that's true. Is a film director not a creative?

You could come up with your own story and direct the AI to generate it for you.

gamblor956•1h ago

In your example, the "come up with your own story" part is the creative part. But you're not "directing" the AI to generate it for you. You're just giving it a command. You're selecting from the results it outputs, but you're not controlling the output.

A film director is a creative. Ultimately, they are in charge of "visualizing" a screenplay": the setting, the the design of the set or the utilization of real locations, the staging of the actors within a scene, the "direction" of the actors (i.e., how they should act out dialog or a scene, lighting, the cinematography, the use of stunts, staging shots to accommodate the use of VFX, the editing (meaning, the actual footage that comprises the movie).

There's an old show on HBO, Project Greenlight, that demonstrates what a director does. They give 2 directors the same screenplay and budget and they make competing movies. The competing movies are always completely different...even though they scripts are the same. (In the most extreme example from one of the later seasons, one of the movies was a teen grossout comedy, and the competing movie was some sort of adult melodrama.)

andoando•57m ago

So 1. being able to bring your own story come to life automatically is cool in itself, and would result in a lot of creative media that is not possible now. Do you know how many people have their own stories, plays, etc that are dying to find someone rich enough to get them published?

2. Using AI can be can be an iterative process. Generate this scene, make this look like that, make it brighter colors, remove this, add this, etc. That's all carefully crafting the output. Now generate this second scene, make the transition this way, etc. I don't see how that's at all different from a director giving their commands to workers, except now you actually have more creative control (given AI gets good enough)

cloverich•2h ago

Not even if you are directing and refining it? What if i smudge out sections repeatedly and over the course of say 20 iterations produce a unique image that matches closely what i am imagining, and that has not be seem before?

kmijyiyxfbklao•2h ago

Buñuel would disagree with you: "The peak of film-making will be reached when you are able to take a pill, switch off the lights, sit facing a blank wall and project on it, directly from your eyes, the film that passes through your head."

jampa•2h ago

The first two paragraphs of your argument could be used to discuss whether Photography (Camera is doing most of the work) or Digital Drawing (Photoshop is doing most of the work) are art.

Both things which were dismissed as not art at first but are widely accepted as an art medium nowadays.

MattGrommes•2h ago

I see this comparison to a camera a lot but I don't think it works (not that you're saying this, I'm just contributing). I'm not an expert but to me the camera is doing very little of the work involved in taking an artistic picture. The photographer chooses which camera to use to get a certain effect, which lenses, the framing, etc. All the camera is doing recording the output of what the person is specifying.

Clamchop•50m ago

Photography mostly eliminated the once-indispensable portrait artist, among other formerly-dependable lines of work.

There's a line to be drawn somewhere between artist and craftsperson. Creating beautiful things to a brief has always been a teachable skill, and now we're teaching it to machines. And, we've long sought to mass-produce beautiful things anyway. Think textiles, pottery, printmaking, architectural adornments.

Can AI replace an artist? Or is it just a new tool that can be used, as photography was, for either efficiency _or_ novel artistic expression?

gamblor956•1h ago

The first two paragraphs of your argument could be used to discuss whether Photography (Camera is doing most of the work) or Digital Drawing (Photoshop is doing most of the work) are art.

The work a camera does is capturing the image in front of the photographer. "Art" in the context of photography is the choice of what in the image should be in focus, the angle of the shot, the lighting. The camera just captures that; it doesn't create anything that isn't already there. So, not even remotely the same thing as AI Gen.

The work of Krita/Inkscape/etc (and technically even Photoshop) is to convert the artistic strokes into a digital version of how those strokes would appear if painted on a real medium using a real tool. It doesn't create anything that the artist isn't deliberately creating. So, not even remotely the same thing as AI Gen.

AI Gen, as demonstrated in the linked page and in the tool comparison, is doing all of the work of generating the image. The only work a human does is to select which of the generated images they like the best, which is not a creative act.

blargey•48m ago

Everyone has a phone camera, and takes photos, but not everyone is a photographer, and even photographers wouldn’t proclaim all their photos “art”.

AI cannot “democratize art” any more than the camera did, until the day it starts teaching artistry to its users.

_DeadFred_•2h ago

Remember how there were all those cake shows all of a sudden, and they were making cakes that looked super pretty, but they were just fondant and sheet cakes? We're not thrilled having to wade through the AI equivalent.

alpaca128•2h ago

How is the requirement to use a computer and maybe pay a cloud subscription in the long term more accessible than other kinds of art? Which individuals are gatekept exactly? Before you bring up disabled people (as often happens when the term accessibility is used), know that many of them are not happy to be used as a shield for this without ever being asked and would rather speak for themselves.

I've tried AI image generation myself and was not impressed. It doesn't let me create freely, it limits me and constantly gravitates towards typical patterns seen in training data. As it completely takes over the actual creation process there is no direct control over the small decisions, which wastes time.

Edit: another comment about a different meaning of accessibility: the flood of AI content makes real content less accessible.

Clamchop•1h ago

There's a pretty standard argument that creating artworks should or must require the hardship of developing the skill to bring vision into reality, or paying someone who can. That can be debated, but the position is textbook gatekeeping.

Other disapproval comes from different emotional places: a retreading of ludditism borne out of job insecurity, criticism of a dystopia where we've automated away the creative experience of being human but kept the grim work, or perceptions of theft or plagiarism.

Whether AI has worked well for you isn't just irrelevant, but contrarian in the face of clear and present value to a lot of people. You can be disgusted with it but you can't claim it isn't there.

rcarr•5m ago

Try and tell me that a single individual would have been able to create this 10 years ago on a normal salary:

https://www.youtube.com/@NeuralViz

It would have cost millions. Now one person can do it with a laptop and a few hundred dollars of credits a month.

AI is 100% making filmmaking more accessible to creative people who otherwise would never have access to the kind of funding and networks required to realise their visions.

duped•1h ago

> Why can't people be happy that more individuals would be soon able to create freely in a more accessible way?

The gates are wide open for those that want to put in effort to learn. What AI is doing to creative professionals is putting them out of a job by people who are cheap and lazy.

Art is not inaccessible. It's never been cheaper and easier to make art than today even without AI.

> Personally I can't wait to see the new creative doors ai will open for us!

It's opening zero doors but closing many

---

What really irks me about this is that I have _seen_ AI used to take away work from people. Last weekend I saw a show where the promotional material was AI generated. It's not like tickets were cheaper or the performers were paid more or anything was improved. The producers pocketed a couple hundred bucks by using AI instead of paying a graphic designer. Extrapolate that across the market for arts and wonder what it's going to do to creativity.

It's honestly disgusting to me that engineers who don't understand art are building tools at the whims of the financiers behind art who just want to make a bit more money. This is not a rising tide that lifts all ships.

ahtihn•1h ago

> The gates are wide open for those that want to put in effort to learn.

Why is effort a requirement?

Why should being an artist be a viable job?

Would you be against technology that makes medical doctors obsolete?

ZoomZoomZoom•1h ago

> Why is effort a requirement?

That's how human brains work. People have an intrinsic need to sort, build hierarchies and prioritize. Effort spent is one of viable heuristics for these processes.

> Why should being an artist be a viable job?

Art itself has great value, if it weren't, museums, theaters and live shows wouldn't exist.

> Would you be against technology that makes medical doctors obsolete?

The analogy doesn't work. The results of a medical process is a [more] healthy person. The result doesn't have any links to the one performing it. Result of an artistic creative process is an art piece, and art is tied to its creator by definition.

kranke155•1h ago

Individuals won’t be able to do anything. The artist here is the LLM. There is no AI art where the human in the loop carries any significance. Proof of that is you can’t replicate their work using tbt same LLM. In AI art, the AI is the artist. The human is just a client making a request.

And who owns the AI?

It’s delusional. Stop falling for the mental jiu Jitsu from the large AI labs. You are not becoming an artist by using a machine to make art for you. The machine is the artist. And you don’t own it.

sarks_nz•3h ago

Distribution of art (particularly digital) is a recent phenomenon. Prior to that, art in human history was one-off. Are we just going back to that time?

Similarly with music, prior to recording tech, live performance was where it was at.

You could look at the digital era as a weird blip in art history.

woah•3h ago

It makes me sad that the US and western Europe which have been the most flexible and forward-thinking societies in the world for generations have now memed themselves into fretting and hand-wringing about technical advances that are really awesome. And for what? The belief that illustration and filmmaking which have always been hobbies for the vast majority of participants should be some kind of jobs program?

dmonitor•2h ago

People aren't looking forward to companies playing the "how much sawdust can you put in a rice crispy before people notice the difference" experiment on the entertainment industry. The quality of acting, scripting, lighting, and animation in the film/television industry already feels second rate to stuff being made before 2020. The cost cutting and gutting of cultural products is becoming ridiculous, and this technology will only be an accelerant.

woah•2h ago

If you don't like a movie, then don't watch it.

swalsh•3h ago

I think the non-creative work is coming... but it's harder, needs more accuracy, and just generally takes more effort. But it's 100% coming. AI today can one shot with about 80% perfection. But for use cases that need to be higher than that, that last 20% is grueling to gain. It's like taking a jet across the country, and then getting jammed in traffic while you're taking a taxi to your hotel.

dyauspitr•3h ago

There’s limited training data for physical movements. Once there is enough of that the non creative space will start getting their own LLMs.

TechDebtDevin•2h ago

80% on todo apps

cadamsdotcom•2h ago

Plenty of non-creative work can be automated.

Have a look at the workflow and agent design patterns in this video by youtuber Nate Herk when he talks about planning the architecture:

https://m.youtube.com/watch?v=Nj9yzBp14EM

There’s less talk about automating non-creative work because it’s not flashy. But I can promise it’s a ton of fun, and you can co-design these automations with an LLM.

ivape•2h ago

Dude.

Making a movie is not accessible to most people and it's EVERYONES dream. This is not even there yet, but I have a few movies I need to make and I will never get a cast together and go do it before I die. If some creatives need to take a backseat so a million more creatives can get a chance, then so be it.

onemoresoop•38m ago

Yeah, there will be so many AI generated videos that many will go unwatched. Not sure where this is heading but it's certainly an interesting future.

dktp•1h ago

For better or worse, a big chunk (if not most) of the AI development probably does go into non-creative work like matching ads against users and ranking search results

It's just not what gets the exciting headlines and showcases

BosunoB•1h ago

Robotics will come in the next few years. If you believe the AI2027 guys, though, the majority of work will be automated in the next 10 years, which seems more and more plausible to me every day.

ugh123•49m ago

This kind of tech will open up filmmaking to a much wider base of creative talent.

rvz•3h ago

Well, all the AI labs wanted to "Feel the AGI" and the smoke from Google...

They all got smoked by Google with what they just announced.

ugh123•3h ago

When can I change the camera view and have everything stay consistent?

skybrian•3h ago

What’s the easiest way to try out Imagen 4?

Edit: https://labs.google/fx/tools/whisk

flakiness•3h ago

How does this compare with sora (pro)?

echelon•1h ago

Sora, the video model, is shit. Kling, Runway, and a whole host of other models are better. You don't have to do much to be better than Sora.

Sora, the image model (gpt-image-1), is phenomenal and is the best-in-class.

I can't wait to see where the new Imagen and Veo stack up.

999900000999•3h ago

Ehh, really for 20$. Break dancers with no music, people just pop in and out ?

Google what is this?

How would anyone use this for a commercial application.

lenerdenator•3h ago

I do find myself wondering if the people working on this stuff ever give any real thought to the impact on society that this is going to have.

I mean obviously the answer is "no" and this is going to get a bunch of replies saying that inventors are not to blame but the negative results of a technology like this are fairly obvious.

We had a movie two years ago about a blubbering scientist who blatantly ignored that to the detriment of his own mental health.

tmpz22•3h ago

How could you possibly push back on the societal benefit of a director being able buy a vacation home in Lake Tahoe?

bowsamic•3h ago

It's really being forced on us too. Jira, Confluence, and Notion are three products I've used where they've purposefully ignored requests to allow us to disable or hide the bundled generative AI. It's really intrusive. I also switched to Duck Duck Go because of the new AI on Google

tootie•37m ago

Remember when they fired Timnit Gebru for publishing on AI safety?

StefanBatory•3h ago

Thanks to them, we will be able to enter new era of politics. Where nothing is true, and everything is vibe based.

Thank you, researchers, for making our world worse. Thank you for helping to kill democracy.

matthewaveryusa•3h ago

"The Bloomberg terminal for creatives"

crat3r•3h ago

This doesn't look (any?) better than what was shown a year or two ago for the initial Sora release.

I imagine video is a far tougher thing to model, but it's kind of weird how all these models are incapable of not looking like AI generated content. They all are smooth and shiny and robotic, year after year its the same. If anything, the earlier generators like that horrifying "Will Smith eating spaghetti" generation from back like three years ago looks LESS robotic than any of the recent floaty clips that are generated now.

I'm sure it will get better, whatever, but unlike the goal of LLMs for code/writing where the primary concern is how correct the output is, video won't be accepted as easily without it NOT looking like AI.

I am starting to wonder if thats even possible since these are effectively making composite guesses based on training data and the outputs do ultimately look similar to those "Here is what the average American's face looks like, based on 1000 people's faces super-imposed onto each other" that used to show up on Reddit all the time. Uncanny, soft, and not particularly interesting.

ahmedfromtunis•3h ago

It has long been established that Veo has a waaay better understanding of physics, and consistency over multiple frames, than Sora. Not even close.

crat3r•3h ago

I want to be clear, I don't think Sora looks better. What I am saying is they both look AI generated to a fault, something I would have thought would be not as prominent at this point.

I don't follow the video generation stuff, so the last time I saw AI video it was the initial Sora release, and I just went back to that press release and I still maintain that this does not seem like the type of leap I would have expected.

We see pretty massive upgrades every release between all the major LLM models for code/reasoning, but I was kind of shocked to see that the video output seems stuck in late 2023/early 2024 which was impressive then but a lot less impressive a year out I guess.

htrp•3h ago

is it still a waitlist?

Imnimo•3h ago

>Imagen 4 is available today in the Gemini app, Whisk, Vertex AI and across Slides, Vids, Docs and more in Workspace.

I'm always hesitant with rollouts like this. If I go to one of these, there's no indication which Imagen version I'm getting results from. If I get an output that's underwhelming, how do I know whether it's the new model or if the rollout hasn't reached me yet?

minimaxir•2h ago

Google is typically upfront about which model versions you're using in those tools. Not as behind-the-scenes as ChatGPT.

However, looking at the UI/UX in Google Docs, it's less transparent.

nico•3h ago

Wow, the audio integrations really makes a huge difference, especially given it does both sounds and voices

Can’t wait to see what people start making with these

gloosx•3h ago

>>models create, empowering artists to bring their creative vision

Interesting logic the new era brings: something else creates, and you only "bring your vision to life", but what it means is left for readers questioning, your "vision" here is your text prompt?

Were at a crossroads where the tools are powerful enough to make the process optional.

That raises uncomfortable questions: if you don’t have to create anymore, will people still value the journey? Will vision alone be enough? What's the creative purpose in life? To create, or to to bring creative vision to life? Isn't the act of creation is being subtly redefined?

dmonitor•2h ago

It's being redefined in such a way that 2-3 very large entities get to hold the means of production. It's a very convenient redefinition for them.

teitoklien•1h ago

???? I’m not a musician, my parents wanted me to focus on studies, during my important exam years in highschool they removed me from my favourite sports.

Even if they hadn’t , i’d still struggle, i was a horrible guitarist. I could only sing decent, even when i wanted to make music I couldn’t.

Now with suno.com AI , I make songs daily, for myself, for my friends, everyone and it has played a huge impact in my positivity day to day even after gruelling workweeks.

I don’t know about your means of production stuff, but i sure as hell couldn’t afford to spend $10000 a month hiring or bringing singers, musicians to compose songs for me.

Now I can with $10-$15 / month. My mom who can’t code is barely tech literate, uses openai advanced voice mode to create prompts to build software with Replit agent (replit then builds an entire app with one or two prompts and deploys it for her)

Then copy pastes it into replit and gets back financial dashboards to help her analyse the options market, her trading portfolio, building simple calculators etc. She can drill down into math she has no clue of with a teensy bit of my help and a ton of help from gemini, replit assistant and claude.

She makes good money now by herself which is a big deal for her as she was a housewife before and always wanted to build her own thing. These AI tools have given her fulfillment that nothing else before could. She reads and understands complex books now with gemini and openai by clicking photos of pages, and if she’s confused she asks them to translate with examples in her mother tongue (non english speaker). She is far more confident now and positive about her life and looks forward to it everyday.

I don’t know about you’re means of production theory, but with current trends of model distillation making small AI models affordable to train for anyone and constant rapid progress of even unknown startups launching better and better opensource models.

It’s the common plebs like me and my mom who finally have the means of production.

gloosx•1h ago

If your focus is to solve the problem, then it makes sense to treat the process as secondary. The tools are just means to an end.

This view also aligns with how generative AI is marketed – it's a way to accelerate realization, not a way to focus on the act of crafting.

That said, outcome-first thinking does run the risk of disconnection, and our current culture is all about disconnection.

teitoklien•1h ago

Even the process is more democratized now, Want to learn coding ?

Build out your app idea first with replit -> Then export the codebase into your computer -> Run claude code on it and ask it to scan all the files and describe the tech stack to you and how it operates while giving you all the major components you need to learn to understand it with youtube channel and book recommendations for each topic + work exercises -> Use perplexity deep research once a week to further research every topic as you start to learn them

If you’re a busy man/woman make gumloop or lindyai workflow to check your calendar and pack in timeslots to do all of this learning, and then auto send you worksheets via email as homework to test you skills

All of this for a price of 1/15th of a college degree (not even an expensive college)

This is not hypothetical conjecture I do this daily.

So everyone has now 1) Low cost access to build stuff with one prompt to realise the value of tools 2) A personal tutor that can then help you scour the depths of the craft and force you to practice and learn deeply now with your added motivation of knowing what’s possible with building stuff

So it has the potential to connect us more too, it’s upto humans to choose whether they do at the end tho. That is their liberty.

gloosx•1h ago

This raises another interesting question to think about: if everyone has (1) low-cost access to build anything, and (2) low-cost access to learn how to build anything at the same time...

...what do you think a human would choose the most?

teitoklien•1h ago

Hence said is their liberty, Im not contesting what you’re worried about. That idiocracy will rise and our nextgen will only have surface level thoughts with AI being more and more being the subtle decider of every human’s choice in the background.

That is probably the more likelier possibility. However it just shows the lack of philosophy in our modern times, people don’t do things they are lazy about and a Choice between the easy way and hard way is no longer a choice for majority the easy way’s dangling carrot is the final ultimatum.

I think i’ll leave it at the thought that as time progresses to find value in day to day life, to force ourselves to choose the right thing, philosophy will again have to become a much stronger actor in our lives, or else we’d all drive off a cliff at current rate.

At the end what happens will be decided by choice and liberty of humans as their choices expand.

oblio•1h ago

That's step one, and you are right.

Step two is... sure, every pleb can now create art.

That devalues art. More than that, that makes for a "winner takes all" marketplace. So even fewer people than now benefit from it. More than that, guess who wins out: middlemen, the marketplace owners.

Read the Black Swan by Nassim Taleb, especially the chapters about Extremistan and Mediocristan. Basically every time we invent something that scales and unlocks something for a great amount of people, we commoditize it and the quality of life for the average person in that field goes down while the leeches, pardon me, the middle men, are the only ones that become constantly rich, after the initial struggle to achieve market dominance (so when the market matures).

teitoklien•1h ago

Tell that to the kings in medieval times who were the only ones who could afford music from finest crafts men and silk shirts

This allowed people in those crafts to be very rich.

Then came the evil leeches of middle men who brought cheap fine clothes for the masses and music in the hands of every broke college kid in his dorm room. So evil !

Get the sarcasm ? Calling ability for masses to do more things is somehow horrible is elitist, calling the people who make that possible leeches is just elitism in velvet glove.

As long as the majority gets more value in their daily life, world is better.

jplusequalt•10m ago

>Calling ability for masses to do more things is somehow horrible is elitist

What are they doing other than consuming with extra bells and whistles tacked on to it? I'm sorry, but it's not art.

ZoomZoomZoom•1h ago

> I make songs daily

Sorry for being blunt, but you do not. You receive some music matching your request from a service offered by an entity which aims to control as much of content creation and distribution as possible, up to total monopolization.

> I’m not a musician... Now I can with $10-$15 / month.

If you want to create music, do it, it doesn't require much money. If you just want to listen, there's literally thousands of authors creating all kinds of authentic, sincere, daring, skillfully performed, carefully mixed music, giving it away for next to nothing and still striving to find their listeners.

What you pay for is avoiding the effort of finding what suits you.

teitoklien•57m ago

No amigo, no artist out there made a japanese song on my anime obsessed schoolteacher friend and his life.

When he received that song from me, he was super excited for next 3-5 days and still listens to it and flexes them to their friends.

Same thing happened with a lot of my other friends and me too, I have an apple shortcut script that generates songs for me daily based on my routine for that day pulled from todoist.com

I still listen to and pay for others music and songs, but this experience with AI is entirely different.

What I pay for is not avoiding effort of finding what suits me but creating what suits me.

> You receive some music matching your request from a service offered by an entity which aims to control as much of content creation and distribution as possible, up to total monopolization.

What I receive is a high fidelity song made from prompts that i’m given full ownership of , when pooled at scale between all users allows people to make their own song generators with GPUs.

Its very nature is the opposite of monopoly. I’d love to hear how you think the big 3 corps (Universal, Sony, etc) who own all the music almost globally are not a monopoly ? Never had an experience where your spotify or apple music streaming albums disappear randomly due to those big 3 corps ?

My friend’s song of himself will never disappear that mp3 file he can store on a pen drive, load ig anywhere, gift to anyone. How is that the “monopoly” ?

ZoomZoomZoom•39m ago

> when pooled at scale between all users allows people to make their own song generators with GPUs.

I can assure you, local generation is to become a fringe activity, same as self-hosting web services, only worse, because the quality gap (which in case of software is often negligible) will be insurmountable.

> I’d love to hear how you think the big 3 corps (Universal, Sony, etc) who own all the music almost globally are not a monopoly?

It's not a monopoly, it's a cartel. Luckily, they don't own everything, though, too much they do.

> no artist out there made a japanese song on my anime obsessed schoolteacher friend and his life.

Ok, what you describe is commissioning. Yeah, you can't argue with the fact it now can be done almost free and is becoming good enough for most, but you have to keep in mind, this process had been feeding a considerable amount of artists who do it to keep producing their art. Cutting this source of income is not wrong per se, but the consequences are the opposite of supporting the diversity and abundance in arts.

teitoklien•10m ago

> but the consequences are the opposite of supporting the diversity and abundance in arts.

I made 5 songs about 5 different people in a week, with carefully crafted lyrics and tones described by me in the custom prompt, that led to 10 mp3 files of songs (suno generates 2 songs per prompt) Those songs are out there, it’s different, it’s not sloppy it’s actually quite enjoyable.

Now there is more diversity and abundance those songs wouldn’t have existed without AI and there are millions doing it like me out there, those artists who produce songs also have same tools as me, they can be better than me, faster, better, more albums now made by them, edit stuff to perfection, ideate and iterate faster. Who is stopping them ?

Tell me this song is trash and slop : https://suno.com/song/c36741d6-ec62-4922-86f9-6fd0b6f37497

This is in replacement of me and my friends listening to their same 20-40 artists who would be in billboards list each month.

Tell me it has hurt the abundance and diversity of songs out there, that it stopped someone from making their own thing or others listening to their songs, I listen to that song, it’s made by someone else with AI, I don’t mind, it’s awesome !

lxe•2h ago

I've been doing AI art since 2022 and I'm still both disappointed and not quite surprised that this still is a pervasive view of what it takes to create anything high quality using AI.

If you take any high quality AI content and ask their creator what their workflow is, you'll quickly discover that the complexity and nuance required to actually create something high-quality and something that actually "fulfills your vision" is incredibly complex.

Whether you measure quality through social media metrics, reach, or artistic metrics, like novelty or nuance, high quality content and art requires a good amount of skill and effort, regardless of the tool.

Standard reading for context: https://archive.org/details/Bazin_Andre_The_Ontology_of_Phot...

kevinventullo•2h ago

Given the pervasiveness of AI slop with hundreds of thousands of likes on Facebook, I wouldn’t be so sure about using social media metrics as proof of high skill and effort.

jplusequalt•2h ago

>If you take any high quality AI content and ask their creator what their workflow is, you'll quickly discover that the complexity and nuance required to actually create something high-quality and something that actually "fulfills your vision" is incredibly complex

This comes off as so tone deaf seeing your AI artwork is only possible due to the millions of hours spent by real people who created the art used to train these models. Maybe it's easier to understand why people don't respect AI "artists" with this in mind.

tintor•2h ago

Text prompts are very short now, but that can quickly change if prompt following improves.

Software Engineers bring their vision to life through the source code they input to produce software, systems, video games, ...

hooverd•2h ago

LLM providers want to a) make you dependent on their services as you outsource your skills and cognition and b) use that dependency to skim the cream off every economic activity.

kkarakk•1h ago

we can see what happened to opera/theater/hand drawn art as conclusive answer. humans move on to the newer more easier to create/consume thing in general (digital music/tv/digital art) and a small percentage of people treat the older mode of creation as high art coz it's more difficult and expensive to learn / implement.

gloosx•1h ago

Calling cinema a "new form" of theatre is quite a simplification. It was certainly inspired by theatre, but the two differ in almost every aspect: medium, communication language, cultural role, and audience dynamics. Most people throughout history probably never experienced theatre or opera – so they didn’t move from them to cinema; rather, cinema emerged as a more accessible and reproducible medium for those.

Theatre and opera are regarded as high art because they are performed live in front of an audience every time, demanding presence, skill, and immediacy – unlike cinema, which relies on a recorded and edited performance.

klabb3•1h ago

> but what it means is left for readers questioning, your "vision" here is your text prompt?

Right. Imo you have to be imagination handicapped to think that creative vision can be distilled to a prompt, let alone be the medium a creative vision lives in its natural medium. The exact relation between vision, artifact, process and art itself can be philosophically debated endlessly, but, to think artifacts are the only meaningful substrate at which art exists sounds like an dull and hollowed-out existence, like a Plato’s cave level confusion about what is the true meaning vs the representation. Or in a (horrible) analogy for my fellow programmers, confusing pointers to data with the data itself.

julianpye•2h ago

An indie film with poor production values, even bad acting can grip you, make you laugh and make you cry. The consistency of quality is key - even if it is poor. The directing is the red thread throughout the scenes. Anything with different quality levels interrupts your flow and breaks your experience. The problem with AI video content at this stage is that the clips are very good 'in themselves', just as LLM results are, but putting them together to let you engage beyond an individual clip will not be possible for a long time. It will work where the red thread is in the audio (e.g. a title sequence) and you put some clips together to support the thread. But Hollywood has nothing to fear at this stage. In addition, remember that visual artists are control freaks of the purest kind. Film is still used because of the grain, not despite it. 24p prevails.

doctorpangloss•41m ago

There’s already more good content than anyone can watch. It’s impossible to disentangle strength of the art from strength of distribution. Google, the world’s biggest distributor of culture, is focusing on this problem they do not need to solve, instead of the one everyone in art actually suffers from, because: they’re bad at this. It’s that simple.

sandspar•41m ago

AI video may be to Hollywood as photography was to painting. Photography wasn't "painting, but better" - it was a different thing. AI-native video may not resemble typical Hollywood 3-act structure. But if it takes enough eyeballs away from Hollywood then Hollywood will die all the same.

pedalpete•25m ago

I think you're contradicting your own argument. Painting didn't die from photography.

Photography increased the abstract and more creative aspects of painting and created a new style because photography removed much of the need to capture realism. Though, I am still entranced by realist painting style myself, it is serving different purpose than capturing a moment.

rcarr•9m ago

You might want to look up NeuralViz on YouTube. 180k subscribers. They've been building out an entire cinematic universe using AI video tools. And it's by far the funniest show I've watched in years. So the claim that "let you engage beyond an individual clip will not be possible for a long time" isn't true. People are already doing it.

https://www.youtube.com/@NeuralViz

brm•2h ago

I think it's a good thing to have more people creating things. I also think it's a good thing to have to do some work and some thinking and planning to produce a work.

curvaturearth•2h ago

The first video is problematic? the owl faces forwards then seamlessly turns around - something is very off there.

The guy in the third video looks like a dressed up Ewan McGregor, anyone else see that?

I guess we can welcome even more quality 5 second clips for Shorts and Instagram

itissid•1h ago

Who is doing all the work of making physical agents that can behave as good as a UBI generator? Something that can not just create videos, but go get groceries(hell grow my food), help a construction worker lay down tiling, help a nurse fetch supplies.

https://www.figure.ai/ does not exist yet, at least not for the masses. Why are Meta and Google just building the next coder and not the next robot?

Its because those problem are at the bottom of the economic ladder. But they have the money for it and it would create so much abundance, it would crash the cost of living and free up human labor to imagine and do things more creatively than whatever Veo 4 can ever do.

pj_mukh•1h ago

Welcome to the defining paradox of the 21st century:

https://en.wikipedia.org/wiki/Moravec%27s_paradox

BosunoB•1h ago

There are companies working on this, but my understanding is that the training data is more challenging to get because it involves reinforcement learning in physical space.

In the forecast of the AI-2027 guys, robotics come after they've already created superintelligent AI, largely just because it's easier to create the relevant data for thinking than for moving in physical space.

ericskiff•1h ago

Has anyone gotten access to Imagen 4 for image editing, inpaint/outpaint or using reference images yet? That's core to my workflow and their docs just lead to a google form. I've submitted but it feels like it's a bit of a black hole.

cryptoegorophy•1h ago

For anyone with an access, can you ask it to make a pickup truck drive through mud? I’ve tested various different AIs and they all suck with physics and tires spinning wrong way, it is just embarrassing. Demos look amazing, but when it comes to actual use - there is none that worked for me. I guess it is all to increase “investor value”

lelandbatey•39m ago

I think Google's got something going wrong with their usage limits, they're warning I'm about to hit my video limit after I gave two prompts. I have a Google AI Pro subscription (came free for 1 year with a phone) and I logged into Flow and provided exactly 2 prompts. Flow generated 2 videos per prompt, for a total of 4 videos, each ~8 seconds long. I then went to the gemini.google.com interface, selected the "Veo 2" model, and am now being told "You can generate 2 more videos today".

Since Google seems super cagey about what their exact limits actually are, even for paying customers, it's hard to know if that's an error or not. If it's not an error, if it's intentional, I don't understand how that's at all worth $20 a month. I'm literally trying to use your product Google, why won't you let me?

kapildev•39m ago

Google has partnered with Darren Aronofsky’s AI-Driven Studio Primordial Soup. I still don't understand why SAG-AFTRA's strike to ban AI from Hollywood studios didn't affect this new studio. Does anyone know?

cjkaminski•15m ago

Primordial Soup isn't a guild signatory, which means they aren't bound by the agreement negotiated during the strike. It also means they cannot hire guild actors for their projects, but that isn't a likely concern given the nature of the company.

You can try Imagen 4 at Krea

A demographic projection of Spain's native-born population

Monks Behaving Badly: Explaining Buddhist Violence in Asia

NYC Restaurant Interior or Black and White Drawing?

GameDev Assistant for Godot 1.0 walkthrough [video]

Open Source Maintainers Demand Ability to Block Copilot-Generated Issues and PRs

Show HN: Calssy - Account-free shared calendar

"ZLinq", a Zero-Allocation LINQ Library for .NET

High Levels of Arsenic Found in Rice Sold Across the U.S.

Instagram Addiction

Galileo AI is joining Google

Land under the country's largest cities is sinking

The consent you never gave: cookie pop-ups ruled unlawful under GDPR

My Tony Robbins Experience

Schrödinger lays off 60 amid uncertain times and challenging economic conditions

What's New in Flutter 3.32

OpenAI's Stargate Megafactory with Sam Altman (Bloomberg) [video]

A Guide to Prompting

Show HN: I made an app that lets founders chat with AI personas of their users

Texas poised to ban minors from social media

Fortnite is now available again on the US App Store

Frontier Models are Capable of In-context Scheming

I solved almost all of free problems on LeetCode using AI

Good American Speech

Mailr

FreeBSD and NetBSD Zig Cross-Compilation Support

Automated discovery of reprogrammable nonlinear dynamic metamaterials (2024)

Metamaterial Origami Robots [video]

Radiology explainer demo

FakeMyRun – Create custom running routes

You can try Imagen 4 at Krea

A demographic projection of Spain's native-born population

Monks Behaving Badly: Explaining Buddhist Violence in Asia

NYC Restaurant Interior or Black and White Drawing?

GameDev Assistant for Godot 1.0 walkthrough [video]

Open Source Maintainers Demand Ability to Block Copilot-Generated Issues and PRs

Show HN: Calssy - Account-free shared calendar

"ZLinq", a Zero-Allocation LINQ Library for .NET

High Levels of Arsenic Found in Rice Sold Across the U.S.

Instagram Addiction

Galileo AI is joining Google

Land under the country's largest cities is sinking

The consent you never gave: cookie pop-ups ruled unlawful under GDPR

My Tony Robbins Experience

Schrödinger lays off 60 amid uncertain times and challenging economic conditions

What's New in Flutter 3.32

OpenAI's Stargate Megafactory with Sam Altman (Bloomberg) [video]

A Guide to Prompting

Show HN: I made an app that lets founders chat with AI personas of their users

Texas poised to ban minors from social media

Fortnite is now available again on the US App Store

Frontier Models are Capable of In-context Scheming

I solved almost all of free problems on LeetCode using AI

Good American Speech

Mailr

FreeBSD and NetBSD Zig Cross-Compilation Support

Automated discovery of reprogrammable nonlinear dynamic metamaterials (2024)

Metamaterial Origami Robots [video]

Radiology explainer demo

FakeMyRun – Create custom running routes

Veo 3 and Imagen 4, and a new tool for filmmaking called Flow

Comments