Soon, you should be able to put in a screenplay and a cast, and get a movie out. Then, "Google Sequels" - generates a sequel for any movie.
All this is in line with my prediction for the first entirely AI generated film (with Sora or other AI video tools) to win an Oscar being less than 5 years away.
And we're only 5 months in.
I bet they will soon add rules that AI movies can't even compete on it.
We are about six years into transformer models. By now we can get transformers to write coherent short stories, and you can get to novel lengths with very careful iterative prompting (e.g. let the AI generate an outline, then chapter summaries, consistency notes, world building, then generate the chapters). But to get anything approaching a good story you still need a lot of manual intervention at all steps of the process. LLMs go off the rail, get pacing completely wrong and demonstrate gaping holes in their understanding of the real world. Progress on new models is mostly focused in other directions, with better storytelling a byproduct. I doubt we get to "best screenplay" level of writing in five years.
Best Actor/Actress/Director/etc are obviously out for an AI production since those roles simply do not exist.
Similar with Best Visual Effects, I doubt AI generated films qualify.
That leaves us with categories that rate the whole movie (Best Picture, Best International Feature Film etc), sound-related categories (Best Original Score, Original Song, Sound) and maybe Best Cinematography. I doubt the first category is in reach. Video Generation will be good enough in five years. But editing? Screenwriting? Sound Design?
My bet would be on the first AI-related Oscar to be for an AI generated original score or original song, and that no other AI wins Oscars within five years.
Unless we go by a much wider definition of "entirely AI generated" that would allow significant human intervention and supervision. But the more humans are involved the less it has any claim to being "entirely AI". Most AI-generated trailers or the Balenciaga-Potter-style videos still require a lot of human work
I have done quite a bit with AI generated audio/sound/music.
At some point in the process, the end result feels like your own and the models were used to create material for the end work.
At some point, using AI in the creative process will be such a given that it is left unsaid.
I would assume the screen play next year that wins the Oscar will have been helped with the aid of a language model. I can't imagine a writer not using a language model to riff on ideas. The delusional idea here is the prompt "write an Oscar winning screenplay" and that somehow that is all there is going to the creative process.
You're assuming Oscar voting is primarily driven by film quality but this hasn't been true for a long time (if it ever was). Many academy voters are biased by whatever cultural and political trends are currently ascendant among the narrow subset of Hollywood creatives who belong to the academy (the vast majority of people listed in movie credits will never be academy voters). Due to the widespread impact of Oscar wins in major categories, voters heavily weight meta-factors like "what should the Hollywood community be seen as endorsing?"
No issue in recent memory has been as overwhelmingly central as AI replacing creatives among the Hollywood community. The entire industry is still recovering from the unprecedented strikes which shut down the industry and one of the main issues was the use of AI. The perception of AI use will remain cultural/political poison among the rarified community of academy voters for at least a decade. Of course, studios are businesses and will hire vendors who use AI to cut costs but those vendors will be smart enough to downplay that fact because it's all about perception - not reality. For the next decade "AI" will be to Academy-centric Hollywood what "child labor" is to shoe manufacturing. The most important thing is not that it doesn't happen, it's ensuring there's no clear proof it's happening - especially on any movie designed to be 'major category Oscar-worthy' (such films are specifically designed to check the requisite boxes for consideration from their inception). predict that in the near-term AI in the Oscars will be limited to, at most, a few categories awarded in the separate Technical Oscars (which aren't broadcast on TV or covered by the mainstream media).
This "fixes" Hollywood's biggest "issues". No more highly paid actors demanding 50 million to appear in your movie, no more pretentious movie stars causing dramas and controversies, no more workers' unions or strikes, but all gains being funneled directly to shareholders. The VFX industry being turned into a gig meatgrinder was already the canary in the coal mine for this shift.
Most of the major Hollywood productions from the last 10 years have been nothing but creatively bankrupt sequels, prequels, spinoffs and remakes, all rehashed from previous IP anyway, so how much worse than this can AI do, since it's clear they're not interested in creativity anyway? Hell, it might even be an improvement than what they're making today, and at much lower cost to boot. So why wouldn't they adopt it? From the bean counter MBA perspective it makes perfect sense.
Except it bankrupts Hollywood, they are no longer needed. Of people can generate full movies at home, there is no more Hollywood.
The end game is endless ultra personalized content beamed into people's heads every free waking hour of the day. Hollywood is irrelevant in that future.
That's why I think Hollywood is rushing to adopt gen-AI, so they can churn out personalized content faster and cheaper straight to streaming, at the same rate as indie producers.
LLMs have been in the oven for years longer than this, and I'm not seeing any signs of people generating their own novels at home. Well, besides the get-rich-quick grifters spamming the Kindle store with incoherent slop in the hopes they can trick someone into parting with a dollar before they realize they've been had.
Most humans are also not good at writing great scripts/novels either. Just look at the movies that bring in billions of dollars at the box office. Do you think you need a famous novelist to write you a Fast & Furious 11 script?
Sure, there are still great writers that can make scripts that tickle the mind, but that's not what the studios want anymore. They want to push VFX heavy rehashed slop that's cheap to make, easy to digest for the doom-scrolling masses of consumers, and rakes in a lot of money.
You're talking about what makes gourmet Michelin star food but the industry is making money selling McDonals.
The good "creators" are already making bank, helped by app algorithms matching people up to content they'll find addictive to view.
The content doesn't have to be good it just has to be addictive for 80% of the population.
Whatever gets the views.
Then the first fully non-human (but human-like) actors will be created and gain popularity. The IP of those characters will be more valuable than the humans they replaced. They will be derided by old people as "Mickey Mouse" AI actors. The SAG will be beside themselves. Younger people will not care. The characters will never get old (or they will be perfectly rendered when they need to be old).
The off-screen dramas and controversies are part of the entertainment, and these will be manufactured too. (If there will even be an off-screen...)
This is the future, and we've been preparing for it for years by presenting the most fake versions of ourselves on social media. Viewers have zero expectation of authenticity, so biological status is just one more detail.
It will be perfect, and it will be awful. Kids born five years from now will never know anything different.
Very few actors have an appearance or a voice worth a lot in licenses. That's like the top 1% of actors, if that.
I think if done right, humans could also end up getting emotionally attached to 100% AI generated characters, not just famous celebrities.
So the appearance licenses for these 1% are valuable in Stage 1 of the takeover.
The rest are just forgotten collateral damage. Hollywood is full of 'em.
Origami for me was more audio than video. Felt like it's exactly how it would sound.
Now it just takes giant compute clusters and inference time.
With the media & entertainment hungry world which is about to get worse with the unempoyed/underemployed tiktok generation needing "content", something like this has to have a play.
Nowadays when I randomly open a news website to read some article, at the bottom of the page all the generic "hack to lose your belly" or "doctors recommend weird japanese device" or "how seniors can fly business class", I've been noticing lately 1/3rd of the images seem to be AI generated...
I simply don't think it's fair to cheat service providers when we don't like their service. You have a choice, and that choice is to not use that service at all. They're providing it under the terms that it is ad-supported. If you don't want to support it, but you still want to use it, then you're cheating someone. That is dishonest and unethical.
Advertisement-Permission: [required|requested]
And my adblockers had a config option to abort pageloads with an appropriate error message, if `required` or `requested`, then I would use it happily.In the meantime, I'm browsing every site with all content blockers set at maximum, because any other choice is incomprehensible on the modern web.
If I consequently visit some sites that want me to consume advertising of which I am unaware, then that is entirely their issue, not mine.
A lot of content is like this - you just need an approximation to sell an idea, not a perfect reproduction. Makes way more sense to have AI generate you a quick image for a sight gag than to have someone spend all day trying to comp it by hand. And as AI imagery gets more exposure in these sort of scenarios, more people will be accustomed to it, and they'll be more forgiving of its faults.
The bar for "good enough" is gonna get a lot lower as the cost of producing it comes way down with AI.
Drive the storytelling, consult with AI on improving things and exploring variations.
Generate visuals, then adjust / edit / postprocess them to your liking. Feed the machine your drawings and specific graphic ideas, not just vague words.
Use generated voices where they work well, record real humans where you need specific performance. Blend these approaches by altering the voice in a recording.
All these tools just allow you to produce things faster, or produce things at all such that would be too costly to shoot in real life.
In 2 years we have moved from AI video being mostly a pipe dream to some incredible clips! It’s not what this is like now, but what will it be like in 10 years!
Extrapolating that technology will get better in the future when it has got better in the past isn’t a sure bet, but it’s a reasonably reliable one.
Of course it's an indicator of future performance - Not a guarantee, but certainly a indictator.
Now it's "good enough" for a lot of cases (and the pace of improvement is astounding).
AI is still not great at image gen and video gen, but the pace of improvement is impressive.
I'm skeptical image, video, and sound gen are "too difficult" for AI to get "good enough" at for many use cases within the next 5 years.
Also "create static + video ads that are 0-99% complete" suggests the performance is hit or miss.
This is probably one of the better known benchmarks but when I see Midjourney 7 and Imagen3 within spitting distance of each other it makes me question what kind of metrics they are using.
My guess as to determining whether it's 64 attempts to a pass for one and 5 attempts to a fail for another is simply "whether or not the author felt there was a chance random variance would result in a pass with a few more tries based on the initial 5ish". I.e. a bit subjective, as is the overall grading in the end anyways.
If there's only a few attempts and ends in a failure, there's a pretty good chance that I could sort of tell that the model had ZERO chance.
It's a very interesting resource to map some of the limits of existing models.
I want to interrupt all of this hype over Imagen 4 to talk about the totally slept on Tencent Hunyuan Image 2.0 that stealthily launched last Friday. It's absolutely remarkable and features:
- millisecond generation times
- real time image-to-image drawing capabilities
- visual instructivity (eg. you can circle regions, draw arrows, and write prompts addressing them.)
- incredible prompt adherence and quality
Nothing else on the market has these properties in quite this combination, so it's rather unique.
Release Tweet: https://x.com/TencentHunyuan/status/1923263203825549457
Tencent Hunyuan had a bunch of model releases all wrapped up in a product that they call "Hunyuan Game", but the Hunyuan Image 2.0 real time drawing canvas is the real star of it all. It's basically a faster, higher quality Krea: https://x.com/TencentHunyuan/status/1924713242150273424
More real time canvas samples: https://youtu.be/tVgT42iI31c?si=WEuvie-fIDaGk2J6&t=141 (I haven't found any other videos on the internet apart from these two.)
You can see how this is an incredible illustration tool. If they were to open source this, this would immediately become the top image generation model over Flux, Imagen 4, etc. At this point, really only gpt-image-1 stands apart as having godlike instructivity, but it's on the other end of the [real time <--> instructive] spectrum.
A total creative image tool kit might just be gpt-image-1 and Hunyuan Image 2.0. The other models are degenerate cases.
More image samples: https://x.com/Gdgtify/status/1923374102653317545
If anyone from Tencent or the Hunyuan team is reading this: PLEASE, PLEASE, PLEASE OPEN SOURCE THIS. (PLEASE!!)
If Tencent wants to keep Google from winning the game, they should open source their models. From my perspective right now, it looks like Google is going to win this entire game, and open source AI might be the only way to stop that from being a runaway victory.
In this AI rat race, whenever one model gets ahead, they all tend to reach parity within 3-6 months. If you can wait 6 months to create your video I'm sure Imagen 5 will be more than good enough.
It's honestly kind of ridiculous the pace things are moving at these days. 10 years ago waiting a year for something was very normal, nowadays people are judging the model-of-the-week against last week's model-of-the-week but last week's org will probably not sleep and they'll release another one next week.
I don’t know which is more important, but I would say that people mostly won’t pay for fun but disposable images, and I think people will pay for art but there will be an increased emphasis on the human artist. However users might pay for reliable tools that can generate images for a purpose, things like educational illustrations, and those need to be able to follow the spec very well.
I'd love to see some financials but I'd tend to agree they're probably doing pretty well.
- wine glass that is full to the edge with wine (ie. not half full)
- wrist watch not showing V (hands at 10 and 2 o'clock)
- 9 step IKEA shelf assembly instruction diagram
- any kind of gymnastics / sport acro
I mean, it's a fun edge case, but I'm practice - does it matter?
*in practice, not I'm practice. (I swear I have a point, I'm not being needlessly pedantic.) In English, in images, mistakes stick out. Thus negative prompts are used a lot for iterative image generation. Even when you're working with a human graphics designer, you may not know what exactly you want, but you know that you don't want (some aspect of) the image in front of you.
Ie: "Not that", for varying values of "that".
Are they still? The negative keywords were popular in SD era. The negative prompt was popular with later models in advanced tools. But modern iterations look different - the models capable of editing are perfectly fine with processing the previous image with a prompt "remove the elephant" or "make the clock show a different time". Are the negative parts in the initial prompt still actually used in iteration?
Hmm.
Sorry to dunk so hard, but your example of technology stagnating is actually an example of breakthrough technological innovation deep into a product’s lifecycle: the very thing you were trying to say doesn’t happen.
If you want to doubt that it was in fact a not a turning point you'd need to provide very strong arguments.
In my view the reason the iPhone felt so new was almost entirely the incredibly responsive capacitive touch screen with a finger ui, everything I'd used before it did resistive and preferred pen for detail. Pen actually is better for detail so in some ways it was that more than anything else that turned the device from a creation device to a consumption device which was whole new way of thinking about smart personal devices.
Of course it was also sold in a decent package too where Apple did deals that ensured it was available with good mobile internet plans which were also unusual at the time.
Half a year ago, that was sort of possible for some genius really bent on making it happen. A year ago, that was unthinkable. Today, it's a matter of drag&dropping a workflow to a fresh ComfyUI install and downloading a couple dozen GB of img2vid models.
The returns on R&D are not diminishing, the progress is just not happening everywhere evenly and at the same time.
To clarify this test is purely a PASS/FAIL - unsuccessful means that the model NEVER managed to generate an image adhering to the prompt. So as an example, Midjourney 7 did not manage to generate the correct vertical stack of translucent cubes ordered by color in 64 gen attempts.
It's a little beyond the scope of my site but I do like the idea of maintaining a more granular metric for the models that were successful to see how often they were successful.
Cool site btw! Thanks for sharing.
Actually, search engines do this this too: Google something with many possible meanings -- like "egg" -- on Google, and you'll get a set of intentionally diversified results. I get Wikipedia; then a restaurant; then YouTube cooking videos; Big Green Egg's homepage; news stories about egg shortages. Each individual link is very unlike the others to maximize the chance that one of them's the one you want.
Not sure if this affects your results or not but I resist chiming in!
I wonder how much the commonality or frequency of names for things affects image generation? My hunch is that it it roughly correlates and you'd get better results for terms with more hits in the training data. I'd probably use Google image search as a rough proxy for this.
On a more societal level, I'm not sure continuously diminishing costs for producing AI slop is a net benefit to humanity.
I think this whole thing parallels some of the social media pros and cons. We gained the chance to reconnect with long lost friends—from whom we probably drifted apart for real reasons, consciously or not—at the cost of letting the general level of discourse to tank to its current state thanks to engagement-maximizing algorithms.
This naming seems very confusing, as I originally thought there must be some connection. But I don't think there is.
But then again, the do no evil motto is long gone, so I guess anything goes now?
Its something that is only obvious when it is obvious. And the more obvious examples you see, the more non-obvious examples slip by.
If you look at the shadows in the background, you can see how they appear and disappear, how things float in the air, and have all the AI artifacts. The video is also slowed down (lower FPS) to overcome the length limit of AI video generator.
But the point is not how we can spot these, because it's going to be impossible, but how the future of news consumption is going to look like.
[1] https://www.tiktok.com/@calm.with.word/video/750583708327412...
I don't believe it's entirely fake, just enhanced.
The demo videos for Sora look amazing but using it is substantially more frustrating and hit and miss.
Not in 10 years but now.
People who just see this as terrible are wrong. AI improving curves is exponential.
People adaptability is at best linear.
This makes me really sad. For creativity. For people.
Of course this is not because of AI. It's because of the ridiculous system of social organization where increased automation and efficiency makes people worse off.
These larger companies are clearly going after the agency/hollywood use cases. It'll be fascinating to see when they become the default rather than a niche option - that time seems to be drawing closer faster than anticipated. The results here are great, but they're still one or two generations off.
Plus in local generation you're not limited by the platform moderation that can be too strict and arbitrary and fail with the false positives.
Yes comfy UI can be intimidating at first vs an easy to use chatgpt-like ui, but the lack of control make me feel these tools will still not being used in professional productions in the short term, but more in small YouTube channels and smaller productions.
Foundation models are starting to outstrip any consumer hardware we have.
If Nvidia wants to stay ahead of Google's data center TPUs for running all of these advanced workloads, they should make edge GPU compute a priority.
There's a future where everything is a thin client to Google's data centers. Nvidia should do everything in its power to prevent that from happening.
Last time I checked, they couldn't produce enough H100s/GB100s to satisfy demand from everyone and their mother running a data center. And their most recent consumer hardware offerings have been repeatedly called a "paper launch" - probably because consumer hardware isn't a priority, given the price (and profit) delta.
Nobody is running H100s at home, nor are most video companies running ones. So the choice for them is to "rent" them from Google, or... invest a lot in almost impossible to obtain Nvidia hardware? One has lower initial cost, and is available now.
But as long as Google isn't their _only_ customer, why would Nvidia care?
there has always been, the mainframe concept is not new. but it goes in and out of fashion.
>>>> mainframe
<<<< personalpc
>>>> web pages/social media
<<<< personal phones/edge
>>>> cloud ai
<<<< ???? personal robotics, chips and ai ???
>>>> ???? rented swarms ???
It's for advertising.
The Tencent Hunyuan team is cooking.
Hunyuan Image 2.0 [1] was announced on Friday and it's pretty amazing. It's extremely high quality text-to-image and image-to-image with millisecond latency [2]. It's so fast that they've built a real time 2D drawing canvas application with it that pretty much duplicates Krea's entire product offering.
Unfortunately it looks like the team is keeping it closed source unlike their previous releases.
Hunyuan 3D 2.0 was good, but they haven't released the stunning and remarkable Hunyuan 3D 2.5 [3].
Hunyuan Video hasn't seen any improvements over Wan, but Wan also recently had VACE [4], which is a multimodal control layer and editing layer. The Comfy folks are having a field day with VACE and Wan.
[1] https://wtai.cc/item/hunyuan-image-2-0
[2] https://www.youtube.com/watch?v=1jIfZKMOKME&t=1351s
[3] https://www.reddit.com/r/StableDiffusion/comments/1k8kj66/hu...
Generating a long video one shot at a time kind of makes sense, as long as there's good consistency between shots
But what it means, that with time, Opensource will be as good as what commercial offerings now have. Hardware will get cheaper, research is open or delayed open.
It makes me sad, though. I wish we were pushing AI more to automate non-creative work and not burying the creatives among us in a pile of AI generated content.
Just wanted to add representation to that feeling
Creativity is a conversation with yourself and God. Stripping away the struggle that comes with creativity defeats the entire purpose. Making it easier to make content is good for capital, but no one will ever get fulfillment out of prompting an AI and settling with the result.
It'll lower the barrier of entry (and therefore the quality floor before people feel comfortable sharing something "they made" if they can deflect with an easy "the AI made this" versus "I put XY0 hours into this"), but it'll also empower people who wouldn't otherwise even try to create new things and, presumably, follow their passion to learn and do more.
Here's something you can try to prove it to yourself. Sit down and write a novel. It'll be like squeezing blood out of a rock unless your heart is ready to do it freely. You'll see that if you force yourself through hard work to do it, you'll just end up with something that people will laud as creative due to the execution but it'll lack everything about free-flowing creativity. Good programmers are lazy, so are good creatives, but now I'm just repeating myself.
It's a lot easier squeezing blood out of a heart, especially for the lazy.
It’s not important to me that they do.
> Im sure few will continue to do so but creativity will certainly take a big hit.
I’ve seen the workflows for AI generated films like https://youtu.be/x6aERZWaarM?si=J2VHYAHLL3og32Ix and I find it to be very creative. Its more interesting to me that this person would never have raised capital and tried to direct this, but this is much closer to what they wanted to create. I’m also entertained by it, whether I was judging it for generative AI issues or not.
its not really anyone's problem, and generally limited to the people that made way too much of their identity to be based on a single field, that they feel they have to gatekeep it
its great that people can express themselves closer to their vision now
Nothing prevents human from continue doing just that, precisely because it brings joy and satisfaction. Painting, photography classes are still popular, if not more, in the age of digital photography.
Isn't the creativity in what you put in the prompt? Isn't spending hundreds of hours manually creating and rigging models based on existing sketch the non-creative work that is being automated here?
Of course that's not what I believe, but let's not limit the definition of what creativity based on historical limitations. Let's see what the new generation of artists and creators will use this new capability to mesmerize us!
Their placement of books. Their aesthetic. The collection of cool things to put into a scene to make it interesting. The lighting. Not yours. Not from you/not from the AI. None of it is yours/you/new/from the AI. It's ALL based underneath on someone else's work, someone else's life, someone else's heart and soul, and you are just taking it and saying 'look what I made'. The equivalent of a 4 year old being potty trained saying 'look I made a poop'. We celebrate it as a first step, not as the friggen end goal. The end goal is you making something uniquely you, based on your life experience, not on Bob the prop guys and Betty the set designer whose work/style you stole and didn't even have the decency to reference/thank.
And your prompt won't ever change dramatically, because there isn't going to be much new truly creative seedcorn for AI to digest. Entertainment will literally go into limbo/Groundhog Day, just the same generative, derivative things/asthetics from the same AI dataset.
All of these are just human being exposed more to life and learning new skills, in other words -- having more data. LLM already learns those skills and encounters endless experience of people in its training data.
> I hate this argument
That's very subjective. You don't know how the brain works.
> That's very subjective
I was expressing my opinion of this argument which absolutely is subjective
> You don't know how the brain works.
Neither does grandparent comment's author, didn't stop them from making much bolder claims.
If I see a painting, I see an interpretation that makes me think through someone else's interpretation.
If I see a photograph, I don't analyze as much, but I see a time and place. What is the photographer trying to get me to see?
If I see AI, I see a machine dithered averaging that is/means/represents/construes nothing but a computer predicted average. I might as well generate a UUID, I would get more novelty. No backstory, because items in the scene just happened to be averaged in. No style, just a machine dithered blend. It represents nothing no matter the prompt you use because the majority is still just machine averaged/dithered non-meaning. Not placed with intention, focused with real vision, no obvious exclusions with intention. Just exactly what software thinks is the most average for the scene it had described to it. The better AI gets, the more average it becomes, and the less people will care about 'perfectly average' images.
It won't even work for ads for long. Ads will become wild/novel/distinct/wacky/violations of AI rules/processes/techniques to escape and belittle AI. To mock AI. Technically perfect images will soon be considered worthless AI trash. If for no other reason than artists will only be rewarded for moving in directions AI can't going forward. The second Google/OpenAI reach their goal, the goal posts will move because no one wants procedural/perfectly average slop.
Merely changing a seed number will provide endless different outputs from the same single prompt from the same model; rng.nextInt() deserves as much artist credit as the prompter.
Imagine a world where you have a scene fully sketched out in your head (i.e. creativity), you have the script of what will happen, sketches of what the scene looks light, visual style, etc. You want to make that become reality. You could spend a ton of time and money, or you could describe it and provide sketches to an AI to make it come true.
Yes, the limitations in the former can make you take creative shortcuts that could themselves be interesting, but the latter could be just as true to your original vision.
Personally I can't wait to see the new creative doors ai will open for us!
Rap is really the best example of how stupid these discussions are.
The language models are amazing at rhyming so by the logic then anyone can become a rapper and that is going to put current rappers out of business.
Only someone who has never tried to rap could possibly believe this. Same thing with images, same thing with music, same thing with literally everything involving human creativity.
"Creating" with an AI is like an executive "inventing" the work actually done by their team of researchers. A team owner "winning" a game played by the their team.
That being said, AI output is very useful for brainstorming and exploring a creative space. The problem is when the brainstorming material is used for production.
This would include almost everyone who’s used any editing software more advanced than photoshop CS4.
You could come up with your own story and direct the AI to generate it for you.
A film director is a creative. Ultimately, they are in charge of "visualizing" a screenplay": the setting, the the design of the set or the utilization of real locations, the staging of the actors within a scene, the "direction" of the actors (i.e., how they should act out dialog or a scene, lighting, the cinematography, the use of stunts, staging shots to accommodate the use of VFX, the editing (meaning, the actual footage that comprises the movie).
There's an old show on HBO, Project Greenlight, that demonstrates what a director does. They give 2 directors the same screenplay and budget and they make competing movies. The competing movies are always completely different...even though they scripts are the same. (In the most extreme example from one of the later seasons, one of the movies was a teen grossout comedy, and the competing movie was some sort of adult melodrama.)
2. Using AI can be can be an iterative process. Generate this scene, make this look like that, make it brighter colors, remove this, add this, etc. That's all carefully crafting the output. Now generate this second scene, make the transition this way, etc. I don't see how that's at all different from a director giving their commands to workers, except now you actually have more creative control (given AI gets good enough)
2. Current AI can't do what you're describing, so the biggest difference is that you're posing a hypothetical against the real world. But more specifically: the director already has a specific vision in their hand; the purpose of the "direction" is to bring this vision into reality within the scope of their budget and resources. With AI, you have a general idea and the AI creates its own vision and you pick what you like the best, until you ultimately realize the AI isn't going to get what you actually want and you settle for the best the AI can do for you. So, completely different.
That's what direction is though. Film directors prompt their actors and choose the results they like best (among many other commands to many other groups)
>You're selecting from the results it outputs, but you're not controlling the output.
The prompt controls the output (and I bet you'd have more control over the AI than you'd have over a drunk Marlon Brando)
Both things which were dismissed as not art at first but are widely accepted as an art medium nowadays.
There's a line to be drawn somewhere between artist and craftsperson. Creating beautiful things to a brief has always been a teachable skill, and now we're teaching it to machines. And, we've long sought to mass-produce beautiful things anyway. Think textiles, pottery, printmaking, architectural adornments.
Can AI replace an artist? Or is it just a new tool that can be used, as photography was, for either efficiency _or_ novel artistic expression?
The work a camera does is capturing the image in front of the photographer. "Art" in the context of photography is the choice of what in the image should be in focus, the angle of the shot, the lighting. The camera just captures that; it doesn't create anything that isn't already there. So, not even remotely the same thing as AI Gen.
The work of Krita/Inkscape/etc (and technically even Photoshop) is to convert the artistic strokes into a digital version of how those strokes would appear if painted on a real medium using a real tool. It doesn't create anything that the artist isn't deliberately creating. So, not even remotely the same thing as AI Gen.
AI Gen, as demonstrated in the linked page and in the tool comparison, is doing all of the work of generating the image. The only work a human does is to select which of the generated images they like the best, which is not a creative act.
AI cannot “democratize art” any more than the camera did, until the day it starts teaching artistry to its users.
It almost definitely can start teaching artistry to its users, and the same people who are mad in this thread will be mad that it's taking away jobs from art instructors.
The central problem is the same and it's what Marshall Brain predicted: If AI ushers in a world without scarcity of labor of all kinds, we're going to have to find a fundamentally new paradigm to allocate resources in a reasonably fair way because otherwise the world will just be like 6 billionaire tech executives, Donald Trump, and 8 billion impoverished unemployed paupers.
And no, "just stop doing AI" isn't an option, any more than "stop having nuclear weapons exist" was. Either we solve the problems, or a less scrupulous actor will be the only ones with the powerful AI, and they'll deploy it against us.
I've tried AI image generation myself and was not impressed. It doesn't let me create freely, it limits me and constantly gravitates towards typical patterns seen in training data. As it completely takes over the actual creation process there is no direct control over the small decisions, which wastes time.
Edit: another comment about a different meaning of accessibility: the flood of AI content makes real content less accessible.
Other disapproval comes from different emotional places: a retreading of ludditism borne out of job insecurity, criticism of a dystopia where we've automated away the creative experience of being human but kept the grim work, or perceptions of theft or plagiarism.
Whether AI has worked well for you isn't just irrelevant, but contrarian in the face of clear and present value to a lot of people. You can be disgusted with it but you can't claim it isn't there.
I have seen what 99% of people are doing with this "clear and present value". Turns out when you give people a button to print dopamine they probably aren't going to create the next Mona Lisa, they're just going to press the button more. Even with AI, creating compelling art is still a skill that needs to be learned, and it's still hard. And why would they learn a skill when they just decided against learning a skill? Incentives matter, and here the incentives massively favor quantity over quality.
> Whether AI has worked well for you isn't just irrelevant
My point was that it's creatively restrictive, and that current models tend to actively steer away from creative outputs. But if you want to limit yourself to what corporations training the models and providing the cloud services deem acceptable, go ahead.
Gatekeeping commonly means excluding others from a group, a label, or an identity. That’s what they’re referring to.
https://www.youtube.com/@NeuralViz
It would have cost millions. Now one person can do it with a laptop and a few hundred dollars of credits a month.
AI is 100% making filmmaking more accessible to creative people who otherwise would never have access to the kind of funding and networks required to realise their visions.
None of the videos I've clicked on required AI for the content to be good, and some of the randomness has no real reason to be there.
Also, they're painfully unoriginal. They're just grabbing bits that The Onion or shows like Rick & Morty have been doing and putting a revolting AI twist to it. It screams to me of 0 effort slop made for the sole purpose of generating money from morons with no creativity clicking on it and being bemused for 10 seconds
Because many other kinds of art require thousands of hours to learn before getting to the level of current AI
The real gate keeper to art isn't the cost of a pencil, it's the opportunity cost of learning how to use it
Some people have creative ideas they cannot realise and tools like AI help them do it. The more people that can realise their creative ideas the better it is for everyone.
There is a fundamental misunderstanding of creating going on here. There's a reason why people always talk about the work/journey/process rather than the end goal. That's what makes someone an artist--not the end result.
You are consuming, not creating.
Going outside and taking a photo requires you to be engaging with the scene around you. You are the actor doing the thing. When you prompt an AI, the AI is the actor doing the thing.
Not to mention that many directors also write the script for the film.
N.B. If directing people is the distinguishing factor, then animation directors are robbed of their claim to artistry as well, as is basically any solo artist in history
Accessibility -- and I don't mean this in the sense particular to disability -- is highly personal; its not so much that it is more accessible, as that it is differentlly-accessible.
> I've tried AI image generation myself and was not impressed. It doesn't let me create freely, it limits me and constantly gravitates towards typical patterns seen in training data. As it completely takes over the actual creation process there is no direct control over the small decisions, which wastes time.
No offense, but you've only tried the most basic form of AI image generation -- probably something like pure text-to-image -- if that's what you are finding. Sure, that's definitiely what the median person doing AI image gen does, dumping a prompt in ChatGPT or Midjourney or whatever. But its not all that AI image generation offers. You can have as much or as little control of the small (and large) decisions as you want when using AI image generation.
Sometimes I feel like HN comments are working so hard to downplay AI that they’ve lost the plot.
It’s more accessible because you can accomplish amazing storytelling with prompts and a nominal amount of credits instead of spending years learning and honing the art.
Is it at the same level as a professional? No, but it’s more than good enough for someone who wants to tell a story.
Claiming that computer access is too high of a bar is really weird given that computers and phones (which can also operate these sites) are ubiquitous.
> Edit: another comment about a different meaning of accessibility: the flood of AI content makes real content less accessible.
No it does not. Not any more than another person signing up for YouTube makes any one channel less “accessible”. Everyone can still access the content the same.
The gates are wide open for those that want to put in effort to learn. What AI is doing to creative professionals is putting them out of a job by people who are cheap and lazy.
Art is not inaccessible. It's never been cheaper and easier to make art than today even without AI.
> Personally I can't wait to see the new creative doors ai will open for us!
It's opening zero doors but closing many
---
What really irks me about this is that I have _seen_ AI used to take away work from people. Last weekend I saw a show where the promotional material was AI generated. It's not like tickets were cheaper or the performers were paid more or anything was improved. The producers pocketed a couple hundred bucks by using AI instead of paying a graphic designer. Extrapolate that across the market for arts and wonder what it's going to do to creativity.
It's honestly disgusting to me that engineers who don't understand art are building tools at the whims of the financiers behind art who just want to make a bit more money. This is not a rising tide that lifts all ships.
Why is effort a requirement?
Why should being an artist be a viable job?
Would you be against technology that makes medical doctors obsolete?
That's how human brains work. People have an intrinsic need to sort, build hierarchies and prioritize. Effort spent is one of viable heuristics for these processes.
> Why should being an artist be a viable job?
Art itself has great value, if it weren't, museums, theaters and live shows wouldn't exist.
> Would you be against technology that makes medical doctors obsolete?
The analogy doesn't work. The results of a medical process is a [more] healthy person. The result doesn't have any links to the one performing it. Result of an artistic creative process is an art piece, and art is tied to its creator by definition.
This is an expansive definition and thus not useful, because it would include:
1. Natural phenomena (sand on a vibrating plate is pretty cool).
2. Folk crafts (this hand-woven rug sure ties the room together!).
3. Advertisement.
4. Industrial design (this soap dispenser looks like a droid head, awesome!).
5. Drug induced experiences.
6. Art forgery and plagiarism.
Nothing in the list is really art. Rough definition of art is an intentional process (or the results thereof) of self-expression, and/or interpretation/modeling of reality performed with symbolic means. This implies intentionality and a conscience, which current "AI" doesn't have.
> AI lowers the discoverability of art that falls in the latter category, but I'd say that it's a solvable problem that can be fixed with better recommendation algorithms.
Theoretically it is. However, it won't be ever solved and implemented widely due to the lack of incentives and the fact that just replacing it all with "AI" is much more profitable and exploitable.
I don't think it's worthwhile to explain the inherent value of human created art or that to learn how to do it one must put some effort into it. All I can say is, if you are one of those people who do not understand art, please don't build things that take away someone else's livelihood without very good reason.
I don't think the majority of AI generation for art is useful for anything but killing artists.
Because we as a society have valued it as one for eternity.
Zero-effort output generators like prompting means people are just generating trash that they themselves don't even care about. So why should I take my time to watch/experience that?
The whole "GenAI is accessible" sentiment is ridiculous in my opinion. Absolutely nothing is stopping people from learning various art mediums, and those are skills they'll always have unlike image generators which can change subscription plans or outright the underlying model.
Absolutely no one should be lauding being chained to a big corp's tool/model to produce output.
---
Why should being an artist be a viable job? Well, people should get paid for their work. That applies to all domains except technical people love to look down on art while still wanting to watch movies, well produced youtube videos, etc. You can see it in action here on HN frequently: someone will link a blog post they took time to write and edit... and then generate an image instead of paying an artist. They want whatever effect a big header image provides, but are not willing to pay a human to do it or do it themselves. Somehow because it's "just art" it's okay to steal.
---
If tech has progressed to the point of true "general artificial intelligence", then likely all jobs will be obsolete and we're going to have to rethink this whole capitalism thing.
I think all industries should be utilizing tech to augment their capabilities, but I don't think actual people should be replaced with our current guesstimator/bullshitter "AI". Especially not critical roles like doctors and nurses.
On jobs: craftsmanship is slightly different than art: industries are built with people who can craft, there is today an artistic part in it but it's not the essence of the job: the ads industry can work with lower quality ads provided they can spam 10x. There is however an overlap between art/craftmanship: a lot of people working in these industries can today be in a balance where they live with a salary and dedicate time to explore their mediums. We know what will happen when the craftmanship part is replaced by AI, being an artist will require to have the balance in the first place.
It feels like a regression: it leads to a reduction of ideas/explorations, a saturation of the affected mediums, a loss of intent. Eager to see what new things come out of it though.
And who owns the AI?
It’s delusional. Stop falling for the mental jiu Jitsu from the large AI labs. You are not becoming an artist by using a machine to make art for you. The machine is the artist. And you don’t own it.
So the bad news is people are just insecure, jealous, pedantic, easy to offend, highly autistic - and these are the smart ones.
The good news, is with dead internet theory they will all be replaced with bots that will atleast be more compelling make some sort of sense.
"Oh you're posting on hackernews, if you don't suck google's dick and every single gadgets megacorps shit out you must be highly autistic".... interesting take
Sorry I being autistic didn’t even phrase it well I think people got too offended I was just saying the bad side to it, HN is great.
Couple of people were so upset at the suggestion they replied in a defensive manner I should have been more careful with my rant
What a weird way to spell "give $200 a month to google"
Similarly with music, prior to recording tech, live performance was where it was at.
You could look at the digital era as a weird blip in art history.
Have a look at the workflow and agent design patterns in this video by youtuber Nate Herk when he talks about planning the architecture:
https://m.youtube.com/watch?v=Nj9yzBp14EM
There’s less talk about automating non-creative work because it’s not flashy. But I can promise it’s a ton of fun, and you can co-design these automations with an LLM.
Making a movie is not accessible to most people and it's EVERYONES dream. This is not even there yet, but I have a few movies I need to make and I will never get a cast together and go do it before I die. If some creatives need to take a backseat so a million more creatives can get a chance, then so be it.
AI creatives can enjoy the brief blip in time where they might get someone else to watch what they've created before their skills become obsolete in an exponentially faster rate just like everyone else's.
It's just not what gets the exciting headlines and showcases
We _could_ use this to empower humans, but many of us instinctively know that it will instead be used to crush the human spirit. The end result of this isn’t going to be an expansion of creative ability, it’s going to be the destruction of creative jobs and the capture of these creative mediums by a few large companies.
I agree , but that's the negative. The positive will be that almost any service you can imagine (medical diagnosis, tax preparation, higher education) will come down to zero, and with a lag of perhaps a decade or two it will meet us in the physical world with robo-technicians, surgeons and plumbers. The cost of building a new house or railway will plummet to the cost of the material and the land, and will be finished in 1/10 of the time it takes today. The main problem to me is that there's a lag between the negatives and the positives. We're starting out with the negatives and the benefits may take a decade or two to reach us all equally.
Why would you want massive amounts more of those things? In fact I might even argue that medicine, taxation and education are a net negative on society already. And that to the extent that there seems to be scarcity, it's mainly a distribution problem having to do with entrenched interests and bureaucracy.
> The cost of building a new house or railway will plummet to the cost of the material and the land
That's is the actual scarcity tho.
I'm not sure what you mean. In my country getting a specialist to take a look at you can take weeks, the scarcity is that there's not enough doctors. For sure many people get delayed and suboptimal diagnosis (even if you finally get to see the specialist, he may have 10 mintues for you and 50 other patients to see that day). A.I can simply solve this.
> The cost of building a new house or railway will plummet to the cost of the material and the land
>> That's is the actual scarcity tho
Not necessarily, the labor costs a tremendous amount, and also it might be that we don't need to cram tens of millions of people around cities anymore if most work is automated, we can start spreading out (again, this will take decades and I'm not denying we have pressing problems in the immediate future).
The same was said about the camera or photoshop.
There is a more sensical distinction between work that is informational in nature, and work that is physical and requires heavy tools in hard-to-reach places. That's hard to do for big tech, because making tests with heavy machinery is hard and time consuming
good lord. talk about pedantic.
They all got smoked by Google with what they just announced.
Sora, the image model (gpt-image-1), is phenomenal and is the best-in-class.
I can't wait to see where the new Imagen and Veo stack up.
Google what is this?
How would anyone use this for a commercial application.
I mean obviously the answer is "no" and this is going to get a bunch of replies saying that inventors are not to blame but the negative results of a technology like this are fairly obvious.
We had a movie two years ago about a blubbering scientist who blatantly ignored that to the detriment of his own mental health.
Thank you, researchers, for making our world worse. Thank you for helping to kill democracy.
I imagine video is a far tougher thing to model, but it's kind of weird how all these models are incapable of not looking like AI generated content. They all are smooth and shiny and robotic, year after year its the same. If anything, the earlier generators like that horrifying "Will Smith eating spaghetti" generation from back like three years ago looks LESS robotic than any of the recent floaty clips that are generated now.
I'm sure it will get better, whatever, but unlike the goal of LLMs for code/writing where the primary concern is how correct the output is, video won't be accepted as easily without it NOT looking like AI.
I am starting to wonder if thats even possible since these are effectively making composite guesses based on training data and the outputs do ultimately look similar to those "Here is what the average American's face looks like, based on 1000 people's faces super-imposed onto each other" that used to show up on Reddit all the time. Uncanny, soft, and not particularly interesting.
I don't follow the video generation stuff, so the last time I saw AI video it was the initial Sora release, and I just went back to that press release and I still maintain that this does not seem like the type of leap I would have expected.
We see pretty massive upgrades every release between all the major LLM models for code/reasoning, but I was kind of shocked to see that the video output seems stuck in late 2023/early 2024 which was impressive then but a lot less impressive a year out I guess.
I'm always hesitant with rollouts like this. If I go to one of these, there's no indication which Imagen version I'm getting results from. If I get an output that's underwhelming, how do I know whether it's the new model or if the rollout hasn't reached me yet?
However, looking at the UI/UX in Google Docs, it's less transparent.
https://aistudio.google.com/generate-image
But this still says it's Imagen 3.0-002, not Imagen 4.
It is so confusing. Ok, I got gemini pro through workspace or something, but not everything is there? Sure, I can try aistudio, flow, veo, gemini etc to figure out what I can do where, but so bad UX. Just tried using gemini to create an image, definitely not the newest imagegen as the text was just marbled up. But I can't see which version I'm on, genious.
Edit: After clicking through lots of google products I'm still not able to find a single place I can actually try the new imagegen, despite the article claiming it's available today in X,Y,Z
Can’t wait to see what people start making with these
Interesting logic the new era brings: something else creates, and you only "bring your vision to life", but what it means is left for readers questioning, your "vision" here is your text prompt?
Were at a crossroads where the tools are powerful enough to make the process optional.
That raises uncomfortable questions: if you don’t have to create anymore, will people still value the journey? Will vision alone be enough? What's the creative purpose in life? To create, or to to bring creative vision to life? Isn't the act of creation is being subtly redefined?
If you take any high quality AI content and ask their creator what their workflow is, you'll quickly discover that the complexity and nuance required to actually create something high-quality and something that actually "fulfills your vision" is incredibly complex.
Whether you measure quality through social media metrics, reach, or artistic metrics, like novelty or nuance, high quality content and art requires a good amount of skill and effort, regardless of the tool.
Standard reading for context: https://archive.org/details/Bazin_Andre_The_Ontology_of_Phot...
This comes off as so tone deaf seeing your AI artwork is only possible due to the millions of hours spent by real people who created the art used to train these models. Maybe it's easier to understand why people don't respect AI "artists" with this in mind.
I feel a real artist would make their own tools: brushes, paint, canvas, and above all be truly creative by not unfairly using anything that’s gone before. If they did they aren’t creative; they’re a thief.
p.s. def not an artist.
Is craft making your own chisels for woodworking?
Perhaps there are craftsman who buy chisels made by others.
Okay. Then is craft only making furniture with dovetail joints by hand?
Well, I guess people use planers.
So, no it's not just hand made wood working that's craft.
Someone uses a CnC machine with a design they made to cut wood, then hand sands and polishes. Is that craft?
What if you learned it took them three or four times as many hours to learn the CnC machine and design as it did to hand plane a cedar log?
To be clear, I don't identify as an artist at all, but I do have a stake in this conversation -- which is that I'd like more young folks to be positive, pick up tools at their disposal and build good things with them. The future's coming, and it's going to be built out by people with open minds who are soaking up everything they can about whatever tools are available. It's a sort of brain rot to gatekeep technology advances out of creativity.
People have had access to tools for creating for generations. In the modern era you can buy a pencil and a sketch pad for dollars. You can buy an instrument used for as little as a hundred dollars. Hell, schools teach art and music for free.
>The future's coming, and it's going to be built out by people with open minds who are soaking up everything they can about whatever tools are available.
Not all technology presents a net good for society. These technologies only exist on top the mountain of stolen artwork created by millions of artists, and this tech will continue to hamper the livelihoods of artists as long these companies are pushing them.
>It's a sort of brain rot to gatekeep technology advances out of creativity.
JFC. Don't talk to me about brain rot. The "art" and "creativity" you speak about here is just more finely grained consumption. Now instead of scrolling through a feed, you can ask Google to present your dopamine addicted brain exactly what you want to see in that moment.
In contrast, focusing on improving a craft acts as a sort of antidote to "brain rot" because you're engaging in multiple important things at once:
- critical thinking
- delayed gratification
- habit formation
- emotional exploration
- and more
I agree with the idea of “Amistics” (thanks Neal) - a sort of societal and moral lens to view technologies through and evaluate them. Totally with you there too.
I agree that doomscrolling and social media are cancer-y in the extreme, to the extent that for a number of years I printed a daily personal newspaper. Srsly.
> this tech will continue to hamper the livelihoods of artists …
Nope. We’ll just redefine what an artist is. Pop quiz: did Disney employ more “artists” when each cel of a film was hand drawn and colored, or now when these modern “faux-artists, not like the real ones” have access to rendering clusters?
Or a second pop quiz, when da Vinci or Rubens ran workshops where apprentices painted “da Vincis” or “Rubens(s?)” who was the artist?
By the way, it’s right to redefine what an artist is. I’m going to get super controversial, ca 1900 and say that photographers can be artists. Now I’m going to get super controversial ca 1910 and say that someone mounting a bicycle wheel as a ‘readymade’ and displaying it can be an artist. Wait, now I’m going to move ahead the 1980s and say a cow cut in half and suspended in some sort of formaldehyde can be art. Hang on. A poem on a disk that deletes itself as its read is art.
The art is the creative endeavor itself. It’s the outcome of a creative person engaging with whatever tools they want to create some output. If someone wants to engage with an LLM or diffusion model or whatever and have it make something to those standards, it’s art. Calling them ‘not an artist’ based on their choice of tools is just totally incorrect.
I’m not saying all uses of diffusion models or any other AI assisted imagery is art. But I am saying that ingesting and summarizing publicized images is not theft, and people choosing to use those tools to instantiate a creative vision can absolutely be art, and further that generally the cheaper a form of creative expression becomes the better on balance for the world.
Here's the crux of the issue I have with this entire conversation--because you now are able to generate "artwork," you expect the artistic community to respect you as an artist. You're waltzing into the room with none of the same battle scars, experiences, or morals and demanding that they bestow upon you the title of "creator".
>By the way, it’s right to redefine what an artist is
Sure, but let artists be the ones who take charge in redefining what art is. How is it right to redefine what "art" and "creating" is without the goodwill or consent of the artistic community at large? You are effectively trying to force a hostile takeover of the space, to demand everyone consider your generated image/song/video be treated with the same amount of respect as actual art.
If you can't even be bothered to respect the artistic community enough to understand why they feel slighted over the creation of these tools, or to empathize with them over their impact in livelihood due to the proliferation of AI slop, why the hell do you expect them to consider you an artist?
If you look through civitai and the stable diffusion subreddit you’ll find people who’ve spent thousands of hours tuning these AI tools to produce something that they imagined. In my mind, they’re artists. It might be bad art, some of it is, some of it is arguably not, but they fit the description to me. They certainly think of themselves as struggling to create things they envisioned, and sometimes achieving it.
As to who gets to define art and what art is: please understand that I’m saying —>> you are gatekeeping <<— by calling people who spend thousands of hours creating imagery they want to create “not part of the artistic community”.
So, I have a broader view of the artistic community than you, full stop. It includes people whose livelihood is going to be disrupted by this technology. It includes a bunch of people who couldn’t create imagery they imagined before but can now.
Just as I can understand why Luddites burned shit in Northern England, I can understand and even respect a fight from interest groups to turn back the clock on new technology. And I am interested to see how strong guilds like SAG navigate and negotiate new economies around creating.
End of the day - I think moralizing in order to limit human creativity with bullshit made up rules about what an artist “is” or should be is foolish, wrong-headed, and ultimately doomed as an endeavor, plus it runs the risk of convincing new creatives not to engage. It’s a net loss for human creative output, while advocates get to pearl-clutch about the evils of tech. It’s just the wrong, wrong, wrong attitude to have about it; probably a waste of time trying to convince one well-spoken person on HN to change their views. But, hopefully you will. You could still rail against the tech by the way, or advocate for protectionism or a bunch of stuff, even if you decided to accept a person could use a diffusion model to make something creative.
Software Engineers bring their vision to life through the source code they input to produce software, systems, video games, ...
Exactly. Probably the most important quote of modern times is, I think it was a CEO of an ISP that said it: "we don't want to be the dumb pipes" (during a comparison with a water utility company).
Everyone wants to seek rents for recurring revenue someone else actually generates.
Theatre and opera are regarded as high art because they are performed live in front of an audience every time, demanding presence, skill, and immediacy – unlike cinema, which relies on a recorded and edited performance.
What is true is that cheapening the cost of creative production will yield a wider variety of expression: we will see what people prefer to consume.
Right. Imo you have to be imagination handicapped to think that creative vision can be distilled to a prompt, let alone be the medium a creative vision lives in its natural medium. The exact relation between vision, artifact, process and art itself can be philosophically debated endlessly, but, to think artifacts are the only meaningful substrate at which art exists sounds like an dull and hollowed-out existence, like a Plato’s cave level confusion about what is the true meaning vs the representation. Or in a (horrible) analogy for my fellow programmers, confusing pointers to data with the data itself.
Photography increased the abstract and more creative aspects of painting and created a new style because photography removed much of the need to capture realism. Though, I am still entranced by realist painting style myself, it is serving different purpose than capturing a moment.
I think you overestimate the publics art appreciation. The average answer will be a blank stare.
Commercial portrait painters died out pretty fast.
> You're not the monolith of me!
These other universe memes are too good.
Artists aren't going to be replaced by AI tools being used by me on my iPhone, those artists were already replaced by bulk art from IKEA et al. Artists who reject new tools for being new will be replace by artists who don't. Just like many painters were replaced by photographers.
Except they already are.
https://societyofauthors.org/2024/04/11/soa-survey-reveals-a...
Ok, I went from being pleasantly surprised to breakout laughter at that point.
But I also think this points out a big problem: high-quality stuff is flying under the radar simply because of how much stuff is out there. I've noticed that when faced with a lot of choice, rather than exploring it, people fall back into popular stuff that they're familiar with in a really sad way. Like a lot of door dash orders will be for McDonalds, or people will go back to watching popular series like Friends, or how Disney keeps remaking movies that people still go to see.
The format of the shows are mostly clip-based - man on the street, news hour, etc - and obviously the jokes are all written by someone with a good sense of humour.
Not to discount that this is, as you say, an example of someone using AI to successfully create characters and stories that resonate with people. it's just still very much because of a creative human's talent and good taste that it's working.
AI tools used for any content will / are being used to add to the pile of shit.
You can also use it as a communication tool such as making a "live" storyboard to prep location, blocking, maybe even as notes for actors.
Being able to express visual ideas with words is one of the most powerful things of this AI craze. Text/code is whatever.
Id much rather start seeing individuals creating AI movies where you aren't bogged down by the need to hire actors and what bot
The guy in the third video looks like a dressed up Ewan McGregor, anyone else see that?
I guess we can welcome even more quality 5 second clips for Shorts and Instagram
https://www.figure.ai/ does not exist yet, at least not for the masses. Why are Meta and Google just building the next coder and not the next robot?
Its because those problem are at the bottom of the economic ladder. But they have the money for it and it would create so much abundance, it would crash the cost of living and free up human labor to imagine and do things more creatively than whatever Veo 4 can ever do.
In the forecast of the AI-2027 guys, robotics come after they've already created superintelligent AI, largely just because it's easier to create the relevant data for thinking than for moving in physical space.
https://www.youtube.com/watch?v=SPF4MGL7K5I
Obviously we don't know how hand picked that is so it would be interesting to see a comparison from someone with access.
Since Google seems super cagey about what their exact limits actually are, even for paying customers, it's hard to know if that's an error or not. If it's not an error, if it's intentional, I don't understand how that's at all worth $20 a month. I'm literally trying to use your product Google, why won't you let me?
I cant be the only one wondering where the swedish beach volleyball channel is though.
A bit depressing.
1. People like to be entertained.
2. NeuralViz demonstrates AI videos (with a lot of human massaging) can be entertaining
To me the fundamental question is- "will AI make videos that are entertaining without human massaging?"
This is similar to the idea of "will AI make apps that are useful without human massaging"
Or "will AI create ideas that are influential without human massaging"
By "no human massaging", I mean completely autonomous. The only prompt being "Create".
I am unaware of any idea, app or video to date that has been influential, useful or entertaining without human massaging.
That doesn't mean it can't happen. It's fundamentally a technical question.
Right now AI is trained on human collected data. So, technically, It's hard for me to imagine it can diverge significantly from what's already been done.
I'm willing to be proven wrong.
The Christian in me tells me that Humans are able to diverge significantly from what's already been done because each of us are imbibed with a divine spirit that AI does not have.
But maybe AI could have some other property that allows it to diverge from its training data.
Technology is inevitable and it's a tool, advancing technology will always leave people who specialize and are unable to adapt in a bad position, but this won't stop technology from advancing.
I think one could argue this is one of the reasons many people would like their community/government to provide social safety nets for them. It would make specializing less risky in a time when technology advances at a fast pace.
In reality Luddites did not oppose technology per-se, but the dramatic worsening of the working conditions in the factories, reduced wages and concentration of the income to the capital holders. These are the same problems that should be addressed contemporarily.
They initially tried to address these by political means. But with that failing they moved to sabotage and violence.
https://www.smithsonianmag.com/innovation/when-robots-take-j...
The obvious aim of these foundational image/movie generation AI developments is for these to become the primary source of values at cost and quality unparalleled by preexisting human experts, while allowing but not necessitating further modifications by now heavily commoditized and devalued ex-professional editors at downstream to allow for their slow deprecation.
But the opposite seem to be happening: better data are still human generated, generators are increasingly human curated, and are used increasingly closer to the tail end of the pipeline instead of head. Which isn't so threatening nor interesting to me, but I do wonder if that's a safe, let alone expected, outcome for those pushing these developments.
Aren't you welding a nozzle onto open can of worms?
There is an ever growing percentage of new AI-generated videos among every set of daily uploads.
How long until more than half of uploads in a day are AI-generated?
Renewable energy is easily able to provide enough energy sustainable. Batteries can be recycled. Solar panels are glas/plastic and silicium.
Nuclear is feasable, fusion will happen in 50 years one way or the other.
Existens is what it is. If it means being able to watch cat videos, so be it. We are not watching them for nothing, we watch them for happiness.
Well that's just your opinion.
Yes we can generate electricity, but it would be nice if used it wisely.
Nonetheless, survival can't be the life goal after all the moon will drift away from earth in the future, the sun will explode and if we survive that as a species, all bonds between elements will disolve.
It also can't be about giving your dna away because your dna has very little to no impact over just a handful of generations.
And no the goal of our society has to be to have as much energy available as possible to us. So much energy, that energy doesn't matter. There is enough ways of generating energy without a real issue at all. Fusion, renewable energy directly from the sun.
There is also no inherant issue right now preventing us all having clean stable energy besides capitalsm. We have the technology, we have the resources, we have the manufacturing capacity.
To finish my comment: Its not about energy, its about entropy. You need energy to create entropy. We don't even consume the energy of the sun, we use it for entropy and dissipate it back to space after.
Perhaps the difference here is the behaviour would be much more human and thus harder to detect using current fraud detection?
Plus some users might want to legitimately upload things with AI-generated content in it
Why not? Given enough data, it's possible to train models to differentiate - especially since humans can pick up on the difference pretty well.
> Plus some users might want to legitimately upload things with AI-generated content in it
Excluding videos from training datasets doesn't mean excluding them from Youtube.
Ah then sure. It was this part that was problematic.
If users are still allowed to upload flagged content, then false positives almost don't matter, so Youtube could just roll out some imperfect solution and it would be fine
Today, we’re launching SynthID Detector, a verification portal to help people identify AI-generated content. Upload a piece of content and the SynthID Detector will identify if either the entire file or just a part of it has SynthID in it.
With all our generative AI models, we aim to unleash human creativity and enable artists and creators to bring their ideas to life faster and more easily than ever before."
From the page linked in the post....
So there's different ways to detect AI generated content (videos/images atleast). (https://www.nature.com/articles/s41586-024-08025-4 <-- paper on synthID / watermarking and detecting it with LLMs)
What they do care about is their training set getting tainted, so I imagine they will push quite hard to have some mechanism to detect AI; it’s useful to them even if users don’t act on it.
The remaining 10% is the solution to generating good hands, of course. And do you think YouTube has been helping anyone achieve that?
I mean, you could limit yourself to the most popular or most interesting 100 million, but that's still an enormous amount of data to download.
It wouldn't be hard for google to poison competitor training just by throttling bandwidth.
google will have access to all of these. competitors will have to do tons of network interactions with google to pull in only the first set. (which google could detect and block depending on how these competitors go about it)
This isn't certain. Google do not break out Youtube revenues nor costs. Hosting this amount of videos, globally, redundantly, the vast majority of which are basically never watched, cannot be cheap.
It's entirely plausible that Google's wider benefit from Youtube (such as training video generation algorithms and better behaviour tracking for better targeted ads across the internet) are enough to compensate for Youtube in particular losing money.
Google does break out Youtube revenue.
Latest 10-K: https://abc.xyz/assets/77/51/9841ad5c4fbe85b4440c47a4df8d/go...
See page 10, for youtube Ads revenue.
Of course the alternative is to use game engines, but it's possible that AI would generate more realistic video stream for the same money spent. Those recent AI-generated videos certainly look much more realistic than any game footage I ever saw.
If we look at the Veo 3 examples, this is not the typical youtube video, but instead they seem to recreate cgi movies, or actual movies.
Ideogram and gpt4o passes only a few, but not all of them.
Why is it that all these AI concept videos are completely crazy?
Like if you asked a model to help you create a coffeeshop website for a demo, it started looking more like sex shop, you just vibe with it and say that's what you wanted in the first place. I've noticed that the success rate of using AI is proportional to much you can gaslight yourself.
However, I also think this is to show that it can create anything, not just copies of stuff it has seen. If you ask for a painting of a woman and it shows you mona lisa, that's not very impressive.
I’ve noticed ads with AI voices already, but having it lip synced with someone talking in a video really sells it more
I like how Veo supports camera moves, though I wonder if it clearly recognizes the difference between 'in-camera motion' and 'camera motion' and also things like 'global motion' (e.g. the motion of rain, snow etc).
Obligatory link to Every Frame a Painting, where he talks about motion in Kurosawa: https://www.youtube.com/watch?v=doaQC-S8de8
The abiding issue is that artists (animators, filmmakers etc) have not done an effective job at formalising these attributes or even naming them consistently. Every Frame a Painting does a good job but even he has a tendency to hand wave these attributes.
Ironically, this would be a good application of AI, where the AI listens in on their calls, and will flag conversation that warrants the keyword being said.
The pace is so crazy that was an over estimation! I'll probably get done in 2. Wild times.
0: https://www.linkedin.com/feed/update/urn:li:activity:7317975...
Feels like there's going to be a dichotomy where the individual visuals look pretty good taken by themselves but the story told by those shots will still be mushy AI slop for a while. I've seen this kind of mushy consistency hold up over the generations so far, it seems very difficult to remove becasue it relies on more context than just previous images and text descriptions to manage.
[0] https://www.reddit.com/r/ChatGPT/comments/1kru6jb/this_video...
Created by Ari Kuschnir
Someone will use AI to make the "AI Killed the Video Star" video. Probably the same guy that made this[1] and other masterpieces.
I think the change here will be something we've seen with the other modalities. Text was interestingly syntactically correct but nonsense sentences. Then paragraphs but the end of the article would go off the rails. Then the article. Now it's that the creativity of the children's story in question.
Pictures were awful fever dreams filled with eyes but you could kind of see a dog. Then you could see what it was, then decent
Videos were fun that they kind of worked, then surprising it took a few seconds for the panda to turn into spaghetti, then it kept the general style for a decent time.
I see this moving towards the creativity being the major thing, or it having a few general styles (softly lit background for example).
This has mostly all shifted in a very short space of time and as someone who put RBMs on GPUs possibly for the first time (I'm gonna claim it) this is absolutely wild.
Had I seen some of this, say, 6 months ago I'd not have guessed at all bits weren't real.
It wasn't until I was able to get my jaw off the ground that I told her it was AI. No, not AI like special effects, completely AI.
But they do not allow any people in the image even cartoon depictions of humans. This knee caps a lot of potential usage.
This reminds me of Pixar's video of an animated lamp 40 years ago. I remember that within 5 years Toy Story came out and changed everything on how animated films were made. Looks to me like we are on our way to doing the same thing with realistic movies.
[0] https://www.reddit.com/r/ChatGPT/comments/1kru6jb/this_video...
My main issue when trying out Veo 2 was that it felt very static. A couple elements or details were animated, but it felt unnatural that most elements remained static. The Veo 3 demos lack any examples where various elements are animated into doing different things in the same shot, which suggests that it's not possible. Some of the example videos that I've seen are neat, but a tech demo isn't a product.
It would be really cool if Google contracted a bunch of artists / directors to spend like a week trying to make a couple videos or short movies to really showcase the product's functionality. I imagine that they don't do that because it would make the seams and limitations of their models a bit too apparent.
Finally, I have to complaint that Flow claims to not be available in Puerto Rico: "Flow is not available in your country yet." Despite being a US territory and being US citizens.
Also Google is going to have to tread carefully, people in the entertainment industry are already AI hostile, and they dictate a surprising amount of public opinion.
Im pretty sure kid/child ai porn already exist somewhere. But i'm quite lucky despite knowing rotten.com and plenty of other sides, never having seen real so i doubt i will see fake child porn.
Whats the elephant in the room now? Nothing changed. Whoever consumes real will consume fake too. FBI/CIA will still try to destroy cp rings.
We could even think it might make this situation somehow better because they might consume purely virtual cp?
https://arstechnica.com/tech-policy/2025/02/25-arrested-so-f...
Your family will be target for example, just imagine your daughter in high-school getting bullied by these type of generated AI videos. it's easy to say nothing happen, but when it happen to you you will be aware how fucked is these AI videos.
At least with AI Video you can now always say its AI video.
Is it shitty that this is possible? yes of course. But hidding knowledge never works.
We have to deal with it as adults. We need to educate about it and we need to talk about it.
We should all be hoping AI-generated CSAM floods the CSAM market, instead of trying to restrict AI so that we artificially prop the market up and cause harm to many more humans.
My last recollection is recent case said AI generated didn’t have copyright?
Think of all of your favorite novels that are deemed "impossible" to adapt to the screen.
Or think of all the brilliant ideas for films that are destined to die in the minds of people who will never, ever have the luck or connections required to make it to Hollywood.
When this stuff truly matures and gets commoditized I think we are going to see an explosion of some of the most mind blowing art.
I can see it using some form of PEFT so that the output becomes consistent with both the setting and the characters and then it is about generating over and over each short segment until you are happy with the outcome. Then you stitch them together and if you don't like some part you can always try to re-generate them, change the prompt, ...
in the owl/badger video, the owl should fly silently.
This is an interesting non-trivial problem of generalization and world-knowledge etc., but also?
There's something somewhat sad about that slipping through; it makes me think, *no one involve in the production of this video, its selection, it passing review... etc., seemed to realize that it is one of the characteristic things about owls that you don't hear their wings.
We have owls on our hill right now and see them almost every day and regularly seem them fly. It's magic, especially in an urban environment.
https://www.youtube.com/watch?v=-WigEGNnuTE
Longer version:
IncreasePosts•8mo ago