System card: https://openai.com/index/sora-2-system-card/
System card: https://openai.com/index/sora-2-system-card/
These companies and their shareholders really are complete scum in my eyes, just like AI in miltech.
Not because the tech isn't super interesting but because they steal years of hard work and pain from actual artists with zero compensation - and then they brag about it in the most horrible way possible, with zero empathy.
Then comes losing the little humanity left the mainstream culture, exactly as Miyzaki said, leading to a dead cold and even more unjust society.
Communism is tossing the frog into boiling water (tens millions of dead), capitalism is boiling it slowly (poor people in first world countries might not afford a dentist but they're not starving yet).
We need a system that rewards work - human time and competence.
There are really only 2 resources in the world - natural resources and human time. Everything else is built on top of those. And the people providing their time should be rewarded, not those who are in positions of power which allow them to extract value while not providing anything in return.
Does anybody here really think rich people deserve to just get richer faster than any working person can? Does anybody really believe that buying up homes and companies and raking in money for doing absolutely nothing is what we should be rewarding?
Then put your name behind it.
Where do you live? Their value has been steadily appreciating in a lot of places in the west due to high demand.
That's because the value of the land under the houses is so high; the house itself is nothing special. But even then, it's mostly because of Prop 13, and it only works out if you live in the house yourself. There's still noone cornering the market in California houses. Almost all landlords only own 1-2 properties.
It's risky to own a lot of buildings, and worse the risks are correlated if they're all in the same place (there could be a flood or wildfire etc.)
Commercial real estate is different because your tenants are (more) professional.
Doesn't mean the next 10 years will see that growth but if you believe your country/area's population will grow it is probably a good investment for now in the western world.
Like I said, you can tell it doesn't work because these businesses don't exist. There are essentially no landlords who own multiple single family homes. They do exist for multifamily and commercial.
Insurance is not free money its a pool of money gathered from monthly payments together to offset risk. You don't need to distribute the risk over all of your homes you buy a policy for each home.
This stuff is 101. And works all around the world. There is even an app called airbnb that will find short term rentals for your house.
They certainly don't constitute a depreciating asset!
This is not just about copyright infringement or plagiarism.
Automatically generating text, images and videos based on training data and a tiny prompt is fundamentally about taking someone's work and making money off of it without giving anything in return.
No matter what text you put in the prompt you'll get /something/. Just because you put "studio ghibli anime" in the prompt doesn't mean you're going to actually get that out of it. It'll just be kind of yellow and blobby.
(Also, the style isn't from "people" but a specific guy named Yoshifumi Kondo who isn't around anymore.)
Though… I'm always surprised how respectful Westerners are about Miyazaki. Meanwhile you read other Japanese directors and they're saying all kinds of things about him.
Kids are happy that homework takes less time. Teachers are happy that grading the generated homework takes less time. Programmers are happy they can write the same amount of code in less time. Graphic designers are happy they can get an SVG from a vague description immediately. Writers are happy they can generate filler from a few bullet points quickly.
But then someone comes along, notices people are not working most of the time, fires three quarters of them and demands 4x increased output from the rest. And they can do it because the "AI" is helping them.
Except they don't get paid any more. The company makes the same amount of money for less cost.
So where does the difference go? To the already rich who own the company and the product.
https://commons.wikimedia.org/wiki/File:This_Is_Fine_(meme)....
Here is a direct example of a derived work, to the point where the prompt is "n orange-brown anthropomorphic dog sitting in a chair at a table in a room that is engulfed in flames, happy dog sitting on chair at a table viewed from the side, dog with a hat, room is burning with fire all across the room".
That's covered by Fair Use, I suppose they will argue this if they get sued. Interestingly, commons doesn't allow Fair Use, but the according to commons, "this is not a derived work".
https://commons.wikimedia.org/wiki/Commons:Deletion_requests...
You tell me if that was a derivative image or not. I argued it was, and the argument was completely ignored.
Inb4 make your own video model and see how easy it is
Sora 2 itself as a video model doesn't seem better than Veo 3/Kling 2.5/Wan 2.2, and the primary touted feature of having a consistent character can be sufficiently emulated in those models with an input image.
Its no use building technology when its not married with the humanities and liberal arts.
I assume if you ask normal people how AI affects their lifes they'd think about annoying callcenter menus, deep fake porn and propaganda videos, and getting homework done. Not sure if any of this is a positive experience for the mind.
It's 2025 and most speech controls for car navigation don't work, Siri is a pile of sh*t and millionaires are trying to convince us that we should either use their AI or a google which has significantly reduced the quality of their search result pages.
It's like a false choice dilemma which allows back-to-the-roots companies such as Kagi to emerge, and I'm happy about it.
Completely agree. The way I think about life is - how will people look back 50 years from now, and make remarks about what is happening?
I don't expect Sora2 to be SOTA. The Chinese models are further ahead in video/image gen
I predict that this will move some people over, and IG/TT will lose marketshare.
You need to be in the US/Canada and wait for this notification, and when you get an invite you can start using it in the app and on sora.com. And apparently you get 4 more invite codes that you can share with anyone, e.g. Android users:
> Android users will be able to access Sora 2 via http://sora.com once you have an invite code from someone who already has access
But if you do, that signals to the company this is all perfectly okay.
However, I still don't see how OpenAI beats Google in video generation. As this was likely a data innovation, Google can replicate and improve this with their ownership of YouTube. I'd be surprised if they didn't already have something like this internally.
This is something I would not like to see, I prefer product videos to be real, I am taking a risk with my money. If the product has hallucinated or unrealistic depiction it would be a kind of fraud.
- Scamming people at scale
- Nonconsensual pornography
- Juicing engagement metrics for fading social media sites
- The ongoing destruction of truth as a concept in our increasingly atomized and divided world
Obviously this will get used for a lot of evil or bad as well
Whether said fun is "worth" the social and economic costs is a separate issue.
What are the benefits of what you do? Does anyone know?
Regardless of the slop, some people will learn to use it well. You have stuff like NeuralViz - quite the sight! - and other creators will follow suit, and figure out how to use the new tools to produce content that's worth engaging with. Bigfoot vlogs and dinosaur chase scenes, all that stuff is mostly just fun.
People like to play. Let them play. This stuff looks fun, and beats Sora 1 by a long shot.
Hopefully it catalyzes
Trust in media? Soaring! Why believe your eyes or ears when you can doubt everything equally?
Journalism? Thriving! Reporters now get to spend their days playing forensic video detective instead of, you know, reporting news.
Social harmony? Better than ever! Nothing brings people together like shared paranoia and the collective shrug of “I guess truth is dead now.”
Honestly, what could possibly go wrong?
https://www.tiktok.com/@dreamrelicc
Before AI, each video on this channel would have taken a large team with a Hollywood budget to create. In a few more years, one person may be able to turn their creative vision into a full-length movie.
(half sarcastic, but you could make the argument that most art has no benefit besides to the person that made the art)
Things are cool because they are unique, very hard to create, and require creativity. When those things become cheap commodities, they are no longer cool.
Making better tools is better for everyone: the median usage of those tools downstream is a separate issue.
AI pictures today are much less impressive than Dall-E 2 pictures were a few years ago, despite the fact that the models are much better nowadays. Currently AI videos can still be impressive, but this will quickly become a thing of the past.
Then people will move from trying to create art to creating "content". That is, non-artistic slop. Advertisements. Porn. Meme jokes. Click bait. Rage bait. Propaganda. Etc.
Then you have Chimamanda Adichie, who has sold millions of copies and won several awards, including the BBC National Short Story Award, widely described as "one of the most prestigious awards for a single short story"
Then another Nigerian writer, Wole Soyinka, won the Nobel fucking Prize in Literature in 1986. Or is that measure not good enough for you, your highness ?
Not only do you come across as racist, you clearly have no idea what you're talking about. Congratulations.
You were thoroughly proven wrong so now your new standard for literary greatness is "writers that average people know" ? (which is really just code for 'writers I know', because millions do know those writers, I wasn't sharing some secret). I guess that means we can throw out Faulkner, Joyce, and Woolf in favor of whoever's currently at the top of the airport bookstore list.
It's not "defamatory" to point out that your argument, which began with a dismissive generalization about an entire country, was based on profound ignorance (the kind that wouldn't have taken anything more than a basic google search to remedy). You were corrected with facts. Instead of going, 'I stand corrected, sorry', you're doubling down. It just makes you look worse, and stupid.
This is the most basic racist playbook happening in real time, and you're the star. If you genuinely think you aren't then you need to take a long, good look at yourself.
People need to be exposed to what is real. Not more artificial stuff.
I think this is the point at which humanity will finally puke and reject this crap.
Just because a small segment of people like it doesnt mean the mass majority will.
[0] https://andymasley.substack.com/p/a-ton-of-ai-images-ive-mad...
Yes, but at the same time the value of video production will quickly drop to 0. Or to whatever it costs to generate that video in terms of tokens.
edit: as per usual it's not yet...
It's not that I disagree with the criticism; it's rather that when you live on the moving edge it's easy to lose track of the fact that things like this are miraculous and I know not a single person who thought we would get results "even" like this, this quickly.
This is a forum frequented by people making a living on the edge—get it. But still, remember to enjoy a little that you are living in a time of miracles. I hope we have leave to enjoy that.
Now we have photorealistic video with sound and, oh yeah, the model can generate an entire script and mini-plot on its own based on the most basic prompt.
I think this is not nearly as important as most people think it is.
In hollywood movies, everyone already knows about "continuity errors" - like when the water level of a glass goes up over time due to shots being spliced together. Sometimes shots with continuity errors are explicitly chosen by the editor because it had the most emotional resonance for the scene.
These types of things rarely affect our human subjective enjoyment of a video.
In terms of physics errors - current human CGI has physics errors. People just accept it and move on.
We know that superman can't lift an airplane because all of that weight on a single point of the fuselage doesn't hold, but like whatever.
This release is clearly capable of generating mind-blowingly realistic short clips, but I don't see any evidence that longer, multi-shot videos can be automated yet. With a professional's time and existing editing techniques, however...
There are lots of tools being built to address this, but they're still immature.
https://x.com/get_artcraft/status/1972723816087392450 (This is something we built and are open sourcing - still has a ways to go.)
ComfyUI has a lot of tools for this, they're just hard to use for most people.
But clearly we also see some major downsides. We already have an epidemic of social media rotting people's minds, and everything about this capability is set to supercharge these trends. OpenAI addresses some of these concerns, but there's absolutely no reason to think that OpenAI will do anything other than what they perceive as whatever makes them the most money.
An analogy would be a company coming up with a way to synthesize and distribute infinite high-fructose corn syrup. There are positive aspects to cheaply making sweet tasting food, but we can also expect some very adverse effects on nutritional health. Sora looks like the equivalent for the mind.
There's an optimistic take on this fantastic new technology making the world a better place for all of us in the long run, after society and culture have adapted to it. It's going to be a bumpy ride before we get there.
Unless there some fundamental, technical way to distinguish the two, I wonder who would win?
The very fact that I (or billions of others) waste time on shorts is an issue. I don't even play games anymore, it's just shorts. That is a concerning rewiring of the brain :/
Guess what I`m trying to say is that, there is a market out there. It's not pretty, but there certainly is.
Will keep trying to not watch these damn shorts...
Maybe this will result in something similar, but it can affect more people who aren’t as wary.
https://www.tiktok.com/discover/ai-homeless-people-in-my-hou...
Of course, the ones focusing on the content can always editorialize the spam out. And in real social networks you ask your friends to stop making that much slop. But this can be finally the end of Facebook-like stuff.
Please enlighten me. What are they? If my elderly grandma is on her deathbed and I have no way to get to see her before she passes, will she get more warmth and fond memories of me with a clip of my figure riding an AI generated dragon saying goodbye, or a handwritten letter?
My original question was asking for examples of this. Try to keep up, c'mon man
Are there? “A lot” of them? Please name a few that will be more beneficial than the very obvious detrimental uses like “making up life-destroying lies about your political opponents or groups of people you want to vilify” or “getting away with wrongdoing by convincing the judge a real video of yourself is a deepfake”.
That last one has already ben tried, by the way.
https://www.theguardian.com/technology/2023/apr/27/elon-musk...
@qoez
> The first entirely AI generated film (with Sora or other AI video tools) to win an Oscar will be less than 5 years away.
https://news.ycombinator.com/item?id=42368951
This prediction of mine was only 10 months ago.
Imagine when we and if we get to 5 years.
Like, it should be preferable to keep all the slop in the same trough. But it's like they can't come up with even one legitimate use case, and so the best product they can build around the technology is to try to create an addictive loop of consuming nothing but auto-generated "empty-calories" content.
I feel like this is the ultimate extension of "it feels like my feed is just the artificial version of what's happening my friends and doesn't really tell me anything about how they're actually faring."
I have to imagine there will be a rebellion against all of this at some point, when people simply can’t take the false realities anymore. What is the alternative? Ready Player One? The Matrix? Wall-E?
I am bullish on this, albeit with major concerns in many domains. It was fun and addictive as hell with images. With video it will be wild.
The technology itself is super impressive, but a social media app of AI slop doesn't feel like the best use of it. I'm old enough to not really be interested in social media in general anymore, so maybe I'm just out of touch, but I just can't see this catching on. It feels like the type of thing that people will download, use a few times until the novelty wears off and then never open again.
I bet the real goal is to make money from long tail of corporate market ( ads, info videos etc).
Pretty much the same problem we all work on every day in $DAY_JOB.
The worst part is we are already seeing bad actors saying 'I didn't say that' or 'I didn't do that, it was a deep fake'. Now you will be able to say anything in real life and use AI for plausible deniability.
I doubt it will be for the better. The ubiquity of AI deepfakes just reenforces entrenchment around "If the message reinforces my preconceived notion, I believe it and think anyone who calls it fake is stupid/my enemy/pushing an agenda. If the message contradicts my preconceived notion, it's obviously fake and anyone who believes it is stupid/my enemy/pushing an agenda.". People don't even take the time to think "is this even plausible", much less do the intellectual work to verify.
Today's Sora can produce something that resembles reality from a distance, but if you look closely, especially if there's another perspective or the scene is atypical, the flaws are obvious.
Perhaps tomorrow's Sora will overcome the the "final 10%" and maintain undetectable consistency of objects in 2 perspectives. But that would require a spatial awareness and consistency that models still have a lot of trouble with.
It's possible to produce some video or image that looks real, cherry-picked for a demo, but not possible to produce any arbitrary one you want that will end up passable.
I'm optimistic here.
Look at 1900s tech like social security number/card, and paper birth certificates. Our world is changing and new systems of verification will be needed.
I see this as either terribly dystopian - or - a possibility for the mass expansion of cryptography and encrypted/signed communication. Ideally in privacy preserving ways because nothing else will make as much sense when it comes to the verification that countries will need to give each other even if they want backdoor registry BS for the common man.
Breaking changes get fixes.
However, personalization (teleporting yourself into a video scene) is boring to me. At its core, it doesn't generate new experience to me. My experience is not defined by photos / videos I took on a trip.
however as they hint at a little in the announcement, if video generation becomes good enough at simulating physics and environments realistically, that's very interesting for robotics.
If you never expected Altman to be the figurehead of principled philosophy, none of this should surprise you. Of course the startup alumni guy is going to project maligned expectations in the hopes of being a multi-trillion dollar company. The shareholders love that shit, Altman is applying the same lessons he learned at Worldcoin to a more successful business.
There was never any question why Altman was removed, in my mind. OpenAI outgrew it's need for grifters, but the grifter hadn't yet outgrown his need for OpenAI.
I understand the cynicism but this is in fact not the job of a businessman. We shouldn't perpetuate the pathological meme that it is.
There is no legal or moral imperative to make antisocial, unethical, or short term decisions that "maximize shareholder value."
This is something that morally weak people tell themselves (and others) to justify the depravity they're willing to sink to in order to satiate their greed.
The concept doesn't even make sense: different shareholders have different priorities and time horizons. A businessperson has no way to know what it objectively means to maximize their returns. They must make a subjective determination, and they have extremely broad latitude to do that.
Increasing shareholder value can be done in the broadest sense by just increasing business
If I fund my own business, I can control growth and _choose_ ethics over profits, in the hope that stunting growth is acceptable if my customers value ethics too, and that whomever I someday pass my company to shares these values
If I take capital investment, I now have a contractual agreement to provide returns on that investment. Yes failure to adhere can result in lawsuits or legal penalties. Or I can be fired/voted out for failing to bring high enough returns. I now _cannot_ choose ethics over profits, due to the conflict of interest of self-preservation
So you are correct - there is no legal or moral contract to behave unethically, but there is instead a strong systemic and self-preserving incentive to do so
I think we almost agree here, but you make it sound as if the exec can simply stand up and do the right thing here. I argue the exec will simply be pushed aside for another
This is what people refer to when they talk about the binds that hold modern day mega-corps
If you yourself are an exec, I personally think you can understand these truths and work with them as best you can, and still be a good human being of course, but that there are lines that should not be crossed to keep a job
It is a collective issue we need to solve that of course starts with each individual seeing the true situation with kindness and compassion
They don’t need to be excused by “well that’s their obligation.” It’s not! Actually, a person’s obligation is to act morally even when there are incentives otherwise, which is approximately all the time for nearly every person.
This is something children learn (lest they be excluded from their society) yet Very Smart People in the upper echelons of the business world conveniently forget.
> If I take capital investment, I now have a contractual agreement to provide returns on that investment. Yes failure to adhere can result in lawsuits or legal penalties.
This is not true. If you've signed a contract that says anything like this, consider getting a real lawyer.
Sounds about as plausible as "ironically taking heroin".
> Nobody's going to get their news from Sora because it's literally 100% fake.
I'm with Neal Stephenson ("Fall", in this case) on this prediction, although I really hope I'm wrong.
In the early years everyone told me that TikTok is actually fun and whimsical (like just after it stopped being musical.ly), and it's all about fun collaboration, and amateur comedy sketches, fun dances and lipsyncs, and people posting fun reactions to each other etc, all lighthearted and that social media is finally fun again!
I have seen what people generate with AI, and I do not have good news for you.
I agree. At best, short videos can be entertainment that destroys your attention span. Anything more is impossible. Even if there were no bad actors producing the content, you can't condense valuable information into this format.
I watch videos for two reasons. To see real things, or to consume interesting stories. These videos are not real, and the storytelling is still very limited.
So, for the same reason you'd go to a local art gallery
One recent disillusionment for me was that lots of police body cam content is fake, as in basically amateur actors trying to enact a realistic police stop, they even put the usual bodycam numbers and letters and axos logo in the corner etc.
And so many other videos of things happening in the street are more or less obviously fake and staged. Still 90% probably don't notice.
The worst part is we are already seeing bad actors saying 'I didn't say that' or 'I didn't do that, it was a deep fake'. Now you will be able to say anything in real life and use AI for plausible deniability.
I predict a re-resurgence in life performances. Live music and live theater. People are going to get tired of video content when everything is fake.
The recent Google Veo 3 paper "Video models are zero-shot learners and reasoners" made a fascinating argument for video generation models as multi-purpose computer vision tools in the same way that LLMs are multi-purpose NLP tools. https://video-zero-shot.github.io/
It includes a bunch of interesting prompting examples in the appendix, it would be interesting to see how those work against Sora 2.
I wrote some notes on that paper here: https://simonwillison.net/2025/Sep/27/video-models-are-zero-...
Going back to sleep. Wake me up when it's available to me.
Edit: looks like this post was actually first, so maybe we'll reverse the merge
1/ 0m23s: The moon polo players begin with the red coat rider putting on a pair of gloves, but they are not wearing gloves in the left-vs-right charge-down.
2/ 1m05s: The dragon flies up the coast with the cliffs on one side, but then the close-up has the direction of flight reversed. Also, the person speaking seemingly has their back to the direction of flight. (And a stripy instead of plain shirt and a harness that wasn’t visible before.)
3/ 1m45s: The ducks aren't taking the right hand corner into the straightaway. They are heading into the wall.
I do wonder what the workflow will be for fixing any more challenging continuity errors.
It’s ok for this to be a fun toy. (And fun toy while also being an astonishing piece of engineering.) But if it wants to push beyond fun toy then it would be interesting to see how that process works.
Will Sora2 help me sketch out a movie for me, doing 10% of the work where I have to reshoot the other 90% for real, or will it get me 90% there leaving me only 10% left to do “by hand”?
(This is the exact same question, I believe, which is being asked of the maintenance burden imposed by vibe coded products. They get you 90% then fail spectacularly leaving you having to do the bulk of the work again? Or they get you 90% of the way and you int have to fill in the gaps to reach a stable long term product?)
I will believe it when I see because Sora 1 is probably the most disappointing technology given what I thought it was going to be that I can even think of. I waited forever for it and then barely used it because it sucks.
Particularly bad was the snowmobile sequence. It was literally a different snowmobile in every cut.
The racing pool duck scene was a different pool in every shot.
About the only consistent thing was the faces that were spliced into the scenes.
I do not really see anything super significant in the demo. It looks like this suffers from all the same problems of AI generated video. They just hid it by avoiding more then 5 seconds in the same setting.
State of the things with doom scrolling was already bad, add to it layoffs and replacing people with AI (just admit it, interns are struggling competing with Claude Code, Cursor and Codex)
What's coming next? Bunch of people, with lots of free time watching non-sense AI generated content?
I am genuinely curious, because I was and still excited about AI, until I saw how doom scrolling is getting worse
Wasn't this always the outcome of the post labor economy?
For this discussion lets just say that AI+Robots could replace most human labor and thinking. What do people do? Entertainment is going to be the number one time consumer.
They are not. This is false, zirp ended, this is the problem. Not LLMs.
Interns at big tech maybe impacted less, because their systems are so complex, but when I look at job boards or talk with engineers I see they're mentioning interns less, AI assisted coding more.
Bar for the interns is higher now, why do I need 3 interns to polish the product if I can complete 70% of the job with AI and hire 1 intern to fix other parts
However, I also think ai coding is hyped way beyond its capability.
i will not comment any further
The bootcamp (actually, evening classes in coding run in cooperation with the public sector) regularly placed graduates with employers.
They’ve seen a big hit in this since AI, and companies have explicitly cited the fact that AI can complete the same tasks that these junior devs used to perform.
click
takes me to the iPhone app store...
Impressive that THAT was one of the issues to find, given where we were at the start of the year.
I imagine it won’t necessarily be used in long scenes with subtle body language, etc involved. But maybe it’ll be used in other types of scenes?
Like you have an exterior shot of a cabin, the surrounding environment, etc — all generated. Then you jump inside which can be shot on a traditional set in a studio.
Getting that establishing shot in real life might cost $30K to find a location, get the crew there, etc. Huge boon to indie films on a budget, but being able to endlessly tweak the shot is valuable even for productions that could afford to do it IRL.
Kutcher mentions the establishing shots, and I'd forgotten also points out the utility for relatively short stunt sequences.
> Why would you go out and shoot an establishing shot of a house in a television show when you could just create the establishing shot for $100? To go out and shoot it would cost you thousands of dollars.
> Action scenes of me jumping off of this building, you don’t have to have a stunt person go do it, you could just go do it [with AI].
Jason Blum is also getting really into the tech.
I expect the "cameo" feature is an attempt at capturing that viral magic a second time.
I kid.
Art should require effort. And by that I mean effort on the part of the artist. Not environmental damage. I am SO tired of non tech friends SWOONING me with some song they made in 0.3 seconds. I tell them, sarcastically, that I am indeed very impressed with their endeavors.
I know many people will disagree with me here, but I would be heart broken if it turned out someone like Nick Cave was AI generated.
And of course this goes into a philosophical debate. What does it matter if it was generated by AI?
And that's where we are heading. But for me I feel effort is required, where we are going means close to 0 effort required. Someone here said that just raises the bar for good movies. I say that mostly means we will get 1 billion movies. Most are "free" to produce and displaces the 0.0001% human made/good stuff. I dunno. Whoever had the PR machine on point got the blockbuster. Not weird, since the studio tried 300 000 000 of them at the same time.
Who the fuck wants that?
I feel like that ship in Wall-E. Let's invest in slurpies.
Anyway; AI is here and all of that, we are all embracing it. Will be interesting to see how all this ends once the fallout lands.
Sorry for a comment that feels all over the place; on the tram :)
A prompt delivered by Amazon drones would obviously not be the same lovely moment.
So yes, I agree.
The music industry already went through this with AutoTune and we know how that turned out.
they use it, everyone uses it, it got better to the point where most people dont know its used, ever heard of melodyne? well AI made it even better.
And then there has been about 20 years of people using it even as their style of music, notably in hip hop, reggaeton, urbano, country, etc.
Boomers like to think it was just an annoying fad in 2008-2011 or something, but it never went away, now everyone uses it, whether obvious or not
It's just a way to get different kind of sound. It won't make you good tracks.
An example here: https://v.redd.it/fqlqrgumo5rf1
I find this one interesting because Rap has classically been difficult for these models (I think because it's technically difficult to find the right rhythms and flow for a given set of lyrics).
it's just AI slop, like the median
like if you just put a bunch of words together and shipped that. Quantity was never what people wanted imo.
It is impressive if the instrumental track was made with just some prompts though
I've been listening to this across a variety of genres though, maybe these lyrics and vocals are more to your taste:
(similar to Opeth) https://suno.com/song/9ab8da05-c3f2-412d-80b4-c7d0b3ae840f?s...
(indie rock) https://suno.com/song/756dd139-4cba-4e40-b29c-03ace1c69673
like, LLMs are fantastic at generating patterns, so words that match and same with images etc.
But there's not much uniqueness? it's "impressive" like a savantic kind of ability to come up with rap, but it doesn't really product something I'd want to listen to..?
I listened to the metal thing and kind of the same thing?
It's very high fidelity, like the quality of the drums and etc it's quite impressive, but the vocals seem off? it's like a poem being read by TTS then transformed into "metal voice"
and kind of just an averaging of "metal music" kind of like stock photos and into a track, very formulaic
not to mention many metal bands etc they do formulaic stuff especially if they have an identifying kind of hit
But to me this is cool tech, but I wouldn't listen to it
I've listened music for a long time but I don't listen to a wide variety today, however for example with pop it can be very complex or very simple, but average or "almost" will really not make a good song, it can seem simple in hindsight but probably blood sweat and tears went into such songs, or creative energy that might never come back as strong.
just my raw thoughts though. it could be me being biased knowing it's AI, but I don't think so. I think my brain has kind of adapted to a point where I can feel if something is AI because it always seems super "average"/mid?
I don't think people would think anything strange of a lot of these tracks if they just randomly heard them on the radio.
This marked a divergence from thousands of years of vocal performances where singing ability and enjoyment of the music were one and the same.
AutoTune was the first slop, and the general population seems to like it.
The problem with autotune is more that it removes a lot of nuance from singers' voices, it's like listening to MIDI instead of listening to a real piano. This is, however, something that can be improved. Synthesizers can produce wonderful musical effects, and there's lots of highly virtuoso music on synthesizers (including voice distortions, pretty similar technically to autotune) for those that are into it. Progrock, for example, was all about using new technology in complex and extremely interesting ways. Maybe more interestingly for your particular objection, you can look at early electronic music, say Vangelis or Isao Tomita or Kraftwerk. For at least parts of their songs, they could have just programmed their synthesizers ahead of time and played concerts without even being on stage - but that doesn't take away from the fact the music itself.
Ultimately, if the music sounds good and elicits some feelings and thoughts, it's good music. Whether the musicians can reproduce it live or it's done 90% in a studio doesn't really matter here. Of course, it does mean it may not be worth going to a live show from some particular performer, and it also means that the performer is not necessarily the most relevant artist - the person programming the "auto"tune should at least be considered part of the band.
For me the biggest thing is actually the production, there's many people involved usually and sometimes real magic gets made, and that magic might not even contain any vocals at first
like what is acceptable music? only raw vocals & acoustic instruments?
Yeah, it turned out that almost all mainstream tracks nowadays have post-processing on vocals (the extent varying between genres and styles).
AI could be helpful here, but it's not clear that it is required or an improvement.
I love the casual reminds that we're second-class citizens each time a new technology gets released. Available in the US but always excluding Puerto Rico.
I guess copyright is pretty much dead now that the economy relies on violating it. Too bad those of us not invested into AI still won't be able to freely trade data as we please....
I know, I know. Most people don't care. How exciting.
Anyone, literally anyone, can use it (eventually) to generate incredible scenes. Imagine the person who comes up with a short film about an epic battle between griffins and aliens...Or a simple story of a boy walking in the woods with their dog...Or a story of a first kiss. Previously people were limited to what they had at hand. They couldn't produce a video because it was too costly. Now they can craft a video to meet their vision.
I do find it exciting.
Well, yes? There's a reason why everything that was produced with these tools so far is garbage: because no one actually caring about their art would accept these things. Art is a deliberate thing, it takes effort. These tools are fine for company training videos and TikToks. Of course a few years ago this was science fiction. They are immensely impressive from a technical perspective. Two things can be true.
Sora 2 represents significant progress towards [AGI]. In keeping with OpenAI’s mission, it is important that humanity benefits from these models as they are developed.
This seems like a good time to remind ourselves of the original OpenAI charter: https://web.archive.org/web/20230714043611/https://openai.co...I wonder how exactly they reconcile the quote above with "We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions"...
The closest thing out there is SignWriting [1] which has about as much traction in the real world as esperanto.
Second problem is that sign language is heavily influenced with corresponding facial expressions, body language, the motion of the hands, even how emphatic the motions are. Trying to approximate what is effectively a SPATIAL language into written glyphs feels like a complete waste of time.
If your native language is French, why might you prefer things to be written in French rather than, say, Swahili?
The point is that "SIGN LANGUAGE" is idiomatic to the native speaker's tongue. So if you're going to take the time to create a specialized written form of it, you can just write using the native language which can be read by BOTH the Deaf and non-Deaf community.
Deaf people are not magically illiterate.
Creating a written sign language serves no value since it is just a crappier version of the normal written equivalent.
So there's not a lot of value in creating a written form of say the French Sign Language because you can just use French.
Swahili regions have multiple types of sign language including Kenyan Sign Language.
[1]
No, this is not true. French and French Sign Language are totally unrelated languages. Sign languages generally have little to do with the spoken language of the country they’re used in, that’s why for example American Sign Language and British sign language are completely different and not mutually intelligible despite the UK and the US speaking the same language (with only slight differences in accent and vocabulary).
subtitles can work but it's basically a second language. perhaps comparable to many countries where people speak a dialect that's very different from the "standard" written language.
this is why you sometimes have sign language interpreters at events, rather than just captions.
there's not really a widely accepted written form of sign language.
No, the reason is because a) it's in real time, and b) there's no screen to put the subtitles on. If it was possible to simply display subtitles on people's vision, that would be much more preferable, because writing is a form of communication more people are familiar with than sign language. For example, someone might not be deaf, but might still not be able to hear the audio, so a sign language interpreter would not help them at all, while closed captions would.
That argument applies just as equally to sign language - most countries have their own idiosyncratic sign language. (ASL, LSE, etc.). Any televised event that has interpreters will be using the national language version.
The closest thing you're thinking of is IS - International Sign but its much more limited in terms of expression and not every deaf person knows it.
> there's not really a widely accepted written form of sign language.
Because it makes no sense to have it unless there was a regional deaf community that was fluent in sign language and also simultaneously illiterate.
https://www.reddit.com/r/NoStupidQuestions/comments/6t7k1w/h...
If you're going to convert audio to a digital form in realtime anyway we have this new amazing invention called the WRITTEN LANGUAGE.
(Your comment would be just fine without the last sentence)
"Don't be snarky."
For example, I saw a lot of people criticizing "Wish" (2023, Disney) for being a good movie in the first half, and totally dropping the ball in the last half. I haven't seen it yet, but I'm wondering if fans will be able to evolve the source material in the future to get the best possible version of it.
Maybe we will even get a good closure for Lost (2004)!
(I'm ignoring copyright aspects, of course, because those are too boring :D)
Much more mundane, but useful!
You must understand that infinite copyright is the author's right, and AI companies must be sued for 50 trillion dollars.
About 6 months ago I asked a few different AIs if they could translate a song for me as a learning experience, meaning not a simple translation, but more a word by word explanation of what each word meant, how it was conjugated, any more musical/lyrical only uses that aren't common outside of songs, and so on. I was consistently refused on copyright grounds, despite this seeming a fair use given the educational nature. If I pasted a line of the lyrics at a time, it would work initially, but eventually I would need to start a new chat because the AI determined I translated too much at once.
So in this one, if I wanted to ask it to create a video of the moment in Final Fantasy 6 when the bad guy wins, or a video of the main characters of Final Fantasy 7 and 8 having a sword duel, would it outright refuse for copyright reasons?
It sounds like it would block me, which makes me lose a bit of interest in the technology. I could try to get around it, but at what point might that lead to my account being flagged as a trouble maker trying to bypass 'safety' features. I'm hoping in a few years the copyright fights on AI dies down and we get more fair use allowance instead of the tighter limitations to try to prevent calls for tighter regulation.
100% sure we will see people re-doing movie parts. Also see https://en.wikipedia.org/wiki/The_Phantom_Edit
Whether it's text or super-advanced VR holograms, if it's fan fiction it's fan fiction. Which can be interesting and compelling, but that will never be as exciting as the Word of God[0]. Death of the Author is a nice thought experiment but few people really adhere to it, I've found.
> new AI feature/model comes out
> "it's going to replace people in this field! they better start looking for a new job!!!"
why is this a good thing?
I had to chuckle at this. Because the arrogance of OAI et al will finally get them in the end when these projects continue to be negative NPV.
We are at a point now where it is now how to write software that is the problem but how to describe to the software that is the problem. Video and film making is so generalized, AI needs more information. Typically that information comes from a director's and their team's consistency during production. AI has neither the information for consistency of imagery nor the narrative and the perspective of the narrative a human director and team bring. In time, AI will develop large enough contexts, but will the hardware to run that be affordable? There is a huge amount of context in both an entire script and the world view perspective a film crew brings to any script, and for that reason I think many of the traditional (VFX included) film roles are not going to suddenly disappear. AI video does not replace their consistency at their budget, hands down.
When AI video is able to be just a part of the skill set, for example when it is compatible with compositing, editing, and knows that terminology, AI video will be adopted more. Right now, it is designed as an all or nothing offering.
Won’t the industry change to adopt that massive price cut/productivity gain?
I think feeling like you need to use that in marketing copy is a pretty good clue in itself both that its not, and that you don’t believe it is so much as desperately wish it would be.
Sora 2 itself looks and sounds a little poorer than Google Veo 3. (Which is itself not currently ranked as the top video model. The Chinese models are dominating.)
I think Google, with their massive YouTube data set, is ultimately going to win this game. They have all the data and infrastructure in the world to build best-in-class video models, and they're just getting started.
The social battle will be something completely different, though. And that's something that I think OpenAI stands a good chance at winning.
Edit: Most companies that are confident of their image or video models stealthily launch it on the Model Arena a week ahead of the public model release. OpenAI did not arrange to do that for Sora 2.
Nano Banana, Seedream/Seedance, Kling, and several other models have followed this pattern of "stealth ELO ranking, then reveal pole position".
https://artificialanalysis.ai/text-to-video/arena?tab=leader...
The fact that this model is about "friends" and "social" implies that this is an underpowered model. You probably saw a cherry picked highlight reel with a large VRAM context, but the actual consumer product will be engineered for efficiency. Built to sustain a high volume of cheap generations, not expensive high quality ones. A product built to face off against Meta. That model compete on the basis of putting you into videos with Pikachu, Mario, and Goku.
I don't know, applying the same thinking to LLMs, Google should have been first and best with just text based LLMs too, considering the datasets they sit on (and researchers, among others the people who came up with attention). But OpenAI somehow beat them on that regardless.
it doesn't spark optimism or joy about the future of engaging with the internet & content which was already at a low point.
old is gold, even more so
> A lot of problems with other apps stem from the monetization model incentivizing decisions that are at odds with user wellbeing. Transparently, our only current plan is to eventually give users the option to pay some amount to generate an extra video if there’s too much demand relative to available compute. As the app evolves, we will openly communicate any changes in our approach here, while continuing to keep user wellbeing as our main goal.
If they got the generation "live" enough, imagine walking past a mirror in a department store and seeing yourself in different clothes.
Wild times.
You can prompt with a normal size 8 dress and "kim jungle un wearing a dress" and it will show you something that doesn't help you understand whether that dress would fit or not. You can ask for a tube dress and it will usually give him a big bust to hold it up. It's not useful for the purpose of visualing fit.
It will definitely be used for such just like image models already are for cheap tenu clothes, and our onions shopping experience will get worse.
Maybe this needs purpose built models like vibe-net or maybe you cab train a general purpose model to do it, but if they were spending the effort necessary to do so they'd be calling it out.
https://adage.com/article/digital-marketing-ad-tech-news/car...
A little creepy, but very much in this vein.
We probably haven't even scratched the surface of what will be done with this tech. When video becomes "easy", "quick", "affordable", and "automatable" (something never before possible on any of those dimensions) - it enables countless new things to be done.
But Sora /VEO will probably also revolutionize movies and tv content
https://xcancel.com/Naija_PR/status/1904809073356251634
Then take the next step. Why even spend money going out? Generate a video of yourself with fake friends at a party and post that, while eating ice cream alone at home.
I agree with you regarding online validation. I would even go so far as saying that depending on online validation or fame in general for happiness is unhealthy and anyone who does should make it a priority to find alternative sources.
"Five things you won't believe: We took an actual vacation"
Hey don't be giving away my JOMO secrets.
Now... take it a STEP further. Remember the scene in Futurama where Fry tries on the Lightspeed Briefs and looks in the mirror to see a rather aspirational version of himself?
https://www.youtube.com/watch?v=by0KQRJVFuk
Yeah.
Absolutely cooked.
After the disaster that was chatGPT4.001, study mode and now this: an impossibly expensive to maintain AI video slop copyright violater, their releases are uninspired and bland, and smelling of desperation.
Making me giddy for their imminent collapse.
The point is that sora2 demo videos seemed impressive but I just didn't feel any real excitement. I am not sure who this is really helping.
So much visual power, yet so little soul power. We are dying.
>Every AI video demonstration is always about funny stuff and fancy situations.
The thing about AI slop is that by its very nature, unless it's heavily reined in by a human, it's invariably lowest common denominator garbage. It very likely will generate something you yourself could think of within the first five seconds of hearing the prompt, not some very clever take on it, so it can only work as a placeholder (AI as a replacement of stock images is great, for example) or to add background detail where it won't call attention to itself and its genericity.
>imagine building a video about the moment Jesus was born
Given there are multiple paintings on the subject, I very much doubt no one has generated something like that already.
Even moreso than Facebook tags, the person being cast can cause the deletion of the source video at any time.
I love this AI video technology.
Here are some of the films my friends and I have been making with AI. These are not "prompted", but instead use a lot of hand animation, rotoscoping, and human voice acting in addition to AI assistance:
https://www.youtube.com/watch?v=H4NFXGMuwpY
https://www.youtube.com/watch?v=tAAiiKteM-U
https://www.youtube.com/watch?v=7x7IZkHiGD8
https://www.youtube.com/watch?v=Tii9uF0nAx4
Here are films from other industry folks. One of them writes for a TV show you probably watch:
https://www.youtube.com/watch?v=FAQWRBCt_5E
https://www.youtube.com/watch?v=t_SgA6ymPuc
https://www.youtube.com/watch?v=OCZC6XmEmK0
I see several incredibly good things happening with this tech:
- More people being able to visually articulate themselves, including "lay" people who typically do not use editing software.
- Creative talent at the bottom rungs being able to reach high with their ambition and pitch grand ideas. With enough effort, they don't even need studio capital anymore. (Think about the tens of thousands of students that go to film school that never get to direct their dream film. That was a lot of us!)
- Smaller studios can start to compete with big studios. A ten person studio in France can now make a well-crafted animation that has more heart and soul than recent by-the-formula Pixar films. It's going to start looking like indie games. Silksong and Undertale and Stardew Valley, but for movies, shows, and shorts. Makoto Shinkai did this once by himself with "Voices of a Distant Star", but it hasn't been oft repeated. Now that is becoming possible.
You can't just "prompt" this stuff. It takes work. (Each of the shorts above took days of effort - something you probably wouldn't know unless you're in the trenches trying to use the tech!)
For people that know how to do a little VFX and editing, and that know the basic rules of storytelling, these tools are remarkable assets that compliment an existing skill set. But every shot, every location, every scene is still work. And you have to weave that all into a compelling story with good hooks and visuals. It's multi-layered and complex. Not unlike code.
And another code analogy: think of these models like Claude Code for the creative. An exoskeleton, but not the core driving engineer or vision that draws it all together. You can't prompt a code base, and similarly, you can't prompt a movie. At least not anytime soon.
What is up with a lot of voices are left ear only?
We all told him about the sound mix - he let a couple of videos slip with a bad "mono as single-channel stereo audio" renders. On his machine it sounded normal. He got flack for that, and he's been hearing this for months.
I'm going to show him this thread. I don't think he'll ever forget to check again.
Despite that, he's a really talented guy. Chalk this up as a bad production deploy. We didn't want to delete and re-upload since the videos had legs when we first released them. There's a checklist now.
In the meantime, good old
Settings -> Accessibility -> Audio -> Play Stereo as Mono
helps.
Rewind to just one year prior -- 2024.
AI video was brand-spanking new. We'd only just gotten over the "Will Smith" spaghetti video and the meme-y "Pepperoni Hug Spot" and "Harry Potter by Balenciaga" videos.
I was the only person to attempt to use AI in 2024's competition. It was a time when the tools and infrastructure for video barely existed.
On the debut night, I was resoundingly booed by the audience. It felt surreal. Working all weekend to have an audience of peers jeering at you in a dark theater. The judges gave me an award out of sympathy.
Back then, image-to-video models really were not a thing (Luma launched "Dream Machine v1" shortly after this). I was using Comfy, Blender, Mocap, a full Mocap suit (the itchy kind), and a lot of other hacks to build something with extremely crude tools.
We lost a day of filming and had to scramble to get something done in just 24 hours. No sleep, too much caffeine. Lots of sweat and toil.
The resulting film was a total mess, of course:
https://vimeo.com/955680517/05d9fb0c4f (It's seriously bad - I hate it. It might legitimately be the very first time AI was used in a 48 hour competition.)
That said, it felt very much like a real 48 Hour competition to me. Like a game jam. The crude ingredients, an idea, the clock. The hustle. The corners being cut. It was palpable.
I don't think you can say there isn't soul in this process. The process has so much soul.
Anyway, fast forward to this year. Three teams used AI, including my own. (I don't think I have a link to our film, sadly.)
We all got applause. The audience was full of industry folks, students, and hobbyists. They loved it. And they knew we used AI.
The industry is anxious but curious about the tech. But fundamentally, it's a new tool for the tool box. The real task is storytelling.
"The studios and creators who thrive in this new landscape will be those who can effectively harness AI’s capabilities while maintaining the human creativity and vision that ultimately drives the art of cinema."
It is in many ways thrilling to see this come to life, and I couldn't agree with you more.
..Just somehow several years on, these optimistic statements still all end up being in the future tense, somehow for all the supposed greatness and benefits, we still dont see really valuable outputs. A lot of us do not want more of the "CONTENT" as envisioned by corporate ghouls who want their employees or artists to "thrive" (another word kidnapped by LinkedIn-Linguists). The point is not in the speed and easiness of generation of outputs, visual and sound effects etc. The point is the artists interpretation and their own vision, impressions etc. Not a statistical slop which "likely" fits my preferences (i.e. increases my dopamin levels).
I have a really big problem with letting low quality stuff infest into the species.
This will also be used to create great content.
Beauty is not just an “idea” that someone has and needs to get out onto a medium
It is a process and journey that a person undergoes to get said idea onto said medium
That journey often plays out very differently than the person expects. Things change, the art is different from the idea, and the person learns and grows
Our modern society is so obsessed with results, competition, and efficiency that we no longer see the truth: the journey is to be enjoyed, and from enjoying the journey, comes beauty
I encourage you to meditate on why our society is so sick and depressed right now, and extrapolate to how we got here, before assuming this will be a good thing for society
> I considered renting out sound stages, flying to exotic desert locations, getting a scuba team to shoot the underwater scenes in an aquarium, commissioning custom-made Teletubbies costumes, hiring SAG actors, building dozens of miniature sets, and spending my life savings on making this video. But using AI just seems slightly easier.
Making short films with AI is still incredibly effortful. If you're being careful and diligent, it takes days to "shoot" and edit the entire shot list for a 5-7 minute short.
Would you say that the creators of today's animated TV shows, in mechanizing production with Toon Boom Studio, have stripped the beauty away? I still found "Bojack Horseman" to be a salient dramedy.
Would you say that Pixar, in using motion capture and algorithms to simulate light, physics, and movement, is cutting away the journey?
This is a new adventure and new level of abstraction we're embarking upon.
I'm already thinking about the next way points: real time mocapped improv for D&D campaigns and live community theater fantasy and science fiction productions.
These are tools that bring us to new places, that enable us to tell new stories. Previously you'd have to win Disney budget approval to tell a story matching your vision - now you don't.
Art is not effort. Art is not labour. Beauty is not suffering. Art =/= craft. Art is communication.
If someone wants to suffer long the endurance journey to becaome skilled at a craft we can still respect/appreciate it the same way a sprinter spends 10 years training to run real fast, in the mean time most of us will use a vehicle to get somewhere faster.
What we're going to lose is a bunch of interesting behind the scene videos because no one is going to watch someone prompt for an hour wondering why can't I do that, but rather why didn't I do that.
Proliferating tools for creation is net good in the same sense that teaching masses to write is net good. It's strange people are opposing lowering the barrier to entry to visual communication. That's what art ultimately is, communication. Once difficult, soon ubiquituous.
I can't find the link now, but I saw a continuous shot video of a grocery store from the perspective of a fly. It was shot in the 90s music video style and looked so damn good.
Some of the stuff being done by these guys is also a whole lot of fun (slightly NSFW and political content), and it fits the music video theme:
Personally, I feel mixed feelings. I'm impressed, but I'm not looking forward to the new "movies" that are going to litter YouTube et al generated from this.
Multiple sci-fi-fantasy tales have been written about technology getting so out of control, either through its own doing or by abuse by a malevolent controller, that society must sever itself from that technology very intentionally and permanently.
I think the idea of AGI and transhumanism is that moment for society. I think it's hard to put the genie back in the bottle because multiple adversarial powers are racing to be more powerful than the rest, but maybe the best thing for society would be if every tensor chip disintegrated the moment they came into existence.
I don't see how society is better when everyone can run their own gooner simulation and share it with videos made of their high school classmates. Or how we'll benefit from being unable to trust any photo or video we see without trusting who sends it to you, and even then doubting its veracity. Not being able to hear your spouse's voice on the phone without checking the post-quantum digital signature of their transmission for authenticity.
Society is heading to a less stable, less certain moment than any point in its history, and it is happening within our lifetime.
CGI for fantasy stuff is unavoidable, but when it's stuff that could have been done by actors but is instead AI, then to me it just feels cheap and nasty - fake.
These LLMs might make content that looks initially impressive but they are absolutely not performing physically based rendering or have any awareness of the lighting arrangement in these scenes. There are a lot of things they get right, but you only have to screw up one small element to throw the whole thing off.
I am willing to bet that Unreal Engine 5 will continue to produce more realistic human faces than OAI ever can with these types of models. You cannot beat the effects of actually running raytracing in a PBR pipeline.
If anything, it looks a lot worse than a lot of AI-generated videos I've seen in the past, despite being a tech demo with carefully curated shots. Veo 3 just blows this out of the water for example.
I'm over here thinking, "It felt like just yesterday I was laughing at trippy, incoherent videos of Will Smith eating spaghetti."
I love the progress we're making. I love the competition between big companies trying to make the most appealing product demos. I love not knowing what the tech world is going to look like in six months. I love not thinking, "Man. The Internet was a cool invention to have grown up in, but now all tech is mundane and extractive." Every time I see AI progress I'm filled with childlike wonder that I thought was gone for good.
I don't know if this represent SOTA for video generation. I don't care. In that moment I found it impressive and was commenting specifically on the joy I experienced watching the video. I find it frustrating to have that joy met with such negativity.
The period we're in is fleeting. I think it should be acknowledged and treasured for what it is rather than viewed with disdain because of what is inevitably to come. I stopped using Facebook and never moved to Insta/TikTok when things began to feel too extractive, but, for a good decade there, I felt so close to so many more people than I ever thought possible. It was a really nice experience that I no longer get to have. I'm not mad at social media. I'm happy I got to experience that window of time.
Right now I'm very happy to be using LLMs without feeling like I'm being preyed upon. I love that programming feels fresh and new to me after 15 years. I'm looking forward to having my ability to self-express magnified ten-fold by leveraging generative audio/visuals, and I look forward to future breakthroughs that occur once all these inventions become glorified ad-delivery mechanisms.
None of this seems bad to me. Innovation and technological progress is responsible for every creature comfort I have experienced in my entire life. People deserve to make livings off of those things even if they weren't solely responsible for the innovation.
Points though for the completely expressionless line delivery, it completely nailed that.
Will Smith eating spaghetti is the dumbest most uncreative thing. You are impressed by it because it is a meme. It is stupid.
I have no idea why you're so intent on coming across bitter about a fledgling technology. A few years ago this demo video would've been indistinguishable from magic. It will continue to improve.
There's still something off about the movements, faces and eyes. Gollum features.
Brave new internet, where humans are not needed for any "social" media anymore, AI will generate slop for bots without any human interaction in an endless cycle.
It seems like OpenAI is trying to turn Sora into a social network - TikTok but AI.
The webapp is heavily geared towards consumption, with a feed as the entry point, liking and commenting for posts, and user profiles having a prominent role.
The creation aspect seems about as important as on Instagram, TikTok etc - easily available, but not the primary focus.
Generated videos are very short, with minimal controls. The only selectable option is picking between landscape and portrait mode.
There is no mention or attempt to move towards long form videos, storylines, advanced editing/controls/etc, like others in this space (eg Google Flow).
Seems like they want to turn this into AITok.
Edit: regarding accurate physics ... check out these two videos below...
To be fair, Veo fails miserably with those prompts also.
https://sora.chatgpt.com/p/s_68dc32c7ddb081919e0f38d8e006163...
https://sora.chatgpt.com/p/s_68dc3339c26881918e45f61d9312e95...
Veo:
https://veo-balldrop.wasmer.app/ballroll.mp4
https://veo-balldrop.wasmer.app/balldrop.mp4
Couldn't help but mock them a little, here is a bit of fun... the prompt adherence is pretty good, at least.
NOTE: there are plenty of quite impressive videos being posted, and a lot of horrible ones also.
social media was heading this way before AI
OpenAI did not stealthily release Sora 2 to the image and video ELO ranking leaderboards ahead of time as is now somewhat tradition.
This model is probably designed to run fast and cheap as a social play. Emphasis on putting you and your friends into popular franchises and IPs.
OpenAI probably has a totally different model for their Hollywood-grade VFX. One that's too expensive to offer $20/mo consumers.
- - - - -
EDIT:
Oh my god, OpenAI literally just disrupted TikTok:
https://x.com/GabrielPeterss4/status/1973071380842229781
https://x.com/GabrielPeterss4/status/1973122324984693113
https://x.com/GabrielPeterss4/status/1973121891926942103
https://x.com/GabrielPeterss4/status/1973120058907041902 (potentially dangerous ... )
https://x.com/GabrielPeterss4/status/1973111654524264763
https://x.com/GabrielPeterss4/status/1973090475486879818
https://x.com/GabrielPeterss4/status/1973110596825653720 (is this the same model? It doesn't look like it.)
https://x.com/GabrielPeterss4/status/1973096194508251321
https://x.com/GabrielPeterss4/status/1973086729281347650
https://x.com/GabrielPeterss4/status/1973088038851932522 (this is truly something only kids will love)
https://x.com/GabrielPeterss4/status/1973087595967201449
https://x.com/GabrielPeterss4/status/1973077105903620504
Holy shit!
This is 100% the future of what kids will do. This is incredible for short form vertical video.
It doesn't need to look good, it just needs to let you tell incredible stories with people and things you care about.
This is way better than Meta's social video app.
The younger generations however will likely gobble it right up. I try not to judge because folks said the same thing about Nintendo when I was young.
I've long thought that AI will force new distribution methods because old media is so markedly against it... Maybe this is another Netflix vs Blockbuster moment.
I like appointment television too. Sometimes A24 isn't pretentious enough for me. But I'm not beyond saying that there's absolutely a time and place for saccharine.
This content will grab eyeballs. I'll bet money on that.
It doesn't really matter what you or I think anyway. OpenAI is delivering a stream of hits and will continue growing while we debate on the sidelines.
That's a direct copy of what Midjourney has done already.
https://www.midjourney.com/explore?tab=videos
Many people are just playing with images and the distinctive styles that Midjourney (the model) seems to have developed. It's also trained by ratings and people's interactions.
When you make images you can dial down the "aesthetic".
This is the "Suno" moment for video.
It's easy to make a really compelling composition. Something even Google Veo couldn't do.
It's not the best looking video model, but it has everything else -- rich editing, good voices and lipsync, music and lyrics, animation (cartoon, 3D, anime), SFX. It's wild.
The videos aren't single clips but rather complete beginning-middle-end stories that unfold over several cuts.
They have an onboarding flow where you rate images and it tunes into your aesthetic preferences. You can create mood boards for specific projects.
So I would say it's more community than social media.
Also I suspect that this won't stay free very long. Silicon valley loves the model of starting free to build a user base and then monetizing more later
We are also getting better at producing cheap power. For example thanks to intermittent sources like solar and wind, in many places electricity often becomes free in wholesale markets some times of the day.
AI generation (including video) currently takes at least a second, and users expect that delay. So that means inference is not latency sensitive and you can put these data centres anywhere in the world, wherever power is cheapest. Model training cares even less about latency.
At the moment, the hardware itself is too expensive (and nvidia has long backlogs), so people run them even when power is expensive. But you can easily imagine an alternative future where power never becomes cheaper than today (on average), but we have lots of AI data centres lying in wait around the world and only kicking into full gear when and where power is essentially free.
Power needs to be given away or people paid to take it is more of a function of limited storage abilities and limited ability to scale down rather then generating unlimited power. The free power is an issue with how the system is built (and the type of power source) rather than a sign of success. The same area has to import power at higher costs when the sun or wind isn't as powerful.
There's no need to scale down solar or wind power.
Yes, storage is another way to make money from power prices that differ over time.
> [...] the cost has increased per hour a lot over the last 50 years.
Some sources of power, like solar, have been dropping in price a lot recently.
Data centers seem to have wholesale rates of around 4 cents per kilowatt-hour on the higher end.
This gets you 2 cents per video. If you're generating 50 million videos per day (an estimate on the higher side of how many TikTok videos are uploaded every day), that costs you a million dollars a day.
So if you entirely subsidized for free the entirety of all of TikTok's video generation just using LLMs, I don't think energy generation costs exceed 365 million a year (and I think this might be very severely estimating costs, but there are some large error bars here).
I'm pretty sure OpenAI (or any company) would be pretty happy to pay 365 million dollars a year for the soft social power of something like TikTok. Just the influence this buys in politics and social discourse would be worth the pricetag alone.
And that's of course leaving aside any form of monetization whatsoever (where in reality you'd likely be charging the heaviest users the most).
N.B. I'm also not sure it's actually more power efficient for users to post their own content in absolute terms. It seems not unlikely that the amount of energy it takes to produce, edit, and process a TikTok video exceeds half a kilowatt-hour. But maybe you're focused solely on the video hoster.
That would be really remarkable, considering the total power capacity of a phone battery is in the neighborhood of 0.015 kWh
I hedged as "not unlikely" because I'd need to think harder about the amortization of more energy expensive videos vs less energy expensive ones and how much energy you can actually attribute to a video vs the video solely being an activity that would be an add-on to something that would happen anyways.
But it's not just the energy expenditure of a phone.
(I also think that 0.5 kilowatt-hours is an overestimate of energy expenditure by potentially up to two orders of magnitude depending on how much batching is done, but my original comment did say 0.5 kWh).
But this is kind of a worst case cost analysis. I fully expect that the average non-pro Sora 2 video has one to two orders of magnitude less GPU utilization than I listed here (because I think those video tokens are probably generated at a batch size of ~100 per batch).
Where do you get this from?
Timing-wise, I'm making an extremely liberal estimate. (In reality, Sora 2 takes about 2 minutes to generate a video in my testing).
The thing that matters here is ultimately the product of time and memory (the latter of which I use as a stand-in for calculating number of chips being used), so I'm more or less asserting that the models being used to generate the free Sora 2 videos have less than 6 trillion parameters (since 2 minutes being 1/5 of 10 minutes give us a 5x multiplier we could apply to 1.3 TB), or less than 24 trillion parameters with aggressive quantization. I can explain why I think parameter count is less than 6 trillion if you'd like, but it's pretty hand-wavy intuition.
Well this is disappointing. I can't even watch your links.
One conversation I remember was complaining about people who constantly want AI pictures of anime feet.
I think OpenAI is just responding to the users.
What's the benefit of this? Curious if anyone has a solid viewpoint steelmanning any positives they can think of.
Is there? Creating AGI sounds like a great way to utterly upend every assumption that our economy and governments are built on. It would be incredibly destabilizing. That's not typically good for business. There's no telling who will profit once that genie is out of the bottle, or if profit will even continue to be a meaningful concept.
> By "defeat," I don't mean "subtly manipulate us" or "make us less informed" or something like that - I mean a literal "defeat" in the sense that we could all be killed, enslaved or forcibly contained.
Linked from https://openai.com/index/planning-for-agi-and-beyond/
Just reminds me of this: <https://en.wikiquote.org/wiki/The_Hitchhiker%27s_Guide_to_th...>
>If you genuinely believed you were 2-4 years away from AGI, is Sora slop really the thing you'd release?
People always like telling stories. Books, comic strips, movies, they're all just telling a story with a different amount of it left up to the viewer's imagination. Lowering the barrier to entry for this type of stuff is so cool.
I think you have to be pretty pessimistic to not just think it's really cool. You can find issues with it for sure, and maybe argue that those issues outweigh the benefit, but hard to say it's not going to be fun for some people.
I do find the actual generation of video very cool as a technical process. I would also say that I can find a lot of things cool or interesting that I think are also probably deleterious to society on the whole, and I worry about the possibility of slop feeds that are optimized to be as addictive as possible, and this seems like another step in that direction. Hopefully it won't be, but definitely something that worries me.
This response just never feels true to me. Many of the most successful web comics are crude drawings of just stick figures and text[1] with potentially a little color thrown in[2] and like half of the videos I see on TikTok are just a person talking into the forward facing camera of their phone. The barrier to entry in the pre-AI world isn't actually that high if you have something interesting to say. So when I see this argument about lowering the barrier to entry, I can't stop myself from thinking that maybe the problem is that these people have nothing interesting to say, but no one can admit that to themselves so they must blame it on the production values of their content which surely will be improved by AI.
[1] - https://xkcd.com/
[2] - https://explosm.net/
I think people have a mistaken view of what makes some form of storytelling interesting. Perhaps this is my own bias, but something could be incredibly technically proficient or realistic and I could find it utterly uninteresting. This is because the interesting part is in what is unique about the perspective of the people creating it and ideas they want to express, in relation to their own viewpoint and background.
Like you pointed out, many famous and widely enjoyed pieces of media are extremely simple in their portrayal.
I completely agree. And now that you mention this, I realize I didn't even point to the most obvious and famous examples of this sort of thing with artists like Picasso and Van Gogh.
If someone criticizes Picasso's or Van Gogh's lack of realism, they are completely missing the point of their work. They easily could have and occasional did go for a more photorealistic look, but that isn't what made them important artists. What set them apart was the ways they eschewed photorealism in order to communicate something deeper.
Similarly, creating art in their individual styles isn't interesting because it shifts the primary goal from communication to emulation. That is all AI art really is, attempts at imitation, and imitation without iteration just isn't interesting from an artistic or storytelling perspective.
Social media is the new CB radio.
But now with an AI-powered addiction factor so you can never put it down, no matter how bad it is.
Blipverts are next.
It's not just different amounts, but different kinds. A (good) comic strip isn't just the full text of a book plus some pictures..
What exactly is this enabling, other than the mass generation of low quality, throwaway crap that exists solely to fatten up Altman's wallet some more?
One that comes to mind is a sort of podcast-style of two cats having a conversation, and in each "episode" there's some punchline where they end up laughing about some cat stereotype. Definitely low quality garbage, but I guess what I mean by "barrier of entry" (sorry for the buzzword), is just that this is going to enable a new generation of content, memes, whatever you want to call it.
If this works it is a more powerful algorithm shaping mechanism than TikTok’s revealed preference feed. Even if Sora doesn’t take off, it could force TikTok to integrate something similar.
Also, remember that it’s not about benefitting society as a whole, it’s about benefitting the investors. If the investors get rich at the cost of society, it’s a win for OpenAI.
I mainly am curious if anyone has the view that there is broader benefit to the development of this, after all, wasn't that the entire mission statement of OpenAI?
Quoting from their announcement on their site:
> OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.
This feels like something constrained by the need to generate a financial return, and not something primarily focused on understanding physics and world models, to be blunt.
Or 'everything has to fit into 120 characters' (= Twitter). Or 'replies are designed to be maximally rage bait-y' (= Tumblr).
Personally I think the problem with TikTok is largely based in hyperoptimized content specialized to your interest shaping your worldview and isolating your perspective of the world from others, as well as probably being pretty bad for the ability to maintain attention and engage with long form narratives and ideas. I don't really think TikTok is unique here, other than that it's the best in the game at doing it and keeping people's attention.
But overall I suppose I just see something like this as potentially worse in those regards, but maybe I'm overly pessimistic.
Nonetheless, a platform for AI videos with an audience looking for them, rather than the horrible "boomer-slop" that is prevalent on other social media, is welcome in my eyes.
They'll just cross-post. That's been going on since back when Facebook was The Face Book.
(personal rant) I've been in a mild existential crisis since I read Amusing Ourselves to Death. Can one form of entertainment really be more well-regarded than another? Is fine art fundamentally different from pop art? Are there 'finer' pop cultures amongst all pop cultures? I do still think reading The Song of Ice and Fire is more meaningful than scrolling TikTok. The crisis part is that I can't justify this belief with words.
1: Reading a long book demands focus on a longer timespan than scrolling TikTok, and with focusing on a single thing for a long time, we get a sense of accomplishment. I don’t know how to justify this as valuable, but for some reason I feel that it is.
2: The Song of Ice and Fire (and GoT) were consumed by a huge proportion of people, and you now have this in common with them. This act of consuming entertainment also grants you a way to connect with other humans - you have so much to talk about. Contrast that with an algorithmic feed, which is unique just for you - no one else sees your exact feed. Of course, there are tons of people that see some of the same snippets of content, if their interests overlap with yours, but it’s not nearly as universal as having read the same series of books (and there’s much less to talk about when you’ve seen the same 17-second short form video than when you’ve both invested dozens of hours in reading the same series of books).
I don’t think these thoughts fully justify your belief, but hopefully they provide some support to it.
https://x.com/theo/status/1973167911419412985 (Music video with Sam Altman as Skibidi Toilet)
This is pretty fun.
These keep getting wilder and wilder:
https://x.com/MatthewBerman/status/1973115097339011225 (Kinda gross)
https://x.com/cloud11665/status/1973115723309515092 (Japanese)
It can do cartoons:
https://x.com/venturetwins/status/1973158674899280077 (Rick and Morty)
https://x.com/TheJasonRink/status/1973163915476611314 (Family Guy)
https://x.com/cfryant/status/1973162037305024650 (Family Guy Horror)
Incredibly convincing anime:
https://x.com/fofrAI/status/1973164820863262748
Minecraft meets GTA:
https://x.com/Angaisb_/status/1973160337752121435
Super Mario in the real world:
https://x.com/skirano/status/1973184329619743217
Super solid looking movie trailer:
https://x.com/jasonjoyride/status/1973142061114335447
Damn:
I really want to call this the "Suno moment" for AI video.
Prior to Sora 2, you had to prompt a lot of clips which you then edited together. You had to create a starting frame, maybe do some editing. Roll the dice a lot.
Veo 3 gave us the first glimpse of a complex ensemble clip with multiple actors talking in a typically social media or standup comedy fashion. But it was still just an ingredient for some larger composition, and it was missing a lot of the soul that a story with a beginning-middle-end structure has.
Sora 2 has some internal storytelling mechanic. I'm not sure what they did, but it understands narrative structure and puts videos into an arc. You see the characters change over the course of the video. They're not just animated Harry Potter portraits. They're alive. And they do things that change the world they're in.
Furthermore, Sora 2 has really good "taste" and "aesthetic", if that makes sense. It has good understanding of shot types, good compositions, good editing, good audio. It does music. It brings together so much complexity in choice and arranges them into a very good final output.
I'm actually quite blown away by this.
Just like Suno made AI music simple and easy - it handled lyrics, chorus, beat, medley, etc. - this model handles all of the ingredients of a 10 second video. It's stunning.
Sora 2 isn't the highest quality video model. It doesn't have the best animation. But it's the best content machine I've ever seen.
Now these things aren't new, fake videos / images go back decades if not a century. But they took some effort to make, whereas this technology makes it possible for it to take less effort than it took for me to write this comment.
Of course, it's always my choice; if I stop visiting Reddit and touch grass instead it really won't affect me directly.
Everything has a relevancy and penetration decay curve.
The funny thing is, I think this law applied in the classical era (1950's, 1990's, etc.), we just weren't creating at scale to realize it.
Maybe it's just one dominant variable: novelty. I'd be curious to see how we might model this.
> https://x.com/jasonjoyride/status/1973142061114335447
This isn't AI generated. They're a production company and they made a short film: https://www.youtube.com/watch?v=JGLoTjxd-Ss
>> How do you get HD renders? im getting like super low res shit
>It's because this isn't AI
I haven't watched the film, but the premise is something about an orbiting space station. I could easily imagine scenes featuring rapid day/night cycles like astronauts experience on the ISS.
Remember when you were taught to extract the "moral of the story" in school? That was the whole point. That form of communication is what makes art valuable and it definitely is what makes some art more valuable than others.
I prefer classics myself, but this is exactly why booktok works (and why Fourth Wing blew up the way it did).
My personal process of grappling with this led to a focus on agency and intentionality when defining the difference.
Scrolling TikTok, much as scrolling Twitter or Facebook or Instagram or YouTube's recommendations would be, is an entirely passive activity. You sit back and you allow the Content to be fed to you.
Reading a book requires at least a bare minimum of selecting a book to read, choosing to finish that book, and intentionally choosing at any given time to spend your time reading that particular book. Similar things can be said for selecting movies. The important part in my mind is that you chose it, rather than letting someone or something else pick what they think you'll like.
The process of picking things yourself allows you to develop taste and understand what you like and dislike, mentally offloading that to someone or something else removes the opportunity to develop that capability.
I think there's arguments to be made against this view: how can you decide what to read or watch without getting recommendations or opinions? If you only engage with popular media isn't it just a slower process of the same issue?
But I do believe there is a fundamental difference between passivity and active evaluation of engagement as mental processes, and it's the exact reason why it is harder to do than scrolling is.
Compare https://de.wikipedia.org/wiki/Lesesucht (use Google Translate).
The gist of your linked article is that they were opposed to reading because they believed that reading distracted people from labor, which they considered undisciplined and immoral. Of course there also seems to be a healthy dose of misogyny associated with it:
> Poeckel's statement that women should acquire a certain amount of knowledge, but not too much, because then they could become a "burden on human society," is representative of many other texts in which reading regulations played a central role.
Then once you get to the progression of books > comics > movies > Youtube > TikTok (did I miss any?), you can observe a steady decrease in the amount of cognitive effort required to engage with the medium and a reduction in attention spans. Reduced attention span is a legitimate concern and it's only getting worse as time goes on (ask teachers).
I actually enjoy TikTok in moderation these days but I worry about people who struggle to engage with anything but TikTok, it's like a generational ratchet that only seems to go one way, towards shorter and shorter attention spans.
Maybe someone can make the argument that this won't actually matter, but it's incorrect to say that things haven't changed in observable and measurable ways, and that people are just complaining about nothing.
My S.O. probably spends 3 hours a day on TikTok/Reels and I seriously doubt they could remember even 10% of what they saw in that time. It's like a part of their brain turns off while scrolling.
Most short form content would probably score low. It’s short, for one, and it tends to be repetitive and lack anything like plot complexity or nuance.
Of course it’s not like trite pop is new. Way back in the dime store novel days it was called pulp. TikTok is just one of the latest iterations. People have always consumed dumb filler.
On the other hand, reading a book is like getting on a boat. You've made certain preparations for acquiring the vessel and set course through unknown territory. A journey away from the shore and away from what's immediately at hand, which can also turn out to be a journey towards self-discovery.
It depends on what you want to get out of art.
Do you want human connection and shared cultural context so you can talk to real friends about things? Do you want virtual friends and connections? Do you want ideas to inspire you to create your own things, or change how you think?
Do you just want to distract yourself from how hungry you are, how much inequality is in the world, and how depressed you are, letting death draw closer?
All of those are valid things, and different art is more meaningful for different goals.
Scrolling tiktok fits into the last one, it's burning time to avoid thinking about things, moving you closer to death. Song of Ice and Fire builds a large coherent world, has bits of morality and human relation, and all of those can spark ideas and be related to your own human suffering, so it indeed feels more valid to me as a way to reflect and change how you think.
So there isn't necessarily some huge crisis that you need to justify: in some ways reality just is (and this includes subjective reality;).
Say if you ask why do the laws of physics conserve energy locally, you can actually argue that if it were otherwise actually life would be extremely more unlikely, as that tends to increase instability in various systems (both energy divergence and going to 0 makes life unlikely); but still I'm almost certain you could conceive of forms of life in non-energy-conservative systems (something like Conway's Game of Life, but maybe with more advance rules if you prefer). So while it makes sense that the physics in our universe is approximately locally conservative (maybe not exactly in GR?), in totality it's just kind of a brute fact, an experimental observation. Our theories help us devise say better experiments to test e. conservation, and in a way map out the landscape of consistent physical laws. But they don't tell you which realization of consistent or admissible laws you'll find yourself in.
Other way to phrase it, what you feel is in a way real. So if you feel in some fundamental way better reading A than B, then that simply reflects a property of reality and needs no further explanation. The only problem is that in some cases our judgement can be distorted, like by substances or maybe overwhelming blinding desires (that fail to reflect fundamental experiences) or by limitations of our memory, etc.. But if we assume this isn't the case (i.e. some pathological reason for your preference), then your feeling is valid irrespective of a wordy justification. I think some things really are subjective, but also believe in a fundamental and very complex way subjectivity is actually as objective as anything else. I think the fact that one experience is actually (with some important caveats and necessary context) better than another in what might be called essentially an objective sense, is one of the most counterintuitive things we will come to accept about the human mind. We tend to mistaken complexity (it's very complex to compare experiences) to impossibility (it's impossible to judge experiences objectively).
I believe in principle there might be the equivalent Laws of Physics (say Newtonian mechanics) for the human mind, but I suspect we're still very far from it, because it might require analyzing the network of n=100 trillion synapses in our brain. I think one day we might get there, but that would probably require something like a computational effort maybe at least several times n, or even on the order of n², or some other poly(n), and also poly(n) memory. If we think of one of the major objectives of physical law is to make predictions, and explain behavior, and say to aid in engineering and designing structures, I think one of the main objective of the laws of the mind would be say to predict whether say an experience or mental state is good or not, and explain why it is so; and then perhaps allow improving a little the design of things so that we have better experiences, that is, a better life. I guess this is already what say psychology, various spiritual traditions, philosophy and arts try to achieve (and I think gets already in many cases pretty close, maybe increasingly closer, to the still inaccessible extremely complicated reality of the human mind and brain).
Regardless, we often have to do our best with what we have today, which is our best-effort subjective judgement, aided by language various human disciplines :)
[1] https://old.reddit.com/r/slatestarcodex/comments/1n6j1jg/pur...
that reach out across the world.
I may not complete this last one
but I give myself to it.
I circle around God, around the primordial tower.
I’ve been circling for thousands of years
and I still don’t know: am I a falcon,
a storm, or a great song?"
-- Rainer Maria Rilke
Say that somebody writes to make certain ideas more visible. For example, somebody wants people to buy the idea that amusing oneself to death is what we do (the book you mentioned). Somebody else perhaps has found that we are chronically depressed and cynic, when instead we should be thinking that a dead death itself is a fine trophy to hang on the wall during the march of progress[^1].
You can a) decide that you are set on your ways, thus entertainment should be pure and removed enough from reality so it doesn't mess with your deeply held beliefs and not read any of those books. or b) run the risk and read the thing with an open mind.
A lot of people are in the a) camp. Those who are in the b) camp would still like to be entertained a little.
[^1] Yours truly. I do that in fiction. https://www.ouzu.im/
> "[...] in its place, he identifies a different kind of crisis. Not the crisis of attention, but the crisis of interest."
Our attention in fact, has never been as fully absorbed as is today's. In place of books and architecture (as in the film), our attention has shifted towards more rapid forms. Yet in terms of hours spent, our 'attention' towards them has massively increased.
Is the crisis we're feeling then one of purported inattention, or a general loss of interest and satisfaction from our surrounds? What has spurred this crisis? Gabriel and Casey's conversation ends:
> "What about everyday life? Are we losing interest in everyday life?"
The film offers an hopeful answer.
This is a no-brainer question. For an extreme example: CSAM is a form of entertainment for some people.
Or perhaps Aristotle’s Poetics - pop culture has value because it is mimetic, and AI generated pop culture is no less a mirror, just one which produces reflections of every moment, all the time - but rather than the grand catharsis we might experience in a work of literature with well wrought characters with whom we empathise, we find the void staring into us as we do into it. Hollow art for hollow men.
Like it or not, the void is culture, and has value because it reflects us, albeit through a glass, darkly.
I think that it reduces down to "reward without effort is bad for you" - in so many different contexts in life, especially entertainment.
Here's one attempt; it's art versus content. Tiktok is content; it's people recording a video, sometimes in one take and publishing, sometimes in multiple takes with some editing etc, sometimes fully professional ones. But overall, it's cheap, rapidly produced content for cheap, rapid consumption. ASoIaF was a labor of years to produce not just a series of books, but a world, a rich history, and later on a multi-media enterprise that involved and employed millions of people, then entertained and excited hundreds of millions of people over the years.
AI is lowering the barrier to entry even more, with anyone able to just punch in some words - less even than this comment - and produce something. For someone to consume. Maybe one in a billion will be remembered or still popular in a decade (like how some of these cheap videos are still popular / remembered / quoted, think vines / memes). But the ratio just keeps getting worse.
ASoIaF to a TikTok video is like... ASoIaF to a tweet.
What is wrong is instead the routine consumption of art created by others in a stupor to rest from the drudgery of daily work.
Create art, don’t waste your life consuming.
But TV and Social Media have their incentives twisted. It's just about ads. They don't really care what you are seeing as long as you are seeing as many ads as possible. The joke about TV was that a valid description of it was advertisements with a little bit of entertainment sprinkled in throughout.
I'm not saying that people haven't been able to use these platforms to build anything meaningful, but that the incentives and the purpose of these platforms are not to entertain, but to keep you glued to the feed for as long as possible to see as many ads as possible (which is why I think "rage bait" is so common).
I am not sure how important fiction novels are (compared to reading non-fiction books or biographies who tell true facts about the real world), but I would say they broaden the horizon of the reader? And there is a selection effect in that “literature” was done by pretty smart people.
Scrolling TikTok is often described as mindless and with people not describing later what videos they watched. In general short form content (TikTok, Instagram, X/Tweets) seem to be much more superficial than long form content (eg this hn discussion board).
I did this, found two things I did for fun, both consuming significant blocks of time. The one that felt useless left not real impact. I want to do more of it, but after spending hours on it, I'm no different than I was before (other than perhaps a bit more skilled at the form of entertainment).
The other form, which was the same thing from an outside perspective (for example, my parents would see them as the same) left me different. It led to me building new goals, reevaluating things happening around me, spend more time thinking about where I'll be in 10/20 years. It led to me walking an hour a day and to start jogging some to build my endurance, despite the form of entertainment being unrelated to physical activity. I don't think this is innately a property of one entertainment form over another, but more about my personal relationship to entertainment.
Using this, how do 'poorly regarded' entertainment impact those engaging in it, compared to 'well regarded' entertainment? Are their lives better for it?
Reading thousand page novels requires actively engaging with the material as you grow your vocabulary, and explore new ideas.
Scrolling TikTok on the other hand is a passive process. Could you recall even a quarter of all the videos you see on your TikTok feed in a single day? I would doubt it.
is a trash derivative "we have Lords of the Rings at home" wannabe, completely void of joy and feels like it's written by an angsty edgy teenager who hates the world and has learned about medieval history for the first time and wanted to add zombies and dragons, the most original fantasy tropes.
I would honestly and unsarcastically take a day of scrolling through TikTok over sitting through 1 chapter of ASoIaF.
And apparently lately the author feels so too.
The question is whether people will eventually get bored with this stuff or if it actually will mesmerize people for huge fractions of their waking lives.
If the latter, I suspect we will outlaw it eventually. It’d be like legalizing hard opiates, literally, but minus the ODs and health damage.
Tiktok makes a lot of money, doesn't it? It definitely draws a lot of eyeballs.
Seems pretty clear what the benefit (to the company) is?
Instagram does not welcome this and I don't think they should. It is its own lane. And if it's just a place to sweep AI slop into, that's a good thing.
User engagement. That translates into money.
Now I can see it can make for a fun party game, but that they seriously go after it, when their game should be leading models to do serious work ... is not a great sign to me.
Not only it has the slot-machine like addiction factor, it's going to make lots of money and it will take off very quickly.
All OpenAI has to do is to make the video generation much much faster.
Why should a commercial enterprise that has had billions of investments have benefits outside of earning money? Besides the entertainment value that the masses get from making and viewing these, of course.
I'm just curious if such thing is possible.
Revealed preferences. Keep giving the people exactly what they want (not what they claim to want), in unlimited quantities, until the message is received or we're all dead.
Please note that I’m not necessarily commenting on whether the existence of AI generated video is good or bad for our society, because I think it’s pretty moot what we think about it. It’s not going to just go away even if the majority of people here at HN or in general feel that it’s problematic.
The thread replies show what deadbeats Hacker News users are.
The worlds knowledge base could be turned to video.
Like Khan academy but rather than old out of date videos instant upgrades from rerunning / correcting a prompt.
In practice it's not good enough to do this but no more so than other applications. It could be an interesting first iteration. /end steelmaned.
Do they care at all how societally terrible thing it is? They found an addictive way of retaining traffic, and they will be holding to it.
The other is possibly there’s no point in a thousand users all turning up to a blank prompt box and using a lot of resources to generate the same thing, or things they are not impressed by. A lot of users will ‘get what they came for’ initially just by seeing a bunch of good examples. Discussions around them will help them produce better outputs faster. Etc
Although we can tell they are inaccurate, what percentage of people can visualize the prompts better in their mind’s eyes? I bet a substantial number can’t even tell the clips are generated if posted without context.
In a few aspects, these world models are already pretty close to what we have in our brains.
Doesn't mean OpenAI can't do other stuff with it as well.
No need to guess; In the article they say that the purpose:
We first started playing with this “upload yourself” feature several months ago on the Sora team, and we all had a blast with it. It kind of felt like a natural evolution of communication—from text messages to emojis to voice notes to this.
So today, we’re launching a new social iOS app just called “Sora,” powered by Sora 2. Inside the app, you can create, remix each other’s generations, discover new videos in a customizable Sora feed, and bring yourself or your friends in via cameos. With cameos, you can drop yourself straight into any Sora scene with remarkable fidelity after a short one-time video-and-audio recording in the app to verify your identity and capture your likeness.
Makes sense. I hate it, but the timing is probably good for them to try. There's going to be a mass exodus from TikTok in the US at some point, and those people will land somewhere.
That's my first impression too after seeing the screenshots of the sora app.
> I don't have the privilege to think everything ain't political
Basically proper working persistence of the scene.
For example, I'm working with a walking and talking character at this time using multiple AI video models and systems. Generated clips any length longer than 8 seconds risk rapid quality loss, but sometimes you can get up to 12-19 seconds without the generation breaking down. That means one needs to simulate a multiple camera shoot on a stage, so you can cut around the character(s) and create a longer sequence. But now you need to have multiple views of the same location to place your character(s) into - and current AI models can't reliably give you a "different angled views" of an environment. We just got consistent different views of characters, and it'll be another period until environments can be generally examined from any view. BUT, that's if people realize this is not in the models yet, and so far people are so fascinated by the fantasy violence and sexual content they can make nobody realizes you cannot simply "look left and right" in any of these models and that even works with consistency or reliability. There are workarounds, like creating one's entire set and environments in 3D models, for use as the backgrounds and starting frames, but that's now 3D media production + AI, and none of the AI tools generate media that even has alpha channels, and a lot of similar incompatibilities like that.
I think OpenAI is actually doing a great job at easing people into these new technologies. It's not such a huge leap in capabilities that it's shocking, and it helps people acclimate for what's coming. This version is still limited but you can tell that in another generation or two it's going to break through some major capabilities threshold.
To give a comparison: in the LLM model space, the big capabilities threshold event for me came with the release of Gemini 2.5 Pro. The models before that were good in various ways, but that was the first model that felt truly magical.
From a creative perspective, it would be ideal if you could first generate a fixed set of assets, locations, and objects, which are then combined and used to bring multiple scenes to life while providing stronger continuity guarantees.
This is a truly _wild_ way to describe "this version isn't much better than the previous one". Would you say "Apple's latest iPhone is a pretty small marginal improvement over the previous one, but it's useful to help peopel to acclimate for what's coming".
How much are they (and providers of similar tools) going to be able to keep anyone from putting anyone else in a video, shown doing and saying whatever the tool user wants?
Will some only protect politicians and celebrities? Will the less-famous/less-powerful of us be harassed, defamed, exploited, scammed, etc.?
"Consent-based likeness. Our goal is to place you in control of your likeness end-to-end with Sora. We have guardrails intended to ensure that your audio and image likeness are used with your consent, via cameos. Only you decide who can use your cameo, and you can revoke access at any time. We also take measures to block depictions of public figures (except those using the cameos feature, of course). Videos that include your cameo—including drafts created by other users—are always visible to you. This lets you easily review and delete (and, if needed, report) any videos featuring your cameo. We also apply extra safety guardrails to any video with a cameo, and you can even set preferences for how your cameo behaves—for example, requesting that it always wears a fedora."
How do we prepare for this? Societal adjustment only (e.g., disbelieving defamatory video, accepting what pervs will do)? Establishing a common base of cultural expectations for conduct? Increasing deterrence for abusers?
Until you have 2 people that are near identical. They don’t even have to be twins, there are plenty of examples where people can’t even tell other people apart. How is an AI going to do it?
You don’t own your likeness. It’s not intellectual property. It’s a constantly changing representation of a biological being. It can’t even be absolutely defined— it’s always subject to the way in which it was captured. Does a person own their likeness for all time? Or only their current likeness? What about more abstract representations of their likeness?
The can of worms OpenAI is opening by going down this path is wild. We’re not current able to solve such a complex issue. We can’t even distinguish robots from humans on the internet.
If Deepfakes remain the tools of nation state actors, laypeople will be easily fooled.
If Deepfakes are available on your iPhone and within TikTok, everyone will just ask "Is it Photoshop?" for every shred of doubt. (In fact, I already see people saying, "This looks like AI".)
This is good. Normalize the magic until it isn't magic anymore.
People will get it. They're smart. They just need exposure.
I really doubt this.
If you are in the creative field, your work will just be reduced to "is this slop?" or "fixed it!" with a low effort AI generated work of your original work (fuck copyright right?).
I already see artists battling and fighting putting out their best non AI work only for their audience to question if it is real and they lose the impressiveness.
This just already undermines creators who don't use AI generated stuff.
But who cares about them right? "it is the future" and it is most definitely AGI for them.
But then again, the starving artist never really made any money and this ensures that the artform stays dead.
It's either this, or the opposite (eg, misinformation needs to be censored). Seems like we as a society can't quite make up our mind on which approach to take.
Let me guess, the ultimate market will be teenagers "creating" a Skibidi Toilet and cheap TikTok propaganda videos which promote Gazan ocean front properties.
One use case I'm really excited about is simply making animated sprites and rotational transformations of artwork using these videogen models, but unlike with local open models, they never seem to expose things like depth estimation output heads, aspect ratio alteration, or other things that would actually make these useful tools beyond shortform content generation.
> We are giving users the tools and optionality to be in control of what they see on the feed. Using OpenAI's existing large language models, we have developed a new class of recommender algorithms that can be instructed through natural language. We also have built-in mechanisms to periodically poll users on their wellbeing and proactively give them the option to adjust their feed.
So, nothing? I can see this being generated and then reposted to TikTok, Meta, etc for likes and engagement.
Procedural generation is a known quantity in gaming, with well-explored pros and cons.
I'm also hopeful we sort out the problems with big tech eventually. I was initially against it, but I'm starting to think Australia's plan to ban under 16s from social media is actually a very good idea.
It's technically impressive, but all so very soulless.
When everything fake feels real, will everything real feel fake?
Sam looks weirdly like Cillian Murphy in Oppenheimer in some shots. I wonder whether there was dataset bleedover from that.
What am I looking at that's super technically impressive here? The clips look nice, but from one cut to the next there's a lot of obvious differences (usually in the background, sometimes in the foreground).
How many hours a week are you actively using AI tools yourself?
What percentage of public comments that you’ve made about AI tools have been skeptical or critical?
2 or 3. Mostly LLMs to check code.
> What percentage of public comments that you’ve made about AI tools have been skeptical or critical?
Probably around 90%.
So sell me. Why is this super impressive? I'm happy to admit that I'm pretty pessimistic about AI.
I have an eye for continuity issues, they are pretty obvious to me. Am I just too focused on that sort of a thing?
If you're already a heavy user of AI tools, you've seen or used previous generations already. So it's just a gradual improvement, nothing to get excited about.
Just like smartphones have been incredibly boring in the last 10 years because the only change has been "slightly more performance" or "marginally thinner".
Their ultimate goal is physical AGI, although it wouldn’t hurt them if the social network takes off as well.
Tangentially related: it's wild to me that people heading such consequential projects have so little life experience. It's all exuberance and shiny things, zero consideration of the impacts and consequences. First Meta with "Vibes", now this.
1: https://www.gurufocus.com/news/3124829/openai-plans-to-launc...
“OpenAI’s New Sora Video Generator to Require Copyright Holders to Opt Out”
https://www.wsj.com/tech/ai/openais-new-sora-video-generator...
And Reuters covered their coverage minus the paywall:
https://www.reuters.com/technology/openais-new-sora-video-ge...
What do you mean by life experience here and how can you tell they have little of it?
There would for sure be large swathes of people who would just lie about what they're doing and use AI to make it seem like they're skateboarding, or skiing or whatever at a pro or semi-pro level and have a lot of people watch it.
My boss sends me complete AI Workslop made with these tools and he goes "Look how wild this is! This is the future" or sends me a youtube video with less than a thousand views of a guy who created UGC with Telegram and point and click tools.
I don't ever think he ever takes a beat, looks at the end product, and asks himself, "who is this for? Who even wants this?", and that's aside from the fact that I still think there are so many obvious tells with this content that make you know right away that it is AI.
Don't think it's going to end here at some slop feed.
The final target of these "world models" on a 20 year horizon is entirely unmanned factories taking over the economy, and swarm of drones and robots fighting wars and policing citizens.
This is why hundreds of billions are poured into these things, cute Ghibli style videos and vacuum robots wouldn't be worth this much money otherwise.
There are arguably more jobs today as a result of computers than there were before they were invented. So why is the assumption that AI will magically delete all jobs while discounting the fact that it will create careers we haven’t even thought of?
For now AI is deleting many of the jobs the computer created.
The reality is we will more likely end up in a society where wealth/power at the very top will grow and the masses will be controlled by AI.
How is this not entirely obvious to everyone that this is the future? Could be 20, 50, 100 years, but coming for sure.
I think that in a vacuum you could reasonably believe that this might be the case but I feel like it isn't just about the technology these days, it's about the hunger c-suites and tech companies have for replacing workforce with ai and/or automation. It's quite clear that layoffs and mass adoption of AI/automation raises shareholder value so there is no incentive to create new jobs.
Will there be an organic shift away from Tech/IT/Computers into new fields? It might, but I think it's a bit naive to think that this will be proportionate to the careers AI will make redundant when there is such a big focus on eliminating as much jobs as possible in lieu of AI.
Haha. The current wave of “careers we couldn’t think of” that tech companies have created include being Uber/Doordash/Amazon delivery drivers, data labelers for training AIs, moderator to prevent horrific content spreading on social networks,… with way weaker social benefits & protections than the blue collar jobs of old they replaced.
So yeah, I have a hard time buying this fantasy of everyone doing some magical fulfilling work while AI does all the ugly work, especially when every executive out there is plainly stating that their ideal outcome is replacing 90% of their workforce with AI.
With the way things are headed, AI will take over large economic niches, and humans will fill in at the edges doing the grimy things AI can’t do, with ever diminishing social mobility and safety nets while AI company executives become trillionaires.
But what's coming is: Vision-language-action models and planning, spatial AI (SLAM with semantics and 3D reconstruction with interactability and affordance detection). Video diffusion models, photo-to-gaussian-splats, video-to-3D (e.g. from Hunyuan), the whole DUSt3R/VGGT line of works, V-JEPA 2 etc. Or if you want product names, Gemini Robotics 1.5, Genie 3, etc. The field is progressing incredibly fast. Humanoid robots are progressing fast. Robotic hands with haptic sensors are more dexterous than ever. It's starting to work. We are only seeing the first glimpses of course.
(Unless it's sci-fi and porn that is mainly pushing for human shaped robots.)
Facebook has become the cringe how-do-you-do-fellow-kids uncle that Microsoft was since the 1990s
Try Jeff Goldblum in The Fly! I just re-watched and the computer he uses is scarily close to our experiences now with AI. In fact, the entire "accident" (I won't spoil it) is a result of the "AI" deciding what to do and getting it wildly wrong.
> How the FUCK does Sora 2 have such a perfect memory of this Cyberpunk side mission that it knows the map location, biome/terrain, vehicle design, voices, and even the name of the gang you're fighting for, all without being prompted for any of those specifics??
> Sora basically got two details wrong, which is that the Basilisk tank doesn't have wheels (it hovers) and Panam is inside the tank rather than on the turret. I suppose there's a fair amount of video tutorials for this mission scattered around the internet, but still––it's a SIDE mission!
Everyone already assumed that Sora was trained on YouTube, but "generate gameplay of Cyberpunk 2077 with the Basilisk Tank and Panam" would have generated incoherent slop in most other image/video models, not verbatim gameplay footage that is consistent.
For reference, this is what you get when you give the same prompt to Veo 3 Fast (trained by the company that owns YouTube): https://x.com/minimaxir/status/1973192357559542169
Doesn't this already answer your question...? "Let's Play" type videos and streams have been a thing for years now, even for more obscure games. It very well could've been trained on Cyberpunk videos of that mission.
I still maintain that’s the kernel it’s getting it from. It’s impressive, I’m just not really shocked by it as a concept.
If I start a new chat it works.
I'm a Plus subscriber and didn't hit rate limits.
This video gen tool will probably be even more useless.
It'll all be rather funny in retrospect.
But if we find it drifts further and further from the truth in cases of biases in news articles, image generation and others we will find ourselves bombarded with historical deviances where everyone can be nudged to anything.
All in the name of safety.
Until these ai capabilities are as neutral and un-discriminatory as electricity, centralized production means centralized control and policies. Imagine if you are not allowed to use your electricity to power some appliances, because the owner of the power-plant feels it's not conducive to their agenda.
https://medium.com/@joe.richardson.iii/the-curious-case-of-t... https://medium.com/@joe.richardson.iii/openai-slaps-a-band-a...
- What's the problem?
- I think you know what the problem is just as well as I do.
This is classic OpenAI heavy-handed censorship/filtering. Don't expect it to get any better; if anything, it'll get worse thanks to the "think of the children" types.
If you want an uncensored model that doesn't patronize you then your only recourse are local models, which, fortunately, are pretty good nowadays and are only getting better thanks to our Chinese friends constantly releasing a stream of freely-licensed models for everyone to use, unlike the "freedom loving" Western labs which don't release squat and make even Xi Jinping blush with how strongly they censor whatever they let us lowly plebs access through a paywalled API.
Also I find it neat that they still include an iOSMath bundle (in chatGPT too), makes me wonder how good their models really are at math.
Yes.
> counterbalancing that with the (potential) savings
No. It's all about personalization. Even with all the money in the world you couldn't sit a filming crew, VFX specialist, foley artist, and voice actors next to every user of your app, ready to produce new content in 60 seconds.
I don't get why this keeps being framed as a labor thing, it's unlocking genuinely new forms of interactive media.
> I don't get why this keeps being framed as a labor thing
It's inextricably linked with labour. That doesn't mean that labour is only factor but it's an important one nonetheless.
And no, labor is not a factor in the way you tried to frame it.
There is absolutely no one tying up $250,000 in GPUs to let users spit out a funny clip of Sam Altman jumping over a chair because they think that's a smart way a way to get out of paying artists.
Because it directly impacts people's ability to earn a living. If you truly don't understand this, I think you should spend some time talking to people who are impacted by it. Artists, and so on. Seriously, this is a head-in-the-sand take.
I build gen AI for entertainment: I don't build to replace anyone, and if my product gets eyeballs existing creators can't, it's because it gives the consumer something they wanted to see in the world.
Past that you're just complaining that consumers don't want what you made.
However having said that, the intention/aim need not be to deliberately replace creatives. That is not the claim I am making or that anyone in this thread or general public discourse is claiming. The minimal claim is simply that the commodification of art decreases the value and employability of people who perform the same task as the AI. It is also not limited to artists. It is all-encompassing. If an employer can now use AI instead of a copywriter, they will often do that -- big and small business alike. The same can be said for many niche fields which previously required specialized education or training.
I am not saying this from an Anti-AI perspective. I own an AI startup.
But use your big boy brain. It is clear that the commodification of intelligence has a downward market pressure on the market value (which is the PERCEIVED value of an employee in the eyes of the EMPLOYER) for many, if not eventually ALL jobs/roles.
The purpose of the society was never to pay money to people, and if we figure out how to get grains of sand to replace skilled labor, there's no amount of greed that can outpace what it will do for humanity.
Biggest problem OpenAI has is not having an immense data backbone like Meta/Google/MSFT has. I think this is step in that direction -- create a data moat which in turn will help them make better models.
Can it do Will Smith eating spaghetti? (I can't get access in UK)
Ever since the launch of Veo, there's already so much AI slop videos on YouTube that it becomes hard to find real videos sometimes.
I'm tired, boss.
I saw some promnise with the Segment Anything model but I haven't seen anyone yet turn it into a motion solver. In fact I'm not sure if can do that at all. It may be that we need to use an AI algorithm to translate the video into a more simple rendition (colored dots representing the original motion) that can then be tracked more traditionally.
Here's to hoping that the industry will adapt to have it aid animators for in-betweening and other things that supplement production. Anime studios are infamously terrible with overworking their employees, so I legitimately see benefits coming from this tool if devs can get it to function as proper frame interpolation (where animators do the keyframes themselves and the model in-betweens).
dvngnt_•4mo ago