That is, be able to prove a) that their models were actually trained on the data they claim, b) that they have consent to use said data for AI training, and c) that this consent was given by the actual author or with the author's consent.
I want platforms like soundcloud, youtube, etc. to be required to actually send out an e-mail to all of its users "hey we will be using your content for AI training, please click here to give permission".
Again, I think we should require companies to get the user to actively give their consent to these things. Platforms are free to lock or terminate accounts that don't, but they shouldn't be allowed to steal content because someone didn't read an e-mail.
Wouldn’t sites like YouTube already have a license to make money off your content anyway? This might be a little out of date but it notes that even though you own the material you upload to YouTube, by uploading it you grant them a license to make money off it, sub-license it to others for commerical gain, make derivative works etc. IANAL but this suggests to me that if you upload it to YouTube, YouTube can license it to OpenAI without needing to inform you or get additional consent. [0]
[0]: https://www.theguardian.com/money/2012/dec/20/who-owns-conte...
We can’t ignore the ethical cost of how AI is being developed - especially when it relies on taking other people’s work without permission. Many of today’s most powerful AI systems were trained on vast datasets filled with human-made content: art, writing, music, code, and more. Much of it was used without consent, credit, or compensation. This isn’t conjecture - it’s been thoroughly documented.
That approach isn’t just legally murky - it’s ethically indefensible. We cannot build the future on a foundation of stolen labor and creativity. Artists, writers, musicians, and other creators deserve both recognition and fair compensation. No matter how impactful the tools become, we cannot accept theft as a business model.
https://arstechnica.com/tech-policy/2025/02/meta-torrented-o...
Mistral waves hello. They're alive and well, and competing well.
Also, while the AI Act and copyright are handled at the EU level, I always get the impression that anyone talking about a "EU government" simply doesn't understand the EU. If you think Germans or Slovaks are rooting for Mistral just because they're European you'd be wrong - they'd be more accepting of it, maybe, due to higher trust in them respecting privacy and related rights, but that's.
That's a funny example since broadcasters have to pay a fee to say "The Super Bowl" in the first place. If they don't, they have to use some euphemism like "the big game."
The answer is definitely no. You cannot use something that you don't have a license for unless it belongs to you.
(For what it's worth to, Claude disagrees and claims that news organizations ARE allowed to use the term Super Bowl, but companies that aren't official sponsors can't use it in their ads. But Claude is not a lawyer so <shrug>)
So in practice, no, it shouldn't. Not because that information itself is bad, but because it probably isn't limited to just that answer.
In summary, I think it is definitely a problem when:
1. The model is trained on a certain type of intellectual property 2. The model is then asked to produce content of the same type 3. The authors of the training data did not consent
And slightly less so, but still questionable when instead:
2. The IP becomes an integral part of the new product
which, arguably, is the case for any and all AI training data; individually you could take any of them out and not much would happen, but remove them all and the entire product is gone.
I want "please mail back this physical form, signed".
It's way too easy with dark-patterns to make people inadvertently click buttons. Or to pretend that people did.
For example, pretty much everyone agrees that the current copyright regime that allows large corporations to hold copyrights on vast libraries of content for near perpetuity isn't what's best for society, but it has earned a lot of companies and a few people a lot of money with which they can influence politics, and so it remains.
Now, it seems that there is a lot of money to be made through training AI on these vast libraries of material, and the companies making that money can use it to influence politics to allow it.
There is of course, the remote possibility that policy on this topic is going to be formed by rich and famous people persuading people of the rightness of their cause, but I don't have a lot of high hopes on that outcome -- and the even remoter chance that the much larger number of "creatives" on the lower manage to band together and lobby over it.
This whole conversation is basically just a public negotiation between corporations on who gets to make the most money from "content" that they didn't create.
It is very clear that a huge portion of AI's value is specifically the destruction of incentive to the original creator of the work, ergo courts will over time find that AI will have to pay for that right. Or courts will have to decide we know longer value music, art, literature, etc. made by humans -- seems like a long shot to me.
Just because copyright was made with good intentions does not mean it's actually helping. It's pretty clear that copyright is used almost entirely to hoard wealth.
That's not really different from what I said. "What society values" is what people spend money on, and that money is used to influence policy.
> Or courts will have to decide we know longer value music, art, literature, etc. made by humans -- seems like a long shot to me.
The courts only interpret the law and it's pretty clear to me the law is going to change.
AI companies haven't just been scraping, they've been pirating. I think it's important because people consent to put their comments on websites, but authors don't consent to have their work stolen. Mixing the two together as if they are the same is an attempt by AI companies to muddy the ethical waters a bit more than I'm comfortable with.
Because I consented to a comment being posted on a website to be read by people does not mean I consented to it being used to train LLMs.
For example, lots of TOS make the works of users property of the platform owners. All of that can stop being relevant if that practice becomes illegal.
People consented to put their comments publicly, they didn't consented to feeding a machine that steals their ideas. One ruling establishing that the wording on the terms is not good enough, and all of that can fall.
Enforcement would be kind of impossible though. How would you prove the age of the data? And what about for things like websites - if a website from 2015 gets scraped in 2025, where does that fall?
So, maybe not a very practical solution. But I think the ideal outcome is one where AI companies don't have to throw away progress from the past few years, but artists have control over their participation going forward. It's hard to say if there's any solution that can respect both.
More likely than not, one party is going to be shafted; and it's probably not going to be the one with the money behind it.
The vibe that all creative work is based on something from earlier doesn't hold up in court. Breaking copyright isn't legal, it's theft.
Isn't this the key?
If AI can learn how to write a song like the Beatles, is it a crime that it has learnt it, or is it a crime that someone can use AI to produce work that resembles those artists' creations?
May the control be put on preventing AI from plagiarizing instead of putting measures to prevent scraping specific content from the open web.
If I were a musician, I could pull upon my life's experience of music to create new, unheard before, music in any style I chose.
My music would belong to me because I created it.
How is it different for an "innocent" AI who does exactly the same thing?
It's very different.
Then please bear with me and extend the comparison just a tiny bit.
Replace the AI with an advanced species from Mars. Or even better yet an augmented human ... a trans-human if you will.
Both the martian have kids, families, feelings, etc... AND both can absorb the entire internet in , say , a week or two. They are musicians and create new music from their internet experience.
Does their music belong to them?
Do they have to pay rights to all the authors they listened to so far on the internet?
And finally ... what is the difference between my martian and a human today?
This article isn't even a discussion on AI copyright ownership. It's about training.
What if an evil robot race came along and begin draining your Martians of all their history, expressions, and entertainment. Would the Martians not be within their rights to go "hey, don't do that"?
Parent edited rather than responded:
Your "martians" already did pay for the rights to access the music (streaming services, cds, etc)! And if it's highly derivative they'll continue to have to pay (royalties). OpenAI/Meta/Anthropic did not pay, that's the inherent issue. They took when they should not have had the right to access, and trained against it.
The question is not the difference between your mythical Martians and humans, its difference between your Martians and AI. Your martians, like humans, have lived experience. They have emotions, fears, beliefs. They have family, children, loved ones. This all affects how they would express themselves. An "AI" has none of this.
AI also doesn't have life experience to pull from.
Who does the music that AI creates belong to?
An "innocent" AI cannot and does not have any of that. To say all a musician does to create new music is "listen to [existing] music" is reductive and ignores the inherently human expression that is music.
You would also only hear the songs you had discovered or had been introduced to, at the speed they were meant to be listened to. The music would also often be part of a scene that might include different underlying philosophies or associated fashion.
Your new music would be the outcome of all of those individual musical experiences, coupled with your creative and technical ability.
None of that is true for AI. It has just harvested everything without anyone’s permission and now makes artless slop from it for anyone who wants it 24/7.
Or, consider an analogy: how would you feel if some machine could use DNA from strands of hair to clone entire humans? Wouldn't it feel wrong?
- a pre-trained model: DNA
- and a prompt: envinronment/input
Is that much different from AI?
Fight for big copyright, or let it all be scraped?
It's not just about famous works now. We live in a gray area era. Every post on social media, tap on a keyboard, click, can be used to train AI.
Who owns that intelligence? It feels like AI products should acknowledge the names of billions of contributors. Common people, feeding the machine, mostly unknowingly.
So, copyright could be an unusual ally. It's often a villain for common folk, but again, we live in a gray area era regarding those rights. Copyright has a precedent with pedigree. Most scraped content is not famous songs and movies, it's random internet stuff.
We need to remember we want AI, but we also want good songs and movies for entertainment, and we also want to have our lesser works protected (internet banal stuff). There is a fine line connecting all that stuff somewhere.
He still sucks up to the copyright lobby.
Pathetic old man (
superkuh•3h ago
surfingdino•3h ago
gloxkiqcza•2h ago
AI just raises the level of abstraction and therefore the capabilities of an individual.
evrimoztamur•3h ago
superkuh•2h ago
As for my qualifications, I've trained a very small (~0.8B) LLM myself on 3GB of IRC logs. So I kind of know what I'm doing and have a basic understanding of the theory involved and the pragmatic issues.
sorcerer-mar•2h ago
evrimoztamur•2h ago
hu3•1h ago
to be fair that's not a bastion of quality either.
evrimoztamur•1h ago
My up to date list of recognised certificate authorities in one of the two major smartphone OSs does not validate his certificate. This is not a matter of 'quality' of Safari, or iOS.
cush•2h ago
SirFatty•2h ago
superkuh•2h ago
I do admit there are many quite informed and technically capable professional musicians but I doubt they'd support a law like this.
mrkeen•2h ago
They are demanding to be informed.
cosmotic•2h ago
kstrauser•2h ago
pantulis•2h ago
9283409232•2h ago
DarkWiiPlayer•2h ago
imgabe•2h ago
prophesi•2h ago
But they're up against multi-billion dollar companies, and the richest man in the world appointed by the US president to dismantle federal oversight.
bilbo0s•2h ago
In four years there will be yet another president. Probably even from a different party the way things are going. They know the system swings back and forth like that. These people have been working that system for over a hundred years now.
How do you think they got d@mn near a century IP protection on a fricken mouse? Who’s in power doesn’t matter to these people.
prophesi•57m ago
sp527•2h ago
People like you can only see copyright infringement when it's blatantly staring you in the face, like Studio Ghibli style AI images. Why is it obvious in that case? Because we have a large enough preexisting sample of Studio Ghibli style frames from anime to make it obvious.
But move closer to zero-shot generation, which anyone with a modicum of knowledge on this subject understands was the directional impetus for generative AI development, and the monkey brain short circuits and suddenly creatives who want to protect their art can go fuck themselves, because money.
You may not find common cause with multi-millionaire artists trying to protect their work now, but you certainly will in hindsight if the only fiscally-sustainable engines of creativity left to us in the future are slop-vomiting AI models.
detectivestory•2h ago
amelius•2h ago
Except the mechanical loom folks are dependent on the IP of the weavers.
In other words, not a good analogy.
TiredOfLife•1h ago