However of course OpenAI will ignore this and at worst nothing will change and at best they get a slap on the wrist and a fine and continue scraping.
You can’t take that stuff out of the models at this point anyway.
But realistically, all that will happen is that the "Pauschalabgabe" is extended to AI subscriptions, making stuff more expensive for everyone.
Soon music industry will be begging OpenAI for exposure of their content, just like the media industry is begging Google for scraping.
I guess the main difference between the situation with language models and humans is one of scale.
I think the question should be viewed like this, if I as a corporation do the same thing but just with humans, would it be legal or not. Given a hypothetical of hiring a bunch of people, having them read a bunch of lyrics, and then having them answer questions about lyrics. If no law prohibits the hypothetical with people, then I don't see why it should be prohibited with language models, and if it is prohibited with people, then there should be no specific AI ruling needed.
All this being said, Europe is rapidly becoming even more irrelevant than it was, living of the largess of the US and China, it's like some uncontacted tribe ruling that satellites can't take areal photos of them. It's all good and well, just irrelevant. I guess Germany can always go the route of North Korea if they want.
I think the difference here is that your example is what a search engine might do, whereas AI is taking the lyrics, using them to create new lyrics, and then passing them off as its own.
Is this not something every single creative person ever has done? Is this not what creating is? We take in the world, and then create something based on that.
If you sell tickets to an event where you read the lyrics aloud, it's commercial performance and you need to pay the author. (Usually a cover artist would be singing, but that's not a requirement.)
So it's not like a human can recite the lyrics anywhere freely either.
If they hire me primarily to recite lyrics, then sure, that would probably be some manner of infringement if I don't license them. But I feel like the case with a language model is much more the former than the latter.
But then with the analogy, if I'm a secretary and the copyright holder of lyrics calls me and asks if I know the lyrics of one of their songs, I don't think it's infringement to say yes and then repeat it back to them.
The LLM is not publicising anything, it's just doing what you ask it to do, it's the humans using it publicising the output.
With all major models not basically trained on nearly all available data, beyond the financial AI bubble about to burst there’s also a big content bubble that’s about exhausted as folks are just pumping out slop vs producing original creative human output. That may be the ultimate long term tragedy of the present AI hype cycle. Expect “made by a human” to soon be a tag associated with premium brands and customer experiences.
I think people would still produce original things as long they have the means for doing it. I guess we could say it is our nature. My fear is AI monopolizing the wealth that once would go to support people producing art.
I went to a grammar school and I write in mostly pretty high-quality sentences with a bit of British English colloquialism. I spell well, spend time thinking about what I am saying and try to speak clearly, etc.
I've always tried to be kind about people making errors but I am currently retraining my mind to see spelling mistakes and grammar errors as inherent authenticity. Because one thing ChatGPT and its ilk cannot do -- I guess architecturally —- is act convincingly like those who misspell, accidentally coin new eggcorns, accidentally use malapropisms, or use novel but terrible grammar.
And you're right: IMO the rage against the cultural damage AI will do is only just beginning, and I don't think people have clocked on to the fact that economic havoc is built-in, success or failure.
The web/AI/software-tech industry will be loathed even more than it is now (and this loathing is increasingly justified)
Just wait a few more years until the majority of ChatGPT training data is filled with misspellings, accidental eggcorns, malapropisms and terrible grammar.
That, and AI slop itself.
Tastes will mature, society will more vocally mock this crap, and we’ll stop seeing the sloppier stuff come out of reputable locations.
Plastic/synthetics are the slop of the physical world. They're a side product of extracting oil and gas so they're extremely cheap.
Yet if you look at synthetics by volume, probably 99% of them are used just because they're cheaper than the natural alternative. Yes, some have characteristics that are novel, but by and large everything we do with plastics is ultimately based on "they're cheaper".
Plastics, unfortunately, aren't going away.
But I'd be surprised if that was generally the case. It's easy to see why ChatGPT 1:1 reproducing a song's lyrics would be a copyright issue. But creating a derivative work based on the song?
What if I made a website that counts the number of alliterations in certain songs' lyrics? Would that be copyright infringement, because my algorithm uses the original lyrics to derive its output?
If this ruling really applied to any alogrithm deriving content from copyright protected works, it would be pretty absurd.
But absurd copyright laws would be nothing new, so I won't discount the possibility.
Did I suggest either of those things?
They're not saying no LLMs, they're saying no LLMs using lyrics without a license. OpenAI simply need to pay for a license, or train an LLM without using lyrics.
They already "filter" the code to prevent it from happening (reproducing exact works). My guess it is just superficially changing things around so it is harder to prove copyright violations.
> I don’t think a country’s government can justify no commercial LLMs to its populace.
Counter-argument: can any country's government justify allowing its population and businesses to become completely dependent on an overseas company which does not comply with its laws? (For Americans, think "China" in this case)
It doesn't appear that modern LLMs are really that hard to build, expensive perhaps, but if you have monopoly on a large enough market, price isn't really your main concern.
That's not how laws and regulations work in European or even EU countries. Courts/the legal system in Germany can not set legal precedents for other countries, and countries don't use legal precedents from other countries, as they obviously have different laws. It could be cited as an authority, but no one is obligated to follow that.
What could happen for example, would be that EU law is interpreted through the CJEU (Court of Justice of the European Union), and its rulings bind EU member states, but that's outside of what individual countries do.
Sidenote, I'm not a English native speaker, but I think it's "precedent", not "precedence", similar words but the first one is specifically what I think you meant.
I think you're right, also not native English speaker.
No, you're right that a German can't influence e.g. the similar lawsuit against Suno in Denmark, but as you point out, it can, and most likely will be cited, and I think it's often the case that this carries a lot of weight.
yes, even if just looking at other court cases in Germany the role of precedent is "in general" not quite as powerful (as Courts are supposed to follow what the law says not what other courts say). To be clear this is quite a bit oversimplified. Other court ruling does still matter in practice, especially if it is from higher courts. But it's very different to how it is commonly presented to work in the US (can't say if it actually works that way).
but also EU member states do synchronize the general working of many laws to make a unified marked practically possible and this does include the general way copy right works (by implementing different country specific laws which all follow the same general framework, so details can differ)
and the parts which are the same are pretty clear about that
- if you distribute a copy of something it's a copy right violation no matter the technical details
a human memorizing the content and then reproducing it would still make it a copy right infringement, so it should be pretty obvious that this applies to LLMs to, where you potentially could even argue that it's not just "memorizing it" but storing it compressed and a bit lossy....
and that honestly isn't just the case in the Germany, or the EU, the main reason AI companies got mostly away with it so far is due to judges being pressured to rule leniently as "it's the future of humanity", "the country wouldn't be able to compete" etc. etc. Or in other words corruption (as politicians are supposed to change laws if things change not tell judges to not do their job properly).
second it probably would be good for the EU and even US as it would de-monopolize the market a bit before that becomes fully impossible
God I can only hope
https://www.digitaltrends.com/social-media/rap-genius-deserv... (2013)
Long ago the first site I remember to do this was lyrics.ch, which was long since shut down by litigation. I'm not endorsing the status quo here, but if the licensing system exists it is obviously unfair to exempt parties from it simply because they're too big to comply.
E.g. why offering lame chat agents as a service, when you can keep the value generation in-house. E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation. Just cut off the end users/public form the model access, and flood the market with AI generated apps/content/works yourself (or with selected partners). Then have a lawyer checking right before publishing.
So this court decision may turn everything worse? I don't know.
If there was a lot of gold to find they wouldn't sell the shovels.
Of course, maybe OpenAI et al should have get a license before training on the lyrics or to avoid training on copyrighted content. But the first would be expensive and the latter would require them to develop actual intelligence.
Same goes for websites where you can watch piracy streams. "The action is the user pressing play" sounds like it might win you an internet argument, but I'm 99% sure none of the courts will play those games, you as the operator who enabled whatever the user could do ends up liable.
At the very least, the users being liable instead of OpenAI makes no sense. Like arresting only drug users and not dealers.
If that was case then Google wouldn't receive DMCA takedown of piracy links, instead offer up users searching for piracy content. Former is more prevalent than latter because one, it requires invasion of privacy - you have to serve up everyone's search results
two, it requires understanding of intent.
Same is the issue here. OpenAI then needs to share all chats for courts to shift through and second, how to judge intent. If someone asks for a German pop song and OpenAI decides to output Bochum - whose fault is that?
Everyone knows that these LLMs were trained on copyrighted material, and as a next-token prediction model, LLMs are strongly inclined to reproduce text they were trained on.
I don't think that we will end up here with such a scenario: lyrics are pervasive and probably also quoted in a lot of other publications. Furthermore, it's not just about lyrics but one can make a similar argument about any published literary work. GEMA is for music but for literary publications there is VG Wort who in fact already have an AI license.
I rather think that OpenAI will license the works from GEMA instead. Ultimately this will be beneficial for the likes of OpenAI because it can serve as a means to keep out the small players. I'm sure that GEMA won't talk to the smaller startups in the field about licensing.
Is this good for the average musician/author? these organizations will probably distribute most of the money to the most popular ones, even though AI models benefit from quantity of content instead of popularity.
https://www.vgwort.de/veroeffentlichungen/aenderung-der-wahr...
HeinzStuckeIt•1h ago
The result was mostly comical, the commentaries for vacuous pop music all sounded more or less the same: “‘Shake Your Booty’ by KC and the Sunshine Band expresses the importance of letting one’s hair down and letting loose. The song communicates to listeners how liberating it is to gyrate one’s posterior and dance.” Definitely one of the first signs that this new tech was not going to be good for the web.