OpenAI may not use lyrics without license, German court rules

https://www.reuters.com/world/german-court-sides-with-plaintiff-copyright-case-against-openai-2025-11-11/

224•aiz0Houp•2mo ago

Comments

HeinzStuckeIt•2mo ago

Lyrics produced some of the first AI slop I noticed after ChatGPT was launched in late 2022, even if the large models hadn’t been trained on them specifically. Overnight there were a bunch of different advertising-laden sites that clearly scraped Genius or other lyric websites, and then had GPT generate commentaries on what the lyrics supposedly mean, so that these would get picked up by search engines.

The result was mostly comical, the commentaries for vacuous pop music all sounded more or less the same: “‘Shake Your Booty’ by KC and the Sunshine Band expresses the importance of letting one’s hair down and letting loose. The song communicates to listeners how liberating it is to gyrate one’s posterior and dance.” Definitely one of the first signs that this new tech was not going to be good for the web.

portaouflop•2mo ago

It would be so hilarious if GEMA was actually useful for once and not a detriment to society and artists in general.

However of course OpenAI will ignore this and at worst nothing will change and at best they get a slap on the wrist and a fine and continue scraping.

You can’t take that stuff out of the models at this point anyway.

jstummbillig•2mo ago

I made a living from GEMA payments some while back, but dear lord, so much of how the institution does what it does feels so bad and zero-sum. Might just be that the world would be better off without it. It does something important for right holders for sure, but (and I understand, I am heavily back-seating here without offering a solution) there must be better ways to go about it.

shadyKeystrokes•2mo ago

Now, without the fimförderung all those grim dark arthouse movies where people yell "Scheisse!" in Berlin stairwells would never be made. And all that public gremium pleasing shovelware, looking extracute and boring clogging up the appstores with zero sales, what would we do without that. Take anything popular streamingwise and ask yourself would it get through and by. And if it was stopped by what and who.. fire that, to fix germanys media sector.

HotHotLava•2mo ago

It'd be equally hilarious if that VC money would be used to actually better society by crushing GEMA in court.

But realistically, all that will happen is that the "Pauschalabgabe" is extended to AI subscriptions, making stuff more expensive for everyone.

portaouflop•2mo ago

Damn I didn’t even consider the second part…

riazrizvi•2mo ago

Nah. It’s so easy for OpenAI to modify their output. I’m already seeing them restrict news article re-generation by newspaper name. They do it to reduce liability. There’s also a big copyright infringement case coming up in the USA this year, and being able to point to responsiveness to complaints will be a key part of their legal defense I bet.

portaouflop•2mo ago

You can modify the output but the underlying model is always susceptible to jail breaks. A method I tried a couple months ago to reliably get it to explain to me how to cook meth step by step still works. I’m not gonna share it, you just have to take my word on this.

riazrizvi•2mo ago

I believe you, but you only need to establish a safety standard where jailbreaking is required by the end-user to show you are protecting property in good faith, AFAIK.

randomNumber7•2mo ago

Why is this so problematic? You can read all this stuff in old papers and patents that are available in the web.

And if you are not capable to do this you will likely not succeed with the chatgpt instructions.

portaouflop•2mo ago

I’m not saying it’s not possible to get this information elsewhere - but it’s impossible to prevent ChatGPT from telling you how to do illegal stuff; something that the model explicitly should not be able to according to its makers

hastamelo•2mo ago

Member when music sites were suing YouTube for music videos, and now they are begging people to watch them there and YT view counts are a bragging topic?

Soon music industry will be begging OpenAI for exposure of their content, just like the media industry is begging Google for scraping.

Lionga•2mo ago

Youtube pays the music owner. OpenAI can never pay as even with stealing content they still manage to loose 5 dollars for every dollar they make.

ayhanfuat•2mo ago

That's exactly the difference between using with or without license.

flanked-evergl•2mo ago

Of course the models are not human, but if you consider this situation as if they are persons, then the question becomes: May a person read lyrics and tell it to someone when asked, and the court's ruling basically says no, this may not happen, which makes little sense.

I guess the main difference between the situation with language models and humans is one of scale.

I think the question should be viewed like this, if I as a corporation do the same thing but just with humans, would it be legal or not. Given a hypothetical of hiring a bunch of people, having them read a bunch of lyrics, and then having them answer questions about lyrics. If no law prohibits the hypothetical with people, then I don't see why it should be prohibited with language models, and if it is prohibited with people, then there should be no specific AI ruling needed.

All this being said, Europe is rapidly becoming even more irrelevant than it was, living of the largess of the US and China, it's like some uncontacted tribe ruling that satellites can't take areal photos of them. It's all good and well, just irrelevant. I guess Germany can always go the route of North Korea if they want.

Steve16384•2mo ago

> May a person read lyrics and tell it to someone when asked, and the court's ruling basically says no, this may not happen, which makes little sense.

I think the difference here is that your example is what a search engine might do, whereas AI is taking the lyrics, using them to create new lyrics, and then passing them off as its own.

flanked-evergl•2mo ago

> whereas AI is taking the lyrics, using them to create new lyrics, and then passing them off as its own.

Is this not something every single creative person ever has done? Is this not what creating is? We take in the world, and then create something based on that.

pavlov•2mo ago

> "May a person read lyrics and tell it to someone when asked"

If you sell tickets to an event where you read the lyrics aloud, it's commercial performance and you need to pay the author. (Usually a cover artist would be singing, but that's not a requirement.)

So it's not like a human can recite the lyrics anywhere freely either.

hugh-avherald•2mo ago

You don't even have to sell tickets: if it's a free concert, copyright is likely infringed. This is likely true in all jurisdictions.

flanked-evergl•2mo ago

If someone hires me as a secretary, and they ask me what is the lyrics of a song, there is no law that prohibits me from telling them if I know and I don't have to license the lyrics in order to do so.

If they hire me primarily to recite lyrics, then sure, that would probably be some manner of infringement if I don't license them. But I feel like the case with a language model is much more the former than the latter.

Attrecomet•2mo ago

As soon as you take the LLM output and publicize it, it turns around and is a lot more akin to having your secretary read out the lyrics publicly. If you don't publicize it in any way, how would the copyright holder ever find out?

flanked-evergl•2mo ago

But the LLM is not advertised as a lyrics DB, and it in no way guarantees that it will reproduce the lyrics accurately, and similarly the copyright holder will never know that it's reproducing the lyrics unless it snoops on my conversations with it, or go ask it directly.

But then with the analogy, if I'm a secretary and the copyright holder of lyrics calls me and asks if I know the lyrics of one of their songs, I don't think it's infringement to say yes and then repeat it back to them.

The LLM is not publicising anything, it's just doing what you ask it to do, it's the humans using it publicising the output.

JCM9•2mo ago

With AI slop showing up everywhere, there’s a real danger that folks will just no longer be motivated to produce real original content.

With all major models not basically trained on nearly all available data, beyond the financial AI bubble about to burst there’s also a big content bubble that’s about exhausted as folks are just pumping out slop vs producing original creative human output. That may be the ultimate long term tragedy of the present AI hype cycle. Expect “made by a human” to soon be a tag associated with premium brands and customer experiences.

sixeyes•2mo ago

I will not stop writing music or drawing my furry bullshit, no matter the culture climate around me. Don't get your hopes up ;3

philipwhiuk•2mo ago

When you're the only one doing it, you'll have a large impact on model generation

gabrielgio•2mo ago

> With AI slop showing up everywhere, there’s a real danger that folks will just no longer be motivated to produce real original content.

I think people would still produce original things as long they have the means for doing it. I guess we could say it is our nature. My fear is AI monopolizing the wealth that once would go to support people producing art.

JohnFen•2mo ago

This. I still produce original things and will continue to do so until I am incapable anymore. What's changed, though, is that I no longer put or discuss those things on the open internet because there's no realistic way to prevent it from getting used to train genAI models.

exasperaited•2mo ago

> Expect “made by a human” to soon be a tag associated with premium brands and customer experiences.

I went to a grammar school and I write in mostly pretty high-quality sentences with a bit of British English colloquialism. I spell well, spend time thinking about what I am saying and try to speak clearly, etc.

I've always tried to be kind about people making errors but I am currently retraining my mind to see spelling mistakes and grammar errors as inherent authenticity. Because one thing ChatGPT and its ilk cannot do -- I guess architecturally —- is act convincingly like those who misspell, accidentally coin new eggcorns, accidentally use malapropisms, or use novel but terrible grammar.

And you're right: IMO the rage against the cultural damage AI will do is only just beginning, and I don't think people have clocked on to the fact that economic havoc is built-in, success or failure.

The web/AI/software-tech industry will be loathed even more than it is now (and this loathing is increasingly justified)

gorbachev•2mo ago

> one thing ChatGPT and its ilk cannot do -- I guess architecturally —- is act convincingly like those who misspell, accidentally coin new eggcorns, accidentally use malapropisms, or use novel but terrible grammar

Just wait a few more years until the majority of ChatGPT training data is filled with misspellings, accidental eggcorns, malapropisms and terrible grammar.

That, and AI slop itself.

riazrizvi•2mo ago

AI slop is like 90’s websites and desktop publishing - there’s a novelty for AI-newbie-creators driving them to churn out lazy crap, while being oblivious to how it lands with strangers.

Tastes will mature, society will more vocally mock this crap, and we’ll stop seeing the sloppier stuff come out of reputable locations.

HeinzStuckeIt•2mo ago

You assume that the public recognizes AI slop for what it is. Across platforms now, people are readily engaging with blatant AI text posts and generated images as if they are bona-fide. In fact, if you point out that the poster is a bot, you may well well get some flack from the community.

mock-possum•2mo ago

People are already upset over that ‘walk my walk’ song on the country music charts

oblio•2mo ago

We already have this in the physical world.

Plastic/synthetics are the slop of the physical world. They're a side product of extracting oil and gas so they're extremely cheap.

Yet if you look at synthetics by volume, probably 99% of them are used just because they're cheaper than the natural alternative. Yes, some have characteristics that are novel, but by and large everything we do with plastics is ultimately based on "they're cheaper".

Plastics, unfortunately, aren't going away.

TiredOfLife•2mo ago

> With AI slop showing up everywhere, there’s a real danger that folks will just no longer be motivated to produce real original content.

BBC truly was ahead of times with their deletion of tv shows.

Levitz•2mo ago

It is of no cost to me when someone else writes a book, plays a song or draws a picture. It is also true that, basically whatever I ever do, someone else has done better. This does not stop me from doing those things because the value within them is in doing them.

We have cars, buses and planes, yet people do partake in pilgrimages. The process matters, even if only personally.

mock-possum•2mo ago

> folks will just no longer be motivated to produce real original content.

Honestly if your only motivation for creating art was “computers can’t do what I do” then… I don’t want to be too gatekeepy about it, but that doesn’t sound like you’re a ‘real’ artist to me. Real artists create art because they enjoy doing it, not because it’s the exclusive domain of humans.

You don’t need to be special, you don’t need to be the best, you don’t need to even be good or successful or recognized or appreciated (although of course all those things are nice) - you just have to be creating art.

petesergeant•2mo ago

I am curious what happens if they call their bluff on this and cut off ChatGPT in Germany. Not that I think OpenAI is doing the right thing, just, I don’t think a country’s government can justify no commercial LLMs to its populace.

gmerc•2mo ago

In curious why you think the rule of law is a bluff.

burnished•2mo ago

Probably pattern recognition

petesergeant•2mo ago

I come from the country with the world’s oldest continuous parliament, and they change the law all the time. Arguably that’s all the majority of politicians do.

aniviacat•2mo ago

Claude and Gemini would become more popular.

pavlov•2mo ago

There are many competing providers of commercial LLMs with equal capabilities, so another vendor would probably be happy to serve a leading Western market of 83 million people.

petesergeant•2mo ago

Yeah? Which commercial provider’s model do you think was trained without using lyrics?

aniviacat•2mo ago

I would imagine providers who want to comply will scan the LLM's output and pay a license fee to the owner if it contains lyrics.

petesergeant•2mo ago

They scan for commercial work already. Isn’t the law about training, not output?

aniviacat•2mo ago

Perhaps; I didn't read the court ruling.

But I'd be surprised if that was generally the case. It's easy to see why ChatGPT 1:1 reproducing a song's lyrics would be a copyright issue. But creating a derivative work based on the song?

What if I made a website that counts the number of alliterations in certain songs' lyrics? Would that be copyright infringement, because my algorithm uses the original lyrics to derive its output?

If this ruling really applied to any alogrithm deriving content from copyright protected works, it would be pretty absurd.

But absurd copyright laws would be nothing new, so I won't discount the possibility.

dathinab•2mo ago

> But creating a derivative work based on the song?

1. it wouldn't matter as derivative work still needs the original license

2. expect if it's not derivative but just inspired,

and the court case was about it being pretty much _the same work_

OpenAIs defense also wasn't that it's derived or inspired but, to quote

> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.

and the court oder said more or less

- if it can reproduce the song lyrics it means it stored a copy of the song lyrics somehow somewhere (memorization), but storing copies requires a license and OpenAI has no license

- it it outputs a copy of the song lyrics it means it's making another copy of them and giving them to the user which is copyright infringement

and this makes sens, if a human memorizes a song and then writes it down when asked it's still is and always has been copyright infringement (else you could just launder copy right by hiring people to memorize things and then write them down, which would be ridiculous).

and technically speaking LLMs are at the core a lossy compressed storage of their training content + statistic models about them. And to be clear that isn't some absurd around five corners reasoning. It's a pretty core aspect of their design. And to be clear this are things well know even before LLMs became a big deal and OpenAI got huge investment. OpenAI pretty much knew about this being a problem from the get to go. But like any recent big US "startup" following the law doesn't matter.

it technically being a unusual form of lossy compressed storage means it makes that the memorization counts as a copyright infringement (with current law)

but I would argue the law should be improved in that case, so that under some circumstances "memorization" in LLMs is treated as "memorization" in Humans (i.e. not a illegal copy, until you make it one by writing it down). But you can't make it all circumstances because like mentioned you can use the same tech to bascially to lossy file compression and you don't want people to launder copy right by training an LLM on a a single text/song/movie and then distributing that...

knollimar•2mo ago

That seems like a really broad interpretation of "technically memorization" that could have unintended side effects (like say banning equations that could be used to generate specific lyrics), but I suppose some countries consider loading into RAM a copy already. I guess we're already at absurdity

cycomanic•2mo ago

> but I suppose some countries consider loading into RAM a copy already. I guess we're already at absurdity

FYI most do. Have a look at many software licenses. In particular Microsoft (who as we know invested lots into OpenAI), will argue it is so.

I would also say it makes sense. If it wasn't the case we can just load a program into lots of computers using only a single license/installation medium.

knollimar•2mo ago

I think it's absurd. In my opinion the copy is for copying the usable part (e.g. installation).

Is running a program making a copy? If I run it on some distributed system is it then making more copies than allowed? This gets insane quickly.

I think it's just a bandaid for fixing removable drive installations. These should have had their own laws/rules/etc.

It has knock-on effects like being able to enforce other IP law to someone you just licensed your software to.

Similarly I think this is more an "interpret words to get the desired outcome instead of the likely spirit or meaning of the words".

dathinab•2mo ago

It _really_ isn't absurd.

The law doesn't care what technical trickery you use to encode/compress copyrighted material. If you take data and then create a equation which contains it based on it it which can reproduce the data trivially then yes, IMHO obviously, this form of embedding copyrighted data still is embedding copyrighted data.

Think about it if that weren't the case I could just transform a video into an equation system and then distribute the latest movies, books, whatever to everyone without permission and without violating copy right even through de-facto I'm doing exactly what copy right law is supposed to prevent... (1)

Just because you come up with a clever technical trick to encode copyrighted content doesn't mean you can launder/circumvent copyright law, or any law at that. Law mostly doesn't care about technical tricks but the outcomes.

Maybe even more importantly LLMs under hood the are basically at the core compression systems where by not giving them enough entropy to store information you force to generalize and with that happen to create a illusion of sentience.

E.g. what is the simplest case of training a transformer? You put in data to create the transformer state (which has much smaller entropy) and then output it from that state and then you find a "transformation" where this works as well as possible for a huge amount of different data. That is a compression algorithm!!! And sure in reality it's more complex you don't train to compress a specific input but more like a dictionary of "expected" input->output mappings where the output parts need to be fully embedded i.e. memorized in the algorithm in some form.

LLMs are basically obscure multi layered hyper dimensional lossy compression systems which compress a simple input->output mapping (i.e. database) defined by all entries in it's training data. A compressed mapping Which due to forcing a limited entropy needs to do compression through generalization....

And since when is compression allowing you to avoid copyright??

So if you want it to be handled differently by law because it's isn't used as a compressed database you have to special case it in law.

But it is used as a compressed database, in that case e.g. it was used to look up lyrics based on some clues. That's basically a lookup in a lossy compressed obscure database system no matter how you would normally think about LLMs.

(1): And in case it's not clear this doesn't mean every RNG is a violation because under some unknown seed it probably would reproduce copyrighted content. Because the RNG wasn't written "based on" the copy righted content.

knollimar•2mo ago

In regards to "Because the RNG wasn't written "based on" the copy righted content."

Does that mean I can distribute the seed if I find one and this RNG wasn't trained on that content?

Does it prevent me from sharing that number on the internet?

It seems like theres a lot of subjective intent here that I'm extremely skeptical

For an LLM also:

If it's lossy enough that it needs RAG to fix the results is that okay?

-------------------

In my opinion I think actually getting the output is where the infringement happens. Having and distributing the LLM weights shouldn't be infringment (in my head) because of the enforcability of results. Otherwise you risk banning RNGs or them all being forced to prove they didn't train on copyrighted content

dathinab•2mo ago

> If it's lossy enough that it needs RAG to fix the results is that okay?

but then the only way RAG can "fix" the result is if the RAG system stored the song text in it's vector data base

in which case the law case and solutions to fix the issue are much more clear

in a certain way a a LLM which only encodes language but now knowledge and then uses RAG and similar is the most desirable (not just for copyright reasons but also e.g. update-ability, traceability, remove-ability of misinformation etc.)

sadly AFIK it doesn't work as language and knowledge details are too much interleaved

> Does that mean I can distribute the seed if I find one and this RNG wasn't trained on that content?

honestly I think this falls outside of situations copyright law considers. But also if you consider that copyright law mostly doesn't care about technical implementation details and that the "spirit of law" (intent of law maker) matters if unclear cases I think I also have a best guess answer:

Neither the RNG nor the seed by them self are a copyright violation but if you spread them with the intend to spread non licensed copy you still do a copyright violation and in that context the seed might be idk. taken down from sharing sites even if by itself it isn't a copyright violation.

The thing is in the end you can transform _any_ digital content into

- "just a number"

- or "just a equation", "equation system" etc.

- or an image, matrix, graph, human readable text , or pretty much anything

so fundamentally you can't have a clean cut between what can and can't be a copyright violation

which is why it matters so much that law acts on a higher abstraction level then what exactly technical happens.

And why intent of law (in gray area cases) matters so much.

And why law really shouldn't be a declarative definition of strict mathematics rules.

freejazz•2mo ago

>But creating a derivative work based on the song?

You need a license to create derivative works.

Semaphor•2mo ago

No, it’s specifically about (mostly) verbatim producing big chunks of lyrics in the output. The court PR specifically mentioned memorization, retaining training data, multiple times.

dathinab•2mo ago

they clearly didn't do that properly, or we wouldn't have the current law suite

the lawsuit was also not about weather it is or isn't copy right infringement. It was about who is responsible (OpenAI or the user who tries to bait it into making another illegal copy of song lyrics).

A model outputting song lyrics means it has it stored somehow somewhere. Just because the storage is in a lossy compressed obscure hyper dimensional transformation of some kind, doesn't mean it didn't store an illegal copy. Or it wouldn't have been able to output it. _Technical details do not protect from legal responsibilities (in general)_

you could (maybe should) add new laws which in some form treat LLM memorized things the same as if a human did memorize it, but currently LLMs have no special legal treatment when it comes to them storing copies of things.

pavlov•2mo ago

The point is that some other vendor will do the work to implement the filtering required by Germany even if OpenAI doesn't.

beezlewax•2mo ago

This assumes that tech companies can act above the law because they've got a new feature to jam down our throats. Have you considered that not everyone wants that? Or that it might not be the best thing?

petesergeant•2mo ago

> Have you considered that not everyone wants that? Or that it might not be the best thing?

Did I suggest either of those things?

barrucadu•2mo ago

> I don’t think a country’s government can justify no commercial LLMs to its populace

They're not saying no LLMs, they're saying no LLMs using lyrics without a license. OpenAI simply need to pay for a license, or train an LLM without using lyrics.

Myrmornis•2mo ago

But lyrics are just one example. Are you saying that training experiments must filter out all substrings from the training input that bear too close a resemblance to a substring of a copyrighted work?

barrucadu•2mo ago

Obviously there's a limit, reproducing a single sentence is unlikely to be copyright infringement just because there are only so many words in a language; but if reproducing some text would be copyright infringement if a human did it, I don't see why LLM companies should get a free pass.

If it's really essential that they train their models on song lyrics, or books, or movie scripts, or articles, or whatever, they should pay license fees.

freejazz•2mo ago

At some point, use of the lyrics becomes de minimis

luke5441•2mo ago

This obviously applies to all copyrighted works. I could sue OpenAI when it reproduces my source code that I published on the Internet.

They already "filter" the code to prevent it from happening (reproducing exact works). My guess it is just superficially changing things around so it is harder to prove copyright violations.

akersten•2mo ago

Oi, you got a loisense to read those words and then repeat them back to me when asked?

barrucadu•2mo ago

I take it you think copyright shouldn't exist at all, then?

akersten•2mo ago

That is a separate opinion, but with respect to the question at hand, the utilitarian value of being able to ask a computer "what are the lyrics to x" and having it produce them outweighs whatever small ideological sanctity the music labels assign to being able to gatekeep the written words of a composition to a small blessed few. It's not like chat gpt is serving up the mp3 file to you. So correct, it is insane to me that mere reproduction of just the lyrics is afforded such weighty copy protection.

(Vis a vis, I take it you write a certified letter to Universal before reproducing Happy Birthday in public? ;) That is actually a far more egregious violation indeed, as it is both a performance of the copyrighted work and in front of an audience - neither of which are the case for the chatbot - yet one we all seem to understand to be fair use.

pjc50•2mo ago

Conversely, last week we had Spain being willing to cut off Cloudflare (!) to protect football match royalties.

> I don’t think a country’s government can justify no commercial LLMs to its populace.

Counter-argument: can any country's government justify allowing its population and businesses to become completely dependent on an overseas company which does not comply with its laws? (For Americans, think "China" in this case)

mrweasel•2mo ago

There are 80 million Germans. If you where OpenAI, or it's shareholders, would you leave that market open for a competitor? No, you'd make a version of your product without the lyrics. More EU countries are going to follow and reach the same conclusion, especially now that Germany has set a legal precedence. Should OpenAI just pull out of a market with 500 million people and leave it to Claude, Perplexity or someone else entirely?

It doesn't appear that modern LLMs are really that hard to build, expensive perhaps, but if you have monopoly on a large enough market, price isn't really your main concern.

embedding-shape•2mo ago

> More EU countries are going to follow and reach the same conclusion, especially now that Germany has set a legal precedence.

That's not how laws and regulations work in European or even EU countries. Courts/the legal system in Germany can not set legal precedents for other countries, and countries don't use legal precedents from other countries, as they obviously have different laws. It could be cited as an authority, but no one is obligated to follow that.

What could happen for example, would be that EU law is interpreted through the CJEU (Court of Justice of the European Union), and its rulings bind EU member states, but that's outside of what individual countries do.

Sidenote, I'm not a English native speaker, but I think it's "precedent", not "precedence", similar words but the first one is specifically what I think you meant.

mrweasel•2mo ago

> I think it's "precedent", not "precedence",

I think you're right, also not native English speaker.

No, you're right that a German can't influence e.g. the similar lawsuit against Suno in Denmark, but as you point out, it can, and most likely will be cited, and I think it's often the case that this carries a lot of weight.

dathinab•2mo ago

> That's not how laws and regulations work in European or even EU countries

yes, even if just looking at other court cases in Germany the role of precedent is "in general" not quite as powerful (as Courts are supposed to follow what the law says not what other courts say). To be clear this is quite a bit oversimplified. Other court ruling does still matter in practice, especially if it is from higher courts. But it's very different to how it is commonly presented to work in the US (can't say if it actually works that way).

but also EU member states do synchronize the general working of many laws to make a unified marked practically possible and this does include the general way copy right works (by implementing different country specific laws which all follow the same general framework, so details can differ)

and the parts which are the same are pretty clear about that

- if you distribute a copy of something it's a copy right violation no matter the technical details

a human memorizing the content and then reproducing it would still make it a copy right infringement, so it should be pretty obvious that this applies to LLMs to, where you potentially could even argue that it's not just "memorizing it" but storing it compressed and a bit lossy....

and that honestly isn't just the case in the Germany, or the EU, the main reason AI companies got mostly away with it so far is due to judges being pressured to rule leniently as "it's the future of humanity", "the country wouldn't be able to compete" etc. etc. Or in other words corruption (as politicians are supposed to change laws if things change not tell judges to not do their job properly).

tremon•2mo ago

countries don't use legal precedents from other countries, as they obviously have different laws

The seminal authority for all copyright laws, the Berne Convention, is ratified by 181 countries. Its latest revisions are TRIPS (concerning authorship of music recordings) and the WIPO Copyright Treaty (concerning digital publication), both of which are ratified by the European Union as a whole. It's not directly obvious to me that EU member states have different laws in this particular area.

That said, the EU uses the civil law model and precedent doesn't quite have the same weight here as it does under common law.

freejazz•2mo ago

US copyright law originates in the constitution and the US does not follow a number of elements of the Berne convention, such as moral rights.

freejazz•2mo ago

>That's not how laws and regulations work in European or even EU countries. Courts/the legal system in Germany can not set legal precedents for other countries, and countries don't use legal precedents from other countries, as they obviously have different laws. It could be cited as an authority, but no one is obligated to follow that.

Do you have some sort of different understanding of copyright law where it's legal to commercially use lyrics (verbatim, mind you) without a license?

shagie•2mo ago

> Do you have some sort of different understanding of copyright law where it's legal to commercially use lyrics (verbatim, mind you) without a license?

Some places have a concept of de minimus as applied to copyright. It is often not prosecuted to have an acoustic guitar and an open case and play music on a park bench. You may need a license for busking in some places - but that's not tied to the music that you play (it could be your own or it could be covers).

I am not saying that it is legal, but rather that it is beneath the notice of the courts.

freejazz•2mo ago

De minimus is when you only use a small portion of the lyrics, so it is not considered infringement.

dathinab•2mo ago

first due to how the EU unified marked works they would have to cut it from all of the EU not just Germany

second it probably would be good for the EU and even US as it would de-monopolize the market a bit before that becomes fully impossible

lvncelot•2mo ago

> cut off ChatGPT in Germany

God I can only hope

techblueberry•2mo ago

German student performance will plateau, while all other countries slowly decline.

jeroenhd•2mo ago

AI is actively harming kids' abilities while inflating their grades when they make AI do their homework.

German student performance may plateau, but when student performance in other countries falls, that still leaves them in a better place.

mathieu4v•2mo ago

I found this bit very revealing:

> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.

Another glimpse into the "mind" of a tech corporation allowing itself full freedom to profit from the vast body of human work available online, while explicitly declining any societal responsibility at all. It's the user's fault, he wrote an illegal prompt! We're only providing the "technology"!

llbbdd•2mo ago

This is largely how it works for nearly all coprightable work. I can draw Mickey Mouse but legally I'm not doing anything wrong until I try to sell it. It certainly doesn't put Crayola or Adobe at legal risk for me to do so.

cycomanic•2mo ago

Not really, if I ask an artist to draw me a Mickey Mouse (for money) who is committing copyright infringement?

It's an interesting observation that the big AI corps very much argue that learning "is the same that humans do", so fair use. But then when it comes to using that learning they argue the other way, i.e. "this is just a machine, it's the person asking who is doing the infringement".

llbbdd•2mo ago

Companies care about material damages in practice. I'm not a lawyer but my understanding is that in that case, the artist drawing and selling the work is infringing (to a degree, because this seems to be a case Disney et al doesn't care about) but that if you take their work and publish and promote it and sell it, YOU become Disney's problem. If the wind and rain and erosion and time and God managed to produce a perfect post-Steamboat-Willie Mickey Mouse in the desert sand, visible from space, that wouldn't be infringement until you monetized it, called it Mickey Mouse and charged people to see it. A lot of the entities trying to get their piece of Infringement Pie seem to think their authority and their works are in the first position here instead; that my newfound capability to generate a Mickey Mouse from scratch on a whim affects their pockets, when in fact we're back to a variant of the classic piracy argument - I was not ever going to pay for it under any condition. If I decide this weekend to have one of the robots help me publish To Kill a Mockingbird Part 2, then sue me into the ground.

moontear•2mo ago

But you are not the one drawing Mickey Mouse in this scenario, are you? You are instructing the AI company to draw something or more close to the original post you are prompting to generate lyrics for song X.

Your prompt may be asking something for illegal (i.e. reproducing the lyrics), but the one reproducing the lyrics is the AI company, not you yourself.

In your example you are asking Adobe to draw Mickey Mouse and Adobe happily draws a perfect rendition of Mickey Mouse for you and you have to pay Adobe for that image.

llbbdd•2mo ago

This keeps coming up, and I am not a lawyer, but as far as I can tell none of that matters. I can pay someone to draw Mickey Mouse for me and hang it up in my house. If I invite people to visit my Mickey Mouse House and charge them for the privilege, I'm in violation. Maybe the artist I paid to draw the mouse is also in some smaller violation but it all comes back to distribution and impact. I don't think it devalues Mickey Mouse in any way if I have a slot machine that spits out pictures of Mickey Mouse. If it does devalue it, maybe it doesn't have much value to begin with.

Reproduction (again, IANAL) seems to consist of a lot more than "I made it", it consists of how you use it and whether that usage constitutes infringement.

EDIT: To add, genuine question, what does "asking" come down to? I can ask Photoshop to draw Mickey Mouse through a series of clever Mickey-Mouse-shaped brush strokes. I can ask Microsoft Word to reproduce lyrics by typing them in. At what gradient between those actions and text prompting am I (or OpenAI, or Adobe) committing copyright infringement?

moontear•2mo ago

Now I get where you are coming from (also not a lawyer):

- You asking the painter to create a Mickey Mouse painting: not illegal. You still are asking for a derivative work without permission, but if used privately you're good (this is different per jurisdiction) - The artist creating the painting of a derivative work is acting illegally - they are selling you the picture and hence this is a commercial act and trademark infringement - Displaying the bought Mickey Mouse image publicly is likely infringement, but worse is if you would charge admission to show the picture, that would definitely be illegal - If you were to hide the image in your basement and look at it privately, it would most likely not be illegal (private use - but see first point since this is different per jurisdiction)

Comparing violations doesn't really make sense (the artist creating it vs. you displaying it) - the act of creating the image for money is illegal. If it were the artist creating the image for him/herself - that would be fine.

Now getting back to the LLM and your question which also the court answered (jurisdiction: Germany). The courts opinion is that the AI recreating these lyrics by itself is illegal (think about the artist creating the image for you for money).

Personally I would think the key part and similarity is the payment. You pay for using OpenAI. You pay for it creating those lyrics/texts. In my head I can create a similar reasoning to your Mickey Mouse example. If we'd take open source LLMs and THEY would create perfect lyrics, I think the court would have a much harder case to make. Who would you be suing and for what kind of money? It would all be open source and nobody is paying anyone anything to recreate the lyrics. It would be and is very hard to prove that the LLMs were trained on copyrighted material - in the lyrics example, they may have ingested illegal lyrics-sharing sites, but they may also just have ingested Twitter or Reddit where people talk about the lyrics - how could any LLM know that these contents were illegal or not to be ingested.

skeptrune•2mo ago

*edit. Will this actually change OpenAI's behaviour to any meaningful extent?

dicknuckle•2mo ago

Does what a US court rules really matter?

skeptrune•2mo ago

Probably not for something like this honestly. I feel like it would just keep getting appealed up. But what do I know? I'm not an attorney.

pjc50•2mo ago

It does in Germany? And quite likely in the rest of the EU?

skeptrune•2mo ago

I guess. But I doubt openai will change its behaviour due to this.

pjc50•2mo ago

Do you think that the German courts will just shrug and accept noncompliance with a court order?

skeptrune•2mo ago

I just expect openai to suspend service to Germany such that Germans have to use a VPN.

mrweasel•2mo ago

Other countries are currently going through the same. KODA is running a similar lawsuit on behalf of the Danish musicians, they can now point to Germany as an example, making it much easier for them to win.

Iolaum•2mo ago

I m not sure about the problem here, lyrics are public you can search '$songname lyrics' and get the result in a website (or even at the search engine page). What's the issue with an LLM producing those lyrics if you ask?

pjc50•2mo ago

They aren't! They're subject to licensing!

https://www.digitaltrends.com/social-media/rap-genius-deserv... (2013)

Long ago the first site I remember to do this was lyrics.ch, which was long since shut down by litigation. I'm not endorsing the status quo here, but if the licensing system exists it is obviously unfair to exempt parties from it simply because they're too big to comply.

mrweasel•2mo ago

Just because you can find them freely online doesn't make them public in the legal sense. If that was the case music piracy would also be legal.

rmoriz•2mo ago

While I partially understand (but not support) the hate against AI due to possible plagiarism and "low effort generation" of works, think about the whole process: If model providers will be liable for generating output, that resembles lyrics or very short texts that fall under copyright laws, they will just change their business model.

E.g. why offering lame chat agents as a service, when you can keep the value generation in-house. E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation. Just cut off the end users/public form the model access, and flood the market with AI generated apps/content/works yourself (or with selected partners). Then have a lawyer checking right before publishing.

So this court decision may turn everything worse? I don't know.

inexcf•2mo ago

The fact they don't already do that, sounds to me like the things produced by AI are not worth the investment. Especially since the output is not copyrightable, right?

If there was a lot of gold to find they wouldn't sell the shovels.

wongarsu•2mo ago

There is a lot of value in specialization. It allows capitalism to do its magic to elevate the best uses of your technology without yourself taking on any of the risk. Trying to inhouse everything often smothers innovation and leads to bad resource allocation. It can be done, but in fields with a lot of ongoing innovation it's extremely hard to get right

There is a reason that Cisco doesn't offer websites, and you are probably actively ignoring whatever websites your ISP has. ASML isn't making chips, and TSMC isn't making chip designs

rmoriz•2mo ago

But think of the Apple approach. And while all cloud providers started with mainstream hardware, they evolved to proprietary systems. The current AI phase may just be the „good old days“ with access just limited by financial power paving to be cut down once the dust settles and some model vendors lose.

thisisit•2mo ago

If there is such an immense value in spinning off and selling models separately you can bet that will happen - without court saying so. At the end running these models is a costly job and you'd want to squeeze out every value.

dangus•2mo ago

This sounds like a much more niche product that doesn't justify the over half-trillion dollars invested into it so far.

For AI to have a positive ROI, it has to be highly applicable to basically every industry, and has to be highly available.

philipwhiuk•2mo ago

> Then have a lawyer checking right before publishing.

Your cheap app just got really expensive

whilenot-dev•2mo ago

> turn everything worse?

A media generation company that is forced to publish uncopyrightable works, because it cannot make the usage to these media generators public, since that would violate copyright - that does sound like a big win for everyone but that company.

How is that worse?

rmoriz•2mo ago

„Record companies“ without artists, but exclusive access to automated creation, selection and a working distribution.

whilenot-dev•2mo ago

Uncopyrightable works result in 0 royalties. How many record companies do you know that are sustainable without royalties?

friendzis•2mo ago

> why offering lame chat agents as a service

Because that's the only business model that the management of these model provider companies suspect to have a chance of generating income, at the current state.

> While I partially understand (but not support) the hate against AI due to possible plagiarism

There's no *possible* plagiarism, every AI slop IS result of plagiarism.

> E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation.

Having lame chat agents as a service does not preclude them from doing this. The fact that they are only selling the shovels should be somewhat insightful.

lifestyleguru•2mo ago

These people would stream German schlager to every screen and speaker in Europe and charge for it 100 EUR monthly per breathing person, if they could. They are violent.

YINN•2mo ago

I feel compelled to support banning AI from infringing on art, even though most pop songs are terrible.

tremon•2mo ago

"pop" music had its own avalanche of slop long before the advent of AI. Soulless reproductions and remixes of once-popular songs are everywhere.

cnqso•2mo ago

There's a major risk to being the market leader in a new, controversial technology. Look what happened to Juul

trollbridge•2mo ago

Highly additive nicotine formulations targeted at teens is not exactly “new technology”.

estebarb•2mo ago

However, the lyrics are shown because the user requested them, shouldn't be the user be liable instead? The same way social networks are not liable for content uploaded by users? I think here there is a somewhat double standard.

Of course, maybe OpenAI et al should have get a license before training on the lyrics or to avoid training on copyrighted content. But the first would be expensive and the latter would require them to develop actual intelligence.

embedding-shape•2mo ago

> However, the lyrics are shown because an action is the user so, shouldn't be the user be liable instead?

Same goes for websites where you can watch piracy streams. "The action is the user pressing play" sounds like it might win you an internet argument, but I'm 99% sure none of the courts will play those games, you as the operator who enabled whatever the user could do ends up liable.

estebarb•2mo ago

I think that is completely different. Piracy websites do only one thing. Chatbots are different.

My concern is that where are we going to put the line: If I type a copyrighted song in Word is Microsoft liable? If I upload a lyric to ChatGPT and ask it to analyze or translate it, is it a copyright violation?

I totally understand your line of thinking. However, the one I'm suggesting could be applied as well and it has precedents in law (intellectual authors of crimes are punishable, not only the perpetrators).

dpoloncsak•2mo ago

> I think that is completely different. Piracy websites do only one thing. Chatbots are different.

Well...YouTube is liable for any copyrighted material on their site, and do 'more than one thing'

estebarb•2mo ago

Not really. Youtube is not liable as long as they remove the content after a copyright complain and other mechanisms.

The problem is if OpenAI is liable for reproducing copyrighted content, so will be other products such as word processors, video editors and so on. So, as society where we will put the line?

Are we going to tolerate some copyright infringement in these tools or are we going to pursue copyright infringements even in other tools as we already got the tools to detect it?

We cannot have double standards, law should be applied equally to everyone.

I do think that overall making OpenAI liable for output is a bad precedent, because of repercusions beyond AI tools. I'm all fine with making them liable for having trained on copyrighted content and so on...

barrucadu•2mo ago

How does OpenAI being liable for reproducing copyrighted material imply that a word processor should be as well? Last time I checked, word processors don't have a black box text generator trained on pre-existing works: a word processor only has the text that the user types into it.

> Not really. Youtube is not liable as long as they remove the content after a copyright complain and other mechanisms.

They have to take action precisely because they're liable for the material on their platform.

hrimfaxi•2mo ago

Why should the user be liable? They didn't reproduce the copyrighted work and the machine is totally capable of denying output (like it already does for other categories of material).

At the very least, the users being liable instead of OpenAI makes no sense. Like arresting only drug users and not dealers.

estebarb•2mo ago

There are countries where drug consumption/posesion is penalized too. There is a similar example in other area: For instance, in Sweeden, Norway and Belize selling sex (aka prostitution) is legal, but buying it is not legal. So, your example actually exists in world legislation.

I'm just asking where are we going to put the line and why.

hrimfaxi•2mo ago

You had originally said the user should be liable instead of OpenAI being liable.

> However, the lyrics are shown because the user requested them, shouldn't be the user be liable instead?

I would imagine the sociological rationale for allowing sex work would not map to a multi-billion-dollar company.

And to add, the social network example doesn't map because the user is producing the content and sharing it with the network. In OpenAI's case, they are creating and distributing copyrighted works.

estebarb•2mo ago

No, the edited wording still conveys the same meaning. My edit was to fix another grammar typo.

The social networks are distributing such content AND benefiting from selling ads on them. Adding ads on top is a derivative work.

Personally I'm on the side of penalizing the side that provides the input, not the output:

- OpenAI training on copyrighted works. - Users requesting custom works based on copyrighted IP

That is my opinion on how it should be layered, that's it. I'm happy to discuss why it should be that way or why not. As I put in other comment, my concern is that mandating copyright filtering o each generative tool would end up propagating to every single digital tool, which as society we don't really want.

hrimfaxi•2mo ago

I am curious why you are of the opinion that the user should be in trouble for requesting the copyright material and not the provider of the material. I feel like there is a distinction in something that was local-first compared to a SaaS. Like a local AI model that reproduced copyrighted works for your own use might not be problematic compared to a remote model reproducing a copyrighted work and distributing it over the internet to you. Most jurisdictions treat remote access across jurisdictional boundaries differently than completely local acts.

thisisit•2mo ago

This is such a bad take.

If that was case then Google wouldn't receive DMCA takedown of piracy links, instead offer up users searching for piracy content. Former is more prevalent than latter because one, it requires invasion of privacy - you have to serve up everyone's search results

two, it requires understanding of intent.

Same is the issue here. OpenAI then needs to share all chats for courts to shift through and second, how to judge intent. If someone asks for a German pop song and OpenAI decides to output Bochum - whose fault is that?

loudmax•2mo ago

Simon Willison had an analysis of Claude's system prompt back in May. One of the things that stood out was the effort they put in to avoiding copyright infringement: https://simonwillison.net/2025/May/25/claude-4-system-prompt...

Everyone knows that these LLMs were trained on copyrighted material, and as a next-token prediction model, LLMs are strongly inclined to reproduce text they were trained on.

miltonlost•2mo ago

All AI companies know they're breaking the law. They all have prompts effectively saying "Don't show that we broke the law!". That we continue to have tech companies consistently breaking the law and nothing happens is an indictment of our current economy.

mock-possum•2mo ago

I don’t read this as “don’t show we broke the law,” I read it as “don’t give the user the false impression that there’s any legal issue with this generated content.”

There’s nothing law breaking about quoting publicly available information. Google isn’t breaking the law when it displays previews of indexed content returned by the search algorithm, and that’s clearly the approach being taken here.

Q6T46nT668w6i3m•2mo ago

Masked token prediction is reconstruction. It goes far beyond “quoting.”

lokar•2mo ago

The whole industry is based on breaking the law. You don’t get to be Microsoft, Google, Amazon, meta, etc without large amounts of illegality.

And the VC ecosystem and valuations are built around this assumption.

admaiora•2mo ago

And it's a question of do we accept breaking law for the possibility to have the greatest technological advancement of the 21st century. In my opinion, legal system has become a blocker for a lot of innovation, not only in AI but elsewhere as well.

saghm•2mo ago

Without agreeing or disagreeing with your view, I feel like the the issue the issue with that paradigm is inconsistency. If an individual "pirates", they get fines and possible jail time, but if a large enough company does it, they get rewarded by stockholders and at most a slap on the wrist by regulators. If as a society we've decided that the restrictions aren't beneficial, they should be lifted for everyone, not just ignored when convenient for large corporations. As it stands right now, the punishments are scaled inversely to the amount of damage that the one breaking the law actually is capable of doing.

rpdillon•2mo ago

This is a point that I don't see discussed enough. I think anthropic decided to purchase books in bulk, tear them apart to scan them, and then destroy those copies. And that's the only source of copyrighted material I've ever heard of that is actually legal to use for training LLMs.

Most LLMs were trained on vast troves of pirated copyrighted material. Folks point this out, but they don't ever talk about what the alternative was. The content industries, like music, movies, and books, have done nothing to research or make their works available for analysis and innovation, and have in fact fought industries that seek to do so tooth and nail.

Further, they use the narrative that people that pirate works are stealing from the artists, where the vast majority of money that a customer pays for a piece of copyrighted content goes to the publishing industry. This is essentially the definition of rent seeking.

Those industries essentially tried to stop innovation entirely, and they tried to use the law to do that (and still do). So, other companies innovated over the copyright holder's objections, and now we have to sort it out in the courts.

Q6T46nT668w6i3m•2mo ago

I don’t follow. You’re punishing the publishing industry by punishing authors?

rpdillon•2mo ago

I'm saying that LLMs are worthwhile useful tools, and that I'm glad that we built them, and that the publishing industry, which holds the copyright on the material that we would use to train the LLMs, have had no hand in developing them, have done no research, and have actively tried to fight the process at every turn. I have no sympathy for them.

The authors have been abused by the publishing industry for many decades. I think they're just caught in the middle, because they were never going to get a payday, whether from AI or selling books. I think the percentage of authors that are commercially successful is sub 1%.

cycomanic•2mo ago

So the argument is because LLMs are useful and the publishing industry was not involved in their creation we should disregard the property rights of the publishing industry and allow using their work without a license? By that same argument (if something useful is being build, we ignore existing rights) shouldn't not also just take the code/models from OpenAI etc. and just publish them somewhere? Why not also their datacenters?

rpdillon•2mo ago

It's not really an argument. It's an observation that they sat on their hands while other industries out-innovated them. They were complacent and now they're paying the price.

We have laws and rules, but those are intended to work for society. When they fail to do so, society routes around them. Copyright in particular has been getting steadily weaker in practice since the advent of the Internet, because the mechanisms it uses to extract value are increasingly impractical since they are rooted in the idea of printed media.

visarga•2mo ago

> So, other companies innovated over the copyright holder's objections, and now we have to sort it out in the courts.

I think they try to expand copyright from "protected expression" to "protected patterns and abstractions", or in other words "infringement without substantial similarity". Otherwise why would they sue AI companies? It makes no sense:

1. If I wanted a specific author, I would get the original works, it is easy. Even if I am cheap it is still much easier to pirate than use generative models. In fact AI is the worst infringement tool ever invented - it almost never reproduces faithfully, it is slow and expensive to use. Much more expensive than copying which is free, instant and makes perfect replicas.

2. If I wanted AI, it means I did not want the original, I wanted something Else. So why sue people who don't want the originals? The only reason to use AI is when you want to steer the process to generate something personalized. It is not to replace the original authors, if that is what I needed no amount of AI would be able to compare to the originals. If you look carefully almost all AI outputs get published in closed chat rooms, with a small fraction being shared online, and even then not in the same venues as the original authors. So the market substitution logic is flimsy.

sidewndr46•2mo ago

You're using the phrase "actually legal" when the ruling in fact meant it wasn't piracy after the change. Training on the shredded books was not piracy. Training on the books they downloaded was piracy. That is where the damages come from.

Nothing in the ruling says it is legal to start outputting and selling content based off the results of that training process.

gruez•2mo ago

>Nothing in the ruling says it is legal to start outputting and selling content based off the results of that training process.

Nothing says it's illegal, either. If anything the courts are leaning towards it being legal, assuming it's not trained on pirated materials.

>A federal judge dealt the case a mixed ruling in June, finding that training AI chatbots on copyrighted books wasn't illegal but that Anthropic wrongfully acquired millions of books through pirate websites.

https://www.npr.org/2025/09/05/g-s1-87367/anthropic-authors-...

rpdillon•2mo ago

I think your first paragraph is entirely congruent with my first two paragraphs.

Your second paragraph is not what I'm discussing right now, and was not ruled on in the case you're referring to. I fully expect that, generally speaking, infringement will be on the users of the AI, rather than the models themselves, when it all gets sorted out.

sidewndr46•2mo ago

I'm in agreement that it will be targeted at the users of AI as well. Once that prevails legally someone will try litigating against the users and the AI corporations as a common group.

1718627440•2mo ago

> Folks point this out, but they don't ever talk about what the alternative was.

That LLMs would be as expensively priced as they really are on society and energy costs? A lot of things are possible, whether they are economically feasible is determined by giving them a price. When that price doesn't reflect the real costs, society starts to wast work on weird things, like building large AI centers, because of a financial bubble. And yes putting people out of business does come with a cost.

"Innovation" is not an end goal.

rpdillon•2mo ago

Innovation is absolutely an end goal, at least in terms of our legal framework. The primary impetus for copyright and patent law is is innovation: to credit those that innovate their due, and I do think this stems from our society seeing innovation as an end goal. But the intent of the system is always different than its actual effect, and I'm fairly passionate about examining the shear.

I run my AI models locally, paying for the hardware and electricity myself, precisely to ensure the unit economics of the majority of my usage are something I can personallly support. I do use hosted models regularly, though not often these days, which is why I say "the majority of my usage".

In terms of the concerns you express, I'm simply not worried. Time will sort it out naturally.

Q6T46nT668w6i3m•2mo ago

You’re willing to eliminate the entire concept of intellectual property for a possibility something might be a technological advancement? If creators are the reason you believe this advancement can be achieved, are you willing to provide them the majority of the profits?

thedevilslawyer•2mo ago

That's an absolutely good tradeoff. There's no longer any need for copyright. Patents should go next. Only trademarks can stay.

delaminator•2mo ago

> There's no longer any need for copyright

So you assign zero value to the process of creation?

Zero value to the process of production?

So people who write and produce books, shows and films should all do what? Give up their craft?

thedevilslawyer•2mo ago

Creation isn't special, or constrained in number.

Process of creation itself is gratifying and valuable to those who will pursue it. No reason to additionally reward it.

Lamp lighters had to give up their craft I suppose and made way to a better world.

delaminator•2mo ago

> Creation isn't special, or constrained in number. > Process of creation itself is gratifying and valuable to those who will pursue it.

spoken like someone who has never made anything in the real world

Holding a boom mic in the air is not gratifying and valuable to anyone who has to do it.

The fruits of your labour are not your labour.

_DeadFred_•2mo ago

Bullshit. Read up and understand the history of these things and their benefits to society. There is a reason they were created in the first place. Over a very long time. With lots of thoughts into the tradeoff/benefits to society. That Disney fucked with it does not make the original tradeoff not a benefit to society.

thedevilslawyer•2mo ago

The fact that you don't actually call out the specific benefit is telling. We're in a world of plenty and don't need copyright to have those benefits for our fellow humans.

hulitu•2mo ago

> And it's a question of do we accept breaking law for the possibility to have the greatest technological advancement of the 21st century

You mean like, murder ?

blibble•2mo ago

and training on mountains of open source code with no attribution is exactly the same

the code models should also be banned, and all output they've generated subject to copyright infringement lawsuits

the sloppers (OpenAI, etc) may get away with it in the US, but the developed world has far more stringent copyright laws

and the countries that have massive industries based on copyright aren't about to let them evaporate for the benefit of a handful of US tech-bros

terminalshort•2mo ago

No thank you. I am perfectly fine with AI training on my open source code and it is perfectly legal because my open source code does not include a license that bans AI training.

blibble•2mo ago

which license is that then?

because other than public domain they all require at least displaying the license, which "AI" ignores

Workaccount2•2mo ago

Training on copyright is not illegal. Even in the lawsuit against anthropic it was found to be fair use.

Pirating material is a violation of copyright, which some labs have done, but that has nothing to do with training AI and everything to do with piracy.

dahart•2mo ago

There is US precedent for training being deemed not fair use. https://www.dglaw.com/court-rules-ai-training-on-copyrighted...

Why wouldn’t training be illegal? It’s illegal for me to acquire and watch movies or listen to songs without paying for them*. If consuming copyrighted material isn’t fair use, then it doesn’t make sense that AI training would be fair use.

* I hope it’s obvious but I feel compelled to qualify that, of course, I’m talking about downloading (for example torrenting) media, and not about borrowing from the library or being gifted a DVD, CD, book or whatever, and not listening/watching one time with friends. People have been successfully prosecuted for consuming copyrighted material, and that’s what I’m referring to.

terminalshort•2mo ago

That interpretation is not correct. The owner explicitly denied license to the data and then the company went to a third party to gain access to the data that they were denied license to.

> When building its tool, Ross sought to license Westlaw’s content as training data for its AI search engine. As the two are competitors, Thomson Reuters refused. Instead, Ross hired a third party, LegalEase, to provide training data in the form of “Bulk Memos,” which were created using Westlaw headnotes. Thomson Reuters’s suit followed, alleging that Ross had infringed upon its copyrighted Westlaw headnotes by using them to train the AI tool.

dahart•2mo ago

You’re contradicting the conclusion / interpretation written on dglaw.com? What is incorrect, exactly? It doesn’t seem like your summary challenges either my comment or the article I linked to, it’s not clear what you’re arguing. The court did find in this case that the use of the unlicensed data used for AI training was not fair use.

Workaccount2•2mo ago

The case isn't on LLMs or transformers, it's on using some other form of non generative AI to create an index of case law. The details are light, but I would guess that the "AI" was just copying over the data from Thomson Reuters.

boredhedgehog•2mo ago

> Training on copyright is not illegal.

The court decision this thread is about holds that it is, on the grounds that the training data was copied to the LLM's memory.

_DeadFred_•2mo ago

If my for profit/for sale product couldn't exist without inputting copyrighted works into it, then my product is derivative of those works. It's a pretty simple concept. No 'but human brains learn'. Humans aren't a corpo's for profit product.

'Would this product have the same value without the copyrighted works?'

If yes then it's not derivative. If no then it is.

terminalshort•2mo ago

This is incorrect. Two judges have now ruled that training on copyrighted data is fair use. https://www.whitecase.com/insight-alert/two-california-distr...

hulitu•2mo ago

You can always vote, but there is always someone going through the back door paying politicians and judges.

qustrolabe•2mo ago

post trained models strongly inclined to pass response similar to what got them high RL score, it's slightly wrong to keep thinking of LLMs as just next token predictions from dataset's probability distribution like it's some Markov Chain

est31•2mo ago

Another instance of GEMA fighting an american company. Anyone who was on the german internet in the first half of the last decade remembers the "not available in your country" error messages on youtube because Google didn't make a deal with GEMA.

I don't think that we will end up here with such a scenario: lyrics are pervasive and probably also quoted in a lot of other publications. Furthermore, it's not just about lyrics but one can make a similar argument about any published literary work. GEMA is for music but for literary publications there is VG Wort who in fact already have an AI license.

I rather think that OpenAI will license the works from GEMA instead. Ultimately this will be beneficial for the likes of OpenAI because it can serve as a means to keep out the small players. I'm sure that GEMA won't talk to the smaller startups in the field about licensing.

Is this good for the average musician/author? these organizations will probably distribute most of the money to the most popular ones, even though AI models benefit from quantity of content instead of popularity.

https://www.vgwort.de/veroeffentlichungen/aenderung-der-wahr...

EdgeExplorer•2mo ago

The obsession with protecting access to lyrics is one of the strangest long-running legal battles to me. I will skip tracks on Spotify sometimes specifically because there are no lyrics available. Easy access to lyrics is practically an advertisement for the music. Why do record companies not want lyrics freely available? In most cases, it means they aren't available at all. How is that a good business decision?

fosk•2mo ago

They probably fear a domino effect if they let go of this. And so they defend it vehemently to avoid setting a precedent.

Think about compositions, samples, performance rights, and so on. There is a lot more at stake.

DANmode•2mo ago

Hot take: it’s all bullshit.

Like software patents - when you’re not a normie.

ajuc•2mo ago

What's the benefit of protecting monetary IP rights to art?

We'll only get the art that artists really wanted to make? Great!

verzali•2mo ago

What's the benefit of getting paid for your work? We'll only get the work people really want to do? Great!

ajuc•2mo ago

Art existed before IP rights. Artists did get paid.

vkou•2mo ago

> What's the benefit of protecting monetary IP rights to art?

What's the benefit of protecting monetary IP rights to software?

What's the benefit of consolidating all meaningful access to computing services to a few trillion-dollar gate-keeping corpos?

freejazz•2mo ago

It's a good decision because it must be an incredible minority of people who only listen to music when the lyrics can be displayed. I'd imagine most people aren't even looking at the music playing app while listening to music. Regardless, they are copyrighted and they get license fees from parties that do license them and they make money that way. Likely much more money than they would make from the streams they are losing from you.

slaymaker1907•2mo ago

I think it depends on the music. Most people will have a greatly improved experience when listening to opera if they have access to (translated) lyrics. Even if you know the language of an opera, it can be extremely difficult for a lot of people to understand the lyrics due to all the ornamentation.

freejazz•2mo ago

What percentage of streaming income does opera, as a genre, represent such that it could even factor into this business decision?

griffzhowl•2mo ago

I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them. But these days with generative AI, they can take lyrics and just make a new song with them, and you can probably see why artists and record companies would want to stop that.

Plus, from TFA,

"GEMA hoped discussions could now take place with OpenAI on how copyright holders can be remunerated."

Getting something back is better than nothing

griffzhowl•2mo ago

Had a couple of drive-by downvotes... Is it that stupid an opinion? Granted I know nothing about the case except for what's in TFA

ToucanLoucan•2mo ago

Likely because you're a "luddite" which in the current atmosphere of HN and other tech spaces, mean you have a problem with a "research institution" which has a separate for-profit enterprise face that it wears when it feels like it having free and open access to the collected works of humanity so it can create a plagiarism machine that it can then charge for people to access.

I don't respect this opinion but it is unfortunately infesting tech spaces right now.

fwn•2mo ago

> Had a couple of drive-by downvotes... Is it that stupid an opinion?

While I do not agree with your take, FWIW I found your comment substantive and constructive.

You seem to be making two points that are both controversial:

The first is that generative AI makes the availability of lyrics more problematic, given new kinds of reuse and transformation it enables. The second is that AI companies owe something (legally or morally) to lyric rights holders, and that it is better to have some mechanism for compensation, even if the details are not ideal.

I personally do not believe that AI training is meaningfully different from traditional data analysis, which has long been accepted and rarely problematized.

While I understand that reproducing original lyrics raises copyright issues, this should only be a concern in terms of reproduction, not analysis. Example: Even if you do no data analysis at all and your random character generator publishes the lyrics of a famous Beatles song (or other forbidden numbers) by sheer coincidence, it would still be a copyright issue.

I also do not believe in selective compensation schemes driven by legal events. If a legitimate mechanism for rights holders cannot be constructed in general, it is poor policy craftsmanship to privilege the music industry specifically.

Doing so relieves the pressure to find a universal solution once powerful stakeholders are satisfied. While this might be seen as setting a useful precedent by small-scale creators, I doubt it will help them.

BeFlatXIII•2mo ago

If anything, AI would scramble the lyrics more than a human "taking lyrics to make a new song from them".

griffzhowl•2mo ago

Maybe, but it's also possible to get an AI to produce a song with the exact same lyrics. And a human copying lyrics would also be a copyright issue in any case.

But anyway it seems I misinterpreted the issue and record companies have always been against reproduction of lyrics whether an AI or human is doing it

EdgeExplorer•2mo ago

I'm not one of the downvoters, but it may be this: "Many sites have been doing that for decades and as far as I know record companies haven't gone after them."

Record companies have in fact, for decades, been going after sites for showing lyrics. If you play guitar, for example, it's almost impossible to find chords/tabs that include the lyrics because sites get shut down for doing that.

griffzhowl•2mo ago

Hmm, alright. I actually do play guitar and used to find chords/tabs with lyrics easily. I haven't been doing that for maybe 10-15 years. Anyway, maybe those sites were paying for a license and I just never considered it

charcircuit•2mo ago

It's like saying that movie studios haven't gone after Netflix over movies, so what's the issue with hosting pirated movies on your own site. The reason movie studios don't go after Netflix is that they have a license to show it.

mjr00•2mo ago

I didn't downvote, but

> I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them.

Reproducing lyrics in text form is, in fact, a problem, independent of AI. The music industry has historically been aggressively litigious in going after websites which post unlicensed song lyrics[0]. There are many arcane and bizarre copyright rules around lyrics. e.g. If you've ever watched a TV show with subtitles where there's a musical number but none of the lyrics are subtitled, you might think it was just laziness, but it's more likely the subtitlers didn't have permission to translate&subtitle the lyrics. And many songs on Spotify which you'd assume would have lyrics available, just don't, because they don't have the rights to publish them.

[0] https://www.billboard.com/music/music-news/nmpa-targets-unli...

griffzhowl•2mo ago

Thanks. Maybe that misconception was the problem. Taking a hammering in downvotes, lol

lokar•2mo ago

The composition and lyrics are owned separately from the recorded performance.

slaymaker1907•2mo ago

I'm pretty sure you could even have lyrics with a separate copyright from the composition itself. For example, you can clearly have lyrics without the music and you can have the composition alone in the case that it is performed as an instrumental cover or something.

EnPissant•2mo ago

This is a tough one for the HN crowd. It's like that man not sure which button to push meme.

1) RIAA is evil for enforcing copyrights on lyrics?

2) OpenAI is evil for training on lyrics?

maximilianburke•2mo ago

I think you mean the RIAA

RAII is a different kind of (necessary) evil

EnPissant•2mo ago

Indeed, too much C++. Edited.

slaymaker1907•2mo ago

Why not both? As the GP mentioned, lyrics are also invaluable for people besides training for AI.

nomel•2mo ago

I think the perceived lack-of-value for them is related to how easy it is to write lyrics down, compared to any other aspect of the music. Anyone can do it within the time of the song, usually first try. Any other aspect of the song cant't just be written down from ear (yes, including the sheet music, which isn't nearly expressive enough to reproduce a performance*).

*There are some funny "play from sheet music without knowing the song" type videos out there, with funny results. YouTube/google search is no longe usable, so I can't find any.

raverbashing•2mo ago

Actually in Germany it's GEMA

briandear•2mo ago

Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs.

Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech.

LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.

gruez•2mo ago

Sounds like it was never about copyright as a principle, only symbolic politics (ie. copyrights benefit megacorps? copyright needs to be weaker! copyright hurts megacorps? copyright needs to be stronger!)

cycomanic•2mo ago

> Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs. > > Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech. > > LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.

Sorry this such a (purposefully?) naive take. In reality the thoughts are much more nuanced. For one open source/free software doesn't exist without copyright. Then there is the whole issue that these companies use vast amount of copyrighted material to train their models, arguing that all this is fair use. But on the other hand they lock their models behind walls, disallow training on them, keep the training methods and data selection secret...

This tends to be what people disagree with. It feels very much different rules for thee and me. Just imagine how outraged Sam Altman would act if someone leaked the code for Gpt5 and all the training scripts.

If we agree that copyright does not apply to llms, then it should also not apply to llms and they should be required to release all their models and the way of training them.

Ea-Nasir•2mo ago

Does that mean you would support open LLM model training on copyrighted data?

cycomanic•2mo ago

I think that opens several other cans of worms, but in principle I would support a solution that allows using copyrighted materials if it is for the common good (I.e the results are released fully open, means not just weights but everything else).

As a side note i am definitely not strong into IP rights, but I can see the benefits of copyright much more clearly than patents.

EdgeExplorer•2mo ago

My point wasn't supposed to be that copyright is bad (or that it's good), just that the business logic of fighting the sharing of lyrics is incomprehensible to me.

That aside, I think there's a lot more complexity than you're presenting. The issue is who gets to benefit from what work.

As hackers, we build cool things. And our ability to build cool things comes in large part from standing on the shoulders of giants. Free and open sharing of ideas is a powerful force for human progress.

But people also have to eat. Which means even as hackers focused on building cool things, we need to get paid. We need to capture for ourselves some of the economic value of what we produce. There's nothing wrong with wanting to get paid for what you create.

Right now, there is a great deal of hacker output the economic value of which is being captured almost exclusively by LLM vendors. And sure, the LLM is more amazing than whatever code or post or book or lyric it was trained on. And sure, the LLM value comes from the sum of the parts of its source material instead of the value of any individual source. But fundamentally the LLM couldn't exist without the source material, and yet the LLM vendor is the one who gets to eat.

The balance between free and open exchange of ideas and paying value creators a portion of the value they create is not an easy question, and it's not anti-hacker to raise it. There are places where patents and other forms of exclusive rights seem to be criminally mismanaged, stifling progress. But there's also "some random person in Nebraska" who has produced billions of dollars in value and will never see a penny of it. Choosing progress alone as the goal will systematically deprive and ultimately drive away the very people whose contributions are enabling the progress. (And of course choosing "fair" repayment alone as the goal will shut down progress and allow less "fair" players to take over... that's why this isn't easy.)

tharne•2mo ago

I know nuance takes the fun out of most online discussions, but there's a qualitative difference between a bunch of college kids downloading mp3's on a torrent site and a $500 billion company who's goal among other things is to become the primary access point to all things digital.

EnPissant•2mo ago

Should young adults be allowed to violate copyright and no one else? The damages caused seem far worse than an LLM being able to reproduce song lyrics.

Is it simply "we like college kids" and "we hate OpenAI"? that dictates this?

I'm ready, hit me with the nuance.

therouwboat•2mo ago

What damages? You can learn lyrics by listening the song.

EnPissant•2mo ago

Sounds like you agree with me.

shermantanktop•2mo ago

Sometimes, sometimes not.

https://www.kissthisguy.com/

shagie•2mo ago

I'm still trying to work out the lyrics to Prisencolinensinainciusol. https://youtu.be/fU-wH8SrFro

... Alright?

shakna•2mo ago

A young adult who pirates, is also more likely to make purchases in that industry, and has an impact that is limited.

A corporation who pirates, is more likely to pirate en masse everything that they can get their hands on, in an ongoing manner, and throw everything they can at contesting their right to do so in court.

EnPissant•2mo ago

This is neither true nor relevant.

ghssds•2mo ago

Maybe individuals and corporations are differents enough copyright should not work the same way.

delecti•2mo ago

3) Some types of data are more ethical to train on than others.

Training on Wikipedia? Cool! Training on pirated copies of books? Not cool! Training on lyrics? IMO that's on the "cool" side of the line, because the "product" is not the words, it's the composition and mastered song.

saghm•2mo ago

One amusing part of lyrics on Spotify to me is how they don't seem to track which songs are instrumentals or not and use that to skip the message about them not knowing the lyrics. An instrumental will pop up and it will say something like "Sorry, we don't have the lyrics to this one yet".

The only thing funnier than that is when they do have the lyrics to a song that probably doesn't need them, like Hocus Pocus by Focus: https://open.spotify.com/track/2uzyiRdvfNI5WxUiItv1y9?si=7a7...

smelendez•2mo ago

I’ve also seen cases where they list lyrics for a song that doesn’t have any (usually an instrumental jazz version of an old standard).

input_sh•2mo ago

Oh they track that, it's in their API as the "instrumentalness" score: https://developer.spotify.com/documentation/web-api/referenc...

The fact that they don't do anything with that information is unrelated.

saghm•2mo ago

Interesting, especially that it's a probability rather than a boolean! The line can be blurry sometimes (like in the example I mentioned), so it makes sense that it might not be possible to come up with a consistent way of classifying them that everyone would agree with.

rpdillon•2mo ago

The content industries should have been the ones to invent LLMs, but their head is so stuck in the past and in regressive thinking about how they protect their revenue streams that they're incapable of innovating. Publishing houses should have been the ones to have researchers looking into how to computationally leverage their enormous corpus of data. But instead, they put zero dollars into actual research and development and paid the lawyers instead. And so it leads to attitudes like this.

Q6T46nT668w6i3m•2mo ago

“The content industries.”

Why would people invest in destroying what they love?

rpdillon•2mo ago

There is no destruction.

sam_lowry_•2mo ago

He meant, the stream of free money from unsuspecting monkeys.

gruez•2mo ago

That's always been the case, eg. how they were latecomers to streaming.

cubefox•2mo ago

Streaming had to compete with digital music piracy. As a result, Spotify is impossibly cheap compared to buying individual albums or singles in the past. So musicians hardly receive any money from recorded music anymore. Nowadays they basically have only concerts left as a means to earn money.

ghssds•2mo ago

The only people seeing themselves as "content creators" are people giving social media stuff so their users get something they can doom scroll. Other people see themselves as artists, entertainers, musicians, authors, etc.

rpdillon•2mo ago

I'm referring to the rent seekers sitting in between the artists and the public.

dragonwriter•2mo ago

> The content industries should have been the ones to invent LLMs

While exclusively-controlled LLMs would be mildly useful to them, the technology existing is dangerous to them, and they already have a surplus supply of content at low cost that they monetize by controlling discovery, gatekeeping, and promotion, so I don't think it makes sense for them to put energy into LLMs even if they had the technical acumen to recognize the possibilities (much the same way that Google, despite leading in developing the underlying technology, had vvery little incentive to productize it since it was disruptive to their established business, until someone else already did and the choice was to compete on that or lose entirely.)

rpdillon•2mo ago

You have to get ahead of the disruption that will destroy you. At least, if you care about longevity of your company. I realize this isn't always the case.

riazrizvi•2mo ago

Thoughts by someone who doesn’t make a living by songs?

I’m guessing you’d want to restrict lyrics to encourage more plays of the song by people who are motivated to understand them. Along with the artist’s appreciation of that experience of extracting what you’re fascinated by. Burdensome processes generate love and connection to things.

Not everything is a functional commodity to be used and discarded at whim.

alienbaby•2mo ago

Can't they just ask for copies of the lyrics they are not allowed to use and s/lyrics//g the training set? I imagine the volume of text that will be removed would be relatively miniscule.

friendzis•2mo ago

They should ask for lyrics they are allowed to use. The volume of the text that's left would be miniscule.

jeroenhd•2mo ago

That's not a solution for the same reason I'm not allowed to pirate unless movie studios personally ask me not to do so.

Sol-•2mo ago

I think in the end they will just pay off copyright holders. The German GEMA is mostly interested in rent-seeking through whatever means available, it's basically the whole point of the organization.

They'll easily be paid off once all legal avenues are exhausted for OpenAI. Though they'll of course keep fighting in court in the hopes of some more favorable negotiating position.

nba456_•2mo ago

If the copyright costs get too high then we'll just use Chinese AI, unless they try to ban that, too.

Buttons840•2mo ago

You know, I'm a bit of a lyricist myself. These very words are lyrics to a tune in my head, and thus enjoy the increased legal protection of lyrics.

dotdi•2mo ago

I am torn because on one hand, fuck record companies. On the other hand, fuck AI companies torrenting, stealing and defrauding.

gizajob•2mo ago

Yeah it is tricky in the current climate who to say “fuck you” to first. GEMA does at least represent human artists a bit. Nobody I know in music or any other creative industries has given a blanket allowance for AI companies hoovering up their artworks to then regurgitate for profit. Pirating music to hear it is one thing. Cloning it with modifications to resell is a whole other thing.

soulofmischief•2mo ago

Copyright law continues to stifle innovation. The DMCA needs to be abolished, and we need to entirely rethink our modern economic system with respect to creative industries (including software development). The cat is not going back in the bag.

We are sitting on the precipice of the greatest technological advancement in history, and rent-seeking industry titans have convinced us that we must stop this unstoppable technological advancement in order to protect the livelihood of artists who already receive cents on the dollar for their efforts.

Doesn't that sound familiar? This is what they have done time and again, and each time they have lost, leading to a huge loss of potential revenue for creatives as people make use of technological breakthroughs.

Then when companies like Netflix finally get everyone on board with streaming and paying for content with modern conveniences, industry titans step back in to demand larger slices of the pie, until the entire system is ruined and people return to piracy and consumption of older media. Don't even get me started on Spotify.

The technological benefit of modern machine learning models is just too large to ignore. These are becoming important tools, which put power back in the hands of the people, of the consumer. A lot of the grassroots anti-AI movements we see in the creative space can be traced back to corporate propaganda or financial backing. A lot of these people really think they're doing what's best for artists. But I just see Blockbuster all over again. We should make an effort not to be on the wrong side of history.

_DeadFred_•2mo ago

Ah yes, give away the protection that also protects me, the small person, should I write a song, write a book, come up with a compelling software concept, come up with a way to improve food growth, should go away because it's 'rent seeking' in order to be replaced with.... rent seeking trillion dollar valuation tech companies?

Prior to the current tech bros economy one of the number one ways average people moved up to being rich in the USA was all enabled purely by the copyright/patents laws protections you want to do away with.

soulofmischief•2mo ago

I said nothing about patents (though I definitely have feelings about software patents you probably wouldn't like), I simply stated that it is farcical and dishonest to build an economic system predicated upon the restriction of first-amendment rights of consumers, such as the DMCA which prevents me from making copies of my files and sharing them with other people, which, outside of national security threats, is an ethically bankrupt proposition.

The reality is that these small people of which you speak are still beholden to a rent-seeking industry that exploits artists en masse. Most creatives historically got next to nothing. Most artists don't get anything close to rich off of their work.

Yet today, we are able to directly support creatives, and many creatives do quite well by managing the long tail and curating a small, but dedicated patronage. Many of these creatives make more money with such a model, while still allowing their work to be shared with proper attribution.

We have just been brainwashed by a century of corporate interests sticking their hand into every facet of the creative industry, convincing us that the systems they've built over decades are the only way for things to be, even if it infringes upon the rights of others.

Also, while I develop software as a trade, I have also been an artist my entire life, working across many mediums, and so my opinions about the intersection of creativity and technology are not just those of some "tech bro", and I don't think that kind of framing is productive or fair. Especially considering I grew up poor and homeless as a teenager, and have had to reckon with the economic prospects of which you speak much more closely than most.

blogus•2mo ago

I would think as a matter of practice AI companies would attempt to detect long strings that appeared frequently in their corpus and dedup them out. There isn’t any value in training over and over again on the same data, and the copyright danger of being able to exactly reproduce your training set is obvious. Perhaps they did it intentionally, using the ability to reproduce copyrighted material as a way to get customers early on, knowing they would have to pay a paltry fee for it later.

mleroy•2mo ago

A key takeaway from this ruling is that "the systems contain copies of the original works." Does this mean that offering any open-weight model capable of reproducing copyrighted text snippets or lyrics will be prohibited? That would be a big setback for AI development in the EU.

moontear•2mo ago

That's what the lawsuit of the New York Times is about - OpenAI reproducing complete texts of NYT articles without paying for the reproduction of said articles. This is not an EU issue, but a general unsolved legal grey zone for the whole AI market.

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Any chess position with 8 pieces on board and one pair of pawns has been solved

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Projecting high-dimensional tensor/matrix/vect GPT–>ML

Show HN: Free Bank Statement Analyzer to Find Spending Leaks and Save Money

Our Stolen Light

Matchlock: Linux-based sandboxing for AI agents

Show HN: A2A Protocol – Infrastructure for an Agent-to-Agent Economy

Drinking More Water Can Boost Your Energy

Proving Laderman's 3x3 Matrix Multiplication Is Locally Optimal via SMT Solvers

Fire may have altered human DNA

"Compiled" Specs

The Next Big Language (2007) by Steve Yegge

Open-Weight Models Are Getting Serious: GLM 4.7 vs. MiniMax M2.1

Using AI for Code Reviews: What Works, What Doesn't, and Why

Show HN: Solnix – an early-stage experimental programming language

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Any chess position with 8 pieces on board and one pair of pawns has been solved

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Projecting high-dimensional tensor/matrix/vect GPT–>ML

Show HN: Free Bank Statement Analyzer to Find Spending Leaks and Save Money

Our Stolen Light

Matchlock: Linux-based sandboxing for AI agents

Show HN: A2A Protocol – Infrastructure for an Agent-to-Agent Economy

Drinking More Water Can Boost Your Energy

Proving Laderman's 3x3 Matrix Multiplication Is Locally Optimal via SMT Solvers

Fire may have altered human DNA

"Compiled" Specs

The Next Big Language (2007) by Steve Yegge

Open-Weight Models Are Getting Serious: GLM 4.7 vs. MiniMax M2.1

Using AI for Code Reviews: What Works, What Doesn't, and Why

Show HN: Solnix – an early-stage experimental programming language

OpenAI may not use lyrics without license, German court rules

Comments