As a (creative) friend of mine flatly said, they refuse to use an LLM until it can prove where it learned something from/cite its original source. Artists and creatives can cite their inspirational sources, while LLMs cannot (because their developers don't care about credit, only output) by design. To them, that's the line in the sand, and I think that's a reasonable one given that not a single creative in my circles has been cut payment from these multi-billion-dollar AI companies for the unauthorized use of their works in training these models.
"See, those developers themselves have used CoPilot, so they approve the copyright infringement."
Even humans have a lot of internalized unconscious inspirational sources, but I get your point.
What fair use? Were the books promised to them by god or something?
True, but not the only relevant thing.
If the output of the LLM is "not very different from the original work" then the output could be the infringement. Putting a hypercomplex black box between the source work and the plagiarised output does not in itself make it "not infringing". The "LLM output as a service" business is then based on selling something based other people's work, that they do not have rights to.
It's falling for misdirection, "pay no attention to the LLM behind the curtain" to think otherwise.
I will disagree with that characterisation. IMHO: In some cases no, it's not different, there are clear lines from inputs to outputs. In some cases yes, it's different from any one input work, it's distributed micro-plagiarism of a huge number of sources. In no case is it original.
But I think that this is legally undecided and won't be decided by you or me, and it is going to be a more interesting and relevant question than "is the LLM model is very like the original work", which it clearly isn't. That's like asking "is this typewriter like this novel?" It can't be, but what you typed with it could be.
Secondly, there's an argument that the infringement happens only when the LLM produces output based in part of whole on the source material.
In other words, training a model is not infringing in itself. You could "research" with it. But selling the output as "from your model" is highly suspect. Your business is then based on selling something based other people's work, that you do not have rights to.
We need to frame this case - and ongoing artist-vs-AI-stuff -using a pseudoscience headline I saw recently: 'average person reads 60k words/day'.
I won't bother sourcing this, because I don't think it's true, but it illustrates the key point: consumers spend X amount of time/day reading words.
> It seems like the authors are setting up for failure by making the case about whether the AI generation hinders the market for books. AI book writing is such a tiny segment what these models do that if needed Meta would simply introduce guard rails to prevent copying the style of an author and continue to ingest the books.
and from the article:
> When he turned to the authors’ legal team, led by high-profile attorney David Boies, Chhabria repeatedly asked whether the plaintiffs could actually substantiate accusations that Meta’s AI tools were likely to hurt their commercial prospects. “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”
The market share an author (or any other artist type) is competing with for Meta is not 'what if an AI wrote celebrity memoirs?'. Meta isn't about to start a print publishing division.
Authors are competing with Meta for 'whose words did you read today?' Were they exclusively Meta's - Instagram comments, Whatsapp group chat messages, Llama-generated slop, whatever - or did an author capture any of that share?
The current framing is obviously ludicrous; it also does the developers of LLMs (the most interesting literary invention since....how long ago?) a huge disservice.
Unfortunately the other way of framing it (the one I'm saying is correct) is (probably) impossible to measure (unless you work for Meta, maybe?) and, also, almost equally ridiculous.
To make fair use of a book's passage, you have to cite it. The except has to be reasonably small.
Without fair use, it would not be possible to write essays and book reviews that give quotes from books. That's what it's for. Not for having a machine read the whole book so it can regurgitate mashups of any part of it without attribution.
Making a parody is a kind of fair use, but parodies are original expression based on a certain structure of the work.
That's not true. That's what's required for something not to be plagiarism, not for something not to be copyright infringement.
Fair use is not at all the same as academic integrity, and while academic use is one of the fair use exceptions, it's only one. The most you would have to do with any of the other fair use exceptions is credit where you got the material (not cite individual passages), because you're not necessarily even using those passages verbatim.
Why? Was it legal for me to download copyrighted songs from Limewire as "fair use"? Because a few people were made examples of.
I'm a musician, so 80% of the music I listen to is for learning so it's fair use, right? ;)
I would be happy with that outcome. I’m a fanfiction writer, and a lot of the stories I read are very much for learning. ;-)
> At times, it sounded like the case was the authors’ to lose, with [Judge] Chhabria noting that Meta was “destined to fail” if the plaintiffs could prove that Meta’s tools created similar works that cratered how much money they could make from their work. But Chhabria also stressed that he was unconvinced the authors would be able to show the necessary evidence. When he turned to the authors’ legal team, led by high-profile attorney David Boies, Chhabria repeatedly asked whether the plaintiffs could actually substantiate accusations that Meta’s AI tools were likely to hurt their commercial prospects. “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”
> When defendants invoke the fair use doctrine, the burden of proof shifts to them to demonstrate that their use of copyrighted works is legal. Boies stressed this point during the hearing, but Chhabria remained skeptical that the authors’ legal team would be able to successfully argue that Meta could plausibly crater their sales. He also appeared lukewarm about whether Meta’s decision to download books from places like LibGen was as central to the fair use issue as the plaintiffs argued it was. “It seems kind of messed up,” he said. “The question, as the courts tell us over and over again, is not whether something is messed up but whether it’s copyright infringement.”
Now that big capital wants to steal from individuals, big capital wins again.
(Unrelatedly, has Boies ever won a high profile lawsuit? I remember him from the Bush/Gore recount issue, where he represented the Democrats.)
Here's this: >Boies also was on the Theranos board of directors,[2][74] raising questions about conflicts of interest.[75] Boies agreed to be paid for his firm's work in Theranos stock, which he expected to grow dramatically in value.[75][3]
https://en.wikipedia.org/wiki/David_Boies
That was one of the decisions of all time.
[0] https://en.m.wikipedia.org/wiki/Dowling_v._United_States_(19...
They seek to convert them into more products. The needs of the copyright holders , who are relatively small businesses and individuals are outweighed by the needs of Meta.
Sarah wanting to watch a movie or listen to music... Too bad she doesn't have an elite team of lawyers to justify whatever she wants.
In practice Meta has the money to stretch this out forever and at most pay inconsequential settlements.
YouTube largely did the same thing, knowingly violate copyright law, stack the deck with lawyers and fix it later.
It's going to take centuries to undo the damage wracked by IP-supported private enterprise. And now we also have to put up with fucking chatbots. This is the worst timeline.
You are free to copy bytes as you see fit, and the internet treats them identically whether they are random noise or whether a codec can turn them into music, film, books, or whatever inspires you.
The problem is that some humans, justifying their behavior by claiming it as "official", may act out with violence against you if they (rightly or wrongly, that's important to note) perceive that your actions are causing the internet to copy bytes to which they object.
Enduring nonviolence is likely yet ahead as consensus grows over the end of the legitimacy of these legacy states.
edit: i'm serious. many americans would be much happier taking this option if they knew it existed. i may take it myself
I have this debate with a friend of mine. He's terrified of AI making all of our jobs obsolete. He's a brilliant musician and programmer both, so he's both enthused and scared. So let's go with the Swift example they use.
Performance Artists have always tried to cultivate an image, an ideal, a mythos around the character(s). I've observed that as the music biz has gotten more tough, that the practice of selling merch at shows has ramped up. Social media is monetized now. There's been a big diversification in the effort to make more money from everything surrounding the music itself. So too will it be with artists.
You're starting to see this already. Artists which got big not necessarily because of the music, but because of the weird cult of personality they built. One who comes to minds is Poppy, who ironically enough built a cult of personality around her being a fake AI bot...
https://en.wikipedia.org/wiki/Poppy_(singer)
You've definitely got counter-examples like Hatsune Miku - but the novelty of Miku was because of the artificiality (within a culture that, like, really loves robots and shit). AI pop stars will undoubtedly produce listenable records and make some money, but I don't expect that they will be able to replace the experience of fans looking for a connection with an artist. Watch the opening of a Taylor Swift concert, and you'll probably get it.
Has making music for a living ever not been tough?
Fair.
> I think that argument is further hampered (taylor being an exception) by the fact that most pop stars already don't write their own songs.
That accounts for the big artists on the radio (yes some people listen to that). But, what about everyone else? I would posit that most artists (the one-hit wonders, the ones without radio success, etc.) write their own songs. It seems like there's such acts who make a go of it just fine, who write their own songs and really nail the connection with fans. I would point to a regional band near me: Mr. Blotto.
1. Training AI on freely available copyright - Ambiguous legality, not really tested in court. AI doesn't actually directly copy the material it trains on, so it's not easy to make this ruling.
2. Circumventing payment to obtain copyright material for training - Unambiguously illegal.
Meta is charged with doing the latter, but it seems the plaintiffs want to also tie in the former.
labrador•1h ago
https://archive.is/Hg4Xr