Judge said Meta illegally used books to build its AI

https://www.wired.com/story/meta-lawsuit-copyright-hearing-artificial-intelligence/

115•mekpro•3h ago

Comments

labrador•1h ago

"Chhabria is cutting through the moral noise and zeroing in on economics. He doesn't seem all that interested in how Meta got the data or how “messed up” it feels—he’s asking a brutally simple question: Can you prove harm?"

https://archive.is/Hg4Xr

TimPC•1h ago

I think the headline is a bit misleading. Mets did pirate the works but may be entitled to use them under fair use. It seems like the authors are setting up for failure by making the case about whether the AI generation hinders the market for books. AI book writing is such a tiny segment what these models do that if needed Meta would simply introduce guard rails to prevent copying the style of an author and continue to ingest the books. I also don’t think AI generated fiction is anywhere near high quality enough to substantially reduce the market for the original author.

stego-tech•1h ago

The problem is that "harm" as defined by copyright law is strictly limited to loss of sales due to breach of that copyright; it makes no allowment (that I know of) to livelihoods lost by the theft of the work indefinitely, as AI boosters suggest their tools can do (replace people). The way this court case is going, it's an uphill battle for the plaintiffs to prove concrete harm in that very narrow context, when the real harm is the potential elimination of their future livelihoods through theft, rather than immediately tangible harms.

As a (creative) friend of mine flatly said, they refuse to use an LLM until it can prove where it learned something from/cite its original source. Artists and creatives can cite their inspirational sources, while LLMs cannot (because their developers don't care about credit, only output) by design. To them, that's the line in the sand, and I think that's a reasonable one given that not a single creative in my circles has been cut payment from these multi-billion-dollar AI companies for the unauthorized use of their works in training these models.

tedivm•1h ago

Github does give free copilot access to open source developers it considers important enough (which is a pretty low bar). While not the same as actually paying, it's the only example I can think of where the company that used people's copyrighted material actually gave something back to those people.

_aavaa_•1h ago

What you’re describing is the Extend phase of Microsoft’s plan.

mistrial9•1h ago

the educated and erudite can wait in line near the Castle; every day due to the grace of our masters, unused bread from the master's table is available without prejudice. These people can have a fine life, and the people are fulfilled.

bgwalter•21m ago

They want to track and utilize the new code that those developers are writing. And they want to keep them on GitHub. And they want to claim in potential lawsuits:

"See, those developers themselves have used CoPilot, so they approve the copyright infringement."

subscribed•44m ago

Your friend might want to check Perplexity.

lopis•26m ago

While Perplexity is able to show sources for the information it shows, the language part, and the body of text upon which it was trained, is a black box, and sources are not given, nor typically desirable as a user.

lopis•28m ago

> Artists and creatives can cite their inspirational sources

Even humans have a lot of internalized unconscious inspirational sources, but I get your point.

internetter•1h ago

> Meta did pirate the works but may be entitled to use them under fair use

What fair use? Were the books promised to them by god or something?

TimPC•57m ago

Fair use allows for certain uses of copyrighted works without a specific license for those works. One of the major criterion is how transformative the work is and an LLM model is very different from the original work so it seems likely that criterion at least is met.

SideburnsOfDoom•32m ago

> an LLM model is very different from the original work

True, but not the only relevant thing.

If the output of the LLM is "not very different from the original work" then the output could be the infringement. Putting a hypercomplex black box between the source work and the plagiarised output does not in itself make it "not infringing". The "LLM output as a service" business is then based on selling something based other people's work, that they do not have rights to.

It's falling for misdirection, "pay no attention to the LLM behind the curtain" to think otherwise.

Filligree•21m ago

The output of the LLM is very different from the original, though. It’s hard to look at this and claim it isn’t.

SideburnsOfDoom•17m ago

> The output of the LLM is very different from the original, though

I will disagree with that characterisation. IMHO: In some cases no, it's not different, there are clear lines from inputs to outputs. In some cases yes, it's different from any one input work, it's distributed micro-plagiarism of a huge number of sources. In no case is it original.

But I think that this is legally undecided and won't be decided by you or me, and it is going to be a more interesting and relevant question than "is the LLM model is very like the original work", which it clearly isn't. That's like asking "is this typewriter like this novel?" It can't be, but what you typed with it could be.

matkoniecz•19m ago

"fair use" is a specific legal term

SideburnsOfDoom•49m ago

Firstly, no kidding, of course it's "illegal" and "Piracy".

Secondly, there's an argument that the infringement happens only when the LLM produces output based in part of whole on the source material.

In other words, training a model is not infringing in itself. You could "research" with it. But selling the output as "from your model" is highly suspect. Your business is then based on selling something based other people's work, that you do not have rights to.

gabriel666smith•46m ago

I think there's a really fundamental misunderstanding of the playing field in this case. (Disclaimer that my day job is 'author', and I'm pro-piracy.)

We need to frame this case - and ongoing artist-vs-AI-stuff -using a pseudoscience headline I saw recently: 'average person reads 60k words/day'.

I won't bother sourcing this, because I don't think it's true, but it illustrates the key point: consumers spend X amount of time/day reading words.

> It seems like the authors are setting up for failure by making the case about whether the AI generation hinders the market for books. AI book writing is such a tiny segment what these models do that if needed Meta would simply introduce guard rails to prevent copying the style of an author and continue to ingest the books.

and from the article:

> When he turned to the authors’ legal team, led by high-profile attorney David Boies, Chhabria repeatedly asked whether the plaintiffs could actually substantiate accusations that Meta’s AI tools were likely to hurt their commercial prospects. “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”

The market share an author (or any other artist type) is competing with for Meta is not 'what if an AI wrote celebrity memoirs?'. Meta isn't about to start a print publishing division.

Authors are competing with Meta for 'whose words did you read today?' Were they exclusively Meta's - Instagram comments, Whatsapp group chat messages, Llama-generated slop, whatever - or did an author capture any of that share?

The current framing is obviously ludicrous; it also does the developers of LLMs (the most interesting literary invention since....how long ago?) a huge disservice.

Unfortunately the other way of framing it (the one I'm saying is correct) is (probably) impossible to measure (unless you work for Meta, maybe?) and, also, almost equally ridiculous.

kazinator•36m ago

Do you not understand that "fair use" is not some copyright free-for-all which lets you use works wholesale without attribution as if they were suddenly public domain?

To make fair use of a book's passage, you have to cite it. The except has to be reasonably small.

Without fair use, it would not be possible to write essays and book reviews that give quotes from books. That's what it's for. Not for having a machine read the whole book so it can regurgitate mashups of any part of it without attribution.

Making a parody is a kind of fair use, but parodies are original expression based on a certain structure of the work.

danaris•2m ago

> To make fair use of a book's passage, you have to cite it.

That's not true. That's what's required for something not to be plagiarism, not for something not to be copyright infringement.

Fair use is not at all the same as academic integrity, and while academic use is one of the fair use exceptions, it's only one. The most you would have to do with any of the other fair use exceptions is credit where you got the material (not cite individual passages), because you're not necessarily even using those passages verbatim.

apercu•26m ago

> but may be entitled to use them under fair use.

Why? Was it legal for me to download copyrighted songs from Limewire as "fair use"? Because a few people were made examples of.

I'm a musician, so 80% of the music I listen to is for learning so it's fair use, right? ;)

Filligree•23m ago

> I'm a musician, so 80% of the music I listen to is for learning so it's fair use, right? ;)

I would be happy with that outcome. I’m a fanfiction writer, and a lot of the stories I read are very much for learning. ;-)

lukeschlather•12m ago

I don't believe anyone was ever penalized for downloading only uploading which seems like a pretty similar principle to what the judge is saying here.

BrawnyBadger53•10m ago

If the result of this becomes that substantial remixes and fanfiction can be commercialized without permission from authors then I am happy. This stuff should have been fair use to begin with. Granted it probably already is fair use but because of the way copyright is enforced online it is effectively banned regardless.

aurizon•17m ago

Yes, current AI video/text product is inferior at this time. Youtube is full of all genres of inferior products - at this time! The ramp of improvement is quite steeply pointing upwards. This is retrospective of the days of spinning jennies and knitting/weaving machines that soon made manual products un-economic - that said, excellent craft/art product endured on a smaller scale. AI is also taking a toll on the movie arts, staring at the low end and climbing the same incremental improvement rungs. All the special effects(SFX) are in a similar boat. Prop rentals are hit hard. 100 high res photos of an old Studio Tv camera - all angles/sizes/lighting can be added to an AI prop library and with a green screen insert the prop can manifest as a true object in any aspect. There can be many. It still takes people to cull the hallucinations - a declining problem. Same with actors. They can be patterned after a famous actor - with likeness fees, or created de-novo. All the classic aspects of a studio production suffer the same incremental marginalisation - in 5 years = what will remain? - what new tech will emerge? I feel that many forks will emerge, all fighting for a place in the sun = some will be weeded out, some will flower - but at a very high pace. The old producers/directors/writers - the whole panoply of what makes a major studio will be scattered like dried bread crumbs,

ndiddy•1h ago

The title for this submission is somewhat misleading. The judge didn't make any sort of ruling, this is just reporting on a pretrial hearing. He also doesn't seem convinced as to how relevant downloading books from LibGen is to the case:

> At times, it sounded like the case was the authors’ to lose, with [Judge] Chhabria noting that Meta was “destined to fail” if the plaintiffs could prove that Meta’s tools created similar works that cratered how much money they could make from their work. But Chhabria also stressed that he was unconvinced the authors would be able to show the necessary evidence. When he turned to the authors’ legal team, led by high-profile attorney David Boies, Chhabria repeatedly asked whether the plaintiffs could actually substantiate accusations that Meta’s AI tools were likely to hurt their commercial prospects. “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”

> When defendants invoke the fair use doctrine, the burden of proof shifts to them to demonstrate that their use of copyrighted works is legal. Boies stressed this point during the hearing, but Chhabria remained skeptical that the authors’ legal team would be able to successfully argue that Meta could plausibly crater their sales. He also appeared lukewarm about whether Meta’s decision to download books from places like LibGen was as central to the fair use issue as the plaintiffs argued it was. “It seems kind of messed up,” he said. “The question, as the courts tell us over and over again, is not whether something is messed up but whether it’s copyright infringement.”

bgwalter•34m ago

The RIAA lawyers never had to demonstrate that copying a DVD cratered the sales of their clients. They just got high penalties for infringers almost by default.

Now that big capital wants to steal from individuals, big capital wins again.

(Unrelatedly, has Boies ever won a high profile lawsuit? I remember him from the Bush/Gore recount issue, where he represented the Democrats.)

kranke155•26m ago

anshumankmr•26m ago

Interesting figure that guy.

Here's this: >Boies also was on the Theranos board of directors,[2][74] raising questions about conflicts of interest.[75] Boies agreed to be paid for his firm's work in Theranos stock, which he expected to grow dramatically in value.[75][3]

https://en.wikipedia.org/wiki/David_Boies

That was one of the decisions of all time.

nadermx•22m ago

[0] https://en.m.wikipedia.org/wiki/Dowling_v._United_States_(19...

fmblwntr•16m ago

ironically, he was the head lawyer on the legal team for napster (obviously a huge loss) but it accords well with your theory

ImPostingOnHN•10m ago

If I remember correctly the legal precedent from that era, and if I'm summarizing correctly: Those who served or uploaded were considered to be infringing, since they were "making copies" by serving or uploading, whereas those who downloaded infringing copies were not themselves infringers. Meta in this case is at least described by the latter, and the question is whether LLM generation constitutes the former.

999900000999•5m ago

Meta needs these books.

They seek to convert them into more products. The needs of the copyright holders , who are relatively small businesses and individuals are outweighed by the needs of Meta.

Sarah wanting to watch a movie or listen to music... Too bad she doesn't have an elite team of lawyers to justify whatever she wants.

In practice Meta has the money to stretch this out forever and at most pay inconsequential settlements.

YouTube largely did the same thing, knowingly violate copyright law, stack the deck with lawyers and fix it later.

adingus•1h ago

I'm wondering if authors are making the same mistakes that the music industry did with Napster and kazaa. Using AI has led to more book purchases for me. If I discover and enjoy a book via AI I'm more inclined to buy it. The cats out of the bag, so pet him.

tacheiordache•47m ago

Just look at the state of the music industry.

mtlynch•18m ago

Can you share more about how you sample books with AI?

Mbwagava•1h ago

Whether or not Meta wins this case, I'm never going to support any government that supports both LLMs and IP. Like we have to put up with IP despite having no clear value to a digital society but as soon as it becomes inconvenient it goes out the window? Nah, let's just trash the state and start over.

It's going to take centuries to undo the damage wracked by IP-supported private enterprise. And now we also have to put up with fucking chatbots. This is the worst timeline.

jMyles•52m ago

The good news is that the internet is, fundamentally and in a way that no legacy state can alter, not a place where IP is cognizable.

You are free to copy bytes as you see fit, and the internet treats them identically whether they are random noise or whether a codec can turn them into music, film, books, or whatever inspires you.

The problem is that some humans, justifying their behavior by claiming it as "official", may act out with violence against you if they (rightly or wrongly, that's important to note) perceive that your actions are causing the internet to copy bytes to which they object.

Enduring nonviolence is likely yet ahead as consensus grows over the end of the legitimacy of these legacy states.

labrador•31m ago

I hope you don't think I'm snarky because I'm serious. If you're an American citizen you can homestead in Alaska and cut yourself off from all this if you like.

edit: i'm serious. many americans would be much happier taking this option if they knew it existed. i may take it myself

RajT88•1h ago

> “What about the next Taylor Swift?” he asked, arguing that a “relatively unknown artist” whose work was ingested by Meta would likely have their career hampered if the model produced “a billion pop songs” in their style.

I have this debate with a friend of mine. He's terrified of AI making all of our jobs obsolete. He's a brilliant musician and programmer both, so he's both enthused and scared. So let's go with the Swift example they use.

Performance Artists have always tried to cultivate an image, an ideal, a mythos around the character(s). I've observed that as the music biz has gotten more tough, that the practice of selling merch at shows has ramped up. Social media is monetized now. There's been a big diversification in the effort to make more money from everything surrounding the music itself. So too will it be with artists.

You're starting to see this already. Artists which got big not necessarily because of the music, but because of the weird cult of personality they built. One who comes to minds is Poppy, who ironically enough built a cult of personality around her being a fake AI bot...

https://en.wikipedia.org/wiki/Poppy_(singer)

You've definitely got counter-examples like Hatsune Miku - but the novelty of Miku was because of the artificiality (within a culture that, like, really loves robots and shit). AI pop stars will undoubtedly produce listenable records and make some money, but I don't expect that they will be able to replace the experience of fans looking for a connection with an artist. Watch the opening of a Taylor Swift concert, and you'll probably get it.

atrus•43m ago

I think that argument is further hampered (taylor being an exception) by the fact that most pop stars already don't write their own songs. If people like Max Martin can pump out multiple hit songs for multiple groups, it kinda shows that who wrote the song doesn't matter.

Has making music for a living ever not been tough?

RajT88•35m ago

> Has making music for a living ever not been tough?

Fair.

> I think that argument is further hampered (taylor being an exception) by the fact that most pop stars already don't write their own songs.

That accounts for the big artists on the radio (yes some people listen to that). But, what about everyone else? I would posit that most artists (the one-hit wonders, the ones without radio success, etc.) write their own songs. It seems like there's such acts who make a go of it just fine, who write their own songs and really nail the connection with fans. I would point to a regional band near me: Mr. Blotto.

reverendsteveii•9m ago

counterpoint: Gorillaz is a band specifically designed around the idea that the artist doesn't have to exist in order to do all of the things you mention above. Gorillaz has an image, a style, a mythos, all of that. Granted, when they started at the time (2000) there needed to be human creativity in order to create all of that but with AI everything about this can now be generated. I suspect it won't be long before all of our theorizing goes out the window because someone actually did create an act where everything, the music, the stage show, interviews, looks, all of it, is just AI-generated. That's when we'll get dollar votes on whether AI can actually generate a meaningful musical experience that people want to have.

steele•41m ago

In a just world, this would shutter the organization.

kazinator•39m ago

The legal system is not going to be kind to the AI hucksters. Why? Because, quite stupidly and counterproductively, they have stepped on its toes by claiming that AI can replace lawyers. On top of that, there have been incidents of lawyers getting in hot water for generating slop instead of doing their work. So, this isn't just some distant, abstract tech issue for the lawyers and judges, like whether APIs should be copyrighted. If you're in any kind of business, you generally want these people to be on your side. Oopsies!

throwacct•18m ago

Only Facebook?!!

Workaccount2•7m ago

Let me make a clarifying statement since people confuse (purposely or just out of ignorance) what violating copyright for AI training can refer to:

1. Training AI on freely available copyright - Ambiguous legality, not really tested in court. AI doesn't actually directly copy the material it trains on, so it's not easy to make this ruling.

2. Circumventing payment to obtain copyright material for training - Unambiguously illegal.

Meta is charged with doing the latter, but it seems the plaintiffs want to also tie in the former.

Show HN: HackerNewsWorthy – Analyze Your Blog Posts for HN Success

Stack Lights

The power of spread and rest patterns in JavaScript

Show HN: UserWatch – AI re‑watches your session replays and labels every dropoff

Which ML techniques can emerge, and which should be engineered?

You can't Git clone a team

A Tektronix TDS 684B Oscilloscope Uses CCD Analog Memory

Modelling the Impact of Tariffs

The Wild Man of the Revolution, Samuel Adams

Show HN: ArcThing – An Arc-Style Browser Sidebar (Extension) for Chrome

Using Gemini's multilingual capabilities to personalize language learning

Maker of AI 'vibe coding' app Cursor hits $9B valuation

Ask HN: Has anyone managed to pass Meta's Access Verification?

Mind, monetized-how AI companies can turn your prompts into buying triggers

Boxer: Dockerfile -> Universal Wasm Binary

Show HN: Bracket – selfhosted tournament system

Artificial Intelligence Report 2025 [pdf]

The Physics of Perfect Pour-Over Coffee

The Case for Abundant Recreation

More British households struggling with bills will resort to energy theft

Common sugar substitute may affect brain and blood vessel health

Example LLM code dysfunction – Go methods

Activeloop-L0: Agentic Reasoning on Your Multimodal Data

Open Source (and Open Data) LLM

Swedish IPTV Crackdown Tested as Users Seek Workarounds

End of 10: Replace Windows 10 with Linux

The Perfect Business

Assembling attention in next-best IKEA email campaigns (2024)

State of Devs 2025 Survey

C2Y initial draft specification [pdf]