Federal judge rules copyrighted books are fair use for AI training

https://www.nbcnews.com/tech/tech-news/federal-judge-rules-copyrighted-books-are-fair-use-ai-training-rcna214766

31•JSLegendDev•7mo ago

Comments

toomuchtodo•7mo ago

I remember the folks here who were dragging the Internet Archive for controlled digital lending, just trying to be a digital library, like it was infringing on authors by attempting to get back what was taken by publishers requiring libraries to buy licenses for ebooks that expired and had to be repurchased time and time again. “Universal access to all knowledge.”

Now, with judicial opinions on this fair use firming up, I am hopeful this will allow them to train on every book they’ve ever acquired and release those models to the world.

jplusequalt•7mo ago

>I hope they train on every book they’ve ever acquired and release those models to the world

Yay, more AI slop to pollute the internet with.

toomuchtodo•7mo ago

Was Google Search slop?

Edit: I respect your position to not engage. We can just do things regardless of those who would disagree (which is good imho when the risk of harm is low and the benefit potentially high), in this case while still adhering to the law as set forth by a court. The judicial interpretation guides that you can train if you own a copy of the work; build accordingly.

When Search Engine Services meet Large Language Models: Visions and Challenges - https://arxiv.org/html/2407.00128v1

jplusequalt•7mo ago

It is now.

I'm not going to engage with anyone who tries to force a weak comparison with an LLM to a different kind of technology. I think we as developers and scientists are well aware that these LLMs are sufficiently different from what's come before for such a comparison to work.

JumpCrisscross•7mo ago

> folks here who were dragging the Internet Archive for controlled digital lending, just trying to be a digital library

The problem with the Internet Archive was it jumped into uncontrolled lending. Basically, there was no practical reason to buy a book that the Internet Archive would "lend" you. That simply isn't true for an LLM citing a book, which may still cause me to read it.

toomuchtodo•7mo ago

Disclosure: No affiliation besides sending them financial support as well as physical collections. I am a supporter, to put it mildly, because we need libraries that cannot be burned.

[section removed; refer to harshreality's context in sibling comment, which is more thorough than what I had commented]

As a reasonable person who believes one should be reasonable until they can no longer be reasonable, as a hacker, this is a green light to be unreasonable as it relates to onerous copyright regulations infringing on the commons. Copyright stakeholders took too much from the commons, and this enables people to take back (with a combination of judicial review and technology).

https://www.library.upenn.edu/news/hachette-v-internet-archi...

https://blog.archive.org/tag/controlled-digital-lending/

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

JumpCrisscross•7mo ago

> While there was a brief period during COVID where lending restrictions were removed from the lending system due to quarantines and shutdowns (and you could quite literally not get into some physical libraries)

This is the moment their liability accelerated. It was stupid, impulsive and put--and may continue to put--the whole project at risk.

toomuchtodo•7mo ago

Well, they made a bold move to protect what they believe in as a bona fide library [1], and while it was not successful, I am more than happy to see a bigger bully come along and even things out. Copyright holders aren't just going to allow fair use, or let profits go uncaptured, ya know? They're going to keep taking until there is nothing left and everything is locked behind a paywall for ~150 years [2] (life + 70 years).

Edit (wrt your reply): Drop into a weekly lunch at the SF location and have a conversation on this. Additional context is always helpful, I find.

[1] https://blog.librarylaw.com/librarylaw/2007/07/internet-arch...

[2] https://www.copyright.gov/help/faq/faq-duration.html

JumpCrisscross•7mo ago

> Copyright holders aren't just going to allow fair use

What they allow doesn't matter. Under controlled lending, the Archive was operating within precedent. I'm not against launching a test case for uncontrolled lending. But doing it with the entire library, irrespective of publication date or jurisdiction, and through the main organisation was just stupid.

mistrial9•7mo ago

people were in a panic under lockdown here in urban California, and elsewhere. The whole of the Internet Archive is experimental and out-of-the-box.. It is beyond unjust to watch corporate data hoarders profit at large while IA gets excoriated by fellow "smarty" people. IA is wonderful in their weird ways.

JumpCrisscross•7mo ago

> beyond unjust to watch corporate data hoarders profit at large while IA gets excoriated by fellow "smarty" people. IA is wonderful in their weird ways

I love the Archive. And as I said above, I'm not fundamentally against uncontrolled lending as a legal test case. But doing it with the entire catalogue, instead of just e.g. older books, and doing it out of the main organisation such that there is no legal segregation between the experiment and the entire project struck me as incredibly impulsive. They're wonderful and weird. But perhaps not the best steward of what's pitched as an archive.

harshreality•7mo ago

During the initial COVID lockdown, IA implemented their National Emergency Library lending program, which is the only uncontrolled (not 1:1) lending of works known to be under copyright that I know of:

1. Almost all physical libraries, including at universities, had shut down, and IA's efforts were to compensate for that. Supply of previously-purchased-for-public-use books had dried up overnight.

2. Normal 1:1 lending was not observed, but DRM was still applied. No books lent during that period were usable outside of a narrow window, without the end user intentionally circumventing the DRM which is a distinct crime and isn't trivial for the average computer user.

3. IA entered into dialogue with major academic publishers before implementing the emergency library. They published an opt-out address on their blog. That wasn't as visible to all publishers or independent authors as it probably should have been, but everyone knows if it had been opt-in instead, authors and publishers all would have reflexively refused, despite the unique situation. Mainstream publishers and most authors simply do not care if people can't access library books for a month or two. They would ban libraries if they could. They're so high on their intellectual property rights they think those can't be relaxed no matter what.

4. As far as I know, all NEL books were scanned PDFs, from donated library book collections. Casual readers don't read scanned books for entertainment. The idea that this significantly displaced kindle purchases, for instance, or lending of real ebooks from local libraries, needs a citation.

There's plenty of additional background here: https://blog.archive.org/national-emergency-library/

mosdl•7mo ago

There is a big difference between the public able to read the books vs using them build a commercial product. Especially if that product will be used to generate commercial work that competes with your work.

jjk166•7mo ago

Is there? Copyright is not supposed to prevent the generation of commercial work that might compete with your own.

subscribed•7mo ago

Llama is open weight and they have been caught training on torrents of the copyrighted books, so you can use it already :)

criddell•7mo ago

Legally acquired copyrighted books.

> Despite siding with the AI company on fair use, Alsup wrote that Anthropic will still face trial for the pirated copies it used to create its massive central library of books used to train AI.

meristohm•7mo ago

Huh. Us meatbags are not just artificial intelligences, we're organic intelligences and thus more important than robots (who are not alive; even if we make golems and infuse them with "life", they are not human animals), and all of us are in training throughout our lives, so this means training on copyrighted material is fair use.

Edit: I see another commenter, presumably human, clarified: "legally-acquired copyrighted books" Even with the arguments about AI being potentially helpful to disabled humans, one healthier route is to help each other out directly instead of dividing and conquering with technology, in the name of helping. Feels like one of the aims of Capitalists is to put us each into our Matrix (1999 movie) battery capsules and bleed us dry while we're distracted.

phendrenad2•7mo ago

While this may seem like a big issue, it's really to be expected. Asking judges to rule on new technology applied to old laws is like asking a bus driver to design an energy efficient bus motor. Judges are technicians, not scientists. They apply the law, they can't think creatively about new laws. For that, we have experts (political think-tanks). And I'm sure political think-tanks are kicking into overdrive as they realize that ramifications of this ruling. This will have an impact, but it's hard to determine what that will be. To some degree, it will disincentivize writing books. If this ruling only applies to SELLING books, then some people will make their books subscription-only, and will test this law against that (do AI companies need a perpetual subscription if they've trained an AI on a subscription-only book?). If AI companies are the primary consumers of books, and everyone just gets their information from AI, then direct-to-consumer books will cease to be a thing, and authors will sell their books directly to AI companies for $100,000 or $1,000,000.

JumpCrisscross•7mo ago

> judges are technicians, not scientists. They apply the law, they can't think creatively about new laws

You've never read an opinion that required creativity?

phendrenad2•7mo ago

THAT's the part of my post you disagree with? lmao

JumpCrisscross•7mo ago

> THAT's the part of my post you disagree with?

Yes. Judges in a common-law system don't just "apply the law," they literally make law. Treating judges as automatons fundamentally misunderstands their role, which makes any predictions based on that misunderstanding likely specious.

phendrenad2•7mo ago

I think we're getting hung up on a technicality here, so I'll concede that judges "make law" in whatever way you say they do.

And I don't just base my predictions on that understanding, it's more of a pass-through thing. I base my predictions on observed reality, where there is a legislative branch that makes laws and a judicial branch that interprets them within narrow confines (supposedly, they basically have free reign to go wild and say up is down and down is up, but that rarely happens because they don't want to be laughed at in public)

megaman821•7mo ago

I don't think there should be many (if any at all) copyright restrictions on training. This doesn't give AIs the right to violoate copyright laws in its output. Plus stopping AIs from training may also stop just gathering metadata on materials like word frequency, genre or protaganist age.

gnabgib•7mo ago

Discussion (166 points, 1 day ago, 197 comments) https://news.ycombinator.com/item?id=44367850

heavyset_go•7mo ago

This might lead to some creators and publishers silo-ing off valuable content in tightly controlled environments. Tightly controlled in terms of both DRM used to prevent screen/web scraping and potential contractual obligations restricting use for training if granted access.

Think Blu-Ray DRM but for more than video, it's already happened with publishers and college textbooks.

mmonaghan•7mo ago

It's the right move and authors + publishers should be rooting for it. Either your work lives in the corpus of human knowledge which AI's will increasingly reflect more perfectly over time or you're forgotten. You've also got precedent that they have to pay for access to your work like any other human.

As long as they're not violating copyright laws in output, it's fine and good.

bgwalter•7mo ago

You have a nice book there. It would be a shame if something happened to it!

bgwalter•7mo ago

“Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, ‘Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.’”

More smug nonsense emanating from Misanthropic. There is no creativity that is enabled. People tweak the prompts like children until something that was stolen from others emerges.

Most people working on "AI" have never created anything substantial. They rely on utilizing other people's creations. It is very sad that Alsup caves to big tech and issues a vibe ruling.

DamnInteresting•7mo ago

I say this as an author who has definitely had a lot of my work slurped up by these machine-learning goblins: This was the right call. I learned to write by reading other authors' works, so I'd have to be quite the hypocrite to stop others from learning from me. Still, it makes me sad and tired to know that I'm unwittingly training my own replacement--one that will never be sad or tired itself.

In my view, one real gray area is in image/video generation, especially "x in the style of y" kinds of shenanigans. As a society we may need to consider some better protections for an artist's/studio's style, otherwise distinct and novel and interesting styles will become watered down into a sea of bland mimicry until the sweet release of the heat death of the universe.

123yawaworht456•7mo ago

>In my view, one real gray area is in image/video generation, especially "x in the style of y" kinds of shenanigans.

it's not a gray area when humans do it, so it's not a gray area

well over 50% of those obnoxiously loud anti-ai artists make a living off fan art, and until 3 years ago, copyright concerns would be scoffed at

Sebastian Galiani on the Marginal Revolution

Ask HN: Are we at the point where software can improve itself?

Binance Gives Trump Family's Crypto Firm a Leg Up

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

Indian Culture

Show HN: Maravel-Framework 10.61 prevents circular dependency

The age of a treacherous, falling dollar

Ask HN: AI Generated Diagrams

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Show HN: A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

Sebastian Galiani on the Marginal Revolution

Ask HN: Are we at the point where software can improve itself?

Binance Gives Trump Family's Crypto Firm a Leg Up

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

Indian Culture

Show HN: Maravel-Framework 10.61 prevents circular dependency

The age of a treacherous, falling dollar

Ask HN: AI Generated Diagrams

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Show HN: A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

Federal judge rules copyrighted books are fair use for AI training

Comments