frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Anthropic Faces Potentially "Business-Ending" Copyright Lawsuit

https://www.obsolete.pub/p/anthropic-faces-potentially-business
60•Invictus0•11h ago

Comments

biglyburrito•10h ago
Oh no. Anyway...
dowager_dan99•9h ago
the courts have been pretty clear in this area so far, siding with some variation on "progress" over the ownership argument. It definitely feels like a "too big to fail" scenario at this stage.

>> The judge ruled last month, in essence, that Anthropic's use of pirated books had violated copyright law

This is not what certification of a class action lawsuit means. It's procedural, not substantive and doesn't weigh in on the merits of the lawsuit. It's about the mechanics of bringing about action representing a potentially huge group of class representatives. The article then goes on to speculate about how Anthropic will post bond for the billions that have been awarded while trying to fundraise, so the clickbait title is backed up with "what if" fan fiction and there's nothing substantive here.

reverendsteveii•8h ago
>It definitely feels like a "too big to fail" scenario at this stage

The courts are just reading the room. If you're a judge appointed by a guy who wants it to be illegal to regulate AI at all, you're not gonna be too keen on regulating AI.

dcre•9h ago
The article responds to its own headline with “...but not really”:

> While the risk of a billion-dollar-plus jury verdict is real, it’s important to note that judges routinely slash massive statutory damages awards — sometimes by orders of magnitude. Federal judges, in particular, tend to be skeptical of letting jury awards reach levels that would bankrupt a major company.

daveguy•9h ago
Would anthropic be considered a "major company"? Paper value is not the same thing as being a Microsoft, Google, or even OpenAI. An example company that would set precedent and get the "major companies" to stop stealing from creators might be allowed. Not saying they'll allow a bankrupting judgement, but I don't think it will be dropped to slap-on-the-wrist level either.
ijk•8h ago
Anthropic is arguing, among other things, that bankrupting the smallest major AI player while all the bigger ones do the same thing should be a reason to reduce the consequences.
daveguy•8h ago
Well, I agree with that argument in the sense that their punishment should be relative to their size. But I was just talking about plantiff strategy. A judgement against Anthropic of any magnitude might open the floodgates for class action lawsuits by creators against major tech companies. I also think that's a good thing, because AI models are clearly laundering work done by original authors.
reverendsteveii•8h ago
This might not be the right way to highlight it but I personally am very interested in the ways in which the proscribed penalties for breaking the law greatly diverge from the actual penalties inflicted, with the variance being directly correlated to the offender's budget for legal defense. Doubly so because giving one appointed official the final say over actual community members kinda feels like an inversion of the way our government claims to work and a backdoor implementation of a different laws for different people.
lvl155•9h ago
Just another pathetic money grab attempt in the AI space.
Incipient•9h ago
Even IF they were awarded such damages, there would be a queue of rich people lining up to foot the bill to take a stake of anthropic.
TuringNYC•8h ago
This. And I think the US Gov will step up and decide that Gen AI is too critical to disrupt for continued US hegemony vs China.
tonyhart7•8h ago
You acting like copyright would stop china doing the same thing
TuringNYC•8h ago
That is my point -- China will proceed forward regardless. The US Government will likely squash this lawsuit to compete with China.
CuriouslyC•8h ago
Yeah, zero percent chance we don't get lawmakers and the executive stepping in to squash this if the courts don't rule the way they want.
reverendsteveii•8h ago
yeah, we're getting wartime government when it comes to AI. we're getting government of endless special cases, "here's why the law shouldn't apply this time" every time
dfedbeef•8h ago
Here's the thing, it's probably not critical for creative works. You can train on open source code, sure. Reddit; sure, maybe.

But creative works... It's muddy. Just get a fucking license for copyrighted shit. It's really not even as difficult as people think; every publishing industry is so consolidated now that it would be entirely possible to work out some industry standard AI training license for publishers to opt into and pay some pittance to creators like Spotify does. Creators then have to opt in if they want a large publisher to publish their shit.

You can still self publish and retain all your rights; but 99% of people won't do it because... Idk they're lazy. Look at the music industry. Other than like, insane clown posse NOFX and $uicideboy$, most bands just sign with a label and let them work out the copyright stuff.

You then sign a deal with the label to train on the works where they own the publishing right. It's opt in for the creators, people can still make money from creating things. It's the same in the film business. If you want your movie in theatres you sign a deal with some publisher. There aren't really that many.

Has literally nobody in tech considered how doable this is? Or tech people are just so used to being able to do anything they want that nobody looked into it?

dfedbeef•8h ago
Uh... no.

Copyright violations are something like 150k per instance, so each time it was copied. Picture a bunch of bootleg DVDs, the fine is not one-time for 7 DVDs. It's 7x150k.

Antrophic will potentially ruin the value of every work they trained on. Google should have been fined, too tbh. Napster got shut down but Google gets a pass? It's nonsense. The recording industry just had more weight to throw around and I guess nobody gives a shit about books.

I doubt anyone would be stoked to foot a bill for trillions of dollars of damages when an AI company is hit for training on millions of copyrighted works. Multiply the number of works times 100,000 to get a floor for max statutory damages.

dfedbeef•8h ago
Don't be mad at me, be mad at Disney and Sonny Bono.
comrade1234•9h ago
Why is it fair use (hashed out in court already) for google to copy every book they can get a hold of and store the full pages and use them to create their n-grams data and presumably to train their ai, but not for this company?

If they had bought each book themselves would it be fair use? So this is only about the piracy?

sidewndr46•9h ago
The way I understand the case, yes. If Anthropic had just 1 copy of each book that would be permitted. Apparently they at some point bought copies of most books then shredded them. I'm not entirely sure how that makes any difference, but it was done after the piracy.
qgin•9h ago
At some point this starts to feel like witchcraft spells.
ACCount36•8h ago
Machine learning is 20% programming, 30% math and 50% demon summoning.
danaris•8h ago
That may be so, but this is really more about the law.

So, y'know, probably more like 80% demon summoning.

s1mplicissimus•8h ago
> Apparently they at some point bought copies of most books then shredded them.

An awfully convenient explanation of what went down. Gives some good "dog ate my homework" vibes.

The double standard vs google etc. is of course despicable

tonyhart7•8h ago
they would target google too

but google money is so big that they could just buys the entire thing + publisher

Google selling books as a publisher on playstore so it can get away with it

justonceokay•9h ago
N grams doesn’t threaten anyone’s business. To have a lawsuit you need damages.
ctkhn•9h ago
With n-gram it isn't just repeating the exact content of the book back to you the way the book itself does, which is what AI does, it's a statistical overview of the words from many books. Knowing what year use of the word "slouch" peaked in print isn't any kind of substitute for reading a book that uses the word.
bigmadshoe•9h ago
“which is what AI does” what? The term “statistical overview of the words from many books” is a great way to describe an LLM. It’s not like the weights encode every book verbatim.
Workaccount2•8h ago
>repeating the exact content of the book back to you the way the book itself does, which is what AI does

Are you sure about that?

Maybe on a good day you'll get a paragraph, but getting a few pages equivalent to a "book preview"? No shot.

kristofferR•9h ago
Yes, only about the piracy.
dfedbeef•8h ago
There's a 'first sale doctrine' in the US. Once you purchase a book, you can make a copy of it for yourself. Same with an album or game.

"Copy" right. Get it? Right to copy.

JohnFen•8h ago
True. But you have to treat the thing you bought and all copies as an indivisible unit. If you sell or give away the thing or any of its copies, you have to include (or destroy) all other copies of that thing.
dfedbeef•8h ago
Yep. Google shouldn't have won that suit, just like Marvin Gaye should not have won his suit against Bruno Mars.

The courts don't always make rational decisions. They're dumb and corrupt.

ijk•8h ago
> If they had bought each book themselves would it be fair use? So this is only about the piracy?

The earlier ruling covered exactly that question:

- Anthropic downloaded many books (from LibGen and elsewhere). This piracy is what the current case is about, and is unrelated to the training.

- Separately, Anthropic bought and scanned a million used books. They trained the AI on this data. This was ruled as fair use, and is not involved in the current case.

Ajedi32•8h ago
That's very interesting, because it totally makes sense legally, but the practical effect is ludicrously stupid. The law is effectively forcing companies to spend millions re-scanning the same books over and over for no reason. It'd be like if we had a law which stated "Before you can train an AI, you must light 1 million dollars on fire. After that you can do whatever you want.". It serves no purpose but to waste societal resources on nothing.
aswegs8•8h ago
Why would they need to "re-scan the same books over and over"? It's as simple as they can use the books to train their AI if they bought them.
Ajedi32•8h ago
Because company A needs to scan the books, then company B wants to train their AI so they need to scan the same books, then company C wants to train their AI so they need to scan the same books... etc.

It would be one thing if they were buying "used" digital copies of the books, but the fact that this is only legal with scanned physical copies makes it extremely wasteful.

ijk•8h ago
It would probably be legal with digital copies; it's just that book publishers have been very zealous in preventing the existence of a market for used digital books.

Copyright has been very silly in the digital realm from the beginning and is unlikely to get less unhinged from reality absent a major overhaul that makes it completely unrecognizable.

triceratops•8h ago
Digital media, in particular its resale, is the one good use case for blockchains that no one seems to be interested in (and don't provide me a link of some obscure project working on it; what can I buy with it?). Probably because it's useful for consumers but not for making money.
alias_neo•8h ago
I don't have any sympathy for big orgs, they can follow the same rules as the rest of us, and should be slapped even harder for this than an individual accused of the same thing, however, I'm curious, why can't they buy digital copies in the first place?

Is there some nuance to the law that allows them to scan/copy them if they're physical but not if they're digital?

ACCount36•7h ago
Not every book is readily available as a digital copy. Things like textbooks, older technical books or just books that weren't too popular can be easier to source as physical books and scan destructively.

A lot of digital copies are also DRM'd to shit - to obtain raw text usable for AI training, you'd have to break DRM. Which isn't that hard, on a technical level - but DMCA exists.

DMCA is a shit law that should have been dismantled two decades ago - but as long as it's around, bypassing DRM on things you own can be illegal. Scanning sidesteps that.

alias_neo•7h ago
I'm anti-DRM personally, but I suppose in this case we could argue it's serving its purpose, it's just that workarounds have been found in the form of scanning physical books.

If no physical copies existed and there were only DRMd digital copies of everything, the companies scanning books for AI training would be forced to work out some deal with the DRM-overlords to have it removed for their use. That (I think) would be a net benefit as hopefully the authors would get paid too.

Ajedi32•7h ago
You say you're anti-DRM but that sounds like a very pro-DRM stance to me.

Within the bounds of personal use, copyright holders should have no say over what people do with media after it is sold. That goes equally when the entity that buys the media is a company rather than a person. The entire reason DRM is a problem is that it subverts that principle using technical means.

alias_neo•6h ago
That wasn't a stance, it was a hypothetical.

I'm totally in agreement with you, once we buy something, it should be ours to do with as we wish, company or person. DRM is the sketchy technical solution that doesn't really solve a technical purpose, it's easily broken, but serves a legal one; the act of breaking it is the legal issue.

I make my stance by avoiding buying DRMd content where possible; DRM free games and digital books, but it's not always possible to avoid, if I buy a BD, I can't rip it to my NAS without subverting the DRM.

Linux is also the only OS running in my home (on computers with screens and keyboards) so I mostly can't even legitimately play those DRMd things if I buy them, whether it's a BD, or Netflix in my web browser, or whatever else if I wanted to.

I'm very, very much anti-DRM.

EDIT: Typo

GuinansEyebrows•6h ago
> Within the bounds of personal use, copyright holders should have no say over what people do with media after it is sold. That goes equally when the entity that buys the media is a company rather than a person.

how can a company be covered under personal-use?

fragmede•5h ago
The problem is the transition from analog to digital. It is entirely legal, for an entity to buy a physical book, and then loan those books out, aka a library. That entity is free to charge money, or might even be a part of the local government. But see, copyright is a thing that was invented in the first place. Why should it even exist in the first place? Of course we can't argue with the fact that the world we are in has copyright, but in countries where there is less copyright protection, it doesn't seem like the sky has fallen there either. We want to promote science and useful arts and incentivize creation. It's supposed to be a temporary monopoly granted by the government before works fall into the public domain. Originally 14 years, with another 14 years if the author was alive. We should absolutely do what we can to encourage science and the arts, but Disney's managed to take it way further than it was originally specified for.
ACCount36•1h ago
Given that "training AI on books you own" was ruled fair use, the "purpose" DRM is serving here is preventing fair use.

Which is the kind of thing you would expect it to do.

bcrl•55m ago
Or they could just pay the authors of the books directly for a license... Isn't that kinda like how a lot of software companies are compensated?
JohnFen•8h ago
The cost might reduce the number of entities who can afford to do it, though, which would reduce the amount of abuse.
bostonsre•8h ago
It needs to hook into the existing legal book supply chain so that authors could potentially get compensated (I doubt they do for used book resale tho..).
Eric_WVGG•8h ago
"no reason"? Try telling that to the people who wrote the books.
Ajedi32•8h ago
The process of repeatedly re-scanning used books benefits the authors? how?
yuliyp•8h ago
They're not getting anything out of the small bump in resale prices well after they wrote and sold those books.
tpmoney•8h ago
I’ve been thinking recently that an overhaul to the copyright system could solve this. Return to a very low default (10 years? 20?). Allow extensions but a requirement for extension is submitting the work to a government managed digital data set that is licensed out to people to use as training data for these sorts of systems (or anything else a massive digitized cataloged library could be useful for). Licensing is some nominal amount of money and the revenue from that is distributed to copyright holders who have submitted their works in proportion to the recency and volume of content (with some cap to avoid flooding the system with content just to get more payouts.

I’m sure there’s lots of unintended problems with this, but it does feel like a common base set of training data like this is exactly the sort of thing the government can and should do.

alias_neo•8h ago
> The law is effectively forcing companies to spend millions re-scanning the same books over and over for no reason

Would anyone agree if you replaced companies with people in that argument?

Why shouldn't a company follow the same rules as everyone else just because the scale at which they're doing it is so large?

I'd argue a company doing something like this should be forced to buy the books NEW and benefit the authors, and if they're found guilty of copyright infringement they should be punished at a scale a few orders of magnitude larger than an individual would be.

> Before you can train an AI, you must light 1 million dollars on fire

If I want to train an AI, I probably need to spend a larger part of my budget as an individual to do so than an org, should I be given the resources for free or severely discounted because I want to make money out of it?

I suppose one _could_ argue in favour of such a practice if it was going to benefit society as a whole, but is it?

Ajedi32•7h ago
I'm not saying companies should follow different rules than people, I'm saying the rules as written make no sense. This particular example just happens to make that fact more readily apparent due to the sheer scale of the needless waste involved.
alias_neo•7h ago
I'm anti-DRM myself, but someone else could argue that the rules are partly doing their job; preventing companies from just gobbling up digital copies, it just happens that they have he resources to take advantage of a loophole by scanning the books in themselves.

The best solution I can come up with would be a digital library where one org, say the internet archive has scanned everything once, then they're charge a licence fee to these orgs to ingest a copy, and the part of the payment goes to the author, no big wastage, the information gets archived and the orgs pay their share.

watwut•7h ago
> Before you can train an AI, you must light 1 million dollars on fire.

I mean, demanding you pay money to the source of data in your quest to create a monopoly you are pretty much guaranteed to abuse later on while becoming filthy rich is not exactly unfair.

troyvit•6h ago
> The law is effectively forcing companies to spend millions re-scanning the same books over and over for no reason.

Oh but the reason is that they're now making $3 billion/year, partially because of those books. I see an argument for the inefficiency behind having to rescan books that are already scanned, but not the cost. If there was a way to buy pre-scanned books from Google Books or whatever then I somewhat see where you're coming from.

I argue that there were positive effects of Anthropic having to buy and scan physical books:

* The choices people made choosing which physical books to buy and scan helped make Claude what it is. Personally I sense a difference between Claude and OpenAI and Gemini, and part of it comes down to the choices they made in training material. Sorry to go on and on, but how many choices here were made because it was a rainy day and the trains were down, so an intern went to bookstore A instead of bookstore B?

* While buying the books used didn't help the authors it helped the struggling bookstores selling their books. Literal dollars into the hands of local workers. When I fast forward to today and see how LLM companies are literally stealing the energy from the communities their data centers are based in, and polluting them with shitty power plants I can at least think of that as one positive outcome, even if it only happened once.

As far as the 7 million+ books Anthropic didn't pay for, their series B in 2022 brought in $580 million. They could have afforded those books.

tedivm•5h ago
The law isn't forcing people to do this, economics are. Nothing about the law forces people to use physical books, just that they actually pay for the books instead of stealing them. The company thinks they can get away with this cheaper than negotiating for a digital copy of the book, so that's what they are doing.
BobaFloutist•1h ago
I mean I don't think the law precludes them legally purchasing ebooks.
noitemtoshow•9h ago
Nope. Books don't worth much.
dbalatero•8h ago
I can tell.
AznHisoka•8h ago
"As a matter of practice (and sometimes doctrine), judges rarely issue rulings that would outright force a company out of business.... So while the jury’s damages calculation will be the headline risk, it probably won’t be the last word."

That should've been the first sentence in this article. Nothing to see here, folks. Just another baity headline.

volleygman180•8h ago
How many members of the class have abstained from using the LLMs provided by the major AI companies, raise your hands...

That's what I thought. Members of the class should be disqualified from joining based on this criteria alone

x______________•6h ago

  >But Alsup split a very fine hair. In the same ruling, he found that Anthropic’s wholesale downloading and storage of millions of pirated books — via infamous “pirate libraries” like LibGen and PiLiMi — was not covered by fair use at all. In other words: training on lawfully acquired books is one thing, but stockpiling a central library of stolen copies is classic copyright infringement.

I am not actively following this trend but didn't Meta do the exact same and successfully argued that it was fair use because.. they didn't seed or upload any data?

What am I missing here?

EMIRELADERO•10m ago
You're not missing anything. The Anthropic and Meta judges simply disagreed with each other and issued two opposite holdings.