>> The judge ruled last month, in essence, that Anthropic's use of pirated books had violated copyright law
This is not what certification of a class action lawsuit means. It's procedural, not substantive and doesn't weigh in on the merits of the lawsuit. It's about the mechanics of bringing about action representing a potentially huge group of class representatives. The article then goes on to speculate about how Anthropic will post bond for the billions that have been awarded while trying to fundraise, so the clickbait title is backed up with "what if" fan fiction and there's nothing substantive here.
The courts are just reading the room. If you're a judge appointed by a guy who wants it to be illegal to regulate AI at all, you're not gonna be too keen on regulating AI.
> While the risk of a billion-dollar-plus jury verdict is real, it’s important to note that judges routinely slash massive statutory damages awards — sometimes by orders of magnitude. Federal judges, in particular, tend to be skeptical of letting jury awards reach levels that would bankrupt a major company.
But creative works... It's muddy. Just get a fucking license for copyrighted shit. It's really not even as difficult as people think; every publishing industry is so consolidated now that it would be entirely possible to work out some industry standard AI training license for publishers to opt into and pay some pittance to creators like Spotify does. Creators then have to opt in if they want a large publisher to publish their shit.
You can still self publish and retain all your rights; but 99% of people won't do it because... Idk they're lazy. Look at the music industry. Other than like, insane clown posse NOFX and $uicideboy$, most bands just sign with a label and let them work out the copyright stuff.
You then sign a deal with the label to train on the works where they own the publishing right. It's opt in for the creators, people can still make money from creating things. It's the same in the film business. If you want your movie in theatres you sign a deal with some publisher. There aren't really that many.
Has literally nobody in tech considered how doable this is? Or tech people are just so used to being able to do anything they want that nobody looked into it?
Copyright violations are something like 150k per instance, so each time it was copied. Picture a bunch of bootleg DVDs, the fine is not one-time for 7 DVDs. It's 7x150k.
Antrophic will potentially ruin the value of every work they trained on. Google should have been fined, too tbh. Napster got shut down but Google gets a pass? It's nonsense. The recording industry just had more weight to throw around and I guess nobody gives a shit about books.
I doubt anyone would be stoked to foot a bill for trillions of dollars of damages when an AI company is hit for training on millions of copyrighted works. Multiply the number of works times 100,000 to get a floor for max statutory damages.
If they had bought each book themselves would it be fair use? So this is only about the piracy?
So, y'know, probably more like 80% demon summoning.
An awfully convenient explanation of what went down. Gives some good "dog ate my homework" vibes.
The double standard vs google etc. is of course despicable
but google money is so big that they could just buys the entire thing + publisher
Google selling books as a publisher on playstore so it can get away with it
Are you sure about that?
Maybe on a good day you'll get a paragraph, but getting a few pages equivalent to a "book preview"? No shot.
"Copy" right. Get it? Right to copy.
The courts don't always make rational decisions. They're dumb and corrupt.
The earlier ruling covered exactly that question:
- Anthropic downloaded many books (from LibGen and elsewhere). This piracy is what the current case is about, and is unrelated to the training.
- Separately, Anthropic bought and scanned a million used books. They trained the AI on this data. This was ruled as fair use, and is not involved in the current case.
It would be one thing if they were buying "used" digital copies of the books, but the fact that this is only legal with scanned physical copies makes it extremely wasteful.
Copyright has been very silly in the digital realm from the beginning and is unlikely to get less unhinged from reality absent a major overhaul that makes it completely unrecognizable.
Is there some nuance to the law that allows them to scan/copy them if they're physical but not if they're digital?
A lot of digital copies are also DRM'd to shit - to obtain raw text usable for AI training, you'd have to break DRM. Which isn't that hard, on a technical level - but DMCA exists.
DMCA is a shit law that should have been dismantled two decades ago - but as long as it's around, bypassing DRM on things you own can be illegal. Scanning sidesteps that.
If no physical copies existed and there were only DRMd digital copies of everything, the companies scanning books for AI training would be forced to work out some deal with the DRM-overlords to have it removed for their use. That (I think) would be a net benefit as hopefully the authors would get paid too.
Within the bounds of personal use, copyright holders should have no say over what people do with media after it is sold. That goes equally when the entity that buys the media is a company rather than a person. The entire reason DRM is a problem is that it subverts that principle using technical means.
I'm totally in agreement with you, once we buy something, it should be ours to do with as we wish, company or person. DRM is the sketchy technical solution that doesn't really solve a technical purpose, it's easily broken, but serves a legal one; the act of breaking it is the legal issue.
I make my stance by avoiding buying DRMd content where possible; DRM free games and digital books, but it's not always possible to avoid, if I buy a BD, I can't rip it to my NAS without subverting the DRM.
Linux is also the only OS running in my home (on computers with screens and keyboards) so I mostly can't even legitimately play those DRMd things if I buy them, whether it's a BD, or Netflix in my web browser, or whatever else if I wanted to.
I'm very, very much anti-DRM.
EDIT: Typo
how can a company be covered under personal-use?
Which is the kind of thing you would expect it to do.
I’m sure there’s lots of unintended problems with this, but it does feel like a common base set of training data like this is exactly the sort of thing the government can and should do.
Would anyone agree if you replaced companies with people in that argument?
Why shouldn't a company follow the same rules as everyone else just because the scale at which they're doing it is so large?
I'd argue a company doing something like this should be forced to buy the books NEW and benefit the authors, and if they're found guilty of copyright infringement they should be punished at a scale a few orders of magnitude larger than an individual would be.
> Before you can train an AI, you must light 1 million dollars on fire
If I want to train an AI, I probably need to spend a larger part of my budget as an individual to do so than an org, should I be given the resources for free or severely discounted because I want to make money out of it?
I suppose one _could_ argue in favour of such a practice if it was going to benefit society as a whole, but is it?
The best solution I can come up with would be a digital library where one org, say the internet archive has scanned everything once, then they're charge a licence fee to these orgs to ingest a copy, and the part of the payment goes to the author, no big wastage, the information gets archived and the orgs pay their share.
I mean, demanding you pay money to the source of data in your quest to create a monopoly you are pretty much guaranteed to abuse later on while becoming filthy rich is not exactly unfair.
Oh but the reason is that they're now making $3 billion/year, partially because of those books. I see an argument for the inefficiency behind having to rescan books that are already scanned, but not the cost. If there was a way to buy pre-scanned books from Google Books or whatever then I somewhat see where you're coming from.
I argue that there were positive effects of Anthropic having to buy and scan physical books:
* The choices people made choosing which physical books to buy and scan helped make Claude what it is. Personally I sense a difference between Claude and OpenAI and Gemini, and part of it comes down to the choices they made in training material. Sorry to go on and on, but how many choices here were made because it was a rainy day and the trains were down, so an intern went to bookstore A instead of bookstore B?
* While buying the books used didn't help the authors it helped the struggling bookstores selling their books. Literal dollars into the hands of local workers. When I fast forward to today and see how LLM companies are literally stealing the energy from the communities their data centers are based in, and polluting them with shitty power plants I can at least think of that as one positive outcome, even if it only happened once.
As far as the 7 million+ books Anthropic didn't pay for, their series B in 2022 brought in $580 million. They could have afforded those books.
That should've been the first sentence in this article. Nothing to see here, folks. Just another baity headline.
That's what I thought. Members of the class should be disqualified from joining based on this criteria alone
>But Alsup split a very fine hair. In the same ruling, he found that Anthropic’s wholesale downloading and storage of millions of pirated books — via infamous “pirate libraries” like LibGen and PiLiMi — was not covered by fair use at all. In other words: training on lawfully acquired books is one thing, but stockpiling a central library of stolen copies is classic copyright infringement.
I am not actively following this trend but didn't Meta do the exact same and successfully argued that it was fair use because.. they didn't seed or upload any data?What am I missing here?
biglyburrito•10h ago