It has been admitted and Anthropic knew that this trial would totally bankrupt them had they said they were innocent and continued to fight the case.
But of course, there's too much money on the line, which means even though Anthropic settled (admitting guilt and profiting off of pirated books) they (Anthropic) knew there was no way they could win that case, and was not worth taking that risk.
> The pivotal fair-use question is still being debated in other AI copyright cases. Another San Francisco judge hearing a similar ongoing lawsuit against Meta ruled shortly after Alsup's decision that using copyrighted work without permission to train AI would be unlawful in "many circumstances."
The first of many.
It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.
Buying used copies of books, scanning them, and training on it is fine.
Rainbows End was prescient in many ways.
That's how Google Books, the Internet Archive, and Amazon (their book preview feature) operated before ebooks were common. It's not scalable-in-a-garage but perfectly scalable for a commercial operation.
The books that are destroyed in scanning are a small minority compared to the millions discarded by libraries every year for simply being too old or unpopular.
Book burnings are symbolic (Unless you're in the world of Fareinheit 451). The real power comes from the political threat, not the fact that paper with words on them is now unreadable.
It is 500,000 books in total so did they really scan all those books instead of using the pirated versions? Even when they did not have much money in the early phases of the model race?
Agreed. Great book for those looking for a read: https://www.goodreads.com/book/show/102439.Rainbows_End
The author, Vernor Vinge, is also responsible for popularizing the term 'singularity'.
Reminds me of permutation city
“Marooned in Real Time” remains my fav.
What I'm wondering is if they, or others, have trained models on pirated content that has flowed through their networks?
I’m surprised Google hasn’t hit its competitors harder with the fact that they actually got permission to scan books from its partner libraries and Facebook and OpenAI just torrented books2/books3, but I guess they have aligned incentive to benefit from a legal framework that doesn’t look to closely at how you went about collecting source material
https://www.reddit.com/r/libgen/comments/1n4vjud/megathread_...
I'm pretty sure that's just a frontend for Uptime Kuma https://github.com/louislam/uptime-kuma
The whole incident is written up in detail, https://swartz-report.mit.edu/ by Hal Abelson (who wrote SICP among other things). It is a well-researched document.
The report speculates to his motivations on page 31, but it seems to be unknown with any certainty.
Information may want to be free, but sometimes it takes a revolutionary to liberate it.
but also prior to that he had written the guerilla open access manifesto so it wasn't great optics to be caught doing that
This is to teach a lesson because you cannot prosecute all thieves.
Yale Law Journal actually writes about this, the goal is to deter crime because in most cases damages cannot be recovered or the criminal will never be caught in the first place.
(Probability of not getting away with it) 0.01 * (Cost if caught) 1000 = 10x (Expected Cost) = not worth it
It’s crazy to imagine, but there was surely a document or slack message thread discussing where to get thousands of books, and they just decided to pirate them and that was OK. This was entirely a decision based on ease or cost, not based on the assumption it was legal. Piracy can result in jail time IIRC, so honestly it’s lucky the employee who suggested this, or took the action avoided direct legal liability.
Oh and I’m pretty sure other companies (meta) are in litigation over this issue, and the publishers knew that settlement below the full legal limit would limit future revenue.
Well actively generating revenue at least.
Profits are still hard to come by.
It's not the same as debt from a loan, because people are buying a percentage stake in the company. If the value of the company happens to go to zero there's nothing left to pay.
But yeah, the amount of investment a company attracts should have something to do with the perception that it'll operate at a profit at some point
In this specific case the settlement caps the lawyer fees at 25%, and even that is subject to the courts approval. In addition they will ask for $250k total ($50k / plaintiff) for the lead plaintiffs, also subject to the courts approval.
If anything it's too little based on precedent.
I think that this is a distinction many people miss.
If you take all the works of Shakespeare, and reduce it to tokens and vectors is it Shakespeare or is it factual information about Shakespeare? It is the latter, and as much as organizations like the MLB might want to be able to copyright a fact you simply cannot do that.
Take this one step further. IF you buy the work, and vectorize it, thats fine. But if you feed it in the vectors for Harry Potter so many times that it can reproduce half of the book, it becomes a problem when it spits out that copy.
And what about all the other stuff that LLM's spit out? Who owns that. Well at present, no one. If you train a monkey or an elephant to paint, you cant copyright that work because they aren't human, and neither is an LLM.
If you use an LLM to generate your code at work, can you leave with that code when you quit? Does GPL3 or something like the Elastic Search license even apply if there is no copyright?
I suspect we're going to be talking about court cases a lot for the next few years.
This seems too cute by half, courts are generally far more common sense than that in applying the law.
This is like saying using `rails generate model:example` results in a bunch of code that isn't yours, because the tool generated it according to your specifications.
I’d guess the legal scenario for `rails generate` is that you have a license to the template code (by way of how the tool is licensed) and the template code was written by a human so licensable by them and then minimally modified by the tool.
[1] https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
'The Board’s decision was later upheld by the U.S. District Court for the District of Columbia, which rejected the applicant’s contention that the AI system itself should be acknowledged as the author, with any copyrights vesting in the AI’s owner. The court further held that the CO did not act arbitrarily or capriciously in denying the application, reiterating the requirement that copyright law requires human authorship and that copyright protection does not extend to works “generated by new forms of technology operating absent any guiding human hand, as plaintiff urges here.”' From: https://www.whitefordlaw.com/news-events/client-alert-can-wo...
The court is using common sense when it comes to the law. It is very explicit and always has been... That word "human" has some long standing sticky legal meaning (as opposed to things that were "property").
So things get even more dark because what becomes distribution can have a really vague definition and maybe the AI companies will only follow the law just barely, just for the sake of not getting hit with a lawsuit like this again. But I wonder if all this case did was maybe compensate the authors this one time. I doubt if we can see a meaningful change towards AI companies attitude's towards fair use/ essentially exploiting authors.
I feel like that they would try to use as much legalspeak as possible to extract as much from authors (legally) without compensating them which I feel is unethical but sadly the law doesn't work on ethics.
Note that the law specifically regulates software differently, so what you cannot do is just willy nilly pirate games and software.
What distribution means in this case is defined in the swiss law. However swiss law as a whole is in some ways vague, to leave a lot up to interpretation by the judiciary.
I would assume it would compensate the publisher. Authors often hand ownership to the publisher; there would be obvious exceptions for authors who do well.
So to me, if you are doing literally any human review, edits, control over the AI then I think you'll retain copyright. There may be a risk that if somebody can show that they could produce exactly the same thing from a generic prompt with no interaction then you may be in trouble, but let's face it should you have copyright at that point?
This is, however, why I favor stopping slightly short of full agentic development at this point. I want the human watching each step and an audit trail of the human interaction in doing it. Sure I might only get to 5x development speed instead of 10x or 20x but that is already such an enormous step up from where we were a year ago that I am quite OK with that for now.
To rephrase the question:
Is a PDF of the complete works of Shakespeare Shakespeare, or is it factual information about Shakespeare?
Reencoding human-readable information into a form that's difficult for humans to read without machine assistance is nothing new.
It remains deranged.
Everyone has more than a right to freely have read everything is stored in a library.
(Edit: in fact initially I wrote 'is supposed to' in place of 'has more than a right to' - meaning that "knowledge is there, we made it available: you are supposed to access it, with the fullest encouragement").
They're merely doing what anyone is allowed to with the books that they own, loaning them out, because copyright law doesn't prohibit that, so no license is needed.
What is in a library, you can freely read. Find the most appropriate way. You do not need to have bought the book.
¹(Edit: or /may/ not be allowed, see posts below.)
I'd be interested to know if you knew of one with bright line rules delineating what is and isn't allowed.
(I know by practice but not from the letter of the law; to give you details I should do some research and it will take time - if I will manage to I will send you an email, but I doubt I will be able to do it soon. The focus is anyway on western European Countries.)
They didn't think it would be a good idea to re-bind them and distribute it to the library or someone in need.
They did not destroy old, valuable books which individually were worth millions.
https://arstechnica.com/ai/2025/06/anthropic-destroyed-milli...
As for needy people, they already have libraries and an endless stream of books being donated to thrift stores. Nothing of value was lost here.
Every human has the right to read those books.
And now, this is obvious, but it seems to be frequently missed - an LLM is not a human, and does not have such rights.
Additionally:
> Every human has the right to read those books.
Since when?
I strongly disagree - knowledge should be free.
I don't think the author's arrangement of the words should be free to reproduce (ie, I think some degree of copyright protection is ethical) but if I want to use a tool to help me understand the knowledge in a book then I should be able to.
[1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
Since in our legal system, only humans and groups of humans (the corporation is a convenient legal proxy for a group of humans that have entered into an agreement) have rights.
Property doesn't have rights. Land doesn't have rights. Books don't have rights. My computer doesn't have rights. And neither does an LLM.
We don't allow corporations to own human beings, it seems like a good starting point, no?
If you use the commons to create your model, perhaps you should be obligated to distribute the model for free (or I guess for the cost of distribution) too.
A vacuum removes what it sucks in. The commons are still as available as they ever were, and the AI gives one more avenue of access.
That is false. As a direct consequence of LLMs:
1. The web is increasingly closed to automated scraping, and more marginally to people as well. Owners of websites like reddit now have a stronger incentive to close off their APIs and sell access.
2. The web is being inundated with unverified LLM output which poisons the well
3. More profoundly, increasingly basing our production on LLM outputs and making the human merely "in the loop" rather than the driver, and sometimes eschewing even the human in the loop, leads to new commons that are less adapted to the evolutions of our world, less original and of lesser quality
By this logic one shouldn't be able to research for a newspaper article at a library.
They'll either go out of business or make better models paid while providing only weaker models for free despite both being trained on the same data.
I presume you (people do) have exploited that knowledge that society has made in principle and largely practice freely accessible to build a professionality, which is now for-profit: you will charge parties for the skills that available knowledge has given you.
The "profit" part is not the problem.
As soo as OpenAI open sources their model's source code I'll agree.
(The "for sale" side does not limit the purpose to sales only, before somebody wants to attack that.)
Knowledge costs money to gain/research.
Are you saying people who do the most valuable work of pushing the boundaries of human knowledge should not be fairly compensated for their work?
An LLM isn't an index.
I think it is obvious instead that readers employed by humans fit the principle.
> rights
Societally, it is more of a duty. Knowledge is made available because we must harness it.
Also, at least so far, we don't call computers "someone".
Probably so, because with "library" I did not mean the "building". It is the decision of the society to make knowledge available.
> we don't call computers "someone"
We do instead, for this purpose. Why should we not. Anything that can read fits the set.
--
Edit: Come up with the arguments, sniper.
there is an asimmetry between agreement and disagreement: the latter requires arguments.
"Sneering and leaving" is antisocial, and that is underlying most of downvoting.
Stop this deficient, improductive and disruptive culture.
He who has the gold makes the rules
Buying used copies of books, scanning them, and printing them and selling them: not fair use
Buying used copies of books, scanning them, and making merchandise and selling it: not fair use
The idea that training models is considered fair use just because you bought the work is naive. Fair use is not a law to leave open usage as long as it doesn’t fit a given description. It’s a law that specifically allows certain usages like criticism, comment, news reporting, teaching, scholarship, or research. Training AI models for purposes other than purely academic fits into none of these.
Unless legislation changes, model training is pretty much analogous to that. Now of course if the employee in question - or the LLM - regurgitates a copyrighted piece verbatim, that is a violation and would be treated accordingly in either case.
Does this still hold true if multiple employees are "trained" from scanned copies at the same time?
Regardless, the issue could be resolved by buying as many copies as you have concurrent model training instances. It isn't really an issue with training on copyrighted work, just a matter of how you do so.
But nobody was ever going to that, not when there are billions in VC dollars at stake for whoever moves fastest. Everybody will simply risk the fine, which tends to not be anywhere close to enough to have a deterrent effect in the future.
That is like saying Uber would have not had any problems if they just entered into a licensing contract with taxi medallion holders. It was faster to just put unlicensed taxis on the streets and use investor money to pay fines and lobby for favorable legislation. In the same way, it was faster for Anthropic to load up their models with un-DRM'd PDFs and ePUBs from wherever instead of licensing them publisher by publisher.
If this is a choice between risking to pay 1.5 billion or just paying 15 mil safely, they might.
Option 2: near-$0 valuation, $15M purchasing cost.
To an investor, that just looks like a pretty good deal, I reckon. It's just the cost of doing business - which in my opionion is exactly what is wrong with practices like these.
What's actually wrong with this?
They paid $1.5B for a bunch of pirated books. Seems like a fair price to me, but what do I know.
The settlement should reflect society's belief of the cost or deterrent, I'm not sure which (maybe both).
This might be controversial, but I think a free society needs to let people break the rules if they are willing to pay the cost. Imagine if you couldn't speed in a car. Imagine if you couldn't choose to be jailed for nonviolent protest.
This isn't some case where they destroyed a billion dollars worth of pristine wilderness and got off with a slap on the wrist.
so you don't think super rich people should be bound by laws at all?
Unless you made the cost proportional to (maybe expontial to) somebody's wealth, you would be creating a completely lawless class who would wreak havoc on society.
It was broken by a company of people who were not very rich at all and have managed to produce billions in value (not dollars, value) by breaking said laws.
They're not trafficking humans or doing predatory lending, they're building AI.
This is why our judicial system literally handles things on a case by case basis.
Your argument is that this is all fine because it wasn't done by people who were super rich but instead done by people who became super rich and were funded by the super rich?
I just want to check that I have that right. You are arguing that if I'm a successful enough bank robber that this is fine because I pay some fine that is a small portion of what I heisted? I mean I wouldn't have been trafficking humans or doing predatory lending. I was just stealing from the banks and everyone hates the banks.
But if I'm only a slightly successful bank robber stealing only a few million and deciding that's enough, then straight to jail do not pass go, do not collect $200?
It's unclear to me because in either case I create value for the economy as long as I spend that money. Or is the key part what I do what that money? Like you're saying I get a pass if I use that stolen money to invent LLMs?
I think the company's bank account would beg to differ on that.
> managed to produce billions in value (not dollars, value) by breaking said laws.
Ah, so breaking the law is ok if enough "value" is created? Whatever that means?
> They're not trafficking humans or doing predatory lending, they're building AI.
They're not trafficking humans or doing predatory lending, they're infringing on the copyright of book authors.
Not sure why you ended that sentence with "building AI", as that's not comparing apples to apples.
But sure, ok, so it's ok to break the law if you, random person on the internet, think their end goals are worthwhile? So the ends justify the means, huh?
> This is why our judicial system literally handles things on a case by case basis.
Yes, and Anthropic was afraid enough of an unfavorable verdict in this particular case that they paid a billion and a half to make it go away.
GP is entrained in the pure-self interest is the only matric needed in society.
I do agree that in the case of victimless crimes, having some ability to recompensate for damages instead of outright ban the thing, means that we can enact many massively net-positive scenarios.
Of course, most crimes aren’t victimless and that’s where the negative reactions are coming from (eg company pollutes the commons to extract a profit).
It's because they did not choose to pay for the books; they were forced to pay and they would not have done so if the lawsuit had not fallen this way.
If you are not sure why this is different from "they paid for pirated books (as if it were a transaction)", then this may reflect a lack of awareness of how fair exchange and trust both function in a society.
Settling isn't "forced", but it's a choice that tells you that the company believes settling is a better deal for them than letting the trial go forward. That's something.
> They paid $1.5B for a bunch of pirated books.
They didn't pay, they settled. And considering flesh-and-blood people get sued for tens of thousands per download when there isn't a profit motive, that's a bargain.
> The settlement should reflect society's belief of the cost or deterrent.
No, it reflects the maximum amount the lawyers believe they can get out of them.
> This might be controversial, but I think a free society needs to let people break the rules if they are willing to pay the cost.
So how much should a politician need to pay to legally murder their opponent? Are you okay with your ex killing you for a $5000 fine?
> Imagine if you couldn't speed in a car.
Speed enough and you lose your license, no need to imagine.
Why does this company get away with it, but do warez groups get raided by SWAT teams, labeled a "criminal enterprise" or "crime gang", and sentenced to decades in jail? Why does the law not apply when you are rich?
Settlements have nothing to do with either of those things. Settlement has to do with what the plaintiff believes is good enough for the cost that will avoid the uncertainty of trial. This is a civil case, "society" doesn't really come into play here. (And you can't "settle" a criminal case; closest analogue would be a plea deal.)
If the trial went forward to a guilty verdict, then the fines would represent society's belief of cost or deterrent. But we didn't get to see that happen.
A) Make 100M, pay 10M in taxes
or
B) Make 100M, pay 10M in lawsuit settlements, pay 9M in taxes
You come out ahead every time by not paying the settlement in the first place.
I get what you are going for, but my point was that a dataset existed, and the only way it could be compiled was illegaly.
Uber could have made the same decision and worked with regulators to be allowed into markets one at a time. It was an intentional choice to lean on the fact that Uber drivers blended into traffic and could hide in plain sight until Uber had enough market share and customer base to give them leverage.
With Uber you had a company that wanted to enter an existing market but couldn't due to legally-granted monopolies on taxi service. And given that existing market, you can be sure that the incumbents would lobby to keep Uber locked out.
With Waymo you have a new technology that has a computer driving the car autonomously. There isn't really any directly-incumbent party with a vested (conflict of) interest to argue against it. Waymo is a kind of taxi, though, so presumably existing taxi operators -- and the likes of Uber and Lyft -- could argue against it in order to protect their advantages. But ironically Uber and Lyft "softened" those regulatory bars already, so it might not have been worth it to try.
At any rate, the regulatory and safety concerns are also very different between the two.
I think I am also just a little more sympathetic to early Uber, given how terrible and cartel-like taxi service was in the past. But I would not at all be sympathetic toward Waymo putting driverless cars on the streets without regulatory approval and oversight, especially if people got injured or killed.
My assumption is that they could have found ways to work around that by technically having someone in the drivers west, for example, but maybe I'm wrong there!
Judge: "But this app facilitated them."
Lawyer: "Well, you presume so-called genuine carpoolers are not facilitated? The manufacturers of their cell phones, the telecom operators, their employers or the bar where they met, or the bus company at whose bus stop they met, they all facilitated their carpooling behavior."
Judge: "But your company profits from this coordination!"
Lawyer: "Well we pay taxes, just like the manufacturer of the cell phone, the telecom operator, their employers, the bus company or the bar... But let's ignore that, what you -representing the government (which in turn supposedly represents the people)- are really after is money or power. As a judge you are not responsible for setting up the economy, or micromanaging the development of apps, so its not your fault that the government didn't create this application before our company did. In a sense you are lucky that we created the app given that the government did not create this application in a timely fashion!"
Judge: "How so?"
Lawyer: "If the population had created this app they would have started thinking about where the proceeds should go. They would have gotten concerned about the centralization of power (financial and intelligence). They would have searched for ways to decentralize and secure their app. They would have eventually gotten cryptographers involved. In that world, no substantial income would be generated, your fleet of taxi's would be threatened as well, and you wouldn't even have the juicy intel we occasionally share either!"
This conversation almost never takes place, since it only needs to take place once, after which a naive judge has learned how the cookie crumbles. Most judges have lost this naivety before even becoming a judge. They learn this indirectly when small "annoyances" threaten the scheme (one could say the official taxi fleet was an earlier such scheme).
Didn't Google have a long standing project to do just that?
- Anthropic
- Any Chinese company who do not care about copyright laws
What is the cost of buying and scanning books?
Copyright law needs to be fixed and its ridiculous hundred years tenure chopped away.
> Anthropic also agreed to delete the pirated works it downloaded and stored.
Also > As part of the settlement, Anthropic said that it did not use any pirated works to build A.I. technologies that were publicly released.
And even if they didn't use the illegally-obtained work to train any of the models they released, of course they used them to train unreleased prototypes and to make progress at improving their models and training methods.
By engaging in illegal activity, they advanced their business faster and more cheaply than they otherwise would have been able to. With this settlement, other new AI companies will see it on the record that they could face penalties if they do this, and will have to go the slower, more expensive route -- if they can even afford to do so.
It might not make it impossible, but it makes the moat around the current incumbents just that much wider.
Oh so now we're at "just trust me bro" levels of absurdity
The Google Books project also faced a copyright lawsuit, which was eventually decided in favor of Google.
After contacting major publishers about possibly licensing their books, [former head of the Google Books project] bought physical books in bulk from distributors and retailers, according to court documents. He then hired outside organizations to dissemble the books, scan them and create digital copies that could be used to train the company’s AI. technologies.
Judge Alsup ruled that this approach was fair use under the law. But he also found the company’s previous approach — downloading and storing books from shadow libraries like Library Genesis and Pirate Library Mirror — was illegal.
Obviously, that's not part of the current settlement. I'm no expert on this, so I don't know the extent to which the earlier ruling applies.
> Notably, in its motion, Anthropic argues that pirating initial copies of Authors’ books and millions of other books was justified because all those copies were at least reasonably necessary for training LLMs — and yet Anthropic has resisted putting into the record what copies or even sets of copies were in fact used for training LLMs.
> We know that Anthropic has more information about what it in fact copied for training LLMs (or not). Anthropic earlier produced a spreadsheet that showed the composition of various data mixes used for training various LLMs — yet it clawed back that spreadsheet in April. A discovery dispute regarding that spreadsheet remains pending.
Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?
To me, the taint still remains. Which is a shame, because it's considered the best coding model so far.
No, it part because it removes agency from the authors/rightsholders. Maybe they don't want to sell Anthropic their books, maybe they want royalties, etc.
You can't sell a Bluray disk to a movie theater and give them the right to charge an audience to watch it in the theater later.
If rightsholders are worried about certain uses of their IP being found to be fair use, they might then change the terms of release contractually to stop or at least partially prevent training.
And thank god they did. There was no perfectly legal channel to fix the taxi cartel. Now you don't even have to use Uber in many of these places because taxis had to compete - they otherwise never would have stopped pulling the "credit card reader is broken" scam, taking long routes on purpose, and started using tech that made them more accountable to these things as well as harder for them to racially profile passengers. (They would infamously pretend not to see you if they didn't want to give you service back when you had to hail them with an IRL gesture instead of an app..)
But I know what I'm going to pay up-front, can always pay with a credit card (which happens automatically without annoying post-trip payment), the ride is fully tracked, and I can report issues with the driver that I have an expectation will actually be acted upon. And when I'm in another country where there are known to be taxis that scam foreigners, Uber is a godsend.
Yes, pre-Uber taxis were expensive and crappy, and even if Uber is expensive now, it's not crappy; it's actually worth the price. And I'm not convinced Uber is even that expensive. We always forget to account for inflation... sure, we might today say, "that ride in a taxi used to cost $14, but in an Uber it costs $18". But that ride in a taxi was 15 years ago.
That's typically considered to be somewhere between assholish and straight up illegal in most civilized economies.
Sure now that it costs them money, they're reacting, making things worse for literally everyone: the taxi drivers, who've been victimized by the governments not reacting when they should. The customers, who are now paying more. The Uber drivers, who are certainly not the ones getting the money.
A great lawyer will tell you laws don't matter if they're not applied, and then tell you how laws are applied and what you can and can't get away with (this is a necessity since most laws aren't very clear at all, especially where it comes to actual real-world cases or penalties). The EU are absolute masters of that. The famous GPDR, for example, isn't protecting anyone's data in any way it matters since governments have the power to grant themselves exceptions to them. Which lead to all the things the GPDR tried to avoid: insurance getting private medical data (who are mostly part of governments in the EU), private medical data being used by the police or in court, just to give some examples.
Hell, it's now been confirmed every 2 years or so since 2015 that essentially all European countries think all of the FANGs are abusing their market position. Google, Facebook, Amazon, Apple, ... they've given them billions of dollars in fines. Tell me, what has been fixed? US advertising companies are deeper entrenched than ever before (even outside of the internet, ie. ClearChannel). Law is supposed to fix the problems. Well, obviously the problem of US companies' dominance is not solved, in fact it's gotten a lot worse.
And this is nothing new. Take what EU countries signed in the Budapest memorandum. You will find that it states that if Russia ("any of the ... blabla", which includes Russia) takes Crimea a bunch of EU countries (France, UK) would, first, declare war on the country that did it (Russia) and initiate actual hostile action against that country (ie. not just support to Ukraine). That meant they agreed to have UK and French (and ...) soldiers attack Russia. That was the security guarantee Ukraine had, and that was an international treaty, which in the EU (look it up) has the power of law.
As everyone and their grandmother's cat knows, they didn't actually follow through. They "gave support". That's just one, at the moment important, example.
And of course, the effect is the same: it became worse and worse. Russia's actions became worse and worse and worse. Now the EU countries have given the same guarantees for countries like Poland, Latvia and even Estonia, either directly or through NATO. Will Russia attack? Why not? It's not like these countries will (or let's be real: can) actually fight under any circumstance.
No country gave guarantees only assurances and it is even highlighted that the US senate would have never voted for it favourably, and thus it never was a treaty.
On the other hand breaking this assurances will guarantee no other country will ever give up their nuclear arsenal, of course a non consolation price for Ukraine. Guarantees in nato which is indeed a treaty and ratified, covering Poland and Latvia and Estonia would be stronger but of course, I would not put all my eggs on it.
After few years of operation, government realised it was serious and pressured Uber to stop taxi operations « Uber pop », until disruption in legislation got through.
I used Uber from first year it was here. As the service got popular with young adults and the people took notice and public debate began, the police was instructed to fine Ubers. Then the drivers asked us passengers to sit up front and pretend we were friends. (Not sure if the app had instructions related to this or not.) Once the legislation change was clear, they closed operation officially for the brief period, as stated in the article.
I just thought it was exciting at the time..
They acquired market power by killing them through predatory pricing, leaving incumbents unprofitable and forcing them to exit - while creating a steep barrier to entry for any new comers and strategically manipulating existing riders by offering high take rates initially and subsidising rides to create artificial demand and inflate market share - then once they kicked out the incumbents, they exercised their market power to raise prices and their % of the take rate of each transaction; leaving consumers and riders worse off.
We can talk all day about the nice UX blah blah. But the reality is, financially, they could not have succeeded without a very dubious and unethical approach.
But I remember when I started using Uber back in 2012. It was amazing compared to every single other option out there. Yes, they entered the market in questionably-legal or often probably outright illegal ways. But illegal is not the same thing as immoral. And I don't think it's unethical to force out competition when that competition is a lazy, shitty, legally-enforced monopoly that treats its customers poorly.
As pointed out here, many governments have laws stating that they will step in ... and they didn't.
Creating the gig economy doesn't get any moral points from me.
Now, I see people at the airport walk over to the pickup lot, joining a crowd of others furiously messing with their phones while scanning the area for presumably their driver.
All the while the taxis waiting immediately outside the exit door were $2 more expensive, last time I checked.
I have no idea what I'm going to get with those taxis waiting immediately outside the exit door. Even in my home country, at the airport next to my city, I have no idea. I know exactly what I'm getting with an Uber/Lyft, every time. That's valuable to me.
I was just in another country a couple months ago, and when trying to leave the airport, I was confused where I'd need to go in order to get an Uber. I foolishly gave up and went for one of those "conveniently-waiting" taxis, where I was quoted a price up-front, in my home currency, that I later (after doing the currency conversion on the Uber price) realized was a ripoff. The driver also aggressively tried to get me to instead rent his "friend's car" rather than take me to the rental car place like I asked. And honestly I consider that lucky: he didn't try to kidnap me or threaten me in any way, but I was tense during the whole ride, wondering if something bad was going to happen.
That sort of thing isn't an anomaly; it happens all the time to tourists in many countries.
I won't recount what recently happened to a friend in Milwaukee. It was an unpopular story (because the ripoff was Uber-based, and not the traditional taxi).
There's bad actors in every industry. I have found that industries that get "entrenched," tend to breed the most bad actors.
If anything turns into a "pseudo-monopoly," expect the grifters to start popping up. They'll figure out how to game the system.
Is that true?
Uber was a godsend for everyone living outside of like 4 metro areas in the US.
I lived in SF when Uber started. We used to call Veteran's Cab because they were the only company that wouldn't ditch on the way to pick you up, but it was completely normal to wait more than an hour for a cab in the dark hinterlands of 24th and Dolores or the industrial wasteland of 2nd and Folsom. An hour during which you had to be ready to jump as soon as the car arrived. Everybody had at least one black-car driver's cell number for downtown use because if they happened to be free, you could at least get picked up.
Uber would have had a religious following of fanpersons even if all they'd done was an estimated pickup time that was accurate to within 20 minutes.
I happily pay a premium for none of these things again.
That said, uh, the use of getting a taxi to drive you to or from the airport was just not having to park at the airport which generally costs a lot of money, and in certain areas is a little sketchy on whether or not your car will get cracked open while you're away.
The US has tons of cities like this that I imagine would have issues with taxis - all parts of the bay area peninsula / east bay, cities in Texas, Denver, etc. Most cities are not like the NYC/Boston and even in places in Chicago, unless you lived downtown likely didn't see taxis driving around.
Uber at least has fixed rates from what was displayed and there are logs of which driver was doing dodgy stuff.
And instead Uber offloaded everything onto gig workers and society. And still lost 20 billion dollars in the process (price dumping isn't cheap).
I always laugh when Americans poke fun at Europeans… we have it much better over here. I assure you of that.
And the drivers have the free will to choose to drive for Uber.
Yup. The drivers should have to pay everything because despite working for Uber they are "free contractors"
> And the drivers have the free will to choose to drive for Uber
Ah yes, I forgot that's exactly how price dumping works: there are multiple companies to chose from and all of them offer competitive wages.
I mean, it's not ancient history. For half of Uber's existence the ongoing story was: drivers have to drive almost 24 hours a day to make living wage with Uber randomly stealing their wages.
This only somewhat changed once governments stepped in and forced Uber to change some of its practices.
Trust me, Snell is far from a fire breathing libertarian conservative.
It’s not the responsibility of a corporation to decide what a “living wage” is. Should Uber pay more to a single mother with three kids than a single father with no kids? Again it’s society’s responsibility to provide for a safety net and to tax corporations to fund it.
On the federal level, that’s what the earned income tax credit was suppose to do and until 2016, it had wide bi-partisan support and was championed by both Republican and Democratic Presidents.
You have to decide whether you want the society to provide safety nets through healthcare, strong labor protections etc. or not.
> Again it’s society’s responsibility to provide for a safety net and to tax corporations to fund it.
Indeed. That's why governments and regulators eventually stepped in.
You can't in good conscience or good faith argue that Uber didn't offload anything onto society and people working for it just because "it's not the job of a company" etc. Uber literally engaged in multiple illegal and borderline illegal practices across the globe, including the US.
And yes, it's the literal job of a taxi company to make sure its drivers work a healthy amount of hours. In Uber's case it meant that it had to pay drivers enough money to cover the costs Uber offloaded onto them, and enough money left over so that they didn't have to drive 18-20 hours a day to make ends meet.
And yeah, not everyone can become Jason Snell
My argument is simply that the only “labor protections” the government should enforce on private enterprise is that a company can’t actively harm employees - OSHA protections, discrimination etc.
> And yes, it's the literal job of a taxi company to make sure its drivers work a healthy amount of hours. In Uber's case it meant that it had to pay drivers enough money to cover the costs Uber offloaded onto them, and enough money left over so that they didn't have to drive 18-20 hours a day to make ends meet.
It’s up to individuals to decide whether the tradeoffs are worth it. It’s not the responsibility of private industry to calculate what a “living wage” is for an individual. Uber never put a gun to anyone’s head to force them to drive for Uber. If anything the government should enforce how long someone can drive because it puts others in danger. But does the government stop people from working two jobs that might add up to 20 hours? What should happen when the driver drives for Uber, Lyft and DoorDash?
The illegal practices at least in New York were around taxi medallion monopoly where taxi drivers were getting in hundreds of thousands in debt to own them for the right to drive.
As far as not everyone being Jason Snell, there were other freelance writers and contractors like truck drivers who had to leave California to save their business
https://www.foxnews.com/opinion/i-had-leave-california-save-...
It even affected 1099 (as opposed to W2) tech workers who were contractors.
If you trust the overlord you didn't choose more than the one you did, then you might want to rethink your career.
Did you try to get insurance on the open market before 2012 with a pre-existing condition? Every other industrial country in the world has health insurance not tied to your employer. Even smaller countries like Costa Rica and Panama have better more affordable insurance. Yes I’ve done my research on caja, Costa Rica’s national health care system. We will be staying there a couple of months in the winter starting next year and it’s our Plan B to retire there.
This is the business model: get more money out of customers (because no real alternative) and the drivers (because zero negotiating power). Not to mention that they actually got to that position by literally operating at a loss for over a decade (because venture money). Textbook anti-competitive practices.
However, the idea itself (that is having an app to order taxi) is spectacular. It also something a high-school kid could make in a month in his garage. The actual strength of the business model is the network effects and the anti-competitive practices, not the app or anything having to do with service quality.
For instance: Monopolies often don't actually limit supply. You only make it so customers can't choose an alternative and set prices accordingly (that is higher than they would have been if there were real alternatives). Big-tech companies do this all the time. Collusion is also not required, but only one form (today virtually unheard of or very rare) of how it may happen. For instance: big-tech companies often don't actually encroach on core parts of the business of other big-tech companies. Google, Microsoft and Apple or Uber are all totally different business with little competitive overlap. They are not doing this because of outright collusion. It's live and let live. Why compete with them when they are leaving us alone in our corner? Also: trying to compete is expensive (for them), it's risky and may hurt them in other ways. This is one of the dirty little secrets: Established companies don't (really) want to compete with other big companies. They all just want to protect what's their and keep it that way. If you don't believe me have a look at the (publicly available) emails from execs that are public record. Anti-competitive thinking through and through.
And it wasn't much of a cartel in NYC before, anyways. Most subways stops in Brooklyn had a black car nearby if you knew how to look for them.
In NYC, prior to Uber entering the market, taxi medallions changed hands for up to $1mm. Prices were fixed by the TLC.
If these are no strong indications of a cartel, I don’t know what is.
https://www.nytimes.com/2025/08/06/business/uber-sexual-assa...
it wont be a chatgpt or coding model ofc, thats not what they go for, but it'll be interesting to see its quality as its all fairly and honestly done. transparently.
Otherwise, of course they would tell them to just pound sand.
Anthropic did. That was the part of their operation that they didn't get in trouble for, but the news spun it as "Anthropic destroyed millions of books to make AI".
However, the judge already ruled on the only important piece of this legal proceeding:
> Alsup ruled in June that Anthropic made fair use of the authors' work to train Claude...
No, trial court decisions are never binding precedent, if they are “published” decisions, they may generally be cited as persuasive precedent. Appellate decisions (Circuit Courts in the federal system) are binding on the trial courts subordinate to that appellate court (and even on panels of the same appellate court) until reversed by the same court sitting en banc or by a higher court (the US Supreme Court in the federal system.)
Even if the ruling legally remains in place after the settlement, district court rulings are at most persuasive precedent and not binding precedent in future cases, even ones handled by the same court. In the US federal court system, only appellate rulings at either the circuit court of appeals level or the Supreme Court level are binding precedent within their respective jurisdictions.
The settlement was a smart decision by anthropic to remove a huge uncertainty. 1.5 is not small, but it won't stop them or slow them significantly.
There are also a lot of usage rules that now make many games unfeasible.
We dug into the private markets seeking less Faustian terms, but found just as many legal submarines in wait... "AI" Plagiarism driven projects are just late to the party. =3
I could read a book, but its highly unlikely I could regurgitate it, much less months or years later. An LLM, however, can. While we can say "training is like reading", its also not like reading at all due to permanent perfect recall.
Not only does an LLM have perfect recall, it also has the ability to distribute plagiarized ideas at a scale no human can. There's a lot of questions to be answered about where fair use starts/ends for these LLM products.
This has not been my experience. These days they are pretty good at googling though.
The 'lossy encyclopedia' analogy is quite apt
And even if one could, it would be illegal to do. Always found this argument for AI data laundering weird.
A xerox machine can reproduce an exact copy of a book if you ask it to, but that doesn't make a xerox machine inherently a copyright violation, nor does it make every use of a xerox machine a violation of copyright, even when the inputs are materials under copyright. So far the judge in this case has ruled that training an AI is sufficiently transformative, and that using legally acquired works for that purpose is not a violation of copyright. That outcome seems entirely unsurprising given the years of case law around copyright and technology that can duplicate copyrighted works. See the aforementioned xerox machines, but also CD ripping, DVRs, VHS recording of TV shows, audio cassette recording, emulators, the Java API lawsuit and also the Google Books lawsuit.
The way this technology is being used clearly violates the intent behind copyright law, it undermines its goals and results in harm that it was designed to prevent. I believe that doing this without extensive public discussion and consensus is anti-democratic.
We always end up discussing concrete implementation details of how copyright is currently enforced, never the concept itself. Is there a good word for this? Reification?
> but AI doesn't change the motivations and goals behind copyright
That's the point they're makingWhich is one fundamental things how copyright is handled. Copying in general or performing multiple times. So I can accept argument that training model onetime and then using singular instance of that model is analogues to human learning.
But when you get to running multiple copies of model, we are clearly past that.
The judge presiding over this case has already issued a ruling to the effect that training an LLM like Anthropic's AI with legally acquired material is in fact fair use. So unless someone comes up with some novel claims that weren't already attempted, claims that a different form of AI is significantly different from a copyright perspective from an LLM or tries their hand in a different circuit to get a split decision, the "jury" is pretty much settled on how fair use applies to AI. Legally acquired material used to train LLMs is fair use. Illegally obtaining copies of material is not fair use, and the transformative nature of LLMs don't retroactively make it fair use.
I have an author friend who felt like this was just adding insult to injury.
So not only had his work been consumed into this machine that is being used to threaten his day job as a court reporter, not only was that done without seeking his permission in any way, but they didn’t even pay for a single copy.
Really embodies raising your middle finger to the little guy while you steamroll him.
IIUC this is very far from settled, at least in US law.
Awesome, so I just need enough perceptrons to overfit every possible copyrighted works then?
I'm so over this shift in America's business model.
Original Silicon Valley model, and generally the engine of American innovation/growth/wealth equality for 200 years: Come up with a cool technology, build it in your garage, get people to fund it and sell it because it's a better mousetrap.
New model: Still come up with a cool idea, still get it funded and sold, but the idea involves committing crime at a staggering scale (Uber, Google, AirBnB, all AI companies, long list here), and then paying your way out of the consequences later.
Look some of these laws may have sucked, but having billionaires organize a private entity that systematically breaks them and gets off with a slap on the wrist, is not the solution. For one thing, if innovation requires breaking the law, only the rich will be able to innovate because only they can pay their way out of the law. For another, obviously no one should be able to pay their way out of following the law! This is basic "foundations of society" stuff that the vast majority of humans agree on in terms of what feels fair and just, and what doesn't.
Go to a country which has really serious corruption problems, like is really high on the corruption index, and ask the people there what they think about it. I mean I live in one and have visited many others so I can tell you, they all hate it. It not only makes them unhappy, it fills them with hopelessness about their future. They don't believe that anything can ever get better, they don't believe they can succeed by being good, they believe their own life is doomed to an unappealing fate because of when and where they were born, and they have no agency to change it. 25 years ago they all wanted to move to America, because the absence of that crushing level of corruption was what "the land of opportunity" meant. Now not so much, because America is becoming more like their country.
This timeline ends poorly for all of us, even the corrupt rich who profit from it, because in the future America will be more like a Latin American banana republic where they won't be able to leave their compounds for fear of getting Luigi'ed. We normal people get poverty, they get fear and death, everyone loses. The social contract is collapsing in front of our eyes.
The federal courts are a joke - the supreme court now has at least one justice whose craven corruption is notorious — openly accepting material value (ie bribes) from various parties. The district courts are being stuffed with Trump appointees with the obvious problems that go with that.
The congress is supine. Obviously they cannot act in any meaningful capacity.
We don’t have street level corruption today. But we’ve fired half the civil service, so I doubt that will continue.
Imagine a future where election results are casually and publicly nullified if the people with the guns don't like the result, and no one can do anything about it. Or where you can start a business but if it succeeds and you don't have the right family name it'll be taken from you and you'll be stripped of all you own and possibly put in prison for a while. That's reality in some countries, the US is not there yet, but those are the stakes we're playing for here, and why change needs to happen.
Right now, the President is sending federal troops and occupying cities and just bombed a ship in Venezuela
So exactly when was there “wealth equality” in the US? Are you glossing over that whole segregation, redlining, era of the US?
And America was built on slavery and genocide.
You realize there are countries that are even worse to their citizens right? Like I'm really asking, why do so many people online seek to eliminate all conversation that isn't a simple and un-nuanced condemnation of America?
I am able to have criticisms of America while also thinking there are good things about it and that there are also worse places, but some people seem incapable of holding those three ideas in their heads simultaneously. Especially the idea that there actually are countries worse than the US, they just can't fathom that it seems, or don't consider it a fact that should receive any attention.
Right this very second, the same Republican Party who fights tooth and nail for the right to bare arms is trying not to let transgender people carry guns.
Which industrial country has a higher rate of incarceration than the US? A higher infant mortality rate? Less people covered by health insurance? A lower life expectancy?
There is absolutely no objective quality of life measurement that you can name where the median American citizen is better off than a country in Europe or in Canada or the UK.
Not creative destruction. But pure corruption.
Is this completely settled legally? It is not obvious to me it would be so
Or can they buy the book, and then use the pirated copy?
Additionally, sharing copyrighted works without permission... the data sets or data lakes... is its own tort. You're guilty just for sharing copies before even training. Some copyrighted works are also commercial, copyright with ban on others' commercial use, and patented. Some are NDA'd but 3rd party leaked them. Sources like Common Crawl probably have plenty of such content.
Additionally, there's often contractual terms of use on accessing the content. Even Singapore's and others laws allowing training on copyrighted content usually require that you lawfully accessed that content in the first place. The terms of use are the weakest link there.
I'd like to see these two issues turned by law into a copyright exception that no contract can override. It needs to specifically allow sharing scraped, publicly-visible content. Anything you can just view or download which the copyright owner put up. The law might impose or allow limits on daily scraping quantity, volume, etc to avoid damage scrapers are doing.
Sure, training by itself isn't worth anything.
Distributing and collecting payment for the usage of a trained model which may violate copyright, etc. that's still an open legal question and worth billions as well.
Can only imagine the pitch, yes please give us billions of dollars. We are going to make a huge investment like paying of our lawsuits.
It basically does nothing for them besides that. Given the split decisions so far, I'm not sure what value the Alsup decision is going to bring to the industry, moving forward, when it's in the context of books that Anthropic physically purchased. The other AI cases are generally not fact patterns where the LLM was trained with copyrighted materials that the AI company legally purchased copies of.
> Although the payment is enormous, it is small compared with the amount of money that Anthropic has raised in recent years. This month, the start-up announced that it had agreed to a deal that brings an additional $13 billion into Anthropic’s coffers. The start-up has raised a total of more than $27 billion since its founding in 2021.
You never know, its a game of interests and incentives - one thing for sure - does does the fed want the private sector to own and control a technology of this kind? Nope.
So long as there is an excuse to justify money flows, that's fine, big capital doesn't really care about the excuse; so long as the excuse is just persuasive enough to satisfy the regulators and the judges.
Money flows happen independently, then later, people try to come up with good narratives. This is exactly what happened in this case. They paid the authors a lot of money as a settlement and agreed on a narrative which works for both sets of people; that training was fine, it's the pirating which was a problem...
It's likely why they settled; they preferred to pay a lot of money and agree on some false narrative which works for both groups rather than setting a precedent that AI training on copyrighted material is illegal; that would be the biggest loss for them.
Yes, and FWIW that's very succinctly stated.
Some individuals in society find a way through that and figure out a way to strategically achieve their goals. Rare though.
It's not the way we expect people to do business under normal circumstances, but in new markets with new products? I guess I don't see much actually wrong with this. Authors still get paid a price they were willing to accept, and Anthropic didn't need to wait years to come to an agreement (again, publishers weren't actually selling what AI companies needed to buy!) before training their LLMs.
In the business where all the value is in data, all they lost is a bit of money.
You’ve never authored, created, or published something? Never worked for a company that sells something protected by copyright?
I.e. never created software in exchange of money.
Copying and distributing works isn’t identical to theft (deliberately depriving someone of their property), but you’re enjoying someone’s work without compensating them, so it isn’t totally unlike depriving them of something.
I guess it depends how you feel about refusing to pay a window washer. Or indeed you not being paid by your employer. It isn’t theft, but someone is clearly stiffing someone else.
As for only big companies benefitting from the copyright regime… seems like an ideological assumption. I know plenty of authors and they are quite happy having legal protections around their work which means they can earn from their labour.
Which is foreseen in societal decision: libraries (again and again).
> refusing to pay a window washer
The window washer is providing a service for a price, that service is not equivalent to knowledge production, and nobody has decided that that service (cleaning windows) should be done for free.
As for window washing vs knowledge production, not sure what you mean. Books have a price. Nobody’s decided they should be free either.
In the standard system for libraries, the book is paid once.
That is "libraries" as in "we have societally decided to make published knowledge freely available".
> window washing vs knowledge production
Societies have not decided that window washing should be freely available - on the other side, they have decided that published knowledge be freely available (that is the meaning of the establishment of libraries).
Re: what “society has decided”, are you arguing that because libraries exist, no one may sell books for a price?
Seems extreme, not widely agreed by the population or the relevant parties, and likely to cause immense problems with the economics of knowledge production, but it’s certainly one point of view!
Similarly I argue that because open source code exists, software engineers must all work for free, and that because public parks exist, everyone’s home gardens are open to all.
Obviously there would be handling costs + scanning costs, so that’s the floor.
Maybe $20 million total? Plus, of course, the time it would take to execute.
The cost of the books is negligible in comparison.
Once the book is done, 99% of them go into the furnace at the district heating boiler next door. The other 1% back to a developed country for resale.
> otherwise only the big companies who can afford to pay off publishers like Anthropic will be able to do so
Only well funded companies can afford to hire a lot of expensive engineers and train AI models on hundreds of thousands of expensive GPUs, too.
Something tells me many the grassroots LLM training people are less concerned about legality of their source training set than the big companies anyway.
Book authors may see some settlement checks down the line. So might newspapers and other parties that can organize and throw enough $$$ at the problem. But I'll eat my hat if your average blogger ever sees a single cent.
More broadly, I think that's a goofy argument. The books were "freely available" too. Just because something is out there, doesn't necessarily mean you can use it however you want, and that's the crux of the debate.
Other people have said that Anthropic bought the books later on, but I haven't found any official records for that. Where would I find that?
Also, does anyone know which Anthropic models were NOT trained on the pirated books. I want to avoid such models.
https://storage.courtlistener.com/recap/gov.uscourts.cand.43....
"Similarly, different sets or “subsets” or “parts of” or “portions” of the collections sourced from Books3, LibGen, and PiLiMi were used to train different LLMs..." Page 5
"In sum, the copies of books pirated or purchased-and-destructively-scanned were placed into a central “research library” or “generalized data area,” sets or subsets were copied again to create training copies for data mixes, the training copies were successively copied to be cleaned, tokenized, and compressed into any given trained LLM, and once trained an LLM did not output through Claude to the public any further copies." Page 7
The phrase "Finally, once Anthropic decided a copy of a pirated or scanned book in the library would not be used for training at all or ever again, Anthropic still retained that work as a “hard resource” for other uses or future uses" implies to me Anthropic excluded certain books from training, not that they excluded all the pirated books from training.
Or what if not even distributing it but rather distributing the outputs of the LLM (so closed source LLM like anthropic)
I am genuinely curious as to if there is some gray area that might be exploited by AI companies as I am pretty sure that they don't want to pay 1.5B dollars yet still want to exploit the works of authors. (let's call a spade a spade)
We really are getting at some metaphysical / philosophical questions and maybe we will one day arrive at a question that just can't be answered (I think this is pretty close, right?) and then AI companies would do things freely without being accountable since sure you could take to the courts but how would you come to the decision...?
Another question though
So lets say that the nyt vs openAI case is going on, so in the meantime while they are litigating (lets say), could OpenAI still continue doing the same thing while the case is going on?
EU has copyright exemptions for AI training. You don't need to respect opt outs if you are doing research.
South Korea, Japan has some exemptions too I think?
Singapore has very strong copyright exemptions for AI training. You can completely ignore opt-outs legally, even if doing it commercially.
Just search up "TDM laws globally".
At least if you're a regular citizen.
1. A Settlement Fund of at least $1.5 Billion: Anthropic has agreed to pay a minimum of $1.5 billion into a non-reversionary fund for the class members. With an estimated 500,000 copyrighted works in the class, this would amount to an approximate gross payment of $3,000 per work. If the final list of works exceeds 500,000, Anthropic will add $3,000 for each additional work.
2. Destruction of Datasets: Anthropic has committed to destroying the datasets it acquired from LibGen and PiLiMi, subject to any legal preservation requirements.
3. Limited Release of Claims: The settlement releases Anthropic only from past claims of infringement related to the works on the official "Works List" up to August 25, 2025. It does not cover any potential future infringements or any claims, past or future, related to infringing outputs generated by Anthropic's AI models.
Edit: I'll get ratio'd for this- but its the exact same thing google did in it's lawsuit with Epic. They delayed while the public and courts focused in apple (oohh, EVIL apple)- apple lost, and google settled at a disadvantage before they had a legal judgment that couldn't be challenged latter.
Indeed, it is not only payout, but the destruction of the datasets. Although the article does quote:
> “Anthropic says it did not even use these pirated works,” he said. “If some other generative A.I. company took data from pirated source and used it to train on and commercialized it, the potential liability is enormous. It will shake the industry — no doubt in my mind.”
Even if true, I wonder how many cases we will see in the near future.
A settlement means the claimants no longer have a claim, which means if they're also part of- say, the New York Times affiliated lawsuit- they have to withdraw. A neat way of kneecapping a country wide decision that LLM training on copy written material is subject to punitive measures don't you think?
In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.
>In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.
Okay? I'm an IP litigator and you clearly have no idea what you're talking about. The only thing left to try in this case was the book library piracy. Alsup's fair use decision is just as relevant and is not mooted by the settlement and will be cited by anyone that thinks its favorable to them.
And they actually went and did that afterwards. They just pirated them first.
Bootstrapping in the startup world refers to starting a startup using only personal resources instead of using investors. Anthropic definitely had investors.
Also, do we know if the newer models were trained without the pirated books?
https://storage.courtlistener.com/recap/gov.uscourts.cand.43...
> Also, do we know if the newer models were trained without the pirated books?
I'm pretty sure we do but I couldn't swear to it or quickly locate a source.
Among several places where judge mentions Anthropic buying legit copies of books it pirated, probably this sentence is most relevant: "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages."
But document does not say Anthropic bought EVERY book it pirated. Other sections in the document also don't explicitly say that EVERY pirated book was later purchased.
I stopped using Claude when this case came to light. If the newer Claude models don't use pirated books, I can resume using it.
When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?
Yeah, I wouldn't make this exact claim either. For instance it's probably safe to assume that the pirate datasets contain some books that are out of circulation and which Anthropic happened not to get a used copy of.
They did happen to get every book published by any of the lead plaintiffs though, as a point towards them probably having pretty good coverage. And it does seem to have been an attempt to purchase "all" the books for reasonable approximate definitions of "all".
> When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?
I'm pretty sure pirated books were not used, but not certain, and I really don't remember when/why I formed that opinion.
It looks like you'll be able to search this site if the settlement is approved:
> https://www.anthropiccopyrightsettlement.com/
If your work is there, you qualify for a slice of the settlement. If not, you're outta luck.
This site references Meta, but the training corpus probably has some overlap? Maybe?
https://www.theatlantic.com/technology/archive/2025/03/searc...
I was under the impression they had downloaded millions of books.
There's piracy, then there's making available a model to the public which can regurgitate copyrighted works or emulate them. The latter is still unsettled
Is there a way to make your content on the web "licensed" in a way where it is only free for human consumption?
I.e. effectively making the use of AI crawlers pirating, thus subject to the same kind of penalties here?
That curl script you use to automate some task could become infringing.
At this point, we do need some laws regulating excessive scraping. We can't have the ineternet grind to a halt over everyone trying to drain it of information.
The purpose of the copyright protections is to promote "sciences and useful arts," and the public utility of allowing academia to investigate all works(1) exceeds the benefits of letting authors declare their works unponderable to the academic community.
(1) And yet, textbooks are copyrighted and the copyright is honored; I'm not sure why the academic fair-use exception doesn't allow scholars to just copy around textbooks without paying their authors.
I'm not sure to what extent you can specify damages like these in a contract, ask the lawyer who is writing it.
If you put a “contract” on your website that users click through without paying you or exchanging value with you and then you try to collect damages from them according to your contract, it’s not going to get you anywhere.
The consideration you received was a promise to refrain from using those documents to train AI.
I'm not a lawyer, but by my understanding of contract law consideration is trivially fulfilled here.
https://www.anthropic.com/news/anthropic-raises-series-f-at-...
A judge making on a ruling based on his opinion of how transformative a technology will be doesn't inspire confidence. There's an equivocation on the word "transformative" here -- not just transformative in the fair use sense, but transformative as in world-changing, impactful, revolutionary. The latter shouldn't matter in a case like this.
> Companies and individuals who willfully infringe on copyright can face significantly higher damages — up to $150,000 per work
Settling for 2% is a steal.
> In June, the District Court issued a landmark ruling on A.I. development and copyright law, finding that Anthropic’s approach to training A.I. models constitutes fair use,” Aparna Sridhar, Anthropic’s deputy general counsel, said in a statement.
This is the highest-order bit, not the $1.5B in settlement. Anthropic's guilty of pirating.
I feel it is insane that authors do not receive some sort of standard compensation for each training use. Say a few hundred to a few thousand depending on complexity of their work.
I might find argument of comparing it to human when it is fully legal person and cutting power to it or deleting is treated as murder. Before that it is just bullshit.
And fundamentally reason for copy right to exist is to support creators and to promote them to create more. In world where massively funded companies can freely exploit their work and even in many case fully substitute that principle is failed.
> Ai is not a human with limited time
AI is also bound by time, physics, and limited capacity. It does certain things better or faster than us, it fails miserably at certain things we don't even think about being complex (like opening a door)
> And it is also owned by a company not a legal person.
For the purpose of legalities, companies and persons are relatively equivalent, regardless of the merits, it is how it is
> In world where massively funded companies can freely exploit their work and even in many case fully substitute that principle is failed.
They paid for the books after getting caught, the other companies are paying for the copyrighted training materials
Are they paying reasonable compensation? Say like with streaming services, movie theatres, radio and tv stations. As a whole their model is much close to those than individuals buying books, cds or dvds...
You might even consider Theatrical License or Public Performance License. Paid even if you have memorized a thing...
LLMs are just bad technology that require massive amount of inputs so the authors cannot be compensated enough for it. And I fully believe they should be. And lot more than single copy of their work under entirely ill-fitting first-sale doctrine does.
Depends on how you do it. Clearly reading the book word from word is different from making a podcast talking about your interpretation of the book.
You can follow the case here: https://www.courtlistener.com/docket/69058235/bartz-v-anthro...
You can see the motion for settlement (what the news article is about) here: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...
The lawyers suing Anthropic here will probably walk away with several hundred million dollars - they have won the lottery.
If they managed to extract twice as much money from Anthropic for the class, they'd walk away with probably twice as much... but winning the lottery twice isn't actually much better than winning the lottery once. Meanwhile $4500 is a lot more than $2250 (the latter is a reasonable estimate of how much you'll get per work after the lawyers cut). Which risks the lawyers settling for less than is in their clients best interests so that they can reliably get rich.
Personally (not a lawyer or anything) I think this settlement seems very fair, and I expect the court will approve it. But there's definitely been plenty of class actions in the past where lawyers really did screw over the class and (try to) settle for less than they should have to avoid risking going to trial.
Imagine going to 500k publishers to buy it individually. 3k per book is way cheaper. The copyright system is turning into a data marketplace in front of our eyes
The main cost of doing this would be the time - even if you bought up all the available scanning capacity it would probably take months. In the meantime your competition who just torrented everything would have more high-quality training data than you. There are probably also a fair number of books in libgen which are out of print and difficult to find used.
It's less than 1% Anthropic's valuation -- a valuation utterly dependent on all the hoovering up of others' copyrighted works.
AFAICT, if this settlement signals that the typical AI foundation model company's massive-scale commercial theft doesn't result in judgments that wipe out a company (and its execs), then we have confirmation that is a free-for-all for all the other AI gold rush companies.
Then making deals to license rights, in sell-it-to-us-or-we'll-just-take-it-anyway deals, becomes only a routine and optional corporate cost reduction exercise, but not anything the execs will lose sleep over if it's inconvenient.
The settlement is real money though. Valuation is imaginary.
Writers were the true “foundational” piece of LLMs, anyway.
If someone breaks into my house and steals my valuables, without my consent, then giving me stock in their burglary business isn't much of a deterrent to them and other burglars.
Deterrence/prevention is my real goal, not the possibly of a token settlement from whatever bastard rips me off.
We need the analogue of laws and police, or the analogue of homeowner has a shotgun.
I understand that intentional copyright infringement is a crime in the US, you just need to convince the DOJ to prosecute Anthropic for it...
TBH I'm just going to plow all that money back into Anthropic... might was well cut out the middleman.
You can search LibGen by author to see if your work is included. I believe this would make you a member of the class: https://www.theatlantic.com/technology/archive/2025/03/searc...
If you are a member of the class (or think you are) you can submit your contact information to the plaintiff's attorneys here: https://www.anthropiccopyrightsettlement.com/
I suspected my work was in the dataset and it looks like it is! I reached out via the form.
Also passing on the cost to consumers of generated content since companies now would need to pay royalties on the back-end should also likely increase the cost of generating slop and hopefully push back against that trend.
This shouldn't just be books, but all written content, like scholarly journals and essays, news articles and blogs, etc.
I realize this is just wishful thinking, but there's got to be some nugget of aspirational desire to pay it forward.
Give them this order : "I want to buy all your books as epub"
Pay and fetch the stuff
That's all
That's why Anthropic had to scan physical books.
OpenAI and Google will follow soon now that the precedent has been set, and will likely pay more.
It will be a net win for Anthropic.
Because everyone is expecting AGI now and it's not happening with our current tech.
Taken right from the VC's handbook.
And if AI companies want recent stuff, they need to pay the owners.
However, the West wants to infinitely enrich the lucky old people and companies who benefited from the lax regulations at the start of 20th century. Their people chose to not let the current generations to acquire equivalent wealth, at least not without the old hags get their cut too.
https://www.econlib.org/library/Columns/y2003/Lessigcopyrigh...
Lessig: Not for this length of time, no. Copyright shouldn’t be anywhere close to what it is right now. In my book I proposed a system where you’d have to renew after every five years and you get a maximum term of 75 years. I thought that was pretty radical at the time. The Economist, after the Eldred decision, came out with a proposal—let’s go back to 14 years, renewable to 28 years. Nobody needs more than 14 years to earn the return back from whatever they produced.
For many reasons I switched to writing using a Creative Commons license using Lulu, LeanPub, and my own web site for distribution. This has been a win for me economically, it feels good to add to the commons, and it is fun.
So the data/copyright issue that you might be worried about is actually completely solved already! Anthropic is just paying a settlement here for the illegal pirating that they did way in the past. Anthropic is allowed to train on books that they legally acquire.
And sure, Chinese AI companies could probably scrape from LibGen just like Anthropic did without getting in hot water, and potentially access a bit more data that way for cheap, but it doesn't really seem like the buying/scanning process really costs that much in the grand scheme of things. And Anthropic likely already has legally acquired most of the useful texts on LibGen and scanned them into its internal library anyways.
(Furthermore, the scanning setup might actually give Anthropic an advantage, as they're able to digitize more niche texts that might be hard to find outside of print form)
Western companies will be fine but sharing data in ways that would be illegal in the US does help other companies outside the US.
They're paying much more than the actual damages because US copyright law comes with statutory damages for infringement of registered works on top of actual damages, between $200 and $150,000 per work. And the two sides negotiated this as a fair settlement to reduce the risk of an unfavourable outcome.
Meanwhile it's not alleged that they redistributed the books in any form except as the output of LLMs (not any other form of redistribution).
This looks to be almost entirely a settlement for pirating the books. It does also cover the act of training the LLMs on the books, but since the district court already found that to be fair use it's unlikely to have been a major factor in the amount.
Thats a weird way for Anthropic to announce they're going out of business.
By extension, if the big publishers are getting $3000 per article, that could be a fairly significant windfall.
Kinda how like patents will state the human “inventor” but Apple or whichever corp is assigned the rights.
https://en.wikipedia.org/wiki/Hansi_K%C3%BCrsch
I'm not sure if he even knows, but that is almost certainly his tracks they trained on.
Anthropic went back and bought->scanned->destroyed physical copies of them afterward... but they pirated them first, and that's what this settlement is about.
The judge also said:
> “The training use was a fair use,” he wrote. “The technology at issue was among the most transformative many of us will see in our lifetimes.”
So you don't need to pay $3,000 per book you train on unless you pirate them.
For post-training, other data sources (like human feedback and/or examples) are way more expensive than books
Same racket the media cartels and patent trolls have been forcing for 40-50 years.
Sounds harsh, if true. Making its use practical only for hobby projects basically where the results of Claude kept for yourself completely (be it information, product using Claude, or product is made by using Claude). Difficult to believe, I hope I heard it wrong.
Unless, of course, the transformation malfunctioned and you got the good old verbatim source, with many of examples compiled in similar lawsuits
> When each LLM was put into a public-facing version of Claude, it was complemented by other software that filtered user inputs to the LLM and filtered outputs from the LLM back to the user. As a result, Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service.
(from Bartz v. Anthropic in the Northern District of California)
We are entering a world which is filled with corporate mafia that is above law (due to insignificant damage it can cause). These mafia would grip the world providing the essential services that make up the future world. The State would become much weaker, as policy makers could be bought by lobbying, punishments can be offset by VC funding.
It is all part of the playbook.
They pay out (relative) chump change as a penalty for explicitly pirating a bunch of ebooks, and in return they get a ruling that they can train on copyrighted works forever, for the purchase price of the book (not the price that would be needed to secure the rights!)
I'd be curious to hear from a legal professional...
It would take time, sure, to compile the lists and make bulk orders, but wouldn't it be cheaper in the end than the settlement?
That could push the industry toward consolidation—fewer independent experiments, more centralized R&D inside big tech. I feel that, this might slow the pace of unexpected innovations and increase dependence on incumbents.
This def. raises the question: how do we balance fair compensation for creators with keeping the door open for innovation?
Based on history this is not a possibility but a certainty.
The larger players - who grew because of limited regulations - will start supporting stricter regulation and compliance structures in order to increase the barrier of entry with the excuse of "Oh we learned our lesson, you are right". The hypocrisy is crazy but it makes sense from a capitalistic perspective.
The European and especially German approach of regulating pre-emptive might be more fair, but apparently it also stifles innovation, as we can observe. Almost no significant players from Europe and Germany.
In March, they were worth $61.5 billion
In six months they've created $120 billion in value. That's almost 700 million dollars per day. Avoiding being slowed down by even a few days is worth a billion dollar payout when you are on this trajectory. This lawsuit, and any lawsuit AI model companies are likely to get, will be a rounding error at the end of the fiscal year.
They know that superintelligent AI is far larger than money, and even so, the money they'll make on the way there is hefty enough for copyright law to not be an issue.
https://www.youtube.com/watch?v=sdtBgB7iS8c
Somehow excuses like "we torrented it, but we configured low seeding" "temptation was too strong because there was money to be made" "we tried getting a licenses, but then ignored it" and more ludicrous excuses actually worked.
Internal meta emails seemed to point to people knowing the blatant breach of copyright, and yet Meta won the case.
I guess there are tiers of laws even between billionaire companies.
What a formidable moat against newcomers, definitely worth the price!
But then, the countries with the freedom to add everything to the training dataset will have to distribute for free the weights in PI walled countries (because they would be plain 'illegal' and will be "blocked" over there, unless free as in free beer I guess), basically only what deepseek could work.
If powerfull LLM hardware becomes somewhat affordable (look at nvidia omega push on LLM specific hardware), "local" companies may run at reasonable speed those 'foreign trained LLM models', but "here".
Try to do that. There is no easy way to delete your account. You need to reach out to their support via email. Incredibly obnoxious dark pattern. I hate OpenAI, but everything with Anthropic also smells fishy.
We need more and better players. I hope that XAi will give them all some good competition, but I have my doubts.
Anthropic has made AI safety a central pillar of their ethos and have shared a lot of information about what they're doing to responsibly train models...personally I found a lot of corporate-speak on this topic from OpenAI, but very little information.
I haven't had this in a while, but I always hate it when I'm blocked by Cloudflare/Datadome/etc.
It reminds of the theoretically public beaches that are blocked off by privately owned land.
If you point a camera at an ebook reader with a little motor to tap the screen, "next" that's still easier than scanning physical books.
The reason why companies aren't using ebooks is because all the publishers and ebook companies make you click through a license stating that "this book for personal use" (paraphrased).
Because it turns out that nobody in the whole safety cult cares a whit for the human mind, the human experience, human art. Maybe for something they call "human values" in some abstract thought experiment, but never for any human decency. No, the human mind is just ones and zeros, just like a computer, no soul and no spark, to people in the cult. The cult thinks that an LLM reading a book is just the same mechanically as a human reading it.
Your brain is just emergence, your honor. Fair use. Blah blah Dennett Hofstadter Yudkowsky.
Do you feel safe?
But investors, please give us billions of dollars worth of that imaginary social agreement. We need it so that we can build the inevitable future. We're gonna do it more ethically than those guys over there.
What a swindle.
That's what you sound like, to people not in the safety cult.
Also if there is a software library with annoying Stallman-style license, can one use LLM to generate a compatible library but in a public domain or with commercial license? So that nobody needs to respect software licenses anymore? Can we also generate a free Photoshop, Linux kernel and Windows this way?
Even might AI with billions must kneel to copyright industry. We are forever doomed. Human culture will never be free from the grasp of rent seeking.
Seriously, how will this money propagate to the authors (if at all) or will it just stay with the publishers?
on_meds•1d ago
It’s not precedent setting but surely it’ll have an impact.
SlowTao•1d ago
typs•1d ago
mschild•1d ago
https://www.tomshardware.com/tech-industry/artificial-intell...
nerevarthelame•1d ago
>During a deposition, a founder of Anthropic, Ben Mann, testified that he also downloaded the Library Genesis data set when he was working for OpenAI in 2019 and assumed this was “fair use” of the material.
Per the NYT article, Anthropic started buying physical books in bulk and scanning them for their training data, and they assert that no pirated materials were ever used in public models. I wonder if OpenAI can say the same.
lewdwig•1d ago
I would not be surprised if investors made their last round of funding contingent on settling this matter out of court precisely to ensure no precedents are set.