See The Gutenberg Galaxy (book) (1962)
McLuhan's Wake (documentary movie, narrated by Laurie Anderson) (2002)
Re Wake: Listen to the accompanying full interviews with McLuhan's colleagues from which the documentary is drawn.
The legal judgement in the case of Anthropic may answer your question, although with the caveat that I'm not a lawyer, that I have no legal training, and that I may be misreading what looks like plain language but which has an importantly different meaning in law.
The judgement is here: https://cases.justia.com/federal/district-courts/california/...
To quote parts of the section "overall analysis" (page 30):
The copies used to train specific LLMs were justified as a fair use. Every factor but the nature of the copyrighted work favors this result. The technology at issue was among the most transformative many of us will see in our lifetimes.
… The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained “forever” for “general purpose” even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience.
In a way, this seems to be a repeat of the "The 'L' in 'ML' is 'learning'" argument:You are not allowed to use the photocopier in the library to make a copy of the entire book. If your local library is anything like the ones I remember back in the UK, there's even a sign right next to the photocopier telling you this.
You are in fact allowed to go to a public library, learn things from the books within, and apply that knowledge without paying anything to any copyright holder. If/once you buy a book, likewise, because once it's been bought you don't owe the copyright holder anything for having learned something. This is the point of a library, of education, and indeed of copyright: the word is literally the right to make a copy, as in giving authors control over who has the right to make a copy, this is not the right to an eternal rent from what is learned by reading a copy.
(If you then over-train a model so it does print verbatim copies, this is bad for both legal and technical reasons: legal, because it's a copy; technical, because using a neural net to do a lossy compression of documents is a terrible waste of resources, which is just like humans in exactly the way that nobody has any interest in reproducing in silicon).
chasing0entropy•1h ago
aebtebeten•1h ago