Politicians, they try to crack as fewer eggs as possible, telling us they are our friends, and we believe them. Now then.. some do more good than bad, some do more bad than good. But on the other hand something that is _good for me_ is _bad for you_ and vice versa. Politicians are just the means to move the needle juuuuuuust a little bit, so show a change, but never make a drastic one. The cost of drastic changes is re-election. And this is the bread and butter of politicians (yes, I am over-over-simplifying but this is human history and a lot will be left out in a comment).
If a savant has perfect recall, remembers text perfectly and rearranges that text to create a marginally new text, he'd be sued for breach of copyright.
Only large corporations get away with it.
> If an AI reads 100 scientific papers and churns out a new one, it is plagiarism.
That is a specific claim that is being directly addressed and pretty clearly qualifies as "good faith".
If you draw a Venn Diagram of plagiarism and copyright violations, there's a big intersection. For example: if I take your paper, scratch off your name, make some minor tweaks, and submit it; I'm guilty of both plagiarism and copyright violation.
"To steal ideas from one person is plagiarism; to steal from many is research."
Any suits would be based on the degree the marginally new copy was fair use. You wouldn't be able to sue the savant for reading and remembering the text.
Using AI to creat marginally new copies of copyrighted work is ALREADY a violation. We don't need a dramatic expansion of copyright law that says that just giving the savant the book to real is a copyright violation.
Plagarism and copyright are two entirely different things. Plagarism is about citations and intellectual integrity. Copyright is a about protecting economic interests, has nothing to to with intellectual integrity, and isn't resolved by citing the original work. In fact most of the contexts where you would be accused of plagarism, would be places like reporting, criticism, education or research goals make fair use arguments much easier.
More like a speed-reader who retains a schema-level grasp of what they’ve read.
AI don’t have perfect recall.
https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
The average copywrite holder would like you to think that the law only allows use of their works in ways that they specifically permit, i.e. that which is not explicitly permitted is forbidden.
But the law is largely the reverse; it only denies use of copyright works in certain ways. That which is not specifically forbidden is permitted.
And for the future, here's one heuristic: if there is a profound violation of the law anywhere that (relatively speaking) is ignored or severely downplayed, it is likely that interested parties have arrived at an understanding. Or in other words, a conspiracy.
[1] There are tons of legal arguments on both sides, but for me it is enough to ask: if this is not illegal and is totally fair use (maybe even because, oh no look at what China's doing, etc.), why did they have to resort to & foster piracy in order to obtain this?
European here, but why do you think this is so clear cut? There are other jurisdictions where training on copyrighted data has already been allowed by law/caselaw (Germany and Japan). Why do you need a conspiracy in the US?
AFAICT the US copyright law deals with direct reproductions of a copyrighted piece of content (and also carves out some leeway with direct reproduction, like fair use). I think we can all agree by now that LLMs don't fully reproduce "letter perfect" content, right? What then is the "spirit" of the law that you think was broken here? Isn't this the definition of "transformative work"?
Of note is also the other big case involving books - the one where google was allowed to process mountains of books, they were sued and allowed to continue. How is scanning & indexing tons of books different than scanning & "training" an LLM?
Contrast that with AI companies:
They don't necessarily want to assert fair use, the results aren't necessarily publicly accessible, the work used isn't cited, users aren't directed to typical sales channels, and many common usages do meaningfully reduce the market for the original content (e.g. AI summaries for paywalled pages).
It's not obvious to me as a non-lawyer that these situations are analogous, even if there's some superficial similarity.
https://drewdevault.com/2020/08/24/Alice-in-Wonderland.html
https://drewdevault.com/2021/12/23/Sustainable-creativity-po...
If an artist produces a work they should have the rights to that work. If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.
That would indeed be nice, but as the article says, that's usually not the case. The rights holder and the author are almost never the same entity in commercial artistic endeavors. I know I'm not the rights holder for my erroneously-considered-art work (software).
> If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.
Why? You created influential art and its influence was spread. Is that not the point of (good) art?
There's definitely problems with corporatization of ownership of these things, I won't disagree.
> Why? You created influential art and its influence was spread. Is that not the point of (good) art?
Why do we expect artists to be selfless? Do you think Stephen King is still writing only because he loves the art? You don't simply make software because you love it, right? Should people not be able to make money off their effort?
In our current society, that means they need some sort of means to make money from their work. Copyright, at least in theory, exists to incentivize the creation of art by protecting an artists ability to monetize it.
If you abolish copyright today, under our current economic framework, what will happen is that people create less art because it goes from a (semi-)viable career to just being completely worthless to pursue. It's simply not a feasible option unless you fundamentally restructure society (which is a different argument entirely.)
The thing that'd set apart these companies are the services + quality of their work.
There are two reasons why it's a problem. The first reason is that any such abstraction is leaky, and those leaks are ripe for abuse. For example, in case of copyright on information, we made it behave like physical property for the consumers, but not for the producers (who still only need to expend resources to create a single work from scratch, and then duplicate it for free while still selling each copy for $$$). This means that selling information is much more lucrative than selling physical things, which is a big reason why our economy is so distorted towards the former now - just look at what the most profitable corporations on the market do.
The second reason is that it artificially entrenches capitalism by enmeshing large parts of the economy into those mechanics, even if they aren't naturally a good fit. This then gets used as an argument to prop up the whole arrangement - "we can't change this, it would break too much!".
And that's not even touching the spurious lawsuits about musical similarity. That's what musicians call a genre...
It makes some sense for a very short term literal right to reproduction of a singular work, but any time the concept of derivative works comes into play, it's just a bizarrely dystopian suppression of art, under the supposition that art is commercial activity rather than an innate part of humanity.
I mean, owning an idea is kinda gross, I agree. I also personally think that owning land is kinda gross. But we live in a capitalist society right now. If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs. Sam Altman, Elon Musk, and all the other tech CEOs will benefit in place of all of the artists I love and admire.
That, to me, sucks.
This is addressed in the second article I linked.
Consider how many books exist on how to care for trees. Each one of them has similar ideas, but the way those ideas are expressed differ. Copyright protects the content of the book; it doesn’t protect the ideas of how to care for trees.
Assuming you agree with the idea of inheritance, which is another topic, then it is unfair to deny inheritance of intellectual property. For example if your father has built a house, it will be yours when he dies, it won't become a public house. So why would a book your father wrote just before he died become public domain the moment he dies. It is unfair to those doing who are doing intellectual work, especially older people.
If you want short copyright, is would make more sense to make it 20 years, human or corporate, like patents.
Comparing intellectual property to real or physical property makes no sense. Intellectual property is different because it is non exclusive. If you are living in your father’s house, no one else can be living there. If I am reading your fathers book, that has nothing to do with whether anyone else can read the book.
Copyright is about control. If you know a song and you sing it to yourself, somebody overhears it and starts humming it, they have not deprived you of the ability to still know and sing that song. You can make economic arguments, of deprived profit and financial incentives, and that's fine; I'm not arguing against copyright here (I am not a fan of copyright, it's just not my point at the moment), I'm just saying that inheritance does not naturally apply to copyright, because data and ideas are not scarce, finite goods. They are goods that feasibly everybody in the world can inherit rapidly without lessening the amount that any individual person gets.
If real goods could be freely and easily copied the way data can, we might be having some very interesting debates about the logic and morality of inheriting your parents' house and depriving other people of having a copy.
If we enter a world where anyone can create a new Mario game and there are thousands of them released on the public web it would be impossible for the rights holders to do anything, and it would be a PR bad move to go after individuals doing it for fun.
Bad PR? The entire copyright enforcement industry has had bad PR pretty much since easy copying enabled grassroots piracy - i.e. since before computers even. It never stopped them. What are you going to do about it? Vote? But all the mainstream parties are onboard with the copyright lobby.
* https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
Yes please.
Delete it for everyone, not just these ridiculous autocrats. It's only helping them in the first place!
1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law
My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.
Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.
Humans get litigated against this all the time. There is such thing as, charitably, being too inspired.
https://en.wikipedia.org/wiki/List_of_songs_subject_to_plagi...
Plus, all art is derivative in some sense, it's almost always just a matter of degree.
The hold US companies have on the world will be dead too.
I also suspect that media piracy will be labelled as the only reason we need copyright, an existing agency will be bolstered to address this concern and then twisted into a censorship bureau.
The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.
Copyright only comes into play on publication. It's only concerned about publication of the models and publication of works. The machine itself doesn't have agency to publish anything at this point.
abstracting llms from their operators and owners and possible (and probable) ends and the territories they trample upon is nothing short of eye-popping to me. how utterly negligent and disrespectful of fellow people must one be at the heart to give any credence to such arguments
AI is fine as long as the work it generates is substantially new and transformative. If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem.
Yes, I'm aware that machines aren't people and can't be "inspired", but if the functional results are the same the law should be the same. Vaguely defined ideas like your soul or "inspiration" aren't real. The output is real, measurable, and quantifiable and that's how it should be judged.
I believe cover song licensing is available mechanically; you don't need permission, you just need to follow the procedures including sending the licensing fees to a rights clearing house. Music has a lot of mechanical licenses and clearing houses, as opposed to other categories of works.
That doesn't make piracy legal, even though I get a lot of use out of it.
Also, a person isn't a computer so the "but I can read a book and get inspired" argument is complete nonsense.
What we do know though is that LLMs, similar to humans, do not directly copy information into their "storage". LLMs, like humans, are pretty lossy with their recall.
Compare this to something like a search indexed database, where the recall of information given to it is perfect.
I understand people who create IP of any sort being upset that software might be able to recreate their IP or stuff adjacent to it without permission. It could be upsetting. But I don't understand how people jump to "Copyright Violation" for the fact of reading. Or even downloading in bulk. The Copyright controls, and has always controlled, creation and distribution of a work. In the nature even of the notice is embedded the concept that the work will be read.
Reading and summarizing have only ever been controlled in western countries via State's secrets type acts, or alternately, non-disclosure agreements between parties. It's just way, way past reality to claim that we have existing laws to cover AI training ingesting information. Not only do we not, such rules would seem insane if you substitute the word human for "AI" in most of these conversations.
"People should not be allowed to read the book I distributed online if I don't want them to."
"People should not be allowed to write Harry Potter fanfic in my writing style."
"People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."
We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics, the societal tradeoffs we've made so far, and is able to discuss where we might want to go, and what would be best.
You're still not gonna be allowed to commercially publish "Hairy Plotter and the Philosophizer's Rock".
the internet demands it.
the people demand free mega upload for everybody, why? because we can (we seem to NOT want to, but that should be a politically solvable problem)
In the meantime, I will continue to dislike copyright regardless of the parties involved.
https://chatgptiseatingtheworld.com/2025/05/12/opinion-why-t...
Pre-publication reports aren't unusual. https://www.federalregister.gov/public-inspection/current
https://www.federalregister.gov/reader-aids/using-federalreg...
> The Federal Register Act requires that the Office of the Federal Register (we) file documents for public inspection at our office in Washington, DC at least one business day before publication in the Federal Register.
You can find lots of people talking about training, and you can find lots (way more) of people talking about AI training being a violation of copyright, but you can't find anyone talking about both.
It's less clear whether taking vast amounts of copyrighted material and using it to generate other things rises to the level of copyright violation or not. It's the kind of thing that people would have prevented if it had occurred to them, by writing terms of use that explicitly forbid it. (Which probably means that the Web becomes a much smaller place.)
Your comment seems to suggest that writers and artists have absolutely no conceivable stake in products derived from their work, and that it's purely a misunderstanding on their part. But I'm both a computer scientist and an artist and I don't see how you could reach that conclusion.
Also Big Tech: We added 300.000.000 users worth of GTM because we trained in the 10 specific anime movies of Studio Ghibli and are selling their style.
andy99•2h ago
kklisura•2h ago
They acknowledge the issue is before courts:
> These issues are the subject of intense debate. Dozens of lawsuits are pending in the United States, focusing on the application of copyright’s fair use doctrine. Legislators around the world have proposed or enacted laws regarding the use of copyrighted works in AI training, whether to remove barriers or impose restrictions
Why did they write the finding: I assume it's because it's their responsibility:
> Pursuant to the Register of Copyrights’ statutory responsibility to “[c]onduct studies” and “[a]dvise Congress on national and international issues relating to copyright,”...
All excerpts are from https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
_heimdall•2h ago
Sure the courts may find its out of their jurisdiction, but they should act as they see fit and let the courts settle that later.
bgwalter•2h ago
Why could a copyright office not advise the congress/senate to enact a law that forbids copyrighted material to be used in AI training? This is literally the politicians' job.
9283409232•1h ago