If a savant has perfect recall, remembers text perfectly and rearranges that text to create a marginally new text, he'd be sued for breach of copyright.
Only large corporations get away with it.
> If an AI reads 100 scientific papers and churns out a new one, it is plagiarism.
That is a specific claim that is being directly addressed and pretty clearly qualifies as "good faith".
???
Did you not literally comment the following?
>A new research paper is obviously materially different from "rearranging that text to create a marginally new text".
What did you mean by that, if that's not your claim?
If you draw a Venn Diagram of plagiarism and copyright violations, there's a big intersection. For example: if I take your paper, scratch off your name, make some minor tweaks, and submit it; I'm guilty of both plagiarism and copyright violation.
"To steal ideas from one person is plagiarism; to steal from many is research."
And honestly there is truth to it. Some people (at work, in rea life, wherever) might come off very inteligent but the moment they say "oh I just read that relevant fact on reddit/twitter/news site 5 minutes ago" you realize they are just like you and repeating relevant information that was consumed recently.
Any suits would be based on the degree the marginally new copy was fair use. You wouldn't be able to sue the savant for reading and remembering the text.
Using AI to creat marginally new copies of copyrighted work is ALREADY a violation. We don't need a dramatic expansion of copyright law that says that just giving the savant the book to real is a copyright violation.
Plagarism and copyright are two entirely different things. Plagarism is about citations and intellectual integrity. Copyright is a about protecting economic interests, has nothing to to with intellectual integrity, and isn't resolved by citing the original work. In fact most of the contexts where you would be accused of plagarism, would be places like reporting, criticism, education or research goals make fair use arguments much easier.
More like a speed-reader who retains a schema-level grasp of what they’ve read.
AI don’t have perfect recall.
What about loosely memorizing the gist of a copyrighted text. Is that a breach or fair use? What if a machine does something similar?
This falls under a rather murky area of the law that is not well defined.
Those who were immune were put under the scalpel."
https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
The average copywrite holder would like you to think that the law only allows use of their works in ways that they specifically permit, i.e. that which is not explicitly permitted is forbidden.
But the law is largely the reverse; it only denies use of copyright works in certain ways. That which is not specifically forbidden is permitted.
And for the future, here's one heuristic: if there is a profound violation of the law anywhere that (relatively speaking) is ignored or severely downplayed, it is likely that interested parties have arrived at an understanding. Or in other words, a conspiracy.
[1] There are tons of legal arguments on both sides, but for me it is enough to ask: if this is not illegal and is totally fair use (maybe even because, oh no look at what China's doing, etc.), why did they have to resort to & foster piracy in order to obtain this?
European here, but why do you think this is so clear cut? There are other jurisdictions where training on copyrighted data has already been allowed by law/caselaw (Germany and Japan). Why do you need a conspiracy in the US?
AFAICT the US copyright law deals with direct reproductions of a copyrighted piece of content (and also carves out some leeway with direct reproduction, like fair use). I think we can all agree by now that LLMs don't fully reproduce "letter perfect" content, right? What then is the "spirit" of the law that you think was broken here? Isn't this the definition of "transformative work"?
Of note is also the other big case involving books - the one where google was allowed to process mountains of books, they were sued and allowed to continue. How is scanning & indexing tons of books different than scanning & "training" an LLM?
Contrast that with AI companies:
They don't necessarily want to assert fair use, the results aren't necessarily publicly accessible, the work used isn't cited, users aren't directed to typical sales channels, and many common usages do meaningfully reduce the market for the original content (e.g. AI summaries for paywalled pages).
It's not obvious to me as a non-lawyer that these situations are analogous, even if there's some superficial similarity.
To begin with, this very case of Perlmutter getting fired after her office's report is interesting enough, but let's keep it aside. [0]
First, plenty of lobbying has been afoot, pushing DC to allow training on this data to continue. No intention to stop or change course. [1]
Next, when regulatory attempts were in fact made to act against this open theft, those proposed rules were conveniently watered down by Google, Microsoft, Meta, OpenAI and the US government lobbying against the copyright & other provisions. [2]
If you still think, "so what? maybe by strict legal interpretation it's still fair use" -- then explain why OpenAI is selectively signing deals with the likes of Conde Nast if they truly believe this to be the case. [3]
Lastly, when did you last see any US entity or person face no punitive action whatsoever despite illegally downloading (and uploading) millions of books & journal articles; do you remember Aaron Swartz? [4]
You might not agree with my assessment of 'conspiracy', but are you denying there is even an alignment of incentives contrary to the spirit of the law?
[0] https://www.reuters.com/legal/government/trump-fires-head-us...
[1] https://techcrunch.com/2025/03/13/openai-calls-for-u-s-gover...
[2] https://www.euronews.com/next/2025/04/30/big-tech-watered-do...
[3] https://www.reuters.com/technology/openai-signs-deal-with-co...
[4] https://cybernews.com/tech/meta-leeched-82-terabytes-of-pira...
https://drewdevault.com/2020/08/24/Alice-in-Wonderland.html
https://drewdevault.com/2021/12/23/Sustainable-creativity-po...
If an artist produces a work they should have the rights to that work. If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.
That would indeed be nice, but as the article says, that's usually not the case. The rights holder and the author are almost never the same entity in commercial artistic endeavors. I know I'm not the rights holder for my erroneously-considered-art work (software).
> If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.
Why? You created influential art and its influence was spread. Is that not the point of (good) art?
There's definitely problems with corporatization of ownership of these things, I won't disagree.
> Why? You created influential art and its influence was spread. Is that not the point of (good) art?
Why do we expect artists to be selfless? Do you think Stephen King is still writing only because he loves the art? You don't simply make software because you love it, right? Should people not be able to make money off their effort?
I can't speak for Stephen but I absolutely do. I program for fun all the time.
> Should people not be able to make money off their effort?
Is anyone arguing otherwise?
Maybe selling books? Maybe other jobs? The same way that they made money for thousands of years before copyright, really. Books and other arts did exist before copyright!
> and why would someone pay them, if their work is free to be copied at will?
I don't think it's really a matter of if people will pay them. If their art is good, of course people will pay them. People feel good about paying for an original piece of art.
The question is really more about if people will be able to get obscenely rich over being the original creator of some piece of art, to which the answer is it would indeed be less likely.
We didn't have modern novelists a thousand years ago. We didn't have mass production until ~500 years ago, and copyright came in in the 1700's. We didn't have mass produced pulp fiction like we do today until the 20th century. There is little copyright-less historical precedent to refer to here, even if we carve out the few hundred years between the printing press and copyright, it's not as though everyone was mass consuming novels, the literacy rate was abysmal. I wonder what artist yearns for the 1650s.
> If their art is good, of course people will pay them.
You say this as if it were a fact, but that's not axiomatic. Once the first copy is in the wild it's fair game for anyone to copy it as they will. Who is paying them? Should the artists return to the days of needing a wealthy patron? Is patreon the answer to all of our problems?
> Maybe selling books?
But how? To who? A publishing house isn't going to pick them up, knowing that any other publishing house can start selling the same book the minute it shows to be popular, and if you're self publishing and you're starting to make good numbers then the publishing houses can eat you alive.
> The question is really more about if people will be able to get obscenely rich over being the original creator of some piece of art, to which the answer is it would indeed be less likely.
No, the question is if ordinary people could make a living off their novels without copyright. It's very hard today, but not impossible. Without copyright it wouldn't be.
In our current society, that means they need some sort of means to make money from their work. Copyright, at least in theory, exists to incentivize the creation of art by protecting an artists ability to monetize it.
If you abolish copyright today, under our current economic framework, what will happen is that people create less art because it goes from a (semi-)viable career to just being completely worthless to pursue. It's simply not a feasible option unless you fundamentally restructure society (which is a different argument entirely.)
The second one is the "just solve capitalism and we can abolish copyright entirely" argument which is... a total non-starter. Yes, in an idealized utopia, we don't need capitalism or copyright and people can do things just because they want to and society provides for the artist just because humans all value art just that much. It's a fun utopic ideal, but there's many steps between the current state of the world and "you can abolish the idea of copyright", and we aren't even close to that state yet.
The thing that'd set apart these companies are the services + quality of their work.
There are two reasons why it's a problem. The first reason is that any such abstraction is leaky, and those leaks are ripe for abuse. For example, in case of copyright on information, we made it behave like physical property for the consumers, but not for the producers (who still only need to expend resources to create a single work from scratch, and then duplicate it for free while still selling each copy for $$$). This means that selling information is much more lucrative than selling physical things, which is a big reason why our economy is so distorted towards the former now - just look at what the most profitable corporations on the market do.
The second reason is that it artificially entrenches capitalism by enmeshing large parts of the economy into those mechanics, even if they aren't naturally a good fit. This then gets used as an argument to prop up the whole arrangement - "we can't change this, it would break too much!".
I feel like you're shoving all information under the same label. The most profitable corporations are trading in information that isn't subject to copyright, and it's facts - how you drive, what you eat, where you live. It's newly generated ideas. Maybe it is in how the data is sorted, but they aren't copyrighting that either.
If we're going to overthrow artificial entrenchments of capitalism, I feel like there's better places to start than a lot of copyright. Does it need changes? Absolutely, there's certainly exploitation, but I still don't see "get rid of copyright entirely" as being a good approach. Weirdly, it's one of the places that people are arguing for that. Sometimes the criminal justice system convicts the wrong person, and there should be reform. It's also often criticized as a measure of control for capitalistic oligarchs. Should step one be getting rid of the legal system entirely?
The current illegality of the piracy website prevents them from offering a service as nice as Steam. It has to be a sketchy torrent hub that changes URLs every few months. If it was as easy as changing the url to freesteampowered.com or installing an extension inside the steam launcher, the whole "piracy is a service issue" argument loses all relevance. The industry would become unsustainable without DRM (which would be technically legal to crack, but also more incentivized to make harder to crack).
People would just delete the malware (DRM) out of the source code that is no longer restricted by copyright.
If your argument is that copyright is good because it discourages DRM, I think you have a very evidently weak argument.
Steam is the classic example of how this is effective. You compete with pirates by offering what they can't: a reliable, convenient service. DRM becomes more of a hindrance than a benefit in this situation.
Allowing pirates to offer reliable convenient pirate websites that are "so easy a normie can do it" would be a disaster for all the creative industries. You would need to radically change the rest of society to prevent a total collapse of people making money off art.
And that's not even touching the spurious lawsuits about musical similarity. That's what musicians call a genre...
It makes some sense for a very short term literal right to reproduction of a singular work, but any time the concept of derivative works comes into play, it's just a bizarrely dystopian suppression of art, under the supposition that art is commercial activity rather than an innate part of humanity.
I mean, owning an idea is kinda gross, I agree. I also personally think that owning land is kinda gross. But we live in a capitalist society right now. If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs. Sam Altman, Elon Musk, and all the other tech CEOs will benefit in place of all of the artists I love and admire.
That, to me, sucks.
This is addressed in the second article I linked.
There is fair use, but fair is an affirmative defense to infringing copyright. By claiming fair use you are simultaneously admitting infringement. The idea that you have to defend your own private expression of ideas based on other ideas is still wrong in my view.
This is exactly wrong. You can copy all of Harry Potter into your journal as many times as you want legally (creating copies) so long as you do not distribute it.
"copyright law assigns a set of exclusive rights to authors: to make and sell copies of their works, to create derivative works, and to perform or display their works publicly"
"The owner of a copyright has the exclusive right to do and authorize others to do the following: To reproduce the work in copies or phonorecords;To prepare derivative works based upon the work;"
"Commonly, this involves someone creating or distributing"
https://www.copyright.gov/what-is-copyright/
"U.S. copyright law provides copyright owners with the following exclusive rights: Reproduce the work in copies or phonorecords. Prepare derivative works based upon the work."
https://internationaloffice.berkeley.edu/students/intellectu...
"Copyright infringement occurs when a work is reproduced, distributed, displayed, performed or altered without the creator’s permission."
There are endless legitimate sources for this. Copyright protects many things, not just distribution. It very clearly disallows the creation and production of copyrighted works.
> If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs.
It's interesting how much parallel there is here to the idea that company owners reap the rewards of their employee's labor when doing no additional work themselves. The fruits of labors should go to the individuals who labor, I 100% agree.
Consider how many books exist on how to care for trees. Each one of them has similar ideas, but the way those ideas are expressed differ. Copyright protects the content of the book; it doesn’t protect the ideas of how to care for trees.
I understand what you're saying but the way you're framing it isn't what I really have a problem with. I still don't agree with the idea that I can't make my own physical copies of Harry Potters books, identical word for word. I think people can choose to buy the physical books from the original publisher because they want to support them or like the idea that it's the "true" physical copy. And I'm going to push back on that a million times less than the concept of things like Moana comic books. But still, it's infringing copyright for me to make Moana comic books in my own home, in private, and never showing them to anyone. And that's ridiculous.
Moana and Moana 2 are both animated movies that have already been made. They're not just figures of one's imagination.
> If I made a Moana comic book, with an entirely original storyline and original art and it was all drawn in my own style and not using 3D assets similar to their movies, that is violating copyright
It might be, or it might not. Copyright protects the creation of derivative works (17 USC 101, 17 USC 103, 17 USC 106), but it's the copyright holder's burden to persuade the court that the allegedly infringing work with the character Moana in it is derivative of their protected work.
Ask yourself the question: what is the value of Moana to you in this hypothetical? What if you used a different name for the character and the character had a different backstory and personality?
> I still don't agree with the idea that I can't make my own physical copies of Harry Potters books
You might think differently if you had sunk thousands of hours into creating a new novel and creative work was your primary form of income.
> But still, it's infringing copyright for me to make Moana comic books in my own home, in private, and never showing them to anyone.
It seems unlikely that Disney is would go after you for that. Kids do it all the time.
It’s very unlikely that she would (or even could) have devoted herself to writing fiction in her free time as a passion project without hope of monetary reward and without any way to live from her writing for the ten years it took to finish the Potter series.
And even if she had somehow managed, you’d never hear about it, because without publishers to act as gatekeepers it’d have been lost in the mountains of fanfic and whatever other slop amateur writers upload to the internet.
> its popularity is indicative of its quality, even if it doesn't match the standards of a literature PhD for "good writing"
This is a false dichotomy. Literature PhDs are not the only people out there who enjoy high-quality literature more than light entertainment, and anyway, you seem to be admitting that there's a type of fiction that doesn't exist unpaid, so isn't this just proving my point correct?
All that said, even if I accept for the sake of argument that the existence of popular free genre fiction would be enough to prove your point (because, in fairness to you, we were originally talking about Harry Potter, which is as genre as it gets)... I went looking, and there are at most a few sporadic examples. A few minutes of research suggest that some books by Cory Doctorow are among the most popular ones. Also, The Martian by Andy Weir used to be freely available, but isn't anymore as far as I can find.
Sorry, but Cory Doctorow and (formerly) Andy Weir represent a pretty small body of work compared to the entire canon of paid novels, so I'm going to have to call BS on your claim unless you provide some examples of your own.
Assuming you agree with the idea of inheritance, which is another topic, then it is unfair to deny inheritance of intellectual property. For example if your father has built a house, it will be yours when he dies, it won't become a public house. So why would a book your father wrote just before he died become public domain the moment he dies. It is unfair to those doing who are doing intellectual work, especially older people.
If you want short copyright, is would make more sense to make it 20 years, human or corporate, like patents.
Comparing intellectual property to real or physical property makes no sense. Intellectual property is different because it is non exclusive. If you are living in your father’s house, no one else can be living there. If I am reading your fathers book, that has nothing to do with whether anyone else can read the book.
If you consider it right to get value from the work of your family, and you consider that intellectual work (such as writing a book) to be valuable, then as an inheritor, you should get value from it. And since the way we give value to intellectual work is though copyright, then inheritors should inherit copyright.
If you think that copyright should not exceed lifetime, then the logical consequences would be one of:
- inheritance should be abolished
- intellectual work is less valuable than other forms of work
- intellectual property / copyright is not how intellectual work should be rewarded
There are arguments for abolishing inheritance, it is after all one of the greatest sources of inequality. Essentially, it means 100% inheritance tax in addition to all the work going into the public domain. Problematic in practice.
For the value of intellectual work, well, hard to argue against it on Hacker News without being a massive hypocrite.
And there are alternatives to copyright (i.e. artificial scarcity) for compensating intellectual work like there are alternatives to capitalism. Unfortunately, it often turns out poorly in practice. One suggestion is to have some kind of tax that is fairly distributed between authors in exchange for having their work in the public domain. Problem is: define "fairly".
Note that I am not saying that copyright should last long, you can make copyright 20 years, humans or corporate, inheritable. Simple, gets in the public domain sooner, fairer to older authors, already works for patents. Why insist on "lifetime"?
Copyright is about control. If you know a song and you sing it to yourself, somebody overhears it and starts humming it, they have not deprived you of the ability to still know and sing that song. You can make economic arguments, of deprived profit and financial incentives, and that's fine; I'm not arguing against copyright here (I am not a fan of copyright, it's just not my point at the moment), I'm just saying that inheritance does not naturally apply to copyright, because data and ideas are not scarce, finite goods. They are goods that feasibly everybody in the world can inherit rapidly without lessening the amount that any individual person gets.
If real goods could be freely and easily copied the way data can, we might be having some very interesting debates about the logic and morality of inheriting your parents' house and depriving other people of having a copy.
If we enter a world where anyone can create a new Mario game and there are thousands of them released on the public web it would be impossible for the rights holders to do anything, and it would be a PR bad move to go after individuals doing it for fun.
Bad PR? The entire copyright enforcement industry has had bad PR pretty much since easy copying enabled grassroots piracy - i.e. since before computers even. It never stopped them. What are you going to do about it? Vote? But all the mainstream parties are onboard with the copyright lobby.
The fact that copyright law is easy to violate and hard to enforce doesn't stop Nintendo from burning millions of dollars on legal fees to engage in life-ruining enforcement actions against randos making fangames.
"Democratization" with respect to copyright law would be changing the law to put Mario in the public domain, either by:
- Reducing term lengths to make Mario literally public domain. It's unclear whether or not such an act would survive the Takings Clause of the US Constitution. Perhaps you could get around that by just saying you can't enforce copyrights older than 20 years even though they nominally exist. Which brings us to...
- Adding legal exceptions to copyright to protect fans making fan games. Unlikely, since in the US we have common law, which means our exceptions have to be legislated from the judicial bench, and judges are extremely leery of 'fair use' arguments that basically say 'it is very inconvenient for me to get permission to use the thing'.
- Creating some kind of social copyright system that "just handles" royalty payments. This is probably the most literal interpretation of 'democratize'. I know of few extant systems for this, though - like, technically ASCAP is this, but NOBODY would ever hold up ASCAP as an example of how to do licensing right. Furthermore without legal backing, Nintendo can just hold out and retain traditional "my way or the highway" licensing rights.
- Outright abolishing copyright and telling artists to fend for themselves. This is the kind of solution that would herald either a total system collapse or extreme authoritarianism. It's like the local furniture guy selling sofas at 99% off because the Mafia is liquidating his gambling debts. Sure, I like free shit, but I also know that furniture guy is getting a pair of cement shoes tonight.
None of these are what AI companies talk about. Adding an exception just for AI training isn't democratizing IP, because you can't democratize AI training. AI is hideously memory-hungry and the accelerators you need to make it work are also expensive. I'm not even factoring in the power budget. They want to replace IP with something worse. The world they want is one where there are three to five foundation models, all owned and controlled by huge tech megacorps, and anyone who doesn't agree with them gets cut off.
* https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
Yes please.
Delete it for everyone, not just these ridiculous autocrats. It's only helping them in the first place!
1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law
My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.
Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.
Humans get litigated against this all the time. There is such thing as, charitably, being too inspired.
https://en.wikipedia.org/wiki/List_of_songs_subject_to_plagi...
Plus, all art is derivative in some sense, it's almost always just a matter of degree.
Yes, that's why we judge on a case by case basis. The line is blurry.
I think when you're storing copies of such assets in your database that you're well past the line, though.
The hold US companies have on the world will be dead too.
I also suspect that media piracy will be labelled as the only reason we need copyright, an existing agency will be bolstered to address this concern and then twisted into a censorship bureau.
AI is fine as long as the work it generates is substantially new and transformative. If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem.
Yes, I'm aware that machines aren't people and can't be "inspired", but if the functional results are the same the law should be the same. Vaguely defined ideas like your soul or "inspiration" aren't real. The output is real, measurable, and quantifiable and that's how it should be judged.
I believe cover song licensing is available mechanically; you don't need permission, you just need to follow the procedures including sending the licensing fees to a rights clearing house. Music has a lot of mechanical licenses and clearing houses, as opposed to other categories of works.
Those procedures are how you ask for permission. As you say, it usually involves a fee but doesn't have to.
Compulsory licenses are interesting aren't they? It just feels wrong. If Metallica doesn't want me to butcher their songs, why should the be forced to allow it?
As a consumer, it would amazing if there were compulsory licenses for film and tv; then we wouldn't have to subscribe to 70 different services to get to the things we want to see. And there would likely be services that spring up to redistribute media where the rightsholders aren't able to or don't care to; it might be pulled from VHS that fans recorded off of TV in the old days, but at least it'd be something.
Society doesn't need to measure my mind, they need to measure the output of it. If I behave like a conscious being, I am a conscious being. Alternatively you might phrase it such that "Anything that claims to be conscious must be assumed to be conscious."
It's the only answer to the p-zombie problem that makes sense. None of this is new, philosophers have been debating it for ages. See: https://en.wikipedia.org/wiki/Philosophical_zombie
However, for copyright purposes we can make it even simpler. If the work is new, it's not covered by the original copyright. If it is substantially the same, it isn't. Forget the arguments about the ghost in the machine and the philosophical mumbo-jumbo. It's the output that matters.
Your radical behaviourism seems an advantage to you when you want to delete one disfavoured part of copyright law, but I assure you, it isn't in your interest. It doesnt universalise well at all. You do not want to be defined by how you happen to verbalise anything, unmoored from your intention, goals, and so on.
The law, and society, imparts much to you that is never measured and much that is unmeasurable. What can be measured is, at least, extremely ambiguous with respect to those mental states which are being attributed. Because we do not attribute mental states by what people say -- this plays very little role (consider what a mess this would make of watching movies). And none of course in the large number of animals which share relevant mental states.
Nothing of relevance is measured by an LLM's output. It is highly unambigious: the LLM has no mental states, and thus is irrelevant to the law, morality, society and everything else.
It's a obcene sort of self-injury to assume that whatever kind of radical behaviourism is necessary to hype the LLM is the right sort. Hype for LLMs does not lead to a credible theory of minds.
I don't mean to say that they literally have to speak the words by using their meat to make the air vibrate. Just that, presuming it has some physical means, it be capable (and willing) to express it in some way.
> It's a obcene sort of self-injury to assume that whatever kind of radical behaviourism is necessary to hype the LLM is the right sort.
I appreciate why you might feel that way. However, I feel it's far worse to pretend we have some undetectable magic within us that allows us to perceive the "realness" of others peoples consciousness by other than physical means.
Fundamentally, you seem to be arguing that something with outputs identical to a human is not human (or even human like), and should not be viewed within the same framework. Do you see how dangerous an idea that is? It is only a short hop from "Humans are different than robots, because of subjective magic" to "Humans are different than <insert race you don't like>, because of subjective magic."
Why is that? Seems all logic gets thrown out the window when invoking AI around here. References are given. If the user publishes the output without attribution, NOW you have a problem. People are being so rabid and unreasonable here. Totally bat shit.
I didn't meant to imply that the AI can't quote Shakespeare in Context, just that it shouldn't try to pass off Shakespeare as it's own or plagiarize huge swathes of the source text.
> People are being so rabid and unreasonable here.
People here are more reasonable than average. Wait until mainstream society starts to really feel the impact of all this.
That doesn't make piracy legal, even though I get a lot of use out of it.
Also, a person isn't a computer so the "but I can read a book and get inspired" argument is complete nonsense.
What we do know though is that LLMs, similar to humans, do not directly copy information into their "storage". LLMs, like humans, are pretty lossy with their recall.
Compare this to something like a search indexed database, where the recall of information given to it is perfect.
Another way would be to train an internal model directly on published works, use that model to generate a corpus of sanitary rewritten/reformatted data about the works still under copyright, then use the sanitized corpus to train a final model. For example, the sanitized corpus might describe the Harry Potter books in minute detail but not contain a single sentence taken from the originals. Models trained that way wouldn't be able to reproduce excerpts from Harry Potter books even if the models were distributed as open weights.
I understand people who create IP of any sort being upset that software might be able to recreate their IP or stuff adjacent to it without permission. It could be upsetting. But I don't understand how people jump to "Copyright Violation" for the fact of reading. Or even downloading in bulk. The Copyright controls, and has always controlled, creation and distribution of a work. In the nature even of the notice is embedded the concept that the work will be read.
Reading and summarizing have only ever been controlled in western countries via State's secrets type acts, or alternately, non-disclosure agreements between parties. It's just way, way past reality to claim that we have existing laws to cover AI training ingesting information. Not only do we not, such rules would seem insane if you substitute the word human for "AI" in most of these conversations.
"People should not be allowed to read the book I distributed online if I don't want them to."
"People should not be allowed to write Harry Potter fanfic in my writing style."
"People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."
We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics, the societal tradeoffs we've made so far, and is able to discuss where we might want to go, and what would be best.
The article specificaly talks about the creation and distribution of a work. Creation and distribution of a work alone is not a copyright violation. However, if you take in input from something you don't own, and genAI outputs something, it could be considered a copyright violation.
Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.
> "People should not be allowed to read the book I distributed online if I don't want them to."
This is already done. It's been done for decades. See any case where content is locked behind an account. Only select people can view the content. The license to use the site limits who or what can use things.
So it's odd you would use "insane" to describe this.
> "People should not be allowed to write Harry Potter fanfic in my writing style."
Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it. Most cases of fan fiction are allowed because the author allows it. But no, generally, fan fiction is illegal. This is well known in the fan fiction community. Obviously, if you don't distribute it, that's fine. But we aren't talking about non-distribution cases here.
> "People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."
Same with fan fiction. If you replicate a copyrighted piece of art, if you distribute it, that's illegal. If you simply do it for practice, that's fine. But no, if you go around replicating a painting and distribute it, that's illegal.
Of course, technically speaking, none of this is what gen AI models are doing.
> We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics
I agree. Personifying gen AI is useless. We should stick to the technical aspects of what it's doing, rather than trying to pretend it's doing human things when it's 100% not doing that in any capacity. I mean, that's fine for the the layman, but anyone with any ounce of technical skill knows that's not true.
Which is a clear failure of the copyright system. Millions of people are expanding our cultural artifacts with their own additions, but all of it is illegal, because they haven't waited another 100 years.
People are interested in these pieces of culture, but they're not going to remain interested in them forever. At least not interested enough to make their own contributions.
Absolute horse shit. I can start a 1-900 answer line and use any reference I want to answer your question.
I agree, what followed was.
> I can start a 1-900 answer line and use any reference I want to answer your question
Yeah, that's not what we are talking about. If you think it was, you should probably do some more research on the topic.
My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like. They sense and fear change. For instance here you say it's an issue when AI uses something as a source that you don't have Copyright to. Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue". What you said just isn't true. The copyright refers to the right to copy a work.
Distribution: Sure. License your content however you want. That said, in the US a license prohibiting you from READING something just wouldn't be possible. You can limit distribution, copying, etc. This is how journalists can write about sneak previews or leaked information or misfiled court documents released when they should be under seal. The leaking <-- the distribution might violate a contract or a license, but the reading thereof is really not a thing that US law or Common law think they have a right to control, except in the case of the state classifying secrets. As well, here we have people saying "my song in 1983 that I put out on the radio, I don't want AI listening to that song." Did your license in 1983 prohibit computers from processing your song? Does that mean digital radio can't send it out? Essentially that ship has sailed, full stop, without new legislation.
On my last points, I think you're missing my point, Fan fiction is legal if you're not trying to profit from it. It is almost impossible to perfectly copy a painting, although some people are pretty good at it. I think it's perfectly legal to paint a super close copy of say Starry Night, and sell it as "Starry night by Jason Lotito." In any event, the discourse right now claims its wrong for AI to look at and learn from paintings and photographs.
Your proposal is moving goal posts.
> Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue".
No, I never said that. Fair Use exists.
> Fan fiction is legal if you're not trying to profit from it.
No, it's not.[1] You can make arguments that it should be, but, no.
[1] https://jipel.law.nyu.edu/is-fanfiction-legal/
> I think you're missing my point
I think you got called out, and you are now trying to reframe your original comment so it comes across as having accounted for the things you were called out on.
You think you know what you are talking about, but you don't. But, you rely on the fact that you think you do to lose the money you do.
Of course, if you start your thought by dismissing anybody who doesn't share your position as not sane, it's easy to see how you could fail to capture any of that.
^[1] https://arstechnica.com/tech-policy/2025/05/judge-on-metas-a...
You're still not gonna be allowed to commercially publish "Hairy Plotter and the Philosophizer's Rock".
The law is supposed to be impartial. So if the answer is different, then it's not really a law problem we're talking about.
[0] https://en.wikipedia.org/wiki/Dowling_v._United_States_(1985...
I agree. If you can pay the judge, the congress or the president, it is definitely not stealing. It is (the best) democracy (money can buy). /s
If we can agree that taking away of your time is theft (wage theft, to be precise), we as those who rely on intellect in our careers should be able to agree that the taking of our ideas is also theft.
>moved to the Ninth Circuit Court of Appeals, where he argued that the goods he was distributing were not "stolen, converted or taken by fraud", according to the language of 18 U.S.C. 2314 - the interstate transportation statute under which he was convicted. The court disagreed, affirming the original decision and upholding the conviction. Dowling then took the case to the Supreme Court, which sided with his argument and reversed the convictions.
This just tells me that the definition is highly contentious. Having the supreme court reverse a federal ruling already shows misalignment.
If we end up saying it is not illegal, then I demand, that it will not be illegal for everyone. No double standards please. Let us all launder copyrighted material this way, labeling it "AI".
Professing of IP without a license AND offering it as a model for money doesn't seem like an unknown use-case to me
Corporations are not humans. (It's ridiculous that they have some legal protections in the US like humans, but that's a different issue). AI is also not human. AI is also not a chipmunk.
Why the comparison?
Huh? If you agree that "learning from copyrighted works to make new ones" has traditionally not been considered infringement, then can you elaborate on why you think it fundamentally changes when you do it with bots? That would, if anything, seem to be a reversal of classic copyright jurisprudence. Up until 2022, pretty much everyone agreed that "learning from copyrighted works to make new ones" is exactly how it's supposed to work, and would be horrified at the idea of having to separately license that.
Sure, some fundamental dynamic might change when you do it with bots, but you need to make that case in an enforceable, operationalized way.
Go back to the roots of copyright and the answers should be obvious. According to the US constitution, copyright exists "To promote the Progress of Science and useful Arts" and according to the EU, "Copyright ensures that authors, composers, artists, film makers and other creators receive recognition, payment and protection for their works. It rewards creativity and stimulates investment in the creative sector."
If I publish a book and tech companies are allowed to copy it, use it for "training", and later regurgitate the knowledge contained within to their customers then those people have no reason to buy my book. It is a market substitute even though it might not be considered such under our current copyright law. If that is allowed to happen then investment will stop and these books simply won't get written anymore.
Humans are also very useful and transformative.
As a private person I no longer feel incentivised to create new content online because I think that all I create will eventually be stolen from me...
the internet demands it.
the people demand free mega upload for everybody, why? because we can (we seem to NOT want to, but that should be a politically solvable problem)
In the meantime, I will continue to dislike copyright regardless of the parties involved.
Either force AI companies to compensate the artists they're being "inspired" by, or let people torrent a copywashed Toy Story 5.
IP maximalism is requiring DRM tech in every computer and media-capable device that won't play anything without checking into a central server and also making it illegal to reverse or break that DRM. IP maximalism is extending the current bonkers time interval of copyright (over 100 years) to forever. If AI concerns manage to get this down to a reasonable, modern timeframe it'll be awesome.
Record companies in the 90s tied the noose around their own necks, which is just as well because they're very useless now except for supporting geriatric bands. They should have started selling mp3s for 99 cents in 1997 and maybe they would have made a couple of dollars before their slide into irrelevance.
The specific thing people don't want, which a few weirdos keep pushing, is AI-generated stuff passed off as new creative material. It's fine for fun and games, but no one wants a streaming service of AI-generated music, even if you can't tell it's AI generated. And the minute you think you have that cracked - that an AI can create music/art as good as a human and that humans can't tell, the humans will start making bad music/art in rebellion, and it'll be the cool new thing, and the armies of 10Kw GPUs will be wasting their energy on stuff an 1Mhz 8-bit machine could do in the 80s.
Maybe the government should set up a fund to pay all the copyright holders whose works were used to train the AI models. And if it's a pain to track down the rights holders, I'll play a tiny violin.
I find the shift of some right wing politicians and companies from "TPB and megaupload are criminals and its owners must be extradited from foreign countries!" to "Information wants to be free!" much more illuminating.
Instead of the understanding that copyrights and patents are temporary state-granted monopolies meant to benefit society they are instead framed as real perpetual property rights. This framing fuels support for draconian laws and obscures the real purpose of these laws: to promote innovation and knowledge sharing and not to create eternal corporate fiefdoms.
The general public has been lectured for decades about how piracy is morally wrong, but as soon as startups and corporations are in it for profit, everybody looks away?
As for the zeitgeist, I'm not sure anything has materially changed. Recently, creators have been very upset over Silicon Valley AI companies ingesting their output. Is this really reflective of "general internet sentiment"? Would those same people have supported abolition of copyright in the past? I doubt it.
https://chatgptiseatingtheworld.com/2025/05/12/opinion-why-t...
Pre-publication reports aren't unusual. https://www.federalregister.gov/public-inspection/current
https://www.federalregister.gov/reader-aids/using-federalreg...
> The Federal Register Act requires that the Office of the Federal Register (we) file documents for public inspection at our office in Washington, DC at least one business day before publication in the Federal Register.
You can find lots of people talking about training, and you can find lots (way more) of people talking about AI training being a violation of copyright, but you can't find anyone talking about both.
Edit: Let me just clarify that I am talking about training, not inference (output).
While AIs don't reproduce things verbatim like pirates, I can see how they really undermine the market, especially for non-fiction books. If people can get the facts without buying the original book, there's much less incentive for the original author to do the hard research and writing.
It's less clear whether taking vast amounts of copyrighted material and using it to generate other things rises to the level of copyright violation or not. It's the kind of thing that people would have prevented if it had occurred to them, by writing terms of use that explicitly forbid it. (Which probably means that the Web becomes a much smaller place.)
Your comment seems to suggest that writers and artists have absolutely no conceivable stake in products derived from their work, and that it's purely a misunderstanding on their part. But I'm both a computer scientist and an artist and I don't see how you could reach that conclusion. If my work is not relevant then leave it out.
Is that a problem with the tool, or the person using it? A photocopier can copy an entire book verbatim. Should that be illegal? Or is it the problem that the "training" process can produce a model that has the ability to reproduce copyrighted work? If so, what implication does that hold for human learning? Many people can recite an entire song's lyrics from scratch, and reproducing an entire song's lyrics verbatim is probably enough to be considered copyright infringement. Does that mean the process of a human listening to music counts as copyright infringement?
If I were to take an image, and compress it or encrypt it, and then show you data file, you would not be able to see the original copyrighted material anywhere in the data.
But if you had the right computer program, you could use it to regenerate the original image flawlessly.
I think most people would easily agree that distributing the encrypted file without permission is still a distribution of a copyrighted work and against the law.
What if you used _lossy_ encryption, and can merely reproduce a poor quality jpeg of the original image? I think still copyright infringement, right?
Would it matter if you distributed it with an executable that only rendered the image non-deterministically? Maybe one out of 10 times? Or if the command to reproduce it was undocumented?
Okay, so now we have AI. We can ignore the algorithm entirely and how it works, because it's not relevant. There is a large amount of data that it operates on, the weights of the model and so on. You _can_ with the correct prompts, sometimes generate a copy of a copyrighted work, to some degree of fidelity or another.
I do not think it is meaningfully different from the simpler example, just with a lot of extra steps.
I think, legally, it's pretty clear that it is illegally distributing copyrighted material without permission. I think calling it an "ai" just needlessly anthropomorphizes everything. It's a computer program that distributes copyrighted work without permission. It doesn't matter if it's the primary purpose or not.
I think probably there needs to be some kind of new law to fix this situation, but under the current law as it exists, it seems to me to be clearly illegal.
Suppose we accept all of the above. What does that hold for human learning?
I'm not talking about learning. I'm talking about the complete reproduction of a copyrighted work. It doesn't matter how it happens.
In that case I don't think there's anything controversial here? Nobody thinks that if you ask AI to reproduce something verbatim, that you should get a pass because it's AI. All the controversy in this thread seems to be around the training process and whether that breaks copyright laws.
LLMs seek to be a for-profit replacement for a variety of paid sources. They say "hey, you can get the same thing as Service X for less money with us!"
That's a problem, regardless of how you go about it. It's probably fine if I watch a movie with my friends, who cares. But distributing it over the internet for free is a different issue.
>LLMs seek to be a for-profit replacement for a variety of paid sources. They say "hey, you can get the same thing as Service X for less money with us!"
What's an LLM supposed to be a substitute for? Are people using them to generate entire books or news articles, rather than buying a book or an issue of the new york times? Same goes for movies. No one is substituting marvel movies with sora video.
Yes.
> No one is substituting marvel movies with sora video.
Yeah because sora kind of sucks. It's great technology, but turns out text is just a little bit easier to generate than 3D videos.
Once sora gets good, you bet your ass they will.
Like Napster et al, their data sets make copies of hundreds of GB of copyrighted works without authors' permission. Ex: The Pile, Commons Crawl, Refined Web, Github Pages. Many copyrighted works on the Internet also have strict terms of use. Some have copyright licenses that say personal use only or non-commercial use.
So, like many prior cases, just posting what isn't yours on HughingFace is already infringement. Copying it from HF to your training cluster is also infringement. It's already illegal until we get laws like Singapore's that allow copyrighted works. Even they have a weakness in the access requirement which might require following terms of use or licenses in the sources.
Only safe routes are public domain, permissive code, and explicit licenses from copyright holders (or those with sub-license permissions).
So, what do you think about the argument that making copies of copyrighted works violates copyright law? That these data sets are themselves copyright violations?
AI is capable of reproducing copyright (motte) therefore training on copyright is illegal (bailey).
Humans are capable of reproducing copyright illegally, but we allow them to train on copyrighted material legally.
Perhaps measures should be taken to prevent illegal reproduction, but if that's impossible, or too onerous, there should be utilitarian considerations.
Then the crux becomes a debate over utility, which often becomes a religious debate.
Those extra steps are meaningfully different. In your description, a casual observer could compare the two JPEGs and recognize the inferior copy. However, AI has become so advanced that such detection is becoming impossible. It is clearly voodoo.
You can try and argue that a compression algorithm is some kind of copy of the training data, but that’s an untested legal theory.
Most artists can readily violate copyright, that doesn't me we block them from seeing copyright.
This can only be referring to training, the models themselves are a rounding error in size compared to their training sets.
However, when an LLM does the same, people now what it to be illegal. It seems pretty straightforward to apply existing copyright law to LLMs in the same way we apply them to humans. If the actual text they generate is substantially similar to a source material that it would constitute a copyright violation if a human were to have done it, then it should be illegal. Otherwise it should not.
edit: and in fact it's not even whether an LLM reproduces text, it's wether someone subsequently publishes that text. The person publishing that text should be the one taking on the legal hit.
The AI companies will likely be arguing that they don’t need a license, so any terms of use in the license are irrelevant.
You can probably find a good number of expert programmer + patent lawyers. And presumably some of those osmose enough copyright knowledge from their coworkers to give a knowledgeable answer.
At the end of the day though, the intersection of both doesn't matter. The lawyers win, so what really matters is who has the pulse on how the Fed Circuit will rule on this
Also in this specific case from the article, it's irrelevant?
"When a model is deployed for purposes such as analysis or research… the outputs are unlikely to substitute for expressive works used in training. But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."
There are college kids with bigger "copyright collections" than that...
Disk size is irrelevant. If you lossy-compress a copyrighted bitmap image to small JPEG image and then sell the JPEG image, it's still copyright infringement.
Some try to make the argument of "but that's what humans do and it's allowed", but that's not a real argument as it has not been proven, nor it is easy to prove, that machine learning equates human reasoning. In the absence of evidence, the law assumes NO.
Also Big Tech: We added 300.000.000 users worth of GTM because we trained in the 10 specific anime movies of Studio Ghibli and are selling their style.
(Raises $10 billion based on estimated worth of the resulting models.)
"We can't share the GPT4 prettaining data or weights because they're trade secrets that generate over a billion in revenue for us."
I'll believe they're worth nothing when (a) nobody is buying AI models or (b) AI companies stop using the copyrighted works to train models they sell. So far, it looks like they're lying about the worth of the training data.
If it's illegal to know the entire contents of a book it is arbitrary to what degree you are able to codify that knowing itself into symbols.
If judges are permitted to rule here it is not about reproduction of commercial goods but about control of humanity's collective understanding.
That is not to say that we shouldn't do the right thing regardless, but I do think there is a feeling of "who is going to rule the world in the future?" tha underlies governmental decision-making on how much to regulate AI.
I'm not sure at all what China will do. I find it likely that they'll forbid AI at least for minors so that they do not become less intelligent.
Military applications are another matter that are not really related to these copyright issues.
Yeah..
you tax it if the "chatgpt" is foreign.
what if they route through third countries?
This isn't some new phenomenon. We do indeed seize assets from buyers if the seller stole them.
Even (especially?) the military is a dumpster fire but it's at least very good at doing what it exists to do.
I mean, name 2 things anyone owns that aren't dumpster fires?
Long time ago industrial engineers used to say, "Even Toyota has recalls."
Something being a dumpster fire is so common nowadays that you really need a better reason to argue in support of a given entity's ownership. (Or even non-ownership for that matter.)
That said, there are plenty of successful government actions across the world, where Europe or Japan probably have a good advantage with solid public services. Think streets, healthcare, energy infrastructure, water infrastructure, rail, ...
The difference here is that we have people like yourself: those who have zero faith in our government and as such act as double agents or saboteurs. When people such as yourself gain power in the legislator they "starve the beast". Meaning, purposefully deconstruct sections of our government such that they have justification for their ideological belief that our government doesn't work.
You guys work backwards. The foregone conclusion is that government programs never work, and then you develop convoluted strategies to prove that.
1. The National Weather Service. Crown jewel and very effective at predicting the weather and forecasting life threatening events.
2. IRS, generally very good at collecting revenue. 3. National Interagency Fire Service / US Forest service tactical fire suppression
4. NTSB/US Chemicals Safety Board - Both highly regarded.
5. Medicare - Basically clung to with talons by seniors, revealed preference is that they love it.
6. DOE National Labs
7. NIH (spicy pick)
8. Highway System
There are valid critiques of all of these but I don’t think any of them could be universally categorized as a complete dumpster fire.
2) state parks are pretty rad.
Even saying the military is a dumpster fire isn't accurate. The military has led trillions of dollars worth of extraction for the wealthy and elite across the globe.
In no sane world can you say that the ability to protect GLOBAL shipping lanes as a failure. That one service alone has probably paid for itself thousands of times.
We aren't even talking about things like public education (high school education use to be privatized and something only the elites enjoyed 100 years ago; yes public high school education isn't even 100 years old) or libraries or public parks.
---
I really don't understand this "gobermint iz bad" meme you see in tech circles.
I get more out of my taxes compared to equivalent corporate bills that it's laughable.
Government is comprised of people and the last 50 years has been the government mostly giving money and establishing programs to the small cohorts that have been hoarding all the wealth. Somehow this is never an issue with the government however.
Also never understand the arguments from these types either because if you think the government is bad then you should want it to be better. Better mostly meaning having more money to redistribute and more personal to run programs, but it's never about these things. It's always attacking the government to make it worse at the expense of the people.
Library of Congress
National Park Service
U.S. Geological Survey (USGS)
NASA
Smithsonian Institution
Centers for Disease Control and Prevention (CDC)
Social Security Administration (SSA)
Federal Aviation Administration (FAA) air traffic control
U.S. Postal Service (USPS)
Nothing. You don't even need the LLC. I don't think anyone got prosecuted for only downloading. All prosecutions were for distribution. Note that if you're torrenting, even if you stop the moment it's finished (and thus never goes to "seeding"), you're still uploading, and would count as distribution for the purposes of copyright law.
>Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers
Sounds like they used a VPN, set the upload speed to 1kb/s and stopped after the download is done. If the average Joe copied that setup there's 0% chance he'd get sued, so I don't really see a double standard here. If anything, Meta might get additional scrutiny because they're big enough of a target that rights holders will go through the effort of suing them.
Citation needed. RIAA used to just watch torrents and sent cease and desists to everyone who connected, whether for a minute or for months. It was very much a dragnet, and I highly doubt there was any nuance of "but Your Honor, I only seeded 1MB back so it's all good".
Or C) large corporations (and the wealthy) do whatever they want while you still get extortion letters because your kid torrented a movie.
They really do get to have their cake and eat it too, and I don't see any end to it.
"have their cake and eat it too" allegations only work if you're talking about the same entity. The copyright maximalist corporations (ie. publishers) aren't the same as the permissive ones (ie. AI companies). Making such characterizations make as much sense as saying "citizens don't get to eat their cake and eat it too", when referring to the fact that citizens are anti-AI, but freely pirate movies.
Can you link to the exact comments he made? My impression was that he was upset at the fact that they broke T&C of openai, and deepseek's claim of being much cheaper to train than openai didn't factor in the fact that it requried openai's model to bootstrap the training process. Neither of them directly contradict the claim that training is copyright infringement.
Musicians remain subject to abuse by the recording industry; they're making pennies on each dollar you spend on buying CDs^W^W streaming services. I used to say, don't buy that; go to a concert, buy beer, buy merch, support directly. Nowadays live shows are being swallowed whole through exclusivity deals (both for artists and venues). I used to say, support your favourite artist on Bandcamp, Patreon, etc. But most of these new middlemen are ready for their turn to squeeze.
And now on top of all that, these artists' work is being swallowed whole by yet another machine, disregarding what was left of their rights.
What else do you do? Go busking?
In the end this all comes down to needing the people to care enough.
As did Disney, apparently.
>what use is regulation if you can just buy it?
I don't like it either, but it still comes down to the same issues. We vote in people who can be bought and don't make a scandal out of it when it happens. The first step to fixing that corruption is to make congress afraid of being ousted if discovered. With today's communication structure, that's easier than ever.
But if the people don't care, we see the obvious Victor.
For national security reasons I'm perfectly fine with giving LLMs unfettered access to various academic publications, scientific and technical information, that sort of thing. I'm a little more on the fence about proprietary code, but I have a hard time believing there isn't enough code out there already for LLMs to ingest.
Otherwise though, what is an LLM with unfettered access to copyrighted material better at vs one that merely has unfettered access to scientific / technical information + licensed copyrighted material? I would suppose that besides maybe being a more creative writer, the other LLM is far more capable of reproducing copyrighted works.
In effect, the other LLM is a more capable plagiarism machine compared to the other, and not necessarily more intelligent, and otherwise doesn't really add any more value. What do we have to gain from condoning it?
I think the argument I'm making is a little easier to see in the case of image and video models. The model that has unfettered access to copyrighted material is more capable, sure, but more capable of what? Capable of making images? Capable of reproducing Mario and Luigi in an infinite number of funny scenarios? What do we have to gain from that? What reason do we have for not banning such models outright? Not like we're really missing out on any critical security or economic advantages here.
If I'm learning about kinematics maybe it would be more effective to have comparisons to Superman flying faster than a speeding bullet and no amount of dry textbooks and academic papers will make up for the lack of such a comparison.
This is especially relevant when we're talking about science-fiction which has served as the inspiration for many of the leading edge technologies that we use including stuff like LLMs and AI.
A reasonable compromise then is that you can train an AI on Wikipedia, more-or-less. An AI trained this way will have a robust understanding of Superman, enough that it can communicate through metaphor, but it won't have the training data necessary to create a ton of infringing content about Superman (well, it won't be able to create good infringing content anyway. It'll probably have access to a lot of plot summaries but nothing that would help it make a particularly interesting Superman comic or video).
To me it seems like encyclopedias use copyrighted pop culture in a way that constitutes fair use, and so training on them seems fine as long as they consent to it.
Oh really ? They didn't had any problem when people installed copyrighted Windows to come after them. BSA. But now Microsoft turns a blind eye because it suits them.
It's annoying to see the current pushback against China focusing so much on inconsequential matters with so much nonsense mixed in, because I do think we do need to push back against China on some things.
Interesting, but everyone is mining copyrighted works to train AI models.
https://www.theguardian.com/technology/2012/sep/11/minnesota... [1]
>The RIAA accused her of downloading and distributing more than 1,700 music files on file-sharing site KaZaA
Emphasis mine. I think most people would agree that whatever AI companies are doing with training AI models is different than sending verbatim copies to random people on the internet.
I think most artist who had their works "trained by AI" without compensation would disagree with you.
Yes. An artist's style can and sometimes is their IP.
The US Supreme Court disagrees, the right of publicity and intellectual property law are explicitly linked.
> The broadcast of a performer’s entire act may undercut the economic value of that performance in a manner analogous to the infringement of a copyright or patent. — Justice White
Again, show me an example where an artist's style was used for copyright infringement in court. Can you produce even one example?
All right of publicity laws are intellectual property laws but not all intellectual property laws are right of publicity laws.
All copyright laws are intellectual property laws but not all intellectual property laws are copyright laws.
Right of publicity laws are intellectual property laws because the right of publicity is intellectual property. I don't know how else to articulate this over the internet, maybe its time to consult an AI?
This article is literally about the copyright office finding AI companies violating copyright law by training their models on copyrighted material. I'm not even sure what you're arguing about anymore.
My opinion on the matter at hand is this: Artists who complain about GenAI use the hypothetical that you mentioned, where if you can accurately recreate a copyrighted work through specific model usage, then any distribution of the model is a copyright violation. That's why, according to the argument, fair use does not apply.
The real problem with that is that there's a mismatch between the fair use analysis and the actual use at issue. The complaining artists want the fair use inquiry to focus on the damage to the potential market to works in their particular style. That's where the harm is according to them. However, what they use to even get into that stage is the copyright infringement allegation that I described earlier: that the models contain their works on a fixed manner which can be derived without permission.
Not to mention the fact that this position means putting the malicious usage of the models for outright copyright infringement at the output level above the entire class of new works that can be created by its usage. It's effectively saying "because these models can technically be used in an infringing way, it infringes our copyright and any creative potential that these models could help with are insignificant in comparison to that simple fact. Of course, that's not the actual real problem, which is that they output completely new works that compete with our originals, even when they aren't derivatives of, nor substantially similar to, any individual copyrighted work".
Here's a very good article outlining my position in a more articulate way: https://andymasley.substack.com/p/a-defense-of-ai-art
[1] used purely as an example
Ok, how about training AI on leaked Windows source code ?
She was training RI (real intelligence). Is now relevant ? Or does she has to be rich and pay some senators to be relevant ?
Source: https://futurism.com/the-byte/facebook-trained-ai-pirated-bo...
I honestly can't see how this directly addresses fair use, it's a odd sweeping statement. It implies inventing something that borrows little from many different copyrighted items is somehow not fair use? If it was one for one yes, but it's not it's basically saying creativity is not fair use. If it's not saying this and refers to competition in the existing market they're making a statement about the public good, not fair use. Basically a matter for legislators and what the purpose of copyright is.
andy99•9mo ago
kklisura•9mo ago
They acknowledge the issue is before courts:
> These issues are the subject of intense debate. Dozens of lawsuits are pending in the United States, focusing on the application of copyright’s fair use doctrine. Legislators around the world have proposed or enacted laws regarding the use of copyrighted works in AI training, whether to remove barriers or impose restrictions
Why did they write the finding: I assume it's because it's their responsibility:
> Pursuant to the Register of Copyrights’ statutory responsibility to “[c]onduct studies” and “[a]dvise Congress on national and international issues relating to copyright,”...
All excerpts are from https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
_heimdall•9mo ago
Sure the courts may find its out of their jurisdiction, but they should act as they see fit and let the courts settle that later.
bgwalter•9mo ago
Why could a copyright office not advise the congress/senate to enact a law that forbids copyrighted material to be used in AI training? This is literally the politicians' job.
9283409232•9mo ago