https://authorsguild.org/advocacy/artificial-intelligence/wh...
One is a large volume of pirated content used to train models.
Another is models reproducing copyrighted materials when given prompts.
In other words there's the input issue and the output issue and those two issues are separate.
You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc, but a transformer model is not human, and very philosophically and economically importantly, human brains can’t be copied and scaled.
Also worth noting that, if a person performs a copyrighted work from memory - like a poem, a play, or a piece of music - that's a copyright violation. "I didn't copy it, I memorized it" isn't the get-out-of-jail-free card some people think it is.
The means of reproduction are immaterial; what matters is whether a specific use is permitted or not. That a reproduction of a work is found to be infringing in one context doesn't mean it is always infringing in all contexts; conversely, that a reproduction is considered fair use doesn't mean all uses of that reproduction will be considered fair.
Perhaps a section on what the differences are might be helpful. For example what role does style play in the summary. I dont think that the summary of wiki is in the style of George R Martin.
Not a lawyer, but the answer seems to obviously be that one is a commercial reproduction and the other is not. Seems like it would be a tougher questiom if the synopsis was in a set of Encyclopedia Britannica or something.
AI is clearly reproducing work for commercial purposes... ie reselling it in a new format. LLMs are compression engines. If I compress a movie into another format and sell DVDs of it, that's a pretty obvious violation of copyright law. If I publish every 24th frame of a movie in an illustrated book, that's a clear violation, even if I blur things or change the color scheme.
If I describe to someone, for free, what happened in a movie, I don't see how that's a violation. The premise here seems wrong.
Something else: Even a single condensation sold for profit only creates one new copyright itself. LLMs wash the material so that they can generate endless new copyrighted material that's derivative of the original. Doesn't that obliterate the idea of any copyright at all?
For further reading, see: https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors
First it was library genesis and z-lib when meta torrented 70TB of books and then pulled off the ladder, recently it was Anna's archive and how they are coming for it (google and others), weird behaviors with some other torrent sites, now also Wikipedia is being used as a tool to defend LLMs breaking any semblance of copyright "law" unpunished.
All these actions will end up with very bad repercusions once the bubble bursts, there will be a lot of explaining to do.
> Judge Stein’s order doesn’t resolve the authors’ claims, not by a long shot. And he was careful to point out that he was only considering the plausibility of the infringement allegation and not any potential fair use defenses. Nonetheless, I think this is a troubling decision that sets the bar on substantial similarity far too low.
Personally i'm not worried.
OpenAI outputs are an algorithm compressing text.
A jpeg thumbnail of an image is smaller but copyright-wise identical.
An OpenAI summary is a mechanically generated smaller version, so new creative copyright does not have a chance to enter in
Additionally, if human summaries aren't copyright infringement, you can train LLMs on things such as the Wikipedia summaries. In this situation, they're still able to output "mechanical" summaries - are those legal?
Also there is fair use gray area. Unlike Wikipedia, ClosedAI is for profit to make money from this stuff and people using generated text do it for profit.
People usually say contemporary media sucks because of commercial pressures, but those commercial pressures and conditions wouldn't exist without the expansion of copyright.
Yes, giant studios are struggling to introduce new ideas like 1993's Jurassic Park. But that doesn't mean Shane Carruth (of Primer fame) can't. And he could have if Jurassic Park had been released any time between 1790 and 1900.
Our stilted media landscape is directly downstream of prior legislation expanding copyright.
Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.
Even though Steamboat Willie has entered the public domain, Disney has been going after folks using the IP, https://mickeyblog.com/2025/07/17/disney-is-suing-a-hong-kon... / https://mickeyblog.com/2025/07/17/disney-is-suing-a-hong-kon...
The "infringement" in this case was a diamond encrusted Steamboat Willie style Mickey pendant.
Questionable taste aside, I think it's good for society if people are able to make diamond encrusted miniature sculptures of characters from a 1928 movie in 2025. But Disney clearly disagrees.
Disney (and other giant corps) will use every tool in their belt to go after anyone who comes close to their money makers. There has been a long history of tension between artists and media corps. But that's water under the bridge now. AI art is apparently so bad that artists are willing to hand them the keys to their castle.
Nor should they.
Exactly. I always thought it was hilarious that, ever since LLMs and image generators like Stable Diffusion came online a few years ago, HN suddenly seemed to shift from the hacker ethos, of moving fast and breaking things, and using whatever you could for your goals, to one of being an intense copyright hawk, all because computers could now "learn."
This made me wonder about an alternate future timeline where IP law is eventually so broad and media megacorporations are so large that almost any permutation of ideas, concepts or characters could be claimed by one of these companies as theirs, based on some combination of stylistic similarities and using a concept similar to what they have in their endless stash of IP. I wonder what a world like that would look like. Would all expression be suppressed and reduced to the non-law-abiding fringes and the few remaining exceptions? Would the media companies mercifully carve out a thin slice of non-offensive, corporate-friendly, narrow ideas that could be used by anyone, putting them in control of how we express ourselves? Or would IP violation become so common that paying an "IP tax" be completely streamlined and normalized?
The worst thing is that none of this seems like the insane ramblings that it would've probably been several decades ago. Considering the incentives of companies like Disney, IP lawyers and pro-copyright lawmakers, this could be a future we get to after a long while.
area51org•1h ago
throwaway-0001•1h ago
Non for profit just means there is no dividends to owners but they can very well get huge salaries. So actually non for profit is a very bad name.
Should be called non dividend company.
cwillu•1h ago
charcircuit•58m ago
CGamesPlay•52m ago
https://www.cnbc.com/2025/10/28/open-ai-for-profit-microsoft...
nightshift1•49m ago
mmooss•55m ago
hxtk•17m ago
One of them is the purpose or character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes.
o11c•50m ago
As a reminder, the 4 factors of "fair use" in the United States:
1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
2. the nature of the copyrighted work;
3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
4. the effect of the use upon the potential market for or value of the copyrighted work.
txrx0000•25m ago
There would be no problem if they open-sourced everything including the model weights. That was their original mission which they have abandoned.