That's... refreshingly normal? Surely something most people acting in good faith can get behind.
LLMs are not persons, not even legal ones (which itself is a massive hack causing massive issues such as using corporate finances for political gain).
A human has moral value a text model does not. A human has limitations in both time and memory available, a model of text does not. I don't see why comparisons to humans have any relevance. Just because a human can do something does not mean machines run by corporations should be able to do it en-masse.
The rules of copyright allow humans to do certain things because:
- Learning enriches the human.
- Once a human consumes information, he can't willingly forget it.
- It is impossible to prove how much a human-created intellectual work is based on others.
With LLMs:
- Training (let's not anthropomorphize: lossily-compressing input data by detecting and extracting patterns) enriches only the corporation which owns it.
- It's perfectly possible to create a model based only on content with specific licenses or only public domain.
- It's possible to trace every single output byte to quantifiable influences from every single input byte. It's just not an interesting line of inquiry for the corporations benefiting from the legal gray area.
Surely the person doing so would be responsible for doing so, but are they doing anything wrong?
You're perfectly at liberty to relicense public domain code if you wish.
The only thing you can't do is enforce the new license against people who obtain the code independently - either from the same source you did, or from a different source that doesn't carry your license.
If I use public domain code in a project under a license, the whole work remains under the license, but not the public domain code.
I'm not sure what the hullabaloo is about.
If your license allows others to take the code and redistribute it with extra conditions, your code can be imported into the kernel. AFAIK there are parts of the kernel that are BSD-licensed.
LLM-creation ("training") involves detecting/compressing patterns of the input. Inference generates statistically probable based on similarities of patterns to those found in the "training" input. Computers don't learn or have ideas, they always operate on representations, it's nothing more than any other mechanical transformation. It should not erase copyright any more than synonym substitution.
How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.
The whole use it but if it behaves as expected, it’s your fault is a ridiculous stance.
You can’t say “you can do this thing that we know will cause problems that you have no way to mitigate, but if it does we’re not liable”. The infringement was a foreseeable consequence of the policy.
I'm not talking about maintainability or reliability. I'm talking about legal culpability.
Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?
https://spdx.org/licenses/GPL-2.0-only.html It's a specific GPL license (as opposed to GPL 2.0-later)
LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].
The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which
1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies
2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.
I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.
Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.
[0]: https://news.ycombinator.com/item?id=47356000
[1]: http://prize.hutter1.net/
[2]: https://en.wikipedia.org/wiki/ELIZA_effect
[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...
I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section.
We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming.
Humans for humans!
Don't let skynet win!!!
This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.
It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.
2. Infringement in closed source code isn’t as likely to be discovered
3. OpenAI and Anthropic enterprise agreements agree to indemnify (pay for damages essentially) companies for copyright issues.
bitwize•1h ago