The use of paywalled scientific articles to train AI is one place where I think we have to just draw the line and say, this has to be allowed or US AI is simply going to get gutted and replaced by international competitors who have no respect for copyright law.
Sorry but this is just a competitive reality and the content matters A LOT. Sucks that Elsevier gambled badly on the scientific community putting up with overpriced subscriptions forever, but their concerns can't dictate national policy on this.
kmeisthax•54m ago
I agree, but only in the sense that I think any amount of copyright protection for scientific papers is absolutely absurd. The creativity involved in papers is minimal and a good chunk of that research is funded by the government, so paywalling it is criminally unethical.
Also, if we're going to bin the entire concept of copyright, can we at least be equal about it? I'd rather not live in a world where humans labor for the remnants of their culture in the content mines while clankers[0] feast on an endless stream of training data.
[0] Fake racial slur for robots or other AI systems.
zzo38computer•46m ago
I agree. I think that copyright should be abolished entirely, especially for scientific articles (if they are good quality scientific research then I think they would be too important to be copyrighted, in addition to the other stuff you mention), but also for anything else too.
Nevertheless I thin there is another thing against the LLM training, which is that the scraping seems to be excessive (although it could be made less excessive; there are many ways to help with making it less excessive) and I think it requires too much power (although I don't really know a lot about it).
These are two separate issues, though.
jruohonen•3m ago
> I think that copyright should be abolished entirely, especially for scientific articles
You know, it is really the CC-BY-style most science people care about. Same goes with MIT/BSD open source licenses, while with GPL I suppose it is one the side of CC-BY-SA.
bpodgursky•1h ago
Sorry but this is just a competitive reality and the content matters A LOT. Sucks that Elsevier gambled badly on the scientific community putting up with overpriced subscriptions forever, but their concerns can't dictate national policy on this.
kmeisthax•54m ago
Also, if we're going to bin the entire concept of copyright, can we at least be equal about it? I'd rather not live in a world where humans labor for the remnants of their culture in the content mines while clankers[0] feast on an endless stream of training data.
[0] Fake racial slur for robots or other AI systems.
zzo38computer•46m ago
Nevertheless I thin there is another thing against the LLM training, which is that the scraping seems to be excessive (although it could be made less excessive; there are many ways to help with making it less excessive) and I think it requires too much power (although I don't really know a lot about it).
These are two separate issues, though.
jruohonen•3m ago
You know, it is really the CC-BY-style most science people care about. Same goes with MIT/BSD open source licenses, while with GPL I suppose it is one the side of CC-BY-SA.