Brian Keene, Abdi Nazemian and Stewart O'Nan said their works were part of a dataset of about 196,640 books that helped train NeMo to simulate ordinary written language, before being taken down in October "due to reported copyright infringement."
They are seeking unspecified damages for people in the United States whose copyrighted works helped train NeMo's so-called large language models in the last three years.
https://www.reuters.com/technology/nvidia-is-sued-by-authors...
1vuio0pswjnm7•3w ago
https://storage.courtlistener.com/recap/gov.uscourts.cand.42...
lioeters•3w ago
> I am on the data strategy team at NVIDIA, we are exploring including Anna's Archive in pre-training data for our LLMs.
> We are figuring out internally whether we are willing to accept the risk of using this data, but would like to speak with your team to get a better understanding of LLM-related work you have done.