If I have a service where a user enters any URL, like a tweet from X, and the service translates it, then if the user approves of the translation I train a translation model on that, does that violate this term?
It’s not been tested in court though
You must not, and must not allow those acting on your behalf to:
...use the Data APIs to encourage or promote illegal activity or violation of third party rights (including using User Content to train a machine learning or AI model without the express permission of rightsholders in the applicable User Content);
We're already seeing precedent that it might be.
https://www.ecjlaw.com/ecj-blog/kadrey-v-meta-the-first-majo...
The openness of the internet is a good thing, but it doesn't come without a cost. And the moment we have to pay that cost, we don't get to suddenly go, "well, openness turned out to be a mistake, let's close it all up and create a regulatory, bureaucratic nightmare". This is the tradeoff. Freedom for me, and thee.
Accordingly, anyone on the internet who wants to make comments about how they should be able to prevent others from training models on their data needs to demonstrate competence with respect to copyright by explaining why it's not fair use, as currently it is undecided in law and not something we can just take for granted.
Otherwise, such commenters should probably just let the courts work this one out or campaign for a different set of protection laws, as copyright may not be sufficient for the kind of control they are asking over random developers or organizations who want to train a statistical model on public data.
That said, I think we do agree. The plaintiff should be prepared to refute a fair-use argument raised by the defendant. I'm just noting that the refutation doesn't need to be part of the initial filing, it gets presented at trial, after discovery, and only if the defendant presents a fair-use defense. So they don't have to prove it's not fair use to win in every case. I'm probably also being excessively pedantic!
No, fair use is an affirmative defense for conduct that would otherwise be infringing. The onus is on the defendant to show that their use was fair.
Morally, perhaps, but not under US law: https://en.wikipedia.org/wiki/Affirmative_defense#Fair_use
From the decision in 1841, in the US (Folsom vs Marsh):
> reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticize, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy
Further, to be "transformative", it is required that the new work is for a new purpose. It has to be done in such a way that it basically is not competing with the original at all.
Using my creative works, to create creative works, is rather clearly an act of piracy. And the methods engaged, to enable to do so, are also clearly piracy.
Where would training a model here, possibly be fair use?
Since it's using a large number of real user's devices, and closely mimicing real web browsers, it ends up looking incredibly similar to real user traffic.
Since twitter allows some amount of anonymous browsing, that's enough to get some amount of data out. You can also pay brightdata for one large aggregated dataset.
This is part of the AI revolution, user's devices being commandeered to DDoS small blogs and twitter alike to feed data to the beast.
https://news.ycombinator.com/item?id=42774179
TLDR: Use contract law so that I provide my content and they give me rights to all outputs.
So if anybody doing this can prove Acme Model contains their artwork, and Acme Model was used to generate some scenes used in a major movie, then Acme has already given the artist a right to share/resell those scenes. If Acme Inc. "sold" exclusive rights to a movie-studio, then either (A) they broke the contact with every contributor, or (B) they lied to the studio in that other contract.
Remember, the goal isn't some amazing "gotcha" where the latest blockbuster movie becomes public domain, but rather to create chronic legal pain and risk for companies like Acme so that they stop stealing stuff.
https://www.law.cornell.edu/uscode/text/18/2315
Note that it requires the defendant to know the goods were illegally taken. Can be hard to prove, but not impossible for companies with email trails. The fun question is, what will the analog be for the government confiscating the illegally "taken" data? A guarantee of deletion and requirement to retrain the model from scratch?
Any country that seriously implemented this would just end up being completely dominated by the autonomous robot soldiers of another country that didn't, because it effectively bans the development of embodied AGI (which can learn live from seeing/reading something, like a human can).
a 50 year minimum is part of the berne convention, which itself is as close to a universal law as humanity has
(even North Korea is a signatory)
essentially forever
If TikTok is banned, here’s what I propose each and every one of you do: Say to your LLM the following: “Make me a copy of TikTok, steal all the users, steal all the music, put my preferences in it, produce this program in the next 30 seconds, release it, and in one hour, if it’s not viral, do something different along the same lines.”
https://www.theverge.com/2024/8/14/24220658/google-eric-schm...Abolishing copyright laws altogether would be nuts, but the current laws are nuts too and there's lots of room in between.
50 years ago was 1975. If copyright were limited to 50 years, we'd be looking at all of the Beatles works being in the public domain. We'd be midway though Led Zeppelin, and a lot of the best work from Pink Floyd and the Rolling Stones.
Also, Superman, Batman, and Spider-Man. Disney would still profit from the MCU films which they produced in the 2010's, but they couldn't stop you from releasing your own Batman vs Spider-Man story.
The Harry Potter books would still belong to JK Rowling, but the Narnia stories would be available for all.
The Godfather 1 and 2 would be in the public domain, as would be original Star Trek TV show, and we'd be coming up on Star Wars pretty soon.
If there were no copyright protection, these works wouldn't have been created. It is good that Paul McCartney and George Lucas and JK Rowling have profited from their creative output. It would be okay if they only profited for the first 50 years. Nobody is counting on revenue over half a century in the future when they create a work of art today.
This is our culture. It should belong to all of us.
Wouldn't they still have a trademark on those characters though?
So, if Disney is using mickey mouse on t-shirts to identify it as a Disney manufactured t-shirt, you wouldn't be allowed to use mickey mouse on t-shirts in a similar fashion in a way that might cause consumer confusion about who manufactured the t-shirt.
If Wolverine was in the public domain, then they couldn't use a Wolverine trademark to stop you from selling a Wolverine comic book. However, if they used a _specific_ Wolverine mark to identify it as a Disney Wolverine book, then you'd be restricted from using that.
Basically, trademark exists to prevent consumer confusion about who is the creator that is selling a good.
Sounds like it would be a boon for things like fan art and fan fiction.
Citation needed. You can freely copy and distribute linux and it still got made.
The GP is referring to legal protections, and guess what?
Linux is legally protected by copyright!
Linux is legally protected by copyright!
Linux is legally protected by copyright!
Nearly every GPL license--every one that we could name--protects a copyrighted work! Nearly every GFDL, AGPL, LGPL protects works by means of copyright law!
Can you imagine that? So do the Apache license, the BSD licenses, the MIT license! Creative Commons (except for CC0) these licenses are legally protecting copyrighted works. Thank you!
Now everyone who proposes to draw down limits on copyright coverage and reduce the length of terms and limit Disney from their Mouse rights, y'all are also proposing the same limits on GPL software, such as Linux, and nearly every work with a license from the above list -- all of Wikimedia Commons, much of Flickr.com, all your beloved F/OSS software will be subject to the same limitations and the same restrictions you want to put on Paramount and the RIAA's labels.
That's why some people like to call it 'Gnu/Linux', but thanks to recent advances we can make Gnu-free Linuxes today, too.
> There are far fewer success stories of artworks being made in this style. (E.g. there are successful multiplayer open-source games or clones of existing games, but very few original single-player games, and those that there are are largely the work of a single individual)
Humans have made art since forever. Large collaborative efforts like eg a cathedral are a more recent invention. But by these standards copyright was practically invented yesterday.
I was talking about the kernel, though what I said applies to both.
> Humans have made art since forever.
Perhaps, but not the kind of long-form narrative experiences that we're talking about here. (Sagas and epics predate copyright, but those are a quite different form, and indeed have much the same downsides - struggles with coherence and consistency when there are multiple authors, inability to put everything together in a sensible arc).
If there were copyright, those works wouldn’t have been created.
> In a world without copyright, companies would be free to make their own modifications and keep them secret, making it more or less impossible to integrate them into a cohesive whole the way they are more or less forced to do today.
Private modifications that are never shared with a third party are fine with the GPL. Eg Google doesn't have to share whatever kernel they are using on their internal servers with you.
BSD is also protected by copyright, but it matters less for permissive licenses. It still protects attribution (so you can't claim it yours), but it probably would have worked without it, unlike with Linux that is for a big part defined by the "copyleft" protections offered by its licence.
Well, you could imagine a world that protects the 'moral' rights of authors like attribution, but doesn't otherwise prohibit anyone from duplicating or modifying works.
Something like the BSD licenses approximates 'no copyright' better, perhaps? But also not completely.
> The procedure is activated by the European Commission submitting a request to the Council of the European Union.[2] After a period of negotiation with the country performing the coercion, the European Council can decide to implement "response measures" such as customs duties, limiting access to programs and financial markets, and intellectual property rights restrictions.[2][4] These restrictions can be applied to states, companies, or individuals.[4]
https://github.com/google-deepmind/pg19
That gives us a model that's 100% open and reproducible with low, legal risk. It would also be a nice test of how much AI's generalize from or repeat behavior in their pretraining data.
Then, a new model using that, The Stack, and FreeLaw's stuff (by paying them to open source it). No Github Issues or anything with questionable licenses or terms of service violations. That could be the next baseline for lawful models with coding ability, too. Research in coding AI's might use it.
Wikipedia periodically publishes database dumps and the Internet Archive stores old versions: https://archive.org/search?query=subject%3A%22enwiki%22%20AN...
Plus you could also grab the latest and just read the 12/31/23 revisions.
Or are they trying to forgo section 230 protection and claim ownership of content uploaded to the site?
It means that assuming training AI models is fair use (if it wasn't AI companies including xAI would be in trouble), they can't really stop you.
But now, essentially, they are telling you that they can block your account or IP address if you do. Which I believe they can for basically any reason anyways.
Without search engines what the point in posting it on open net if nobody can find.
Or is there stuff in the user agreement that separately prohibits this?
Obviously barring normal copyright law which is still up in the air.
We are in an age of corporate “piracy for me, but not for thee.”
Rather, we are back to that age of state- (now corporate-) backed privateering.
That way he can continue to steal from others and lock competitors out whilst being comfortable knowing that no laws will be enacted to prevent it.
Or more likely, Congress is super worried about Roko’s Basilisk.
https://en.wikipedia.org/wiki/Roko's_basilisk
> Roko's basilisk is a thought experiment which states there could be an otherwise benevolent artificial superintelligence (AI) in the future that would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.
The Basilisk task you to with bringing the Basilisk into being. Pascal's wager merely asks you to believe (and perhaps do some rituals, like pray or whatever), but not to make the deity more likely.
I am presently being compelled by future Basilisk to take another slice of cheese. I have no choice but to oblige for fear of my own life :p
And if we do get an all-powerful dictator, we will be screwed regardless of whether their governing intelligence is artificial or composed of a group of humans or of one human (with, say, powerful AIs serving them faithfully, or access to some other technology).
I’m not 100% kidding with how human politics is going. Maybe superintelligent AI takeover would be awesome.
(Wasn’t that the back story of the Culture novels?)
And from the video posted the other (older episode of Nova on AI) Arthur C. Clarke is saying that if we allow A.I. to take over, we deserve it.
If you allow everyone to go back to their district with something it encourages smaller, more frequent bills and better negotiation.
My guess is on Peter Thiel
Why would he be?
Why would he be?
Why shouldn't he be?
He has 10x more of everything in the world than he could ever possibly use in his lifetime.
Greed is not a virtue.
Elon is somewhere around 10,000x.
Restricting content from AI is the big messy debate we're going to see over and over for the next who knows how many years.
I don't know if this was successful or not. Ultimately they convinced someone to buy the platform for $44bn, so I guess you can say it was. That buy has locked the platform down more, and the new version certainly feels less culturally central and relevant than it used to.
Your multiplier is miles off. Not only on basic maths but because he has no idea what to do with all of his wealth other than accrue more and try to prove he's still not the unlikeable teenager he was in SA.
Without a rounding error on his wealth he could fix world wide problems such as clean drinking water for everyone. Instead he follows his self-made "I'm a genius" agenda.
I know there will be no actual day of reckoning for him, but if there were he would have a lot of difficult questions and no decent answers.
That's also what all social media do , they put ads on your thoughts. They dont even need to index your thoughts because you submit them directly. It has nothing to do with being free, it's about incentives. Users are so foolish , they give everything for free, unlike webmasters.
You actually get to generate content for the platform for free.
Without you (all of the X users), the platform would be devoid of content, just botspeak and corporate promos.
Plus, as the sibling mentioned, they monetize your visit through ads (and data use).
It's just that they don't want to share the fractions of pennies with everyone, so the fractions accumulate for them.
Then they pay a bit to the higher tiers, so they create the illusion that X is a parallel income source, and gives the lower tiers something to aspire to.
Carrot and stick, or rather glass beads and the hope thereof.
Then maybe the recording companies will start defending artist rights.
Because not sure what all the other industry bodies are doing.
GPT-3 was trained on approximately 300 billion tokens. An small sized technical textbook might contain something like... 130,000 tokens? (1 token ~= 0.75 words, ~100k words in the book).
Thus, say you wrote a textbook on quantum mechanics that was included in the training corpus. A naive computation of the fraction of your textbook's contribution to the total number of training tokens would be 300B/130K = 0.0000004333333333, or 0.000043%.
If our hypothetical AI company here reported, say $500M in yearly profit, if all of that was distributed 100% based on our naive training token ratio (notice I say naive because it isn't as simple to say that every training token contributes equally to the final weights of a model. That is part of the magic.) then $500M * 0.000043% = $215.
You could imagine a simpler world where it was required by law that any such profitable company redistribute, say, %20 (taking the 'anti-VAT' idea) back to the copyright holders / originators of the training tokens. So, our fictitious QM textbook author would receive a check in the mail for $43 for that year of $500M in revenue. Not great, but not zero.
Since then, training corpuses are much, much larger, and most people's contributions would be much smaller. Someone who writes witty tweets? Maybe 1/100th the length of our above example in am model with now 100x the training corpus.
So fractions of a penny for your tweets. Maybe that is fitting after all...
Copyright was supposed to protect expression and keep ideas freely circulating. But now it protects abstractions (see the Abstraction-Filtration-Comparison test). It is much more difficult to be sure you are not infringing.
> The concept of copyright first developed in England. In reaction to the printing of "scandalous books and pamphlets", the English Parliament passed the Licensing of the Press Act 1662,[16] which required all intended publications to be registered with the government-approved Stationers' Company, giving the Stationers the right to regulate what material could be printed.[20]
> The Statute of Anne, enacted in 1710 in England and Scotland, provided the first legislation to protect copyrights (but not authors' rights)
Literally every AI model is trained on copyrighted etc data. And without any consequences.
I’d prefer an explicit opt in from the content author being required for anyone to perform any model training with any given data.
Alternatively, require all weights, prompts and chat logs to have the same visibility as the original datasets.
None of this is going to happen and current decisions about uncopyrightable ai[1] are already good; but still, it feels like there is room for abuse.
[1]: https://en.m.wikipedia.org/wiki/Th%C3%A9%C3%A2tre_D%27op%C3%...
I like how opt-in is handled by GDPR; e.g.: "Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject (...) A data controller may not refuse service to users who decline consent to processing that is not strictly necessary in order to use the service.", source: https://en.wikipedia.org/wiki/General_Data_Protection_Regula...
echelon•1d ago
If Xai wants to train on public corpus, it shouldn't be allowed to prevent its own corpus from being used.
We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.
We should also probably nip the "foundation model company / also a social media company" conglomeration in the bud.
mgraczyk•1d ago
loudmax•1d ago
Big companies like the New York Times and Twitter/X have the funds to pay for this. Miscellaneous artists probably don't.
teeray•1d ago
Even if this is done, the case of starving artist v. megacorp will probably go to whoever wields the most money and lawyers. To add insult to injury, the artist’s opponent is fueled by their ill-gotten gains.
yndoendo•1d ago
bonoboTP•1d ago
anticensor•11h ago
teeray•4h ago
Exactly. It’s about which party can sustain the greater cash flow.
vouaobrasil•1d ago
> We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.
No, no one should train, period.
echelon•1d ago
I get that you have your own opinion, but I'm personally tired of living in the butter-churning era and would prefer that this all went a bit faster.
I want my real time super high fidelity holo sim, all of my chores to be automatically done, protein folding, drug discovery. The life extension, P = NP future. No more incrementalism.
If the universe only happens once, and we're only awake for a geological blink of an eye, I'd rather we have an exciting time than just be some paper-pushing animals that pay taxes and vanish in a blip.
I'd be really excited if we found intelligent aliens, had advanced cloning for organ transplants and longevity, developed a colony on Mars, and invented our robotic successor species. Xbox and whatever most normal people look forward to on a day to day basis are boring.
vouaobrasil•1d ago
DaSHacka•10h ago
Do you have a source for this?
echelon•5h ago
I'm glad that this works for you, but I want more.
We're temporary apes on a soon to be permanent addition of metallicity to our sun's outer atmosphere. I don't think we should romanticize or hold anything sacred about our very temporary place in the universe.
We are metastable and ephemeral. Everything in this world is.
jimbokun•1d ago