X changes its terms to bar training of AI models using its content

https://techcrunch.com/2025/06/05/x-changes-its-terms-to-bar-training-of-ai-models-using-its-content/

177•bundie•1d ago

Comments

echelon•1d ago

If an artist or author can't do this, social media shouldn't be able to do it either.

If Xai wants to train on public corpus, it shouldn't be allowed to prevent its own corpus from being used.

We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.

We should also probably nip the "foundation model company / also a social media company" conglomeration in the bud.

mgraczyk•1d ago

Artists can do this, and they do

loudmax•1d ago

Yes, but do artists have the ability to actually monitor and enforce this? You have to have the capacity and the wherewithal and to test these models to even know that your data is being ingested into AI.

Big companies like the New York Times and Twitter/X have the funds to pay for this. Miscellaneous artists probably don't.

teeray•1d ago

> If an artist or author can't do this, social media shouldn't be able to do it either.

Even if this is done, the case of starving artist v. megacorp will probably go to whoever wields the most money and lawyers. To add insult to injury, the artist’s opponent is fueled by their ill-gotten gains.

yndoendo•1d ago

This is dependent on country. USA, yes with their draconian methods. Countries like the UK, the looser of the suit pays all the cost. UK layers have no problem taking low wealth client cases they know will win. UK allows for David vs Goliath and David to win. US up lifts Goliath as a God.

bonoboTP•1d ago

Also in many countries legal costs are just generally lower than in the US.

anticensor•11h ago

However the loser pays vs. both parties pay isn't uniform across all possible lawsuit types even in America or in England. Adding to that, even in loser pays regimes, both parties have to pay upfront and then the winner is refunded the costs.

teeray•4h ago

> both parties have to pay upfront and then the winner is refunded the costs.

Exactly. It’s about which party can sustain the greater cash flow.

vouaobrasil•1d ago

Social media should do it to set a legal precedent.

> We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.

No, no one should train, period.

echelon•1d ago

> No, no one should train, period.

I get that you have your own opinion, but I'm personally tired of living in the butter-churning era and would prefer that this all went a bit faster.

I want my real time super high fidelity holo sim, all of my chores to be automatically done, protein folding, drug discovery. The life extension, P = NP future. No more incrementalism.

If the universe only happens once, and we're only awake for a geological blink of an eye, I'd rather we have an exciting time than just be some paper-pushing animals that pay taxes and vanish in a blip.

I'd be really excited if we found intelligent aliens, had advanced cloning for organ transplants and longevity, developed a colony on Mars, and invented our robotic successor species. Xbox and whatever most normal people look forward to on a day to day basis are boring.

vouaobrasil•1d ago

There is already a beautiful, exciting world out there full of animals and plants and we don't need AI or some computer crap to experience it. The problem is, creating all this AI and advanced technology is directly crushing that world.

DaSHacka•10h ago

> The problem is, creating all this AI and advanced technology is directly crushing that world.

Do you have a source for this?

echelon•5h ago

> There is already a beautiful, exciting world out there full of animals and plants and we don't need AI or some computer crap to experience it.

I'm glad that this works for you, but I want more.

We're temporary apes on a soon to be permanent addition of metallicity to our sun's outer atmosphere. I don't think we should romanticize or hold anything sacred about our very temporary place in the universe.

We are metastable and ephemeral. Everything in this world is.

jimbokun•1d ago

If social media can do this, an artist or author should be able to do it, too.

delichon•1d ago

> “You shall not and you shall not attempt to (or allow others to) […] use the X API or X Content to fine-tune or train a foundation or frontier model,” it reads.

If I have a service where a user enters any URL, like a tweet from X, and the service translates it, then if the user approves of the translation I train a translation model on that, does that violate this term?

yandie•1d ago

Per my experience with GenAI legal teams, that’s a no go.

It’s not been tested in court though

dyauspitr•18h ago

If you don’t want an LLM to view it don’t put it on the public internet.

matwood•1d ago

Weird this just happened. I assumed all sites with any sort of content changed their terms soon after ChatGPT hit the scene.

nailer•1d ago

Yep, from https://the-decoder.com/reddit-ends-its-role-as-a-free-ai-tr... :

You must not, and must not allow those acting on your behalf to:

...use the Data APIs to encourage or promote illegal activity or violation of third party rights (including using User Content to train a machine learning or AI model without the express permission of rightsholders in the applicable User Content);

soulofmischief•1d ago

In my eyes that is considered fair use, and I think the courts will come to agree unless they are financially incentivized to look the other way and thus create a moat for existing players at the expense of newcomers.

michaelcampbell•1d ago

"its content" indeed.

blibble•1d ago

wish I could change my terms to bar training of AI models on my content

vouaobrasil•1d ago

Same here! It should be a default. Unfortunately, the very openness of the internet is now working against us.

soulofmischief•1d ago

Why should it be a default? Can you prove that training a model on data you wrote is not fair use?

We're already seeing precedent that it might be.

https://www.ecjlaw.com/ecj-blog/kadrey-v-meta-the-first-majo...

The openness of the internet is a good thing, but it doesn't come without a cost. And the moment we have to pay that cost, we don't get to suddenly go, "well, openness turned out to be a mistake, let's close it all up and create a regulatory, bureaucratic nightmare". This is the tradeoff. Freedom for me, and thee.

baseballdork•1d ago

The burden is on the user to show that it is fair use, no? Not everyone else's responsibility to prove that it's _not_ fair use.

soulofmischief•1d ago

It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use, they have to show how it violated law, and while it's in the best interest of those organizations to make things easier for the court by showing why it is fair use, they are technically innocent until proven guilty.

Accordingly, anyone on the internet who wants to make comments about how they should be able to prevent others from training models on their data needs to demonstrate competence with respect to copyright by explaining why it's not fair use, as currently it is undecided in law and not something we can just take for granted.

Otherwise, such commenters should probably just let the courts work this one out or campaign for a different set of protection laws, as copyright may not be sufficient for the kind of control they are asking over random developers or organizations who want to train a statistical model on public data.

SAI_Peregrinus•1d ago

You've got it backwards. It's on the defendant to prove that their use is fair. The plaintiff has to prove that they actually own the copyright, and that it covers the work they're claiming was infringed, and may try to refute any fair-use arguments the defense raises, but if the defense doesn't raise any then the use won't be found fair.

soulofmischief•1d ago

It's true that the process is copyright strike/lawsuit -> appeal, but like I said, it's in their best interests to just prove that it's fair use because otherwise the judge might not properly consider all facts, only hear one side of the story and thus make a bad judgement about whether or not it is fair use. If anything, I'm just being pedantic, but we do ultimately agree here I think.

SAI_Peregrinus•8h ago

Well, lawsuits have multiple stages. First the plaintiff files the suit, and serves notice to the defendant(s) that the suit has been filed. Then there's a period where both sides gather evidence (discovery), then there's a trial where they present their evidence & arguments to the court. Each side gets time to respond to the arguments made by the opposing party. Then a verdict is chosen, and any penalties are decided by the court. So there's not really any chance the judge only hears one side of the story.

That said, I think we do agree. The plaintiff should be prepared to refute a fair-use argument raised by the defendant. I'm just noting that the refutation doesn't need to be part of the initial filing, it gets presented at trial, after discovery, and only if the defendant presents a fair-use defense. So they don't have to prove it's not fair use to win in every case. I'm probably also being excessively pedantic!

lmm•17h ago

> It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use, they have to show how it violated law, and while it's in the best interest of those organizations to make things easier for the court by showing why it is fair use, they are technically innocent until proven guilty.

No, fair use is an affirmative defense for conduct that would otherwise be infringing. The onus is on the defendant to show that their use was fair.

petesergeant•15h ago

> It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use

Morally, perhaps, but not under US law: https://en.wikipedia.org/wiki/Affirmative_defense#Fair_use

shakna•15h ago

Yeah, I don't think downloading my paid-for books, from an illegal sharing site, to scrape and make use of, is in any way fair use.

From the decision in 1841, in the US (Folsom vs Marsh):

> reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticize, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy

Further, to be "transformative", it is required that the new work is for a new purpose. It has to be done in such a way that it basically is not competing with the original at all.

Using my creative works, to create creative works, is rather clearly an act of piracy. And the methods engaged, to enable to do so, are also clearly piracy.

Where would training a model here, possibly be fair use?

unstablediffusi•1d ago

if that is any consolation, no one gives a shit about xitter's ToS either. it will continue to be scrapped by every major player.

Capricorn2481•15h ago

How exactly is it being scraped? My understanding is Twitter and LinkedIn are both huge pains in the ass to scrape right now.

TheDong•8h ago

There's a number of companies out there, like "brightdata", which pay a small amount to app developers to install a native "sdk". That SDK mimics a browser, and makes requests as if the user's device is doing it.

Since it's using a large number of real user's devices, and closely mimicing real web browsers, it ends up looking incredibly similar to real user traffic.

Since twitter allows some amount of anonymous browsing, that's enough to get some amount of data out. You can also pay brightdata for one large aggregated dataset.

https://bright-sdk.com/

This is part of the AI revolution, user's devices being commandeered to DDoS small blogs and twitter alike to feed data to the beast.

eru•11h ago

You can just not use Twitter?

Terr_•4h ago

I've been wondering if there's some way to put something into legally-defensible clickwrap around one's own content to deter or annoy misuse.

https://news.ycombinator.com/item?id=42774179

TLDR: Use contract law so that I provide my content and they give me rights to all outputs.

So if anybody doing this can prove Acme Model contains their artwork, and Acme Model was used to generate some scenes used in a major movie, then Acme has already given the artist a right to share/resell those scenes. If Acme Inc. "sold" exclusive rights to a movie-studio, then either (A) they broke the contact with every contributor, or (B) they lied to the studio in that other contract.

Remember, the goal isn't some amazing "gotcha" where the latest blockbuster movie becomes public domain, but rather to create chronic legal pain and risk for companies like Acme so that they stop stealing stuff.

vouaobrasil•1d ago

There needs to be a worldwide standard, such as an HTML tag, that says "no training". And a few countries need to make it a punishable offense to violate the tag. The punishment should be exceptionally severe, not just a fine. For example: any company that violates the tag should be completely barred from operating, forever.

twostorytower•1d ago

It needs to be incorporated into the robots.txt standard.

anigbrowl•1d ago

That will just lead to situations where one company scrapes the site, cleans the content of tags, and sells the data, and another does the training on the precleaned data. The first one hasn't trained and the second one never saw the tag.

vouaobrasil•1d ago

Companies who are found guilty of this should also be rendered bankrupt then.

vharuck•1d ago

This isn't a new concept in law. It's similar to buying goods that were stolen or procured through illegal means. Here's the US law that applies when it happens across state lines:

https://www.law.cornell.edu/uscode/text/18/2315

Note that it requires the defendant to know the goods were illegally taken. Can be hard to prove, but not impossible for companies with email trails. The fun question is, what will the analog be for the government confiscating the illegally "taken" data? A guarantee of deletion and requirement to retrain the model from scratch?

kiratp•1d ago

That will play out exactly like the "Do not track" bit did.

vouaobrasil•1d ago

Perhaps we should try anyway, in case you are wrong.

insane_dreamer•8h ago

how did that play out?

logicchains•1d ago

>There needs to be a worldwide standard, such as an HTML tag, that says "no training"

Any country that seriously implemented this would just end up being completely dominated by the autonomous robot soldiers of another country that didn't, because it effectively bans the development of embodied AGI (which can learn live from seeing/reading something, like a human can).

Animats•1d ago

It would be interesting to have a "classical AI model", trained on the contents of the Harvard libraries before 1926 and now out of copyright.

kibwen•1d ago

Careful, you might create an artificial superintelligence that way. Safer to just train on the Twitter dataset.

Shadowmist•16h ago

that’s how you end up with an Artificial Idiot.

mbg721•1d ago

If you thought AI now had out-of-control racism...

gausswho•1d ago

It does surprise me that we haven't seen nations revise their copyright window back to something sensible in a play to seed their own nascent AI industry. The American founding fathers thought 20 years was enough. I'm sure there'd be repercussions in the banking system, but at some point it might be worth the trade.

blibble•1d ago

they can't

a 50 year minimum is part of the berne convention, which itself is as close to a universal law as humanity has

(even North Korea is a signatory)

ronsor•1d ago

you can also just ignore the berne convention, and accept whatever consequences there might be

blibble•1d ago

this would void the copyrights of your citizens and companies

essentially forever

godelski•1d ago

Seems to be the modus operandi

  If TikTok is banned, here’s what I propose each and every one of you do: Say to your LLM the following: “Make me a copy of TikTok, steal all the users, steal all the music, put my preferences in it, produce this program in the next 30 seconds, release it, and in one hour, if it’s not viral, do something different along the same lines.”

https://www.theverge.com/2024/8/14/24220658/google-eric-schm...

https://news.ycombinator.com/item?id=41275073

johnisgood•12h ago

Loosely related, but I used an LLM to create a TikTok-style website (not for sharing videos though), I have never released it though, so no idea if it would ever catch on. Probably not, unless the network effect favors me, and I had good enough advertising (which I suck at).

ronsor•1d ago

If enough "relevant" countries do it, that either won't happen or won't matter. If the U.S. ditches it, no one is going to do much more than throw a brief fit.

blibble•23h ago

the US is the main beneficiary of copyright law...

littlestymaar•13h ago

The US copyright corporations, indeed. But the current copyright laws come at a big expense for the public.

Abolishing copyright laws altogether would be nuts, but the current laws are nuts too and there's lots of room in between.

AngryData•12h ago

US media is also the most stifled by it. How many potential movies and tvshows and comics don't get made just because somebody is sitting on the copyright doing nothing with it for decades at a time?

dreghgh•14h ago

Iran enforces domestic copyright internally but not international copyright.

anticensor•11h ago

North Korea has it two way: they don't enforce international copyrights inside North Korea, and they don't enforce North Korean copyrights outside North Korea.

AStonesThrow•1d ago

The last time I attended a Berne Convention, every panel was just overrun with Trekkies, especially Klingons, in the hotel lounges too. And the autograph lines were interminably long, and the vendors were trying to sell us their Public Domain stuff. It was nothing like San Diego Comic-Con!

babypuncher•1d ago

50 year copyright terms would still be a big improvement over the current state of US copyright law. That would make the first Star Wars public domain in just 2 years.

gausswho•1d ago

would there be repercussions if a country hewed to the 50 year minimum?

loudmax•1d ago

The current US copyright duration is 70 years after the life of the author. This is absolutely bonkers. 50 years from publication would be a significant improvement.

50 years ago was 1975. If copyright were limited to 50 years, we'd be looking at all of the Beatles works being in the public domain. We'd be midway though Led Zeppelin, and a lot of the best work from Pink Floyd and the Rolling Stones.

Also, Superman, Batman, and Spider-Man. Disney would still profit from the MCU films which they produced in the 2010's, but they couldn't stop you from releasing your own Batman vs Spider-Man story.

The Harry Potter books would still belong to JK Rowling, but the Narnia stories would be available for all.

The Godfather 1 and 2 would be in the public domain, as would be original Star Trek TV show, and we'd be coming up on Star Wars pretty soon.

If there were no copyright protection, these works wouldn't have been created. It is good that Paul McCartney and George Lucas and JK Rowling have profited from their creative output. It would be okay if they only profited for the first 50 years. Nobody is counting on revenue over half a century in the future when they create a work of art today.

This is our culture. It should belong to all of us.

jfim•1d ago

> Disney would still profit from the MCU films which they produced in the 2010's, but they couldn't stop you from releasing your own Batman vs Spider-Man story.

Wouldn't they still have a trademark on those characters though?

ncallaway•17h ago

The trademark on characters is related to selling goods, if the character is used as a way of identifying an authentic seller.

So, if Disney is using mickey mouse on t-shirts to identify it as a Disney manufactured t-shirt, you wouldn't be allowed to use mickey mouse on t-shirts in a similar fashion in a way that might cause consumer confusion about who manufactured the t-shirt.

If Wolverine was in the public domain, then they couldn't use a Wolverine trademark to stop you from selling a Wolverine comic book. However, if they used a _specific_ Wolverine mark to identify it as a Disney Wolverine book, then you'd be restricted from using that.

Basically, trademark exists to prevent consumer confusion about who is the creator that is selling a good.

jfim•2h ago

I see, so in that hypothetical world, one could make a spider guy comic book that looks suspiciously like another, but not label it "The Amazing Spiderman (r)"?

Sounds like it would be a boon for things like fan art and fan fiction.

tpxl•17h ago

> If there were no copyright protection, these works wouldn't have been created.

Citation needed. You can freely copy and distribute linux and it still got made.

AStonesThrow•17h ago

The GP wasn't referring to DRM or DMCA type "copyright protection" as the phrase is typically used. Nobody in this thread has mentioned any of that.

The GP is referring to legal protections, and guess what?

Linux is legally protected by copyright!

Nearly every GPL license--every one that we could name--protects a copyrighted work! Nearly every GFDL, AGPL, LGPL protects works by means of copyright law!

Can you imagine that? So do the Apache license, the BSD licenses, the MIT license! Creative Commons (except for CC0) these licenses are legally protecting copyrighted works. Thank you!

Now everyone who proposes to draw down limits on copyright coverage and reduce the length of terms and limit Disney from their Mouse rights, y'all are also proposing the same limits on GPL software, such as Linux, and nearly every work with a license from the above list -- all of Wikimedia Commons, much of Flickr.com, all your beloved F/OSS software will be subject to the same limitations and the same restrictions you want to put on Paramount and the RIAA's labels.

bornfreddy•16h ago

Yeah, I think most of us are fine with 50 years old Linux kernel being released into public domain.

pastage•17h ago

Linux has used the GPL to its advantage. That can not exist without copyright. (The two camps in copyright discussions, improving it e.g. CC, or destroy it)

lmm•17h ago

Linux is generally a functional tool, and struggles with overall coherence. There are far fewer success stories of artworks being made in this style. (E.g. there are successful multiplayer open-source games or clones of existing games, but very few original single-player games, and those that there are are largely the work of a single individual)

eru•11h ago

Linux is both a kernel (which is under GPL), and an operating system, whose other components are under a variety of licenses (and you can pick and match which components you want).

That's why some people like to call it 'Gnu/Linux', but thanks to recent advances we can make Gnu-free Linuxes today, too.

> There are far fewer success stories of artworks being made in this style. (E.g. there are successful multiplayer open-source games or clones of existing games, but very few original single-player games, and those that there are are largely the work of a single individual)

Humans have made art since forever. Large collaborative efforts like eg a cathedral are a more recent invention. But by these standards copyright was practically invented yesterday.

lmm•8h ago

> Linux is both a kernel (which is under GPL), and an operating system

I was talking about the kernel, though what I said applies to both.

> Humans have made art since forever.

Perhaps, but not the kind of long-form narrative experiences that we're talking about here. (Sagas and epics predate copyright, but those are a quite different form, and indeed have much the same downsides - struggles with coherence and consistency when there are multiple authors, inability to put everything together in a sensible arc).

mattkevan•15h ago

Most of the classic Disney films are based on public domain stories.

If there were copyright, those works wouldn’t have been created.

dehrmann•5h ago

I hadn't put this together, but "The Great Mouse Detective" is a riff on Sherlock Holmes, but that didn't enter the public domain until much later. Would it have been better if it used the character and not just the general vibe?

simiones•13h ago

I think Linus Torvalds has been very explicit that he believes the GPL has been critical to the success of Linux - specifically, the copyright-enforced obligation to contribute back any modifications you make. In a world without copyright, companies would be free to make their own modifications and keep them secret, making it more or less impossible to integrate them into a cohesive whole the way they are more or less forced to do today.

eru•11h ago

GPL only forces you to contribute back a modification you make and publish.

> In a world without copyright, companies would be free to make their own modifications and keep them secret, making it more or less impossible to integrate them into a cohesive whole the way they are more or less forced to do today.

Private modifications that are never shared with a third party are fine with the GPL. Eg Google doesn't have to share whatever kernel they are using on their internal servers with you.

dehrmann•6h ago

The things that hold modifications back from being upstreamed are getting consensus with the OSS developers and sometimes the internal legal team. Occasionally, the issue is proprietary information. Usually, it's the time commitment or upstream not being interested. What companies don't like doing is maintaining patches on a fork, and that's enough incentive to give back on its own.

GuB-42•12h ago

If you want a point, BSD is probably a better example. Linux is protected by copyright, that's what makes copyleft licenses like GPL possible.

BSD is also protected by copyright, but it matters less for permissive licenses. It still protects attribution (so you can't claim it yours), but it probably would have worked without it, unlike with Linux that is for a big part defined by the "copyleft" protections offered by its licence.

eru•11h ago

> It still protects attribution (so you can't claim it yours), but it probably would have worked without it, [...]

Well, you could imagine a world that protects the 'moral' rights of authors like attribution, but doesn't otherwise prohibit anyone from duplicating or modifying works.

GuB-42•11h ago

I don't know about the US but in French "droits d'auteur", moral rights are treated differently from exploitation rights. In particular, they cannot be waived, they cannot be sold, and there is no "work-for-hire". For example, even as an employee, every line of code you write will be yours until you die and nothing can change that. You may not be allowed to do anything with it (for example because the exploitation rights go to your employer), but it is still yours.

eru•12h ago

Linux is under the GPL, which explicitly needs copyright to work.

Something like the BSD licenses approximates 'no copyright' better, perhaps? But also not completely.

Teever•18h ago

Europe has recently introduce a law[0] that allows them to suspend IP protections as a punitive response to coercive economic actions by bad actors.

> The procedure is activated by the European Commission submitting a request to the Council of the European Union.[2] After a period of negotiation with the country performing the coercion, the European Council can decide to implement "response measures" such as customs duties, limiting access to programs and financial markets, and intellectual property rights restrictions.[2][4] These restrictions can be applied to states, companies, or individuals.[4]

[0] https://en.wikipedia.org/wiki/Anti-Coercion_Instrument

littlestymaar•15h ago

The Bern Convention on Copyright is an international convention, like the Treaty of Versailles or the Paris Agreement, it could meet the same fate.

MattGaiser•1d ago

Why would it matter? Copyright has been irrelevant so far.

eru•12h ago

What's the connection with the banking system?

nickpsecurity•1d ago

I wish someone would update and use PG19 for 7-30B+ model:

https://github.com/google-deepmind/pg19

That gives us a model that's 100% open and reproducible with low, legal risk. It would also be a nice test of how much AI's generalize from or repeat behavior in their pretraining data.

Then, a new model using that, The Stack, and FreeLaw's stuff (by paying them to open source it). No Github Issues or anything with questionable licenses or terms of service violations. That could be the next baseline for lawful models with coding ability, too. Research in coding AI's might use it.

murph-almighty•1d ago

I've similarly wondered if I could get a pre-2024 Wikipedia if just for the "fact based" flavor LLM

malinens•18h ago

What happened to wikipedia in 2024?

landl0rd•17h ago

Do you think Wikipedia starting in '24 was polluted by AI slop? This is certainly possible, I'm just not aware of it happening.

Wikipedia periodically publishes database dumps and the Internet Archive stores old versions: https://archive.org/search?query=subject%3A%22enwiki%22%20AN...

Plus you could also grab the latest and just read the 12/31/23 revisions.

thrawa8387336•7h ago

It was already slop, let's not pretend it is significantly different today.

carlio•1d ago

It'd look like this: https://www.smbc-comics.com/comic/copyright

add-sub-mul-div•1d ago

How useful is low-quality content like Youtube comments and tweets anyway? Is it a common/important use case to generate tweet-length, tweet-quality content? Are most use cases of generating tweet-type content spam/fraud? Would a model be better off if it was unable to perform those use cases?

redox99•1d ago

Even if SNR is low, there is some information that only exists on X, or at least is the primary source. Just look at how many submissions on HN are X posts.

add-sub-mul-div•1d ago

Before Musk bought it Twitter was broadly disliked here and there were regularly calls in the comments to disallow submissions from there. Given how it's degraded in completely non-partisan ways (blocking of alternative clients, features removed from free tier, paid subscription tiers below $40/month still have ads, proliferation of spam from paid placement bots in comments) I can't understand how positive sentiment comes from a place other than virtue signaling alignment with Musk and his values.

lesuorac•1d ago

Who's training an AI on the "Tweet" button text?

Or are they trying to forgo section 230 protection and claim ownership of content uploaded to the site?

lambertsimnel•15h ago

Perhaps they want the prohibition on using the site content for AI training to be considered based on something other than their ownership of it, like bandwidth usage or users' rights

HenryBemis•13h ago

They will get paid to share our (your) data and they will use the money for infra and new yachts.

lambertsimnel•13h ago

Indeed, but I'm speculating that they do that without owning the data or even claiming to. That's consistent with the article, but I haven't read the other relevant documents. Maybe they have a license to use the data. Maybe the license allows or requires them to try to restrict others' AI training, regardless of their non-ownership of it. Maybe that serves multiple purposes, in which case they could point to whichever shows them in the best light.

GuB-42•12h ago

These are just terms of service, not copyright.

It means that assuming training AI models is fair use (if it wasn't AI companies including xAI would be in trouble), they can't really stop you.

But now, essentially, they are telling you that they can block your account or IP address if you do. Which I believe they can for basically any reason anyways.

grugagag•9h ago

How would they know you’re training some LLM though?

ronsor•1d ago

I'm not sure how this will work as crawlers don't read or accept ToS.

MoonGhost•1d ago

It will not as long as search engines have access. Which means Google and OpenAI through MS Bing, that's at least.

Without search engines what the point in posting it on open net if nobody can find.

voidUpdate•13h ago

This refers to the API, which you would have to manually attach a bot to so that it could scrape things

cameldrv•1d ago

Naturally I'm sure Grok reads the terms of service on every website it scrapes and doesn't use content from sites that prohibit it.

kyle-rb•1d ago

I've never signed up for the X developer program, so I'm not bound by these terms. But I did download an archive of my data last week. Do I have implicit permission to use that data (~150k liked tweets) to train AI models?

Or is there stuff in the user agreement that separately prohibits this?

Obviously barring normal copyright law which is still up in the air.

josefritzishere•1d ago

If you live in the EU, GDPR dictates that you own your data generally speaking. If you're in the US it varies by state if you have any rights at all.

MoonGhost•1d ago

If you own your face that doesn't mean nobody can take a picture on the street.

archagon•1d ago

Oh, that must be nice. And what should I do as a blogger to get the same privilege for my content?

We are in an age of corporate “piracy for me, but not for thee.”

MonkeyClub•16h ago

> We are in an age of corporate “piracy for me, but not for thee.”

Rather, we are back to that age of state- (now corporate-) backed privateering.

zombot•17h ago

Right, stealing training data from others is OK, having it stolen from you is not. What else is new?

keyle•16h ago

New logo every couple of years and Bob's your uncle.

threeseed•16h ago

Almost certainly the easter egg found in the Trump "Big Beautiful Bill" which prevents states from enacting AI regulations also came from Musk.

That way he can continue to steal from others and lock competitors out whilst being comfortable knowing that no laws will be enacted to prevent it.

mgoetzke•16h ago

why do you think he is so evil but all others are benign ?

littlestymaar•15h ago

None of them are benign. He's the only one to have been in a government office though, and he's also batshit crazy, which makes him even more dangerous than the other oligarchs.

HenryBemis•13h ago

He is not "batshit crazy", or maybe he is. But he is making the next generation of ICBMs for the US government, sorry.. he is making super-duper rockets that will definitely take people to Mars and his companies/creations will be the very first tech ever to _not_ be used for war and death!!! (he wrote while laughing). So that settles it (all).

labster•14h ago

Yep, Musk saying he’s going to fund primary campaigns against congressmembers who vote for the Big Beautiful Bill is all just a brilliant bit of reverse psychology.

Or more likely, Congress is super worried about Roko’s Basilisk.

tetris11•14h ago

That's a wild reference!

https://en.wikipedia.org/wiki/Roko's_basilisk

> Roko's basilisk is a thought experiment which states there could be an otherwise benevolent artificial superintelligence (AI) in the future that would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.

stuaxo•13h ago

And some of the CEOs of LLM companies seem to believe in it, and that "AGI" will come from their LLM work - both of which are utterly insane points of view.

BoxOfRain•13h ago

It's Pascal's Wager with a sci-fi reskin, and all the objections that go along with that.

eru•12h ago

Roko's Basilisk is very, very similar to Pascal's wager, but it has an extra wrinkle:

The Basilisk task you to with bringing the Basilisk into being. Pascal's wager merely asks you to believe (and perhaps do some rituals, like pray or whatever), but not to make the deity more likely.

yubblegum•11h ago

No it is not. Pascal was not making an objective argument for why someone should believe. He was making an argument for why he believed (based on personal religious experiences that he had had).

numpad0•9h ago

To me, the Wager sounds like a pure philosophical joke, and the Basilisk sounds like a typical cult murder justification. It's not falsifiable, and it explains anything post facto. "xyz was tail of the Basilisk" can pseudo-rationalize anything you want.

I am presently being compelled by future Basilisk to take another slice of cheese. I have no choice but to oblige for fear of my own life :p

ilyagr•13h ago

An intelligence that reasons this way would be, in human terms, batshit insane and completely immoral. So, it seems unlikely that many or maybe any humans would experience it as "otherwise benign" if it had power over their lives.

And if we do get an all-powerful dictator, we will be screwed regardless of whether their governing intelligence is artificial or composed of a group of humans or of one human (with, say, powerful AIs serving them faithfully, or access to some other technology).

api•10h ago

Basilisk / Skynet 2028

I’m not 100% kidding with how human politics is going. Maybe superintelligent AI takeover would be awesome.

(Wasn’t that the back story of the Culture novels?)

JKCalhoun•10h ago

It was more or less the story from the "Colossus" trilogy.

And from the video posted the other (older episode of Nova on AI) Arthur C. Clarke is saying that if we allow A.I. to take over, we deserve it.

api•10h ago

We really need a one bill one topic amendment. We are going to get to where there is one bill a year that nobody reads and everything else by executive order, at which point congress is just for show.

threeseed•9h ago

And this may sound ridiculous/odd but you need to bring back pork-barrelling i.e. earmarks.

If you allow everyone to go back to their district with something it encourages smaller, more frequent bills and better negotiation.

NekkoDroid•9h ago

> Almost certainly the easter egg found in the Trump "Big Beautiful Bill" which prevents states from enacting AI regulations also came from Musk.

My guess is on Peter Thiel

ivape•12h ago

X/Twitter has became extremely prohibitive with just about everything since Elon took over. Their API pricing was antagonistic toward even indie developers. Elon is not a generous guy.

newsbinator•12h ago

> Elon is not a generous guy

Why would he be?

ivape•12h ago

It's kind of a "life arc" that gets fulfilled when you've done it all and have all the money in the world, and reach a certain age. It's a very traditional arc for a humane human being.

thomasanders0n•6h ago

He still has a couple decades to go with his companies I would say.

reaperducer•11h ago

> Elon is not a generous guy

Why would he be?

Why shouldn't he be?

He has 10x more of everything in the world than he could ever possibly use in his lifetime.

Greed is not a virtue.

MarcelOlsz•10h ago

My uncle has 10x more of everything in the world than he could ever possibly use in his lifetime. A lake house, a main house, a few boats and cars.

Elon is somewhere around 10,000x.

Barracoon•7h ago

The median American net worth is $192,700. Elon’s net worth is $393.4 billion, so if I’m doing math right he’s about 204,000,000x more

MarcelOlsz•1h ago

I think you might be an order of magnitude off.

threetonesun•10h ago

When twitter became x they switched to basically the same limits Instagram has, I don't think this is a particular failing of Elons, even though he might have many.

Restricting content from AI is the big messy debate we're going to see over and over for the next who knows how many years.

matthewdgreen•9h ago

Twitter's strategy was to keep the platform very open and inviting, in order to make it relevant. This included having a relatively unrestricted API compared to other platforms.

I don't know if this was successful or not. Ultimately they convinced someone to buy the platform for $44bn, so I guess you can say it was. That buy has locked the platform down more, and the new version certainly feels less culturally central and relevant than it used to.

djaychela•10h ago

> He has 10x more of everything in the world than he could ever possibly use in his lifetime.

Your multiplier is miles off. Not only on basic maths but because he has no idea what to do with all of his wealth other than accrue more and try to prove he's still not the unlikeable teenager he was in SA.

Without a rounding error on his wealth he could fix world wide problems such as clean drinking water for everyone. Instead he follows his self-made "I'm a genius" agenda.

I know there will be no actual day of reckoning for him, but if there were he would have a lot of difficult questions and no decent answers.

ryeats•6h ago

Not justify anything he does or does not do but this is clearly not the case since he had to take out loans against equity in his other companies to buy Twitter.

notsosureja1•10h ago

Because it feels warm and fuzzy to be kind and empathic. Being hateful and greedy and letting avarice rule over your worldview is incredibly sad. But who am I to say.

foobarchu•6h ago

Maybe something to do with having built his fortune off the back of taxpayer subsidies?

seydor•17h ago

VAT for content should be a thing. Ultimately all users should be getting paid

guywithahat•17h ago

So I get to use the platform for free, but I also get paid to post on the platform? I'm not sure that makes sense. Like I hate to take the side of big tech, but they can't literally be paying users to use their platform. Just use something else, there are a million social media sites

seydor•16h ago

Google indexes your website for free, and it will pay you to put ads in it.

That's also what all social media do , they put ads on your thoughts. They dont even need to index your thoughts because you submit them directly. It has nothing to do with being free, it's about incentives. Users are so foolish , they give everything for free, unlike webmasters.

Reason077•16h ago

You don’t use the platform for free, unless you’re using an ad blocker. But that’s also, probably, against the TOS?

MonkeyClub•16h ago

> I get to use the platform for free

You actually get to generate content for the platform for free.

Without you (all of the X users), the platform would be devoid of content, just botspeak and corporate promos.

Plus, as the sibling mentioned, they monetize your visit through ads (and data use).

jaoane•15h ago

Most posts are ignored and are an absolute loss to the company. Which is why platforms like Twitter only allow you to make money from posting once you reach a certain threshold.

MonkeyClub•10h ago

They're not an "absolute loss" since they cost bytes to store, and raise engagement and data metrics.

It's just that they don't want to share the fractions of pennies with everyone, so the fractions accumulate for them.

Then they pay a bit to the higher tiers, so they create the illusion that X is a parallel income source, and gives the lower tiers something to aspire to.

Carrot and stick, or rather glass beads and the hope thereof.

threeseed•16h ago

We really need LLMs for music to become more advanced.

Then maybe the recording companies will start defending artist rights.

Because not sure what all the other industry bodies are doing.

mk_stjames•15h ago

I wanted to do some quick math on this idea- supposed we trained a vanilla transformer model from scratch, as GPT2/GPT3 was done- the number of seen input tokens is known perfectly, as is the sources of those training tokens (since then, everyone has either kept quiet about the sources post-Books3-fiasco, or have been finetuning on top of previous models making this more difficult of a calculation)

GPT-3 was trained on approximately 300 billion tokens. An small sized technical textbook might contain something like... 130,000 tokens? (1 token ~= 0.75 words, ~100k words in the book).

Thus, say you wrote a textbook on quantum mechanics that was included in the training corpus. A naive computation of the fraction of your textbook's contribution to the total number of training tokens would be 300B/130K = 0.0000004333333333, or 0.000043%.

If our hypothetical AI company here reported, say $500M in yearly profit, if all of that was distributed 100% based on our naive training token ratio (notice I say naive because it isn't as simple to say that every training token contributes equally to the final weights of a model. That is part of the magic.) then $500M * 0.000043% = $215.

You could imagine a simpler world where it was required by law that any such profitable company redistribute, say, %20 (taking the 'anti-VAT' idea) back to the copyright holders / originators of the training tokens. So, our fictitious QM textbook author would receive a check in the mail for $43 for that year of $500M in revenue. Not great, but not zero.

Since then, training corpuses are much, much larger, and most people's contributions would be much smaller. Someone who writes witty tweets? Maybe 1/100th the length of our above example in am model with now 100x the training corpus.

So fractions of a penny for your tweets. Maybe that is fitting after all...

seydor•12h ago

the payment would probably be based on the usage of that source in generating LLM output for the LLM user. This would probably require training a parallel network that connects LLM network nodes to sources. Then the activation of those nodes could be a surrogate for the contribution of the source

visarga•15h ago

Copyright is not going well. The rights of millions of people are trampled by companies, both the content we post on social networks and our private AI chats. Our voice doesn't matter.

Copyright was supposed to protect expression and keep ideas freely circulating. But now it protects abstractions (see the Abstraction-Filtration-Comparison test). It is much more difficult to be sure you are not infringing.

eviks•13h ago

It seems like it was supposed to do the exact opposite per cursory wiki reading:

> The concept of copyright first developed in England. In reaction to the printing of "scandalous books and pamphlets", the English Parliament passed the Licensing of the Press Act 1662,[16] which required all intended publications to be registered with the government-approved Stationers' Company, giving the Stationers the right to regulate what material could be printed.[20]

> The Statute of Anne, enacted in 1710 in England and Scotland, provided the first legislation to protect copyrights (but not authors' rights)

pergadad•12h ago

Copyright has nothing to do with free expression but was intended to protect the interests of publishers. When the printing press arrived basically any popular book or booklet was quickly copied by others. This meant the original publisher (and sometimes the author, but usually they were paid one-off) saw nothing of the profit.

risyachka•15h ago

Good luck with that. Pretty sure at this point no one cares.

Literally every AI model is trained on copyrighted etc data. And without any consequences.

lcnmrn•15h ago

I allow all robots and even provide a sitemap on Subreply, a social network I created.

petesergeant•15h ago

The only story here is that it took 2 months for them to do this after being "bought" by xAI.

thih9•14h ago

I think the rules should be stricter.

I’d prefer an explicit opt in from the content author being required for anyone to perform any model training with any given data.

Alternatively, require all weights, prompts and chat logs to have the same visibility as the original datasets.

None of this is going to happen and current decisions about uncopyrightable ai[1] are already good; but still, it feels like there is room for abuse.

[1]: https://en.m.wikipedia.org/wiki/Th%C3%A9%C3%A2tre_D%27op%C3%...

eru•12h ago

Well, you explicitly opt-in to Twitter ToS whenever you post anything there.

thih9•10h ago

This is not opt-in how I understand it. When there is no alternative, or the alternative is not using a service, I'd call it a hard requirement instead.

I like how opt-in is handled by GDPR; e.g.: "Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject (...) A data controller may not refuse service to users who decline consent to processing that is not strictly necessary in order to use the service.", source: https://en.wikipedia.org/wiki/General_Data_Protection_Regula...

bamboozled•13h ago

This guy is just painful

foldr•12h ago

This could lead to a precipitous increase in the performance of the AI models.

xiaoyu2006•11h ago

As if anyone will follow.

mrweasel•10h ago

It's that like half of Xs business model, selling data to other companies? Right now no one is as data hungry as AI companies, so it seems strange to cut them off. I can understand wanting to charge a premium for the access, if it's for AI, but straight up saying no seems like a strange business move.

SilverBirch•10h ago

How much do you think Musk values X being a viable independent business vs using it accelerate X AI? I would expect Musk values the first as approximately 0 value, and the second as being 100% of the value. So it makes total sense to exploit the fact that X and X AI are the same company.

mrweasel•9h ago

That's a good point. Other than Meta, X (AI) is the only AI company that "generates" it's own training data and we haven't really seen Musk trying to increase X revenue, of trying to run it cheaper.

narrator•10h ago

Elon mentioned that the earlier rate limiting was for preventing training the real-time AI propaganda deathstar, and to avoid X becoming bot hell, which is an ongoing problem. This move is probably for similar reasons.

https://x.com/elonmusk/status/1675187969420828672

like_any_other•8h ago

In contrast, I'm glad ISPs allow "their" content to be used so permissively.

Hizonner•7h ago

By "its content", X of course means your content.

nly•3h ago

Except xAI which will no doubt get permission at some point.

A year of funded FreeBSD development

How we decreased GitLab repo backup times from 48 hours to 41 minutes

The time bomb in the tax code that's fueling mass tech layoffs

Japanese researchers develop transparent paper as alternative to plastics

Sandia turns on brain-like storage-free supercomputer

Odyc.js – A tiny JavaScript library for narrative games

United States Digital Service Origins

A masochist's guide to web development

Onyx (YC W24) – AI Assistants for Work Hiring Founding AE

The Illusion of Thinking: Understanding the Limitations of Reasoning LLMs [pdf]

Supreme Court Gives Doge Access to Social Security Data

Too Many Open Files

Meta: Shut down your invasive AI Discover feed

Smalltalk, Haskell and Lisp

SaaS is just vendor lock-in with better branding

Series C and Scale

Curate your shell history

Show HN: AI game animation sprite generator

Researchers find a way to make the HIV virus visible within white blood cells

Supreme Court allows DOGE to access social security data

An Interactive Guide to Rate Limiting

What you need to know about EMP weapons

How many trees are there in the North American boreal forest?

A Rippling Townhouse Facade by Alex Chinneck Takes a Seat in a London Square

Wendelstein 7-X sets new fusion record

Swift and the Cute 2d game framework: Setting up a project with CMake

Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Tasks

A leaderless NASA faces its biggest-ever cuts

Weaponizing Dependabot: Pwn Request at its finest

Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

A year of funded FreeBSD development

How we decreased GitLab repo backup times from 48 hours to 41 minutes

The time bomb in the tax code that's fueling mass tech layoffs

Japanese researchers develop transparent paper as alternative to plastics

Sandia turns on brain-like storage-free supercomputer

Odyc.js – A tiny JavaScript library for narrative games

United States Digital Service Origins

A masochist's guide to web development

Onyx (YC W24) – AI Assistants for Work Hiring Founding AE

The Illusion of Thinking: Understanding the Limitations of Reasoning LLMs [pdf]

Supreme Court Gives Doge Access to Social Security Data

Too Many Open Files

Meta: Shut down your invasive AI Discover feed

Smalltalk, Haskell and Lisp

SaaS is just vendor lock-in with better branding

Series C and Scale

Curate your shell history

Show HN: AI game animation sprite generator

Researchers find a way to make the HIV virus visible within white blood cells

Supreme Court allows DOGE to access social security data

An Interactive Guide to Rate Limiting

What you need to know about EMP weapons

How many trees are there in the North American boreal forest?

A Rippling Townhouse Facade by Alex Chinneck Takes a Seat in a London Square

Wendelstein 7-X sets new fusion record

Swift and the Cute 2d game framework: Setting up a project with CMake

Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Tasks

A leaderless NASA faces its biggest-ever cuts

Weaponizing Dependabot: Pwn Request at its finest

Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

X changes its terms to bar training of AI models using its content

Comments