It was the kick in the pants I needed to cancel my subscription.
I'm looking at
> "When you use the Assistant by Kagi, your data is never used to train AI models (not by us or by the LLM providers), and no account information is shared with the LLM providers. By default, threads are deleted after 24 hours of inactivity. This behavior can be adjusted in the settings."
https://help.kagi.com/kagi/ai/assistant.html#privacy
And trying to reconcile those claims with the instant thread. Anthropic is listed as one of their back-end providers. Is that data retained for five years on Anthropic's end, or 24 hours? Is that data used for training Anthropic models, or has Anthropic agreed in writing not to, for Kagi clients?
As if barely two 9s of uptime wasn't enough.
Grabbing users during start up with the less privacy focused option preselected isn't being "very transparent"
They could have forced the user to make a choice or defaulted to not training on their content but they instead they just can't help themselves.
Implicit consent is not transparent and should be illegal in all situations. I can't tell you that unless you opt out, You have agreed to let me rent you apartment.
You can say analogy is not straightforward comparable but the overall idea is the same. If we enter a contract for me to fix your broken windows, I cannot extend it to do anything else in the house I see fit with Implicit consent.
The fact that there's no law mandating opt-in only for data retention consent (or any anti-consumer "feature") is maddening at times
Opt-out leads to very high adoption and is the immoral choice.
Guess which one companies adopt when not forced through legislation?
Except not:
> The interface design has drawn criticism from privacy advocates, as the large black "Accept" button is prominently displayed while the opt-out toggle appears in smaller text beneath. The toggle defaults to "On," meaning users who quickly click "Accept" without reading the details will automatically consent to data training.
Definitely happened to me as it was late/lazy.
“If you do not choose to provide your data for model training, you’ll continue with our existing 30-day data retention period.“
From the support page: https://privacy.anthropic.com/en/articles/10023548-how-long-...
“If you choose not to allow us to use your chats and coding sessions to improve Claude, your chats will be retained in our back-end storage systems for up to 30 days.”
I have to admit, I've used it a bit over the last days and still reactivated my Claude pro subscription today so... Let's say it's ok for casual stuff? Also useful for casual coding questions. So if you care about it, it's an option.
Export data
Shared chats
Location metadata
Review and update terms and conditions
I'm in the EU, maybe that's helping me?
It's part of the update
No one cares about anything else but they have lots of superflous text and they are calling it "help us get better", blah blah, it's "help us earn more money and potentially sell or leak your extremely private info", so they are lying.
Considering cancelling my subscription right this moment.
I hope EU at leat considers banning or extreme-fining companies trying to retroactively use peoples extremely private data like this, it's completely over the line.
I'd love to live in a society where laws could effectively regulate these things. I would also like a Pony.
Its only utopian because it's become so incredibly bad.
We shouldn't expect less, we shouldn't push guilt or responsibility onto the consumer we should push for more, unless you actively want your neighbour, you mom, and 95% of the population to be in constant trouble with absolutely everything from tech to food safety, chemicals or healthcare - most people aren't rich engineers like on this forum and i don't want to research for 5 hours every time i buy something because some absolute psychopaths have removed all regulation and sensible defaults so someone can party on a yacht.
Nitpicking: “opt in by default” doesn’t exist, it’s either “opt in”, or “opt out”; this is “opt out”. By definition an “opt out” setting is selected by default.
You can say that you want to opt out. What Anthropic will decide to do with your declaration is a different question.
(and as diggan said, the web isn't the only source they use anyway. who knows what they're buying from data brokers.)
I realize there's a whole legal quagmire here involved with intellectual "property" and what counts as "derivative work", but that's a whole separate (and dubiously useful) part of the law.
If you can use all of the content of stack overflow to create a “derivative work” that replaces stack overflow, and causes it to lose tons of revenue, is it really a derivative work?
I’m pretty sure solution sites like chegg don’t include the actual questions for that reason. The solutions to the questions are derivative, but the questions aren’t.
AI companies will get bailed out like the auto industry was - they won't be hurt at all.
It’s quite clear. It’s easy to opt out. They’re making everyone go through it.
It doesn’t reach your threshold of having everyone sign a contract or something, but then again no other online service makes people sign contracts.
> should be considered a serious criminal offense.
On what grounds? They’re showing people the terms. It’s clear enough. People have to accept the terms. We’ve all been accepting terms for software and signing up for things online for decades.
Learning metric won’t be you, it will be some global shitty metric that will make the service mediocre with time.
Data gathered for training still has to be used in training, i.e. a new model that, presumably, takes months to develop and train.
Not to mention your drop-in-the-bucket contribution will have next to no influence in the next model. It won't catch things specific to YOUR workflow, just common stuff across many users.
That was true when the tech leadership was an open question and it seemed like any one of the big players could make a breakthrough at any moment that would propel them to the top. Nowadays it has pattered out and the market is all about sustainable user growth. In that sense Anthropic is pretty overvalued, at least if you think that OpenAI's valuation is legit. And if you think OpenAI is overvalued, then Anthropic would be a no-go zone as an investor.
And the underrated comparison was more towards the fact that I couldn't believe scaleAi's questionable accquisition by facebook and I still remember the conversation me and my brother were having which was, why doesn't facebook pay 2x, 3x the price of anthropic but buy anthropic instead of scaleAI itself
well I think the answer my brother told was that meta could buy it but anthropic is just not selling it
As the years go by, I'm finding myself being able to rely on those less and less, because every time I do, I eventually get disappointed by them working against their user base.
I don't think we should be so quick to dismiss the holes LLMs are fulfilling as unnecessary. The only thing "necessary" is food water and shelter by some measures.
For me this been a pretty fundamental shift, where before I either had to figure out another way so I can move on, or had to spend weeks writing one function after learning the needed math, and now it can take me 10-30 minutes to nail perfectly.
seriously, the idea we need this is a joke. people need it to pretend they can do their job. the rest of us enjoy having quick help from it. and we have done without it for a very long time already..
The solution is to break up monopolies....
That doesn't matter when their revenue per user is as high as it is.
They're at $5B ARR and rapidly growing.
Once they admitted they are going to have to take money from folks who chop up journalists that made them feel sad, they proved the current pre token LLM based business model doesn't work. They haven't pulled the ads lever yet but the writing is on the wall.
Which means sadly only business with other revenue streams like M$, the Google, or Amazon can really afford it long term. I'm was rooting for Anthropic but it doesn't look good.
Anthropic probably has 80% of AI coding model market share. That's a trillion dollar market.
Merely selling data is extremely low value compared to also having the surface monopoly to monetize it in a very high engagement and decisioning space.
I feel like you don’t understand the fundamental mechanics of the ad world. Ultimately, the big 4 own such immense decisions surface area it may be a while before any AI model company can create a product the get there.
https://www.anthropic.com/news/updates-to-our-consumer-terms
Meta downloaded copyrighted content and trained their models on it, OpenAI did the same.
Uber developed Greyball to cheat the officials and break the law.
Tesla deletes accident data and reports to the authorities they don't have it.
So forgive me I have zero trust in whatever these companies say.
None. And even if it's the nicest goody two shoes company in the history of capitalism, the NSA will have your data and then there'll be a breach and then Russian cyber criminals will have it too.
At this point I'm with you on the zero trust: we should be shouting loud and clear to everyone, if you put data into a web browser or app, that data will at some point be sold for profit without any say so from you.
I don't own a car and only take public transit or bike. I fill my transit card with cash. I buy food in cash from the farmer's morning market. My tv isn't connected to the Internet, it's connected to a raspberry pi which is connected to my home lab running jellyfin and a YouTube archiving software. I de Googled and use an old used phone and foss apps.
It's all happened so gradually I didn't even realize how far I'd gone!
If you don’t take companies at their word, you need to be consistent about it.
Where did these companies claim they didn’t do this?
Even websites can be covered by copyright. It has always been known that they trained on copyrighted content. The output is considered derivative and therefore it’s not illegal.
If your threat model is to unconditionally not trust the companies, what they're saying is irrelevant. Which is fair enough, you probably should not be using a service you don't trust at all. But there's not much of a discussion to be had when you can just assert that everything they say is a lie.
> Meta downloaded copyrighted content and trained their models on it, OpenAI did the same.
> Uber developed Greyball to cheat the officials and break the law.
These seem like randomly chosen generic grievances, not examples of companies making promises in their privacy policy (or similar) and breaking them. Am I missing some connection?
1. Anthropic reverses privacy stance, will train on Claude chats
3. Gun Maker Sig Sauer Citing National Security to Keep Documents from Public
4. Tesla said it didn't have key data in a fatal crash. Then a hacker found it
6. Meta might be secretly scanning your phone's camera roll
7. If you have a Claude account, they're going to train on your data moving forward
8. Ask HN: The government of my country blocked VPN access. What should I use?
It has always been like this. Sites like Reddit, HN, and Digg and Boing Boing (when they were more popular) have always had a lot of stories under the category of online rights, privacy, and anger at big companies.
> If you’re an existing user, you have until September 28, 2025 to accept the updated Consumer Terms and make your decision. If you choose to accept the new policies now, they will go into effect immediately. These updates will apply only to new or resumed chats and coding sessions. After September 28, you’ll need to make your selection on the model training setting in order to continue using Claude. You can change your choice in your Privacy Settings at any time.
Doesn’t say clearly it applies to all the prompts from the past.
https://www.anthropic.com/news/updates-to-our-consumer-terms
> Previous chats with no additional activity will not be used for model training.
It's an AI company, why wouldn't they use the most precious data they have?
After providing consent, the setting will be turned on by default. [1]
[1]: https://support.docusign.com/s/document-item?language=en_US&...
To make it respect user privacy I would use this data for training preference models, and those preference models used to finetune the base model. So the base model never sees particular user data, instead it learns to spot good and bad approaches from feedback experience. It might be also an answer to "who would write new things online if AI can just replicate it?" - the experience of human-AI work can be recycled directly through the AI model. Maybe it will speed up progress, amplifying both exploration of problems and exploitation of good ideas.
Considering OpenAI has 700M users, and worldwide there are probably over 1B users, they generate probably over 1 trillion tokens per day. Those collect in 2 places - in chat logs, for new models, and in human brains. We ingest a trillion AI tokens a day, changing how we think and work.
Looks like there is an opt out option. Curious about the EU users - would that be off by default (so: opt in)?
But unlike the 100s of data brokers that also want your data, they have an existing operational funnel of your data already that you voluntary give them every day. All they need is dark pattern ToS changes and manage the minor PR issue. People will forget about this in a week.
If they had done this in a more measured way they might have been able to separate human from AI content such as doing legal deals with publishers.
However they couldn't wait to just take it all to be first and now the well is poisoned for everyone.
I've seen zero evidence anything of the such is occurring, and that if it was, it's due to what you claim. I'd be highly interested in research suggesting both or either is occurring however.
If I'm not paying for something, I presume this is the kind of thing that's happening, so this isn't newsworthy to me. Is it also applicable for paid and paid corporate accounts?
For example Anthropic have an Anthropic Console that they appear to consider quite distinct from Claude.ai. Do these share a privacy policy and related settings? How do either of these fit in with the named plans like Pro and Max? What are you actually paying for when you give them money for the various different things they charge for? Is all API use under their Commercial Terms even if it's a personal account that is otherwise under the Consumer Terms? Why isn't all of this obvious and transparent to users?
OpenAI don't seem to be any better. I only just learned from this HN discussion that they train on personal account conversations. As someone privacy-conscious who has used ChatGPT - even if only a few times for experiments - I find the fact that this wasn't very clearly stated up front to be extremely disturbing. If I'd known about it I would certainly have switched off the relevant setting immediately.
I get that these organisations have form for training on whatever they can get their hands on whether dubiously legal or not. But training on users' personal conversations or code feels like something that should require a very clear and explicit opt-in. In fact I don't see how they can legally not have that first in places like the EU and UK that have significant data protection legislation.
I am wondering how would you use a chat transcript for training? Unless it is massive, possibly private codebases that are constantly getting piped into Claude Code right now. In that case, that would make sense.
But more importantly (to me) is storing 5 years worth of other company's IP. That just seems wildly risky for all parties unless I really don't understand how Claude Code works.
And this is only for free users, paid users should never have to think about this.
The cognitive load to remember to opt out every new chat should not rest on the user.
"Anthropic also reported discovering North Korean operatives using Claude to fraudulently obtain remote employment positions at Fortune 500 technology companies, leveraging the AI to pass technical interviews and maintain positions despite lacking basic coding skills."
Note that in this version the North Koreans lack basic coding skills, which took me by surprise. Generally they are assumed to be highly competent.
The original (https://www.anthropic.com/news/detecting-countering-misuse-a...) is completely different:
"Our Threat Intelligence report discusses several recent examples of Claude being misused, including a large-scale extortion operation using Claude Code, a fraudulent employment scheme from North Korea, and the sale of AI-generated ransomware by a cybercriminal with only basic coding skills. We also cover the steps we’ve taken to detect and counter these abuses."
This is what people are using for web search. I'm not targeting Perplexity specifically, Google "AI" summaries are just as bad.
UPDATE: The original pdf says something different again (https://www-cdn.anthropic.com/b2a76c6f6992465c09a6f2fce282f6...):
"The most striking finding is the actors’ complete dependency on AI to function in technical roles. These operators do not appear to be able to write code, debug problems, or even communicate professionally without Claude’s assistance. Yet they’re successfully maintaining employment at Fortune 500 companies (according to public reporting) passing technical interviews, and delivering work that satisfies their employers. This represents a new paradigm where technical competence is simulated rather than possessed."
This should be distributed among managers so that they finally get the truth about "AI".
To put it in perspective: google won't even give you an option to opt out.
If you pay for Gemini as a private user and not as a corporation, you are fair game for google.
Now, neither option is good. But one is still much worse.
There's no such thing as a free lunch, but even when I am a paying customer my data is taken as gratuity and used (+ spread around!) in extremely opaque ways. I am tired of it. Honestly, I'm just getting tired of the internet.
I think that it is only a matter of time before they will start reselling these data as exfiltrated IP to whoever will be interested.
I'm not arguing on the facts of the modal design, I don't remember either way, I just don't remember it being confusing.
Unless I was in some A B test?
I have a really hard time thinking that Google, Microsoft, Meta, etc... would _not_ train on whatever people enter (willingly or not in the system.)
The silver lining is that what most people enter in a chat box is _utter crap_.
So, training on that would make the "Artificial Intelligence" system less and less intelligent - unless the devs find a way to automagically sort clever things from stupid things, in which case I want to buy _that_ product.
In the long run, LLMs dev are going to have to either:
* refrain from getting high on their own supply, and find a way to tag AI generated content
* or sort the bs from the truth, probably reinventing "trust in gatekeepers and favoring sources of truth with a track record" and copying social pressure, etc... until we have a "pulitzer price" and "academy awards" for most relevant AI sources with a higher sticker price, to separate from cheap slop.
That, or "2+2=7 because DeepChatGrokmini said so, and if you don't agree you're a terrorist, and if our AI math breaks your rocket it's your fault."
TBH, I'd love to have a model which was specifically trained on conversation which I had with an earlier iteration. That would make it adapt to me and be less frustrating. Right now I'm relying only on instruction files to somewhat tune a model to my needs.
I hear the sound of a million lawsuits in Europe concerning GDPR violations.
An Effective Altruism ethos provides moral/ethical cover for trampling individual privacy and property rights. Consider their recent decision to provide services for military projects.
As others have pointed out, Claude was trained using data expressly forbidden for commercial reuse.
The only feedback Anthropic will heed is financial and the impact must be large enough to destroy their investors willingness to cover the losses. This type of financial feedback can come from three places: termination of a large fraction of their b2b contracts, software devs organizing a persistent mass migration to an open source model for software development. Neither of these are likely to happen in the next 3 months. Finally, a mass filing of data deletion requests from California and EU residents and corporations that repeats every week.
Actually up until a few months ago I swore I just couldn't use these hosted models (I regularly use local inference but like most my local hardware yields only so much quality). Tech companies, nay many companies, will lie and cheat to squeeze out whatever they can. That includes reneging promises.
With data privacy specifically I always take the default stance that they are collecting from me. In order for me to use their product it has to be /exceedingly/ good to be worth the trade off.
Turns out that Claude Code is just that damn good. I started using it for my own personal project. But the impetus was the culmination of months questioning what kind of data I'd be okay with giving up to a hosted model.
What I'm trying to say is that this announcement doesn't bother me that much because I already went on my own philosophical odyssey to prepare for this breach of trust to occur.
In this aspect, it would've been great to give us an incentive – a discount, a donation on our behalf, plant a percent of a tree or just beg / ask nicely, explain what's in it for us.
Regarding privacy, our conversations are saved anyway, so if it would be a breach this wouldn't make much of a difference, would it?
My reasoning: I use AI for development work (Claude Code), and better models = fewer wasted tokens = less compute = less environmental impact. This isn't a privacy issue for work context.
I regularly run concurrent AI tasks for planning, coding, testing - easily hundreds of requests per session. If training on that interaction data helps future models be more efficient and accurate, everyone wins.
The real problem isn't privacy invasion - it's AI velocity dumping cognitive tax on human reviewers. I'd rather have models that learned from real usage patterns and got better at being precise on the first try, instead of confidently verbose slop that wastes reviewer time.
The bubble is deflating/popping. The MIT study has really dampened the excitement on AI.
The scenario that concerns me is that Claude learns unpublished research ideas from me as we chat and code. Claude then suggests these same ideas to someone else, who legitimately believes this is now their work.
Clearly commercial accounts use AI to assist in developing intellectual product, and privacy is mandatory. The same can apply to individuals.
> Claude then suggests these same ideas to someone else, who legitimately believes this is now their work.
Won't this mean that claude assisted you with someone else work? Sure it's not from a "chat" but claude doesn't really know anything other than it's training data
You can no more own knowledge or information than you can own the number 2.
Pulling up the ladder behind you :-)
I genuinely want to know and would like to have a productive conversation. I would like to identify what made people trust them and not realise they’re the same as every other.
An in-app notification pop-up will alert you to the change. You can opt out in the pop up.
I was able to opt out right now by going to the Privacy section of Settings.
It doesn’t take effect until September 28th. The app will apparently prompt people to review the new terms and make a decision before then.
Only applies to new or resumed sessions if you do review the new terms and don’t turn it off. The angry comments about collecting data from customers and then later using it without permission are not correct. You would have to accept the new terms and resume an old session for it to be used.
Does not apply to API use, 3rd party services, or products like Claude Gov or Claude for Education.
Changing the link to the actual source instead of this perplexity.ai link would be far more helpful.
I would strongly argue that API clients should NEVER be opted in for these sorts of things, and it should be like this industry wide.
> They do not apply to services under our Commercial Terms, including Claude for Work, Claude Gov, Claude for Education, or API use, including via third parties such as Amazon Bedrock and Google Cloud’s Vertex AI.
I’ll edit my comment above to include this too
The in-app notification that I got was a pop up which contained some buttons and some images. There was no text. Just in case it was some dark mode issue I checked the DOM and I couldn't find any text there either. I just clicked outside the modal and it went away. I assumed it was some announcement about some new feature and ignored it.
I did end up seeing the news yesterday in Reddit (I'm having issues getting the research tool to actually being used, tried to see if there was some recent changes) but it's unlikely that I was the only one who experienced modal the issue & if those didn't follow the tech news they could easily miss the change.
Also it seems that this data retention/training does not apply to the API.
I think both Anthropic and OpenAI do not train on enterprise data, so an enterprise account maybe.
https://help.mistral.ai/en/articles/347617-do-you-use-my-use...
I use both Mac and Windows (Work / Leisure) and in both boxes I had a weird dialog appearing with no text at all in either.
I can confirm the dark pattern switch (As in dark grey / light gray status)
in fact, i haven’t agreed to it yet, and was able to close the popup and continue using Claude. they also made it extremely clear how to opt out, providing the switch right in the popup and reminding me it’s also in settings.
when i eventually do have to agree to the ToS changes, i’ll probably just stay opted out.
I'll use Claude with my employer's Copilot account, but was I wasn't putting anything personal there anyway.
Time to learn how to do local models...
Collecting user data should be a liability, not a enormously profitable endeavor.
As some other user put it: "big corp changes policy and breaks promises, how shocking"
This type of behavior has a penalty, that penalty is trust. You lost it.
aurareturn•2h ago
I think Claude saw that OpenAI was reaping too much benefit from this so they decided to do it too.
demarq•2h ago
aurareturn•2h ago
bayindirh•2h ago
demarq•2h ago
aleph_minus_one•2h ago
elzbardico•2h ago
That's why the usual ethos in places like HN of treating any doubt about government actions as lowbrow paranoid conspiracy theory stuff, is so exasperating, for those of us who came from either the former soviet bloc or third world nations.
6510•1h ago
AlecSchueler•2h ago
giraffe_lady•1h ago
sillyfluke•1h ago
Well, probably easier than you think. Given that it looks like Palantir is able to control the software and hardware of the new fangled detention centers with immunity, how difficult do you think it is for them to disappear someone without any accountability?
It is precisely the blurring of the line between gov and private companies that aid in subverting the rule of law in many instances.
[0] https://thegrayzone.com/2025/06/18/palantir-execs-appointed-...
AlecSchueler•1h ago
But the question was "why trust a company and not the government?"
So even now it's between:
And So it's still "could maybe do harm" versus "already controls an army of masked men who are undeniably active in doing harm."sillyfluke•1h ago
The post you were replying to simply said the behavior of this administration made them care more about this issue, not that they trusted companies more than the government. That statement is not even implied in anyway in the comment you responded to?
The fact is whereas in the past it would be expected that the government could regulate the brutal and illegal overreaches of private companies, giving military rank to private companies execs makes that even less likely. The original comment is alluding to a simpler point: A government that gives blank checks to private companies in military and security matters is much worse than one that doesn't.
AlecSchueler•34m ago
sillyfluke•31m ago
Cheer2171•39m ago
I'll still take an increasingly stacked US federal court that still has to pay lip service to the constitution over private arbitration hired by the company accountable only to their whims.
What you mentioned has been repeatedly ruled unconstitutional, but the administration is ignoring the courts.
const_cast•38m ago
There's tradeoffs. The government, at least, has to abide by the constitution. Companies don't have to abide by jack shit.
That means infinite censorship, searches and seizures, discrimination, you name it.
We have SOME protection. Very few, but they're there. But if Uber was charging black people 0.50 cents more on average because their pricing model has some biases baked in, would anyone do anything?
slipperydippery•1h ago
Corporate surveillance is government surveillance. Always has been.
twoquestions•1h ago
Apple/FBI story in question: https://apnews.com/general-news-c8469b05ac1b4092b7690d36f340...
bayindirh•1h ago
On the other hand, what Apple did is a tangible thing and is a result.
This gives them better optics for now, but there is no law says that they can't change.
Their business model is being an "accessible luxury brand with the privacy guarantee of Switzerland as the laws allow". So, as another argument, they have to do this.
sokoloff•1h ago
It seems strange to not be able to grasp the difference in kind here.
bayindirh•1h ago
What happens the same company locks all your book drafts because an algorithm deemed that you're plotting something against someone?
Both are real events, BTW.
sokoloff•1h ago
The government forces me to do business with them; if I don't pay them tens (and others hundreds) of thousands of dollars every year they will send people with guns to imprison me and eventually other people with guns to seize my property.
Me willingly giving Google some data and them capriciously deciding to not always give it back doesn't seem anything like the same to me. (It doesn't mean I like what Google's doing, but they have nowhere near the power of the group that legally owns and uses tanks.)
bayindirh•1h ago
A company "applied what the law said", and refused that they made a mistake and overreached. Which is generally attributed to governments.
So, I you missed the effects of this little binary flag on their life.
[0]: https://www.theguardian.com/technology/2022/aug/22/google-cs...
sokoloff•1h ago
What?! Google locked them out of Google. I'm sure they can still get search, email, and cloud services from many other providers.
The government can lock you away in a way that is far more impactful and much closer to "life stopped; locked out of everything" than "you can't have the data you gave us back".
degamad•1h ago
const_cast•35m ago
Why do you think the military and police outsource fucking everything to the private sector? Because there are no rules there.
Wanna make the brown people killer 5000 drone? Sure, go ahead. Wanna make a facial crime recognition system that treats all black faces as essentially the same? Sure, go ahead. Wanna run mass censorship and propaganda campaigns? Sure, go ahead.
The private sector does not abide by the constitution.
Look, stamping out a protest and rolling tanks is hard. Its gonna get on the news, it's gonna be challenged in court, the constitution exists, it's just a whole thing.
Just ask Meta to do it. Probably more effective anyway.
demarq•2h ago
The part that irks me is that this includes people who are literally paying for the service.
OtherShrezzing•2h ago
smca•1h ago
https://www.anthropic.com/news/updates-to-our-consumer-terms
staticman2•2h ago
echelon•2h ago
These bastard companies pirated the world's data, then they train on our personal data. But they have the gall to say we can't save their model's inputs and outputs and distill their models.
jacooper•2h ago
datadrivenangel•2h ago
echelon•1h ago
We need a Galoob vs. Nintendo [1], Sony vs. Universal [2], or whatever that TiVo case was (I don't think it was TiVo vs. EchoStar). A case that establishes anyone can scrape and distill models.
[1] https://en.wikipedia.org/wiki/Lewis_Galoob_Toys,_Inc._v._Nin....
[2] https://en.wikipedia.org/wiki/Sony_Corp._of_America_v._Unive....
elzbardico•1h ago
While those developers are not well paid (usually around 30/40 USD hour, no benefits), you need a lot of then, so, it is a big temptation to create also as much synthetic data sets from your more capable competitor.
Given the fact that AI companies have this Jihad zeal to achieve their goals no matter what (like, fuck copyright, fuck the environment, etc, etc), it would be naive to believe they don't at least try to do it.
And even if they don't do it directly, their outsourced developers will do it indirectly by using AI to help with their tasks.
sokoloff•1h ago
$40/hour for a full time would put you just over the median household income for the US.
I suspect this provides quite a good living for their family and the devs doing the work feel like they’re well-paid.
fusslo•1h ago