My most generous guess here is that the NYT is accusing OpenAI of deleting infringing user chats themselves, because the implication that someone would delete their history due to fear of copyright infringement is completely stupid.
But OpenAI are desperately trying to spin it that the logs should not be allowed into evidence.
As a citizen of an EU country, I do not view trampling on my rights, directly violating my country's laws and reneging on published privacy policy (all of which OpenAI is being forced to in this case by keeping the data) to be "par for the course".
If OpenAI have already been violating your rights, by putting private information into the logs, then your beef is with them, not the courts for preserving data.
> Accordingly, OpenAI is NOW DIRECTED to preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court (in essence, the output log data that OpenAI has been destroying), whether such data might be deleted at a user’s request or because of “numerous privacy laws and regulations” that might require OpenAI to do so.
https://storage.courtlistener.com/recap/gov.uscourts.nysd.64...
This is important. OpenAI was already deleting the data (as per their policies), now they can't.
As an insult to injury, the paragraph you quoted also gives a fat middle finger to GDPR.
a) GDPR has an exemption for court orders. Which this is. (All court orders are legitimate reason).
b) GDPR says you can't log private data. Which they were. (You can only log legitimate interests, contractual obligations, legal compliance, or consent.)
Again - this is logs, this is not your average user data. This is not their database. This is not the historical chats of their users. This is just... The... Access... Logs.
> Processing shall be lawful only if and to the extent that at least one of the following applies:
> ...
> processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller;
It's not going to stop the rise of LLMs. But one should expect it to cause a lot of very strange news in the next couple of years (lawful leaks [i.e. "discovery"], unlawful leaks, unintended leaks, etc.).
The Justice system (pretty much anywhere) is amenable to being incentivized. It looks like NYT has found the right judge (take that how you will).
Ordinary people expect stuff that they don't actively share with others to stay private. Rightly so! It's the ad industry that got it wrong, not the People.
Having worked at one of those companies (and having quit that job being disillusioned by a lot of things), there is still so much mainstream misinformation about this. Yes data is often used for tracking and training. In aggregate form. Sensitive data is anonymized/de-id-ed. The leading research on these techniques are also coming out from these companies btw.
There are layers and layers of policy and permission safeguards before you're allowed to access user data directly as an engineer. And if/when someone tries to exploit the legitimate pathways to touch user data (say customer support), they get promptly fired.
But it's much easier to believe that FAANG is some great monolithic evil, out to surveil you personally for some vague benefit that never gets specified. All the legitimate concrete monetary benefits (e.g. tracking for ad targeting work and training ML models) can be had just as well with aggregate data, but privacy FUD doesn't want to listen to that.
Meanwhile stupid legislation and the ability of courts and law-enforcement to subpoena any data they want whenever they want keeps data on their servers longer than they'd want to. Yet people will prefer to blame the "Evil Tech Cartel" instead of multiple branches of their government wanting to read their texts and GPS logs.
There aren't that many possibilities on how geolocation data vendors get access to high-precision location data of millions of people. A publicly traded company that generates revenue from targeted ads can never be fully trusted to behave. A social network that optimizes for time spent looking at ads will never really care about its users well-being. Algorithmic feeds are responsible for a widening social divide and loneliness. Highly detailed behavioral analysis can hurt people even when aggregated, for example when they get less favorable insurance terms based on their spending habits. Data that can be used to increase revenue will not be left untouched just to keep moral higher ground. Sensitive information shared with an LLM that end up in training data today might have dangerous consequences tomorrow, there is no way to know yet.
This isn't even about proper handling of individual pieces of data, but the higher-order effects of handing control over both the world's information and the attention of its inhabitations to shareholder-backed mega-corporations. There are perverse incentives at play here, and anyone engaging in this game carries responsibility for the outcome.
In a world where cellphones have all sorts of radio antennas on at all times, there are more ways than you'd think.
> A publicly traded company that generates revenue from targeted ads can never be fully trusted to behave. A social network that optimizes for time spent looking at ads will never really care about its users well-being. Algorithmic feeds are responsible for a widening social divide and loneliness.
I'm really not interested in debating dogmatic philosophy about how cynical one should be in the world. The entire point of my comment was that cynicism induces FUD that's not necessarily backed by direct evidence. One can come up with all sorts of different theories to explain what's happening in the world. Just because they sound somewhat consistent on the surface, doesn't mean they're true. That's just inverted inference.
I do agree with you that there are bad incentives in play here, but if we don't want them to be exploited and actually care about privacy, we should convince our effing legislators to plug the loopholes and enshrine online privacy in actual law. Instead of companies being able to write whatever they want in their Terms of Service. And then create mechanisms to enforce said legislation. Instead of moralizing actions of a company as some sort of monolithic (un)-ethical entity.
I think humanizing and moralizing the actions of large companies is a gigantic waste of time. Not only it accomplishes nothing, it gives us (the affected party) a distraction from focusing our efforts on the representatives that we elected who aren't doing their job. Maybe it's representative of where we feel we can make change
And whilst you say there's so much protection... We have countless examples of where it's been done. [1]
The only real way to be safe with data is... To not have it in the first place. (Which, bonus, often means governments can't compel you to keep it.)
[0] https://digitalcommons.law.uw.edu/wjlta/vol5/iss1/3/
[1] https://www.forbes.com/sites/simonchandler/2019/09/04/resear...
The point I was making wasn't that De-id is a solved problem, or that your data is "safe" with FAANG companies. The point was more about the malice that's attributed to them as a blanket measure, in comments such as these:
> (especially all the FAANG people that curiously always stay silent in threads like this one), has worked very hard to make everyone believe that online privacy is a thing, while working even harder to undermine that at every possible step.
There are many people and execs at these companies who are unscrupulous. But there are also many parts of them that are trying to work on doing things the "somewhat right" way when handling user data.
De-id and anonymization is a hard problem. But there's a lot of concrete evidence for me that many people in the FAANG world are at least trying to make progress on it (sinking billions of dollars of eng and research resources on them), instead of blatantly making bag, which they totally could.
I absolutely believe that there are people at those companies, trying to rein in the corporate behemoth so it doesn't squash its own legs. However, evidence looks like they're... Losing that particular battle.
The corporations still haven't learnt to respect individuals - they're just resources. [4]
Until a corporation acknowledges that safety comes with... Simply not spying on everyone... The risk in trusting them isn't going to be one that people want to take. Yes. These are hard problems. So don't make them a problem you have to face.
[0] https://www.cnbc.com/2018/04/05/facebook-building-8-explored...
[1] https://www.cnbc.com/2018/03/21/facebook-cambridge-analytica...
[2] https://firewalltimes.com/tiktok-data-breach-timeline/
[3] https://www.drive.com.au/news/tesla-shared-private-camera-re...
[4] https://www.theverge.com/meta/694685/meta-ai-camera-roll
>It's not going to stop the rise of LLMs.
Disney might, though.
I think few want to "stop the rise of LLM's", though. I personally just want the 3 C's the be followed: credit, consent, compensation. If it costs a billion dollars to compensate all willing parties to train on their data: good. If someone doesn't want their data trained on it no matter how big the paycheck: also good.
I don't know why that's such a hot take (well. that's rhetorical. I've had many a comment here unironically wanting to end copyright as a concept). That's how every other media company has had to do things.
It's not the public reading the information I'd concerned about, it's every data-hungry corporation that manages to file a lawsuit.
The courts put a lot of trust in lawyers: they'll redact the sensitive information from me and you, but take the view that lawyers are "officers of the court" and get to make copies of anything they convince the court to drag into evidence. But those officers of the court actually work for the same data-harvesting companies and have minimal oversight regarding what they share with them.
I mean, the site's name is Hacker News after all, even though so many of the "hackers" here are confessing their love for Intellectual Property and Copyright law, and everybody chanting the well-known slogan "Information wants to be proprietary!".
Allowing LLMs freedom to snap up everything, overwhelming hurts the smaller people who would have led to said competition. It further entrenches all the biggest players.
Given how blatantly "copyright" has been (and still is) abused by multibillion dollar corporations (with Disney being the most notorious) it's no surprise that there will be a counter-movement forming.
Complete abolishment is of course a pretty radical proposal but I think pretty much everyone here agrees that both the patent and copyright situation warrants a complete overhaul.
Until then all your "nuclear bomb sized" chats are effectively the same as the dinner bill for Sam courting Huang to get more of those GPUs.
I fully expect these chats from discovery to be leaked, token, or show up in the form of analytics in a NYY expose.
* not sure what the right word is
>I hope for everyone’s sake that OpenAI’s lawyers
>I hope for everyone’s sake that OpenAI
https://news.ycombinator.com/item?id=43628278
How much conversation data have you shared with OpenAI to date?
The current NYT is about as far away from that past as you can get. These days they would be writing column after column inciting the whistleblower should be locked up for live as a domestic terrorist.
Given the choice between not logging chats or violating either EU or US law, it seems pretty clear what the vibe is in OpenAI and the valley these days. (no expert on GDPR as applicable to this order either, though)
Interesting. Does this imply that OpenAI needs to distinguish between users in the EU who absolutely have a right to have personal information deleted (like, really, actually deleted) and users in the US?
I don't think people trust OpenAI nor NYT. But if you did trust OpenAI with your sensitive data, NYT isn't going to be more nefarious with it that OpenAI already is.
I just cancelled my NYT subscription because of their actions, detailing the reason for doing so. It’s a very small action, but the best I can do right now.
aucisson_masque•4h ago
I guess people could switch to one of the many chatgpt competitor that isn't being forced to give away your personal chat.
Don't even know what the time is trying to achieve now, the cat is out of the bag with LLM. Even if a judge ruled that chatgpt and other must give royalties to the time for each request, what about the ones running locally on joe's computer or in countries that don't care about American justice at all.
bilekas•4h ago
Infact I see it being hard to defend for OpenAI to basically say "Well yes, its standard practice to hand over any potential evidence, but no we're not doing that".
As for the deleted data, I wonder if legally there are obligations NOT to delete data ?
Tadpole9181•36m ago
It is an absurd breach of user privacy at the scale of tens of millions of Americans that goes well beyond the reasonability for a civil copyright lawsuit.
This sets the precedence that if Gmail gets sued for privacy, now all our emails are leaked. Télécom companies? All of our text messages. Cloud image storage? Woops, gotta hand us every photo any American has taken with a Samsung phone! After all, it might have content that infringes on my copyright!
hotep99•4h ago
louthy•3h ago
Are they? Are you speculating or do you know something we don’t?
It seems that if the NYT want to know whether ChatGPT has been producing copyrighted material, that they own, verbatim, and also understand the scale of the abuse, then they would need to see the logs.
People shouldn’t be surprised, that a company that flouts the law (regardless of what they think of those laws) for its own financial gain, might end up with its collected data in a legal discovery. It’s evidence.
johnnyanmac•4h ago
You don't think that's a victory in and of itself for a business?
Also, you don't need to worry about drug users if you take out the dealer. the users will eventually dry up.
bux93•3h ago
Well, maybe it's business as usual now. A lot of things that were previously considered obvious overreach by corporates and goverment are now depicted as "business as usual" in the US.