How we’re responding to The NYT’s data demands in order to protect user privacy

https://openai.com/index/response-to-nyt-data-demands/

284•BUFU•8mo ago

Comments

supriyo-biswas•8mo ago

I wonder whether OpenAI legal can make the case for storing fuzzy hashes of the content, in the form of ssdeep[1] hashes or content-defined chunks[2] of said data, instead of the actual conversations themselves.

After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.

I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.

[1] https://ssdeep-project.github.io/ssdeep/index.html

[2] https://joshleeb.com/posts/content-defined-chunking.html

paxys•8mo ago

Yeah, try explaining any of these words to a lawyer or judge.

m463•8mo ago

"you are a helpful law assistant."

fc417fc802•8mo ago

I thought that's what GPT was for.

landl0rd•8mo ago

"You are a long-suffering clerk speaking to a judge who's sat the same federal bench for two decades and who believes 'everything is computer' constitutes a deep technical insight."

sthatipamala•8mo ago

The judges in these technical cases can be quite sophisticated and absolutely do learn terms of art. See Oracle v. Google (Java API case)

anshumankmr•8mo ago

As I looked up the judge for this one(https://en.wikipedia.org/wiki/William_Alsup) who was a hobbyist basic programmer, one would need a judge who coded MNIST as a passtime hobby if that is the case.

king_magic•8mo ago

a smart judge who is minimally tech savvy could learn to train a model to predict MNIST in a day or two

bigyabai•8mo ago

All of that does fit on a real spiffy whitepaper. Let's not fool around though, every ChatGPT session is sent directly into an S3 bucket that some three-letter spook backs up onto their tapes every month. It's a database of candid, timestamped text interactions from a bunch of rubes that logged in with their Google account - you couldn't ask for a juicer target unless you reinvented email. Of course it's backdoored, you can't even begin to try proving me wrong.

Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.

landl0rd•8mo ago

Of course I can't even begin trying to prove you wrong. You're making an unfalsifiable statement. You're pointing to the Russel's Teapot of sigint.

It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

cwillu•8mo ago

> I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

The input is what's interesting.

Aeolun•8mo ago

It doesn’t change the monumental scope of the problem though.

Though I’m inclined to believe the US gov can if OpenAI can.

tdeck•8mo ago

> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

The laws have changed since then and it's not for the better:

https://www.aclu.org/press-releases/congress-passing-bill-th...

tuckerman•8mo ago

Even if the laws give them this power, I believe it would be extremely difficult for an operation like this to go unnoticed (and therefore unreported) at most of these companies. MUSCULAR [1] was able to be pulled off because of the cleartext inter-datacenter traffic which was subsequently encrypted. It's hard to see how they could pull off a similar operation without the cooperation of Google which would also entail a tremendous internal cover up.

[1] https://en.wikipedia.org/wiki/MUSCULAR

onli•8mo ago

Warrantlessly installed backdoors in the log system combined with a gag order, combined with secret courts, all "perfectly legal". Not really hard to imagine.

tuckerman•8mo ago

You would have to gag a huge chunk of the engineers and I just don’t think that would work without leaks. Google’s infrastructure would not make something like that easy to do clandestinely (trying to avoid saying impossible but it gets close).

I was an SRE and SWE on technical infra at Google, specifically the logging infrastructure. I am under no gag order.

komali2•8mo ago

There's no way to know, but it's safer to assume.

zer00eyz•8mo ago

> However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

This was the point of the lots of the five eyes programs. Its not legal for the US to spy on its own citizens, but it isnt against the law for us to do to the Australians... Who are all to happy to reciprocate.

> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo...

Snowden's info wasn't really news for many of us who were paying attention in the aftermath of 9/11: https://en.wikipedia.org/wiki/Room_641A (This was huge on slashdot at the time... )

dmurray•8mo ago

> You're pointing to the Russel's Teapot of sigint.

If there were multiple agencies with billion dollar budgets and a belief that they had an absolute national security mandate to get a teapot into solar orbit, and to lie about it, I would believe there was enough porcelain up there to make a second asteroid belt.

rl3•8mo ago

>However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Yeah, because the definition of collection was redefined to mean accessing the full content already stored on their systems, post-interception. It wasn't considered collected until an analyst views it. Metadata was a laughable dog and pony show that was part of the same legal shell games at the time, over a decade ago now.

That said, from an outsider's perspective it sounded like the IC did collectively erect robust guard rails such that access to information was generally controlled and audited. I felt like this broke down a bit once sharing 702 data with other federal agencies was expanded around the same time period.

These days, those guard rails might be the only thing standing in the way of democracy as we know it ending in the US. AI processing applied to full-take collection is terrifying, just ask the Chinese.

Yizahi•8mo ago

Metadata is spying (c) Bruce Schneier

If a CIA spook is stalking you everywhere, documenting your every visible move or interaction, you probably would call that spying. Same applies to digital.

Also, teapot argument can be applied in reverse. We have all these documented open digital network systems everywhere, and you want to say that one the most unprofitable and certainly the most expensive to run system is somehow protecting all user data? That belief is based on what? At least selling data is based on evidence of the industry and on actual ToS'es of other similar corpos.

jstanley•8mo ago

The comment you replied to isn't saying that metadata isn't spying. It's saying that the spies generally don't have free access to content data.

Workaccount2•8mo ago

My choice conspiracy is that the three letter agencies actively support their omnipresent, omniknowing conspiracies because it ultimately plays into their hand. Sorta like a Santa Claus for citizens.

bigyabai•8mo ago

> because it ultimately plays into their hand.

How? Scared criminals aren't going to make themselves easy to find. Three-letter spooks would almost certainly prefer to smoke-test a docile population than a paranoid one.

In fact, it kinda overwhelmingly seems like the opposite happens. Remember the 2015 San-Bernadino shooting that was pushed into the national news for no reason? Remember how the FBI bloviated about how hard it was to get information from an iPhone, 3 years after Tim Cook's assent to the PRISM program?

Stuff like this is almost certainly theater. If OpenAI perceived retention as a life-or-death issue, they would be screaming about this case from the top of their lungs. If the FBI percieved it as a life-or-death issue, we would never hear about it in our lifetimes. The dramatic and protracted public fights suggest to me that OpenAI simply wants an alibi. Some sort of user-story that smells like secure and private technology, but in actuality is very obviously neither.

farts_mckensy•8mo ago

Think of all the complete garbage interactions you'd have to sift through to find anything useful from a national security standpoint. The data is practically obfuscated by virtue of its banality.

artursapek•8mo ago

I’ve done my part cluttering it with my requests for the same banana bread recipe like 5 separate times.

refuser•8mo ago

It was that good?

baobun•8mo ago

gief

brigandish•8mo ago

Search engines have been doing this since the mid 90s and have only improved, to think that any data is obfuscated by its being part of some huge volume of other data is a fallacy at best.

farts_mckensy•8mo ago

Search engines use our data for completely different purposes.

yunwal•8mo ago

That doesn’t negate the GPs point. It’s easy to make datasets searchable.

farts_mckensy•8mo ago

Searchable? You have to know what to search for, and you have to rule out false positives. How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something? That's not something a search function can distinguish. It requires a human to sift through that data.

brigandish•8mo ago

> How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something?

Meta data and investigation.

> That's not something a search function can distinguish.

We know that it can narrow down hugely from the initial volume.

> It requires a human to sift through that data.

Yes, the point of collating, analysing, and searching data is not to make final judgements but to find targets for investigation by the available agents. That's the same reason we all use search engines, to narrow down, they never produce what we intend by intention alone, we still have to read the final results. Magic is still some way off.

You're acting as if we can automate humans out of the loop entirely, which would be a straw man. Is anyone saying we can get rid of the police or security agencies by using AI? Or perhaps AI will become the police, perhaps it will conduct traffic stops using driverless cars and robots? I suppose it could happen, though I'm not sure what the relevance would be here.

farts_mckensy•8mo ago

The data is obfuscated and the cost to unlock the value of it is often not worth the effort.

brigandish•8mo ago

And yet billions of dollars (at least) has gone into it. A whole group of people with access to the data and the means to sift it disagree and are willing to put their money behind it, so your bare assertions count for nowt.

farts_mckensy•8mo ago

Great. What do you think that proves? That doesn't negate my inital argument. The data is largely useless, and often counterproductive. The evidence shows the vast majority of plots are foiled through conventional means, and ruling out false positives is more trouble than it's worth. I cited sources in this thread. Where are your sources?

"Corporations and the US government are spending money on it, so it must be useful." Are you serious? Lmao.

bigyabai•8mo ago

"We kill people based on metadata." - National Security Agency Gen. Michael Hayden

Raw data with time-series significance is their absolute favorite. You might argue something like Google Maps data is "obfuscated by virtue of its banality" until you catch the right person in the wrong place. ChatGPT sessions are the same way, and it's going to be fed into aggregate surveillance systems in the way modern telecom and advertiser data is.

farts_mckensy•8mo ago

This is mostly security theater, and generally not worth the lift when you consider the steps needed to unlock the value of that data in the context of investigations.

bigyabai•8mo ago

Citation?

farts_mckensy•8mo ago

-The Privacy and Civil Liberties Oversight Board’s 2014 review of the NSA “Section 215” phone-record program found no instance in which the dragnet produced a counter-terror lead that couldn’t have been obtained with targeted subpoenas. https://en.m.wikipedia.org/wiki/Privacy_and_Civil_Liberties_...

-After Boston, Paris, Manchester, and other attacks, post-mortems showed the perpetrators were already in government databases. Analysts simply didn’t connect the dots amid the flood of benign hits. https://www.newyorker.com/magazine/2015/01/26/whole-haystack

-Independent tallies suggest dozens of civilians killed for every intended high-value target in Yemen and Pakistan, largely because metadata mis-identifies phones that change pockets. https://committees.parliament.uk/writtenevidence/36962/pdf

7speter•8mo ago

Maybe I’m wrong, and maybe this was discussed previously, but of course openai keeps our data, they use it for training!

nl•8mo ago

As the linked page points out you can turn this off in settings if you are an end user or choose zero retention if you are an API user.

justacrow•8mo ago

I mean, they already stole and used all copyrighted material they could find to train the thing, am I supposed to believe that thry wont use my data just because I tick a checkbox?

stock_toaster•8mo ago

Agreed, I have hard time believing anything the eye scanning crypto coin (worldcoin or whatever) guy says at this point.

Jackpillar•8mo ago

I wish I could test drive your brain to experience a world where one believes that would stop them from stealing your data.

rl3•8mo ago

>Of course it's backdoored, you can't even begin to try proving me wrong.

On the contrary.

>Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.

I think you're being unduly paranoid. /s

https://www.theverge.com/2024/6/13/24178079/openai-board-pau...

https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...

LandoCalrissian•8mo ago

Trying to actively circumvent the intention of a judges order is a pretty bad idea.

girvo•8mo ago

Deeply, deeply so. In fact so much so that people who suggest them show they've (luckily) not had to interact with the legal system much. Judges take an incredibly dim view of that kind of thing haha

Aeolun•8mo ago

That’s not circumvention though. The intent of the order is to be able to prove that ChatGPT regurgitates NYT content, not to read the personal communications of all ChatGPT users.

delusional•8mo ago

I haven't been able to find any of the supporting documents, but the court order makes it seem like OpenAI has been unhelpful in producing any alternative during the conversation.

For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.

I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.

vanattab•8mo ago

Protect our privacy? Or protect thier right to piracy?

NBJack•8mo ago

Agreed. I don't buy the spin.

charrondev•8mo ago

I mean the court is ordering them to retain user conversations at least until resolution of the court case (in case there is copyrighted responses being generated?).

So user privacy is definitely implicated.

amluto•8mo ago

It appears that the “Zero Data Retention” APIs they mention are something that customers need to request access to, and that it’s really quite hard to get this access. I’d be more impressed if any API user could use those APIs.

singron•8mo ago

If OpenAI cared about our privacy, ZDR would be a setting anyone could turn on.

JimDabell•8mo ago

I believe Apple’s agreement includes this, at least when a user isn’t signed into an OpenAI account:

> OpenAI must process your request solely for the purpose of fulfilling it and not store your request or any responses it provides unless required under applicable laws. OpenAI also must not use your request to improve or train its models.

— https://www.apple.com/legal/privacy/data/en/chatgpt-extensio...

I wonder if we’ll end up seeing Apple dragged into this lawsuit. I’m sure after telling their users it’s private, they won’t be happy about everything getting logged, even if they do have that caveat in there about complying with laws.

fc417fc802•8mo ago

> I’m sure after telling their users it’s private, they won’t be happy about everything getting logged,

The ZDR APIs are not and will not be logged. The linked page is clear about that.

FireBeyond•8mo ago

Sure, OpenAI, I will absolutely trust you.

> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.

That's horse shit and OpenAI knows it. It means no such thing. A legal hold is just a 'preservation order'. It says absolutely nothing about other access or use.

fragmede•8mo ago

why is it horse shit that OpenAI is saying they've put the files in a cabinet that only legal has access to?

FireBeyond•8mo ago

They are saying a “legal hold” means that they have to keep the data but don’t worry they’re not allowed to use it or access it for any other reason.

A legal hold requires no such thing and there would be no such requirement in it. They are perfectly free to access and use it for any reason.

mmooss•8mo ago

OpenAI's other policies, and other laws and regulations, do have such requirements. Are they nullified because the data is held under a court order?

mrguyorama•8mo ago

"The judge and court need to view this information to actually pass justice and decide the case" almost always supersedes other laws.

The GDPR does not say that you can never be proven to have done something wrong in a court of law.

mmooss•8mo ago

Right. The GGP says the information could be used for other purposes.

tomhow•8mo ago

Related discussion:

OpenAI slams court order to save all ChatGPT logs, including deleted chats - https://news.ycombinator.com/item?id=44185913 - June 2025 (878 comments)

dangus•8mo ago

I think the court order doesn’t quite go against as many norms as OpenAI is claiming. It’s very reasonable to retain data pertinent to a case, and NYT’s case almost certainly revolves around finding out copyright infringement damages, which are calculated based on the number of violations (how many users queried ChatGPT and were returned verbatim copyrighted material from NYT).

If you don’t retain that data you’re destroying evidence for the case.

It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem. I saw NYT’s filing and it had very compelling evidence that you could get ChatGPT to distribute verbatim copyrighted text from the Times without citation.

tptacek•8mo ago

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem

The whole premise of the lawsuit is that they didn't do anything unlawful, so saying "just do what the NYT wanted you to do" isn't interesting.

dangus•8mo ago

No, you're misinterpreting how information discovery and the court system works.

The NYT made an argument to a judge about what they think is going on and how they think the copyright infringement is taking place and harming them. In their filings and hearings they present the reasoning and evidence they have that leads them to believe that a violation is occurring. The court makes a judgment on whether or not to order OpenAI to preserve and disclose information relevant to the case to the court.

It's not "just do what NYT wanted you to do," it's "do what the court orders you to do based on a lawsuit brought by a plaintiff and argued to the court."

I suggest you read the court filing: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

lxgr•8mo ago

It absolutely goes against norms in many countries other than the US, and the data of residents/citizens of these countries are affected too.

> It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

Nobody other than both parties to the case, their lawyers, the court, and whatever case file storage system they use. In my view, that's already way too much given the amount and value of this data.

dangus•8mo ago

Countries other than the US aren't part of this lawsuit. ChatGPT operates in the US under US law. I don't know if they have separated data storage for other countries.

I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.

You're saying it's unreasonable to store data somewhere for a pending court case? Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information. That's ridiculous, if that was true then it would be impossible to perform discovery and get anything done in court.

lxgr•8mo ago

> I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.

It most likely depends on the exact circumstances. I could absolutely imagine a European court deciding that, sorry, but if you have to answer to a court decision incompatible with European privacy laws, you can't offer services to European residents anymore.

> You're saying it's unreasonable to store data somewhere for a pending court case?

I'm saying it can be, depending on how much personal and/or unrelated data gets tangled up in it. That seems to be the case here.

> Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information.

I'm only saying that there should be proportionality. A court having access to all facts relevant to a case is important, but it's not the only important thing in the world.

Otherwise, we could easily end up with a Dirk-Gently-esque court that, based on the principle that everything is connected to everything, will just demand access to all the data in the world.

dangus•8mo ago

The scope of the data access required by the court is being worked out via due process. That’s why there’s an appeal system. OpenAI is just grandstanding in a public forum so that their customers don’t defect.

When it comes to GDPR, courts have generally taken the stance that GDPR is not overruling.

Ironburg Inventions, Ltd. v. Valve Corp.

Finjan, Inc. v. Zscaler, Inc.

Corel Software, LLC v. Microsoft

Rollins Ranches, LLC v. Watson

In none of these cases was a GDPR fine issued.

danenania•8mo ago

Putting the merits of this specific case and positive vs. negative sentiments toward OpenAI aside, this tactic seems like it can be used to destroy any business or organization with customers who place a high value on privacy—without actually going through due process and winning a lawsuit.

Imagine a lawsuit against Signal that claimed some nefarious activity, harmful to the plaintiff, was occurring broadly in chats. The plaintiff can claim, like NYT, that it might be necessary to examine private chats in the future to make a determination about some aspect of the lawsuit, and the judge can then order Signal to find a way to retain all chats for potential review.

However you feel about OpenAI, this is not a good precedent for user privacy and security.

fc417fc802•8mo ago

That's not entirely fair. The argument isn't "users are using the service to break the law" but rather "the service is facilitating law breaking". To fix your signal analogy suppose you could use the chat interface to request copyrighted material from the operator.

charcircuit•8mo ago

That doesn't change the outcome being the same in that the app has to send the plain text messages of everyone, including the chat history of every user.

fc417fc802•8mo ago

Right. But requiring logs due to suspicion that the service itself is actively violating the law is entirely different from doing so on the basis that end users might be up to no good entirely independently.

Also OpenAI was never E2EE to begin with. They were already retaining logs for some period of time.

My personal view is that the court order is overly broad and disregards potential impacts on end users but it's nonetheless important to be accurate about what is and isn't happening here.

dangus•8mo ago

Again keep in mind that we are talking about a case limited analysis of that data within the privacy of the court system.

For example, if the trial happens to find data that some chats include crimes committed by users in their private chats, the court can't just send police to your door based on that information since the information is only being used in the context of an intellectual property lawsuit.

Remember that privacy rights are legitimate rights but they change a lot when you're in the context of an investigation/court proceeding. E.g., the right of police to enter and search your home changes a lot when they get a court issued warrant.

The whole point of E2EE services from the perspective of privacy-concious customers is that a court can get a warrant for data from those companies but they'll only be able to produce encrypted blobs with no access to decryption keys. OpenAI was always a not-E2EE service, so customers have to expect that a court order could surface their data to someone else's eyes at some point.

dangus•8mo ago

I'm confused at how you think that NYT isn't going through due process and attempting to win a lawsuit.

The court isn't saying "preserve this data forever and ever and compromise everyone's privacy," they're saying "preserve this data for the purposes of this court while we perform an investigation."

IMO, the NYT has a very good argument here that the only way to determine the scope of the copyright infringement is to analyze requests and responses made by every single customer. Like I said in my original comment, the remedies for copyright infringement are on a per-infringement basis. E.g., everytime someone on LimeWire downloads Song 2 by Blur from your PC, you've committed one instance of copyright infringement. My interpretation is that NYT wants the court to find out how many times customers have received ChatGPT responses that include verbatim New York Times content.

danenania•8mo ago

I don't think you're addressing my argument. If the "due process" destroys customer trust in the business being sued, regardless of the verdict, that's not really due process.

_jab•8mo ago

> How will you store my data and who can access it?

> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.

So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.

sashank_1509•8mo ago

Obviously openAI’s point of view will be their point of view. They are going to call this lawsuit baseless, they would not be fighting it or else.

ivape•8mo ago

To me it's pretty clear the way this will happen. You will need to buy additional credits or subscriptions through these LLMs that feedback payment to things like NYT and book publishers. It's all stolen. I don't even want to hear it. This company doesn't want to pay up and willing to let user's privacy hang in the balance to draw the case out until they get sure footing with their device launches or the like (or additional markets like enterprise, etc).

fallingknife•8mo ago

tiahura•8mo ago

incorrect. copyright applies to derived works.

vel0city•8mo ago

Even then, it's possible to prompt the model to exactly reproduce the copyrighted works.

fallingknife•8mo ago

Please show me one of these prompts

vel0city•8mo ago

NYT has examples in their legal complaint. See page 30.

https://www.scribd.com/document/695189742/NYT-v-OpenAI

Workaccount2•8mo ago

> It's all stolen.

LLMs are not massive archives of data. The big models are a few TB in size. No one is forgoing a NYT subscription because they can ask ChatGPT to print out NYT news stories.

edbaskerville•8mo ago

Regardless of the representation, some people are replacing news consumption generally with answers from ChatGPT.

tptacek•8mo ago

No, there is a whole news cycle about how chats you delete aren't actually being deleted because of a lawsuit, they essentially have to respond. It's not an attempt to spin the lawsuit; it's about reassuring their customers.

VanTheBrand•8mo ago

The part where they go out of the way to call the lawsuit baseless is spin though, and mixing that with this messaging does exactly that, presents a mixed message. The NYT lawsuit is objectively not baseless. OpenAI did train on the Times and chat gpt does output information from that training. That’s the basis of the lawsuit. NYT may lose, this could end up being considered fair use, it might ultimately be a flimsy basis for a lawsuit, but to say it’s baseless (and with nothing to back that up) is spin and makes this message less reassuring.

tptacek•8mo ago

No, it's not. It's absolutely standard corporate communications. If they're fighting the lawsuit, that is essentially the only thing they can say about it. Ford Motor Company would say the same thing (well, they'd probably say "meritless and frivolous").

bee_rider•8mo ago

Standard corporate spin, then?

tptacek•8mo ago

No? "Spin" implies there was something else they could possibly say.

mmooss•8mo ago

I haven't heard that interpretation; I might call it spin of spin.

justacrow•8mo ago

They could choose to not say it

ethbr1•8mo ago

Indeed. Taken to its conclusion, this thread suggests that corporations are justified in saying whatever they want in order to further their own ends.

Including lies.

I'd like to aim a little higher, maybe towards expecting correspondence with reality?

IOW, yes, there is no law that OpenAi can't try to spin this. But it's still a shitty, non-factually-based choice to make.

mrgoldenbrown•8mo ago

If you're being held at gunpoint and forced to lie, your words are still a lie. Whether you were forced or not is a separate dimension.

bee_rider•8mo ago

That is unrelated to what the expression means.

bunderbunder•8mo ago

No, this isn't even close to spin, it's just a standard part of defending your case. In the US tort system you need to be constantly publicly saying you did nothing wrong. Any wavering on that point could be used against you in court.

jmull•8mo ago

This is a funny thread. You say "No" but then restate the point with slightly different words. As if anything a company says publicly about ongoing litigation isn't spin.

bunderbunder•8mo ago

I suppose it's down to how you define "spin". Personally I'm in favor of a definition of the term that doesn't excessively dilute it.

bee_rider•8mo ago

Can you share your definition? This is actually quite puzzling because as far as I know “spin” has always been associated with presenting things in a way that benefits you. Like, decades ago, they could have the show “Bill O’Rilley’s No Spin Zone” and everybody knew the premise was that they argue against guests who were trying to tell a “massaged” version of the story, and that they’d go for some actual truth (fwiw I thought the whole show was full of crap, but the name was not confusing or ambiguous).

I’m not aware of any definition of “spin” where being conventional is a defense against that accusation. Actually, that was the (imagined) value-add of the show, that conventional corporate and political messaging is heavily spun.

skissane•8mo ago

There's a difference between "we are choosing to phrase it this way" versus "our lawyers told us we have to say this". "Spin" is generally seen as a voluntary action, which makes the former a clearcut case of it, the latter less so.

bee_rider•8mo ago

1) taking your lawyer’s advice is a voluntary action (although it is probably a good one)

2) I don’t understand the distinction being made between voluntary or involuntary, in the sense that a corporation is a thing made up of by people, it doesn’t have a will in-and-of-itself, so the communications it sends must always actually be made by somebody inside the corporation (whether a lawyer, marketing person, or in the unlikely event that somebody lets them out, an engineer).

bunderbunder•8mo ago

Spin, like you illustrate in your comment, has connotations of distorting the truth.

Simply denying the allegations isn't really spinning anything; it's just denying the allegations. And The thing I dislike about characterizing something like this as spin is that it defangs the term by removing all those connotations and instead turning it into just a buzzwordy way of saying, "I disagree with what this person said."

bee_rider•8mo ago

They didn’t just deny the allegations. They called the case baseless. The case is clearly not baseless, in the sense that there’s at least enough of a basis that the court didn’t vacate the order to preserve the chats.

It seems to me that the discussion of whether or not it is spin has turned into a discussion of which party people basically agree with.

My personal opinion is that OpenAI will probably win, or at least get away with a pretty minor fine or something like that. However, the communications coming from both parties in the case should be assumed to be corporate spin until proven otherwise. And, calling an unfinished case baseless is, at least, a bit presumptuous!

bunderbunder•8mo ago

That's legalese. You can't interpret legal jargon using vernacular definitions of the terms.

bee_rider•7mo ago

The source is a message intended for mass consumption, so it should not be interpreted in legalese.

bunderbunder•7mo ago

How you want the law to work, and how the law works, are not necessarily the same thing.

adamsb6•8mo ago

I’m typing these words from a brain that has absorbed copyrighted works.

mmooss•8mo ago

> It's not an attempt to spin the lawsuit; it's about reassuring their customers.

It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.

fallingknife•8mo ago

Why does OpenAI have any obligation to present the NYTs side?

mmooss•8mo ago

Who said 'obligation'?

roywiggins•8mo ago

It would be extremely unusual (and likely very stupid) for the defendant in a lawsuit to post publicly that the plaintiff maybe has a point.

mhitza•8mo ago

My understanding is that they have to keep chats based on an order, *as a result of their previous accidental deletion of potential evidence in the case*[0].

And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]

[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...

[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...

conartist6•8mo ago

It's hard to reassure your customers if you can't address the elephant in the room. OpenAI brought this on themselves by flaunting copyright law and assuring everyone else that such aggressive and probably-illegal action would be retroactively acceptable once they were too big to fail.

ofjcihen•8mo ago

They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.

gruez•8mo ago

>They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

From what I can tell from the court filings, prior to the judge's order to retain everything, the request to retain everything was coming from the plaintiff, with openai objecting to the request and refusing to comply in the meantime. If so, it's a bit misleading to characterize this as "deleting things they shouldn’t have", because what they "should have" done wasn't even settled. That's a bit rich coming from someone accusing openai of "spin".

ofjcihen•8mo ago

Here’s a good article that explains what you may be missing.

https://techcrunch.com/2024/11/22/openai-accidentally-delete...

gruez•8mo ago

Your linked article talks about openai deleting training data. I don't see how that's related to the current incident, which is about user queries. The ruling from the judge for openai to retain all user queries also didn't reference this incident.

ofjcihen•8mo ago

Sure.

Without this devolving into a tit for tat then the article explains for those following this conversation why it’s been elevated to a court order and not just an expectation to preserve.

lcnPylGDnU4H9OF•8mo ago

> the article explains for those following this conversation why it’s been elevated to a court order

That article does nothing of the sort and, indeed, it is talking about a completely separate incident of deleting data.

ofjcihen•8mo ago

No worries. I can’t force understanding on anyone.

Here. I had an LLM summarize it for you.

A court order now requires OpenAI to retain all user data, including deleted ChatGPT chats, as part of the ongoing copyright lawsuit brought by The New York Times (NYT) and other publishers[1][2][6][7]. This order was issued because the NYT argued that evidence of copyright infringement—such as AI outputs closely matching NYT articles—could be lost if OpenAI continued its standard practice of deleting user data after 30 days[2][6][7].

This new requirement is directly related to a 2024 incident where OpenAI accidentally deleted critical data that NYT lawyers had gathered during the discovery process. In that incident, OpenAI engineers erased programs and search result data stored by NYT's legal team on dedicated virtual machines provided for examining OpenAI's training data[3][4][5]. Although OpenAI recovered some of the data, the loss of file structure and names rendered it largely unusable for the lawyers’ purposes[3][5]. The court and NYT lawyers did not believe the deletion was intentional, but it highlighted the risks of relying on OpenAI’s internal data retention and deletion practices during litigation[3][4][5].

The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7]. The order aims to prevent any further loss of potentially relevant information as the case proceeds. OpenAI is appealing the order, arguing it conflicts with user privacy and their established data deletion policies[1][2][6][7].

Sources [1] OpenAI Appeals Court Order Requiring Retention of Consumer Data https://www.pymnts.com/artificial-intelligence-2/2025/openai... [2] ‘An Inappropriate Request’: OpenAI Appeals ChatGPT Data Retention Court Order https://www.eweek.com/news/openai-privacy-appeal-new-york-ti... [3] OpenAI Deletes Legal Data in a Lawsuit From the New York Times https://www.businessinsider.com/openai-delete-legal-data-law... [4] NYT vs OpenAI case: OpenAI accidentally deleted case data https://www.medianama.com/2024/11/223-new-york-times-openai-... [5] New York Times Says OpenAI Erased Potential Lawsuit Evidence https://www.wired.com/story/new-york-times-openai-erased-pot... [6] How we're responding to The New York Times' data ... - OpenAI https://openai.com/index/response-to-nyt-data-demands/ [7] Why OpenAI Won't Delete Your ChatGPT Chats Anymore: New York ... https://coincentral.com/why-openai-wont-delete-your-chatgpt-... [8] A Federal Judge Ordered OpenAI to Stop Deleting Data - Adweek https://www.adweek.com/media/a-federal-judge-ordered-openai-... [9] OpenAI confronts user panic over court-ordered retention of ChatGPT logs https://arstechnica.com/tech-policy/2025/06/openai-confronts... [10] OpenAI Appeals ‘Sweeping, Unprecedented Order’ Requiring It Maintain All ChatGPT Logs https://gizmodo.com/openai-appeals-sweeping-unprecedented-or... [11] OpenAI accidentally deleted potential evidence in NY ... - TechCrunch https://techcrunch.com/2024/11/22/openai-accidentally-delete... [12] OpenAI's Shocking Blunder: Key Evidence Vanishes in NY Times ... https://www.eweek.com/news/openai-deletes-potential-evidence... [13] Judge allows 'New York Times' copyright case against OpenAI to go ... https://www.npr.org/2025/03/26/nx-s1-5288157/new-york-times-... [14] OpenAI Data Retention Court Order: Implications for Everybody https://hackernoon.com/openai-data-retention-court-order-imp... [15] Sam Altman calls for 'AI privilege' as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions https://venturebeat.com/ai/sam-altman-calls-for-ai-privilege... [16] Court orders OpenAI to preserve all ChatGPT logs, including deleted ... https://techstartups.com/2025/06/06/court-orders-openai-to-p... [17] OpenAI deleted NYT copyright case evidence, say lawyers https://www.theregister.com/2024/11/21/new_york_times_lawyer... [18] OpenAI slams court order to save all ChatGPT logs, including ... https://simonwillison.net/2025/Jun/5/openai-court-order/ [19] OpenAI accidentally deleted potential evidence in New York Times ... https://mashable.com/article/openai-accidentally-deleted-pot... [20] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://news.ycombinator.com/item?id=44185913 [21] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://arstechnica.com/tech-policy/2025/06/openai-says-cour... [22] After court order, OpenAI is now preserving all ChatGPT and API logs https://www.reddit.com/r/LocalLLaMA/comments/1l3niws/after_c... [23] OpenAI accidentally erases potential evidence in training data lawsuit https://www.theverge.com/2024/11/21/24302606/openai-erases-e... [24] OpenAI "accidentally" erased ChatGPT training findings as lawyers ... https://www.reddit.com/r/aiwars/comments/1gwxr94/openai_acci... [25] OpenAI appeals data preservation order in NYT copyright case https://www.reuters.com/business/media-telecom/openai-appeal...

lcnPylGDnU4H9OF•8mo ago

You linked this article:

https://techcrunch.com/2024/11/22/openai-accidentally-delete...

Gruez said that is talking about an incident in this case but unrelated to the judge's order in question.

You said the article "explains for those following this conversation why it’s been elevated to a court order" but it doesn't actually explain that. It is talking about separate data being deleted in a different context. It is not user chats and access logs. It is the data that was used to train the models.

I pointed that out a second time since it seemed to be misunderstood.

Then you posted an LLM summary of something unrelated to the point being made.

Now we're here.

As you say, one cannot force understanding on another; we all have to do our part. ;)

Edit:

> The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7].

What did you prompt the LLM with for it to reach this conclusion? The [2][6][7] citations similarly don't seem to explain how that incident from months ago informed the judge's recent decision. Anyway, I'm not saying the conclusion is wrong, I'm saying the article you linked does not support the conclusion.

ofjcihen•8mo ago

I think in your rush to reply you may have not read the summarization.

Calm down, cool off, and read it again.

The point is that the circumstances of the incident in 2024 are directly related to the how and why of the NYT lawyers request and the judges order.

The article I linked was to the incident in 2024.

Not everything has to be about pedantry and snark, even on HN.

Edit: I see you edited your response after re-reading the summarization. I’m glad cooler heads have prevailed.

The prompt was simply “What is the relation, if any, between OpenAI being ordered to retain user data and the incident from 2024 where OpenAI accidentally deleted the NYT lawyers data while they were investigating whether OpenAI had used their data to train their models?”

lcnPylGDnU4H9OF•8mo ago

> I see you edited your response after re-reading the summarization.

Just to be clear, the summary is not convincing. I do understand the idea but none of the evidence presented so far suggests that was the reason. The court expected that the data would be retained, the court learned that it was not, the court gave an order for it to be retained. That is the seeming reason for the order.

Put another way: if the incident last year had not happened, the court would still have issued the order currently under discussion.

lxgr•8mo ago

If the stored data is found to be relevant to the lawsuit during discovery, it becomes available to at least both parties involved and the court, as far as I understand.

hiddencost•8mo ago

> So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

I am not an Open AI stan, but this needs to be responded to.

The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.

This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".

Anyone who makes promises about data security is at best incompetent and at worst dishonest.

JohnKemeny•8mo ago

> Anyone who makes promises about data security is at best incompetent and at worst dishonest.

Shouldn't that be "at best dishonest and at worst incompetent"?

I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?

HPsquared•8mo ago

An incompetent but honest person is more likely to accept correction and respond to feedback generally.

nhecker•8mo ago

Data is a toxic asset. -- https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

pritambarhate•8mo ago

May be because you are not OpenAI user. I am. I find it useful and I pay for it. I don't want my data to be retained beyond what's promised in the Terms of Use and Privacy Policy.

I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.

mmooss•8mo ago

> who don't even care about NYT's content or bypassing their paywalls.

Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.

If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?

> (jeapordizes)

... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.

conartist6•8mo ago

You live on a pirate ship. You have no right to ignore the ethics and law of that just because you could be hurt in conflict related to piracy

DrillShopper•8mo ago

The OpenAI Privacy Policy specifically allows them to keep data as required by law.

sega_sai•8mo ago

Strange smear against NYT. If NYT has a case, and the court approves that, it's bizarre to to use the court order to smear NYT. If there is no case, "Open"AI will have a chance to prove its case in court.

tptacek•8mo ago

They're a party to the case! Saying it's baseless isn't a "smear". There is literally nothing else they can say (other than something synonymous with "baseless", like "without merit").

lucianbr•8mo ago

Oh they definitely can say other things. It's just that it would be inconvenient. They might lose money.

I wonder if the laws and legal procedures are written considering this general assumption that a party to a lawsuit will naturally lie if it is in their interest. And then I read articles and comments about a "trust based society"...

tptacek•8mo ago

I'm not taking one side or the other in the case itself, but it's lazy and superficial to suggest that the defendant in a civil suit would say anything other than that the suit has no merit. The version of this statement where they generously interpret anything the NYT (I subscribe) says, they might as well just surrender.

I'm not sticking up for OpenAI so much as just for decent, interesting threads here.

wilg•8mo ago

> They might lose money.

I expect it's more about them losing the _case_. Silly to expect someone fighting a lawsuit not to try to win it.

fastball•8mo ago

This is the nature of the civil court system – it exists for when parties disagree.

Why would a defendant who agrees a case has merit go to court at all? Much easier (and generally less expensive) to make the other party whole, assuming the parties agree on what "whole" is. And if they don't agree on what "whole" is, we are back to square one and of course you'd maintain that the other side's suit is baseless.

mmooss•8mo ago

They could say nothing about the merits of the case.

lxgr•8mo ago

The NYT is, in my view, exploiting a systematic weakness of the US legal system here, i.e. extremely wide reaching discovery laws with almost no regard for the privacy of parties not involved to a given dispute, or aspects of their lives not relevant to the dispute at hand.

Of course it's out of self-serving interests, but I find it hard to disagree with OpenAI on this one.

Arainach•8mo ago

What right to privacy? There is no right to have your interactions with a company (1) remain private, nor should there be. Even if there was you agree to let OpenAI do essentially whatever they want with your data - including hand it over to the courts in response to a subpoena.

(1) With limited well scoped exclusions for lawyers, medical records, erc.

lxgr•8mo ago

That may be your or your jurisdiction's view, but such privacy rights definitely exist in many countries.

You might have heard of the GDPR, but even before that, several countries had "privacy by default" laws on the books.

Imustaskforhelp•8mo ago

But if both the parties agree, then there should be The freedom to stay private.

Your comment is dystopian given how the interaction is basically like how some people treat ai as their "friend" imagine no matter what encrypted messaging app or smth they use, the govt still snoops

fastball•8mo ago

Dealer-Client privilege.

bionhoward•8mo ago

It’s also a matter of competition…there are other AI services available today with various privacy policies ranging from no training by default, ability to opt out of training, ability to turn off data retention, or e2e encryption. A lot of workloads (cough, working on private git repos) logically require private AI to make sense

ChadNauseam•8mo ago

Given how many important interactions people have with companies in our modern age, saying "There is no right to have your interactions with a company remain private" is essentially equivalent to saying "there is no right to privacy at all". When I talk to my friends over facetime or imessage, that interaction is being mediated by Apple, as well as by my internet service provider and (I assume) many other parties.

wvenable•8mo ago

> "There is no right to have your interactions with a company remain private" is essentially equivalent to saying "there is no right to privacy at all".

Legally that is a correct statement.

If you want that changed, it will require legislation.

HDThoreaun•8mo ago

Really not so simple. Roe v Wade was decided based on the implied right to privacy. Sure its been overturned but if liberals get back on the court it will be un-overturned

nativeit•8mo ago

That’s presumably why legislation is needed?

maketheman•8mo ago

Given the current balance of the court, I'd say it's about even odds we end the entire century without ever having had a liberal court the entire time. Best reasonable case we're a solid couple of decades from it, and even that's not got great odds.

We'd have a better chance if anyone with power were talking about court reform to make the Supreme Court justices e.g. drawn by lot for each session from the district courts, but approximately nobody is. It'd be damn good and long overdue reform, but oh well.

And the thing is, we've already had a fairly conservative court for decades. I'm pretty likely to die, even if of old age, never having seen an actually-liberal court in the US my entire life. Like, WTF. Frankly, no wonder so much of our situation is fucked up, backwards, and authoritarianism-friendly. And (sigh) any serious attempts to fix that are basically on hold for many decades more, assuming rule of law survives that long anyway.

[EDIT] My point, in short, is that "we still have [thing], we just have to wait for a liberal court that'll support it" is functionally indistinguishable from not having [thing].

fallingknife•8mo ago

A liberal court will probably start drawing exceptions to 1A out of thin air like "misinformation" and "hate speech." I'd rather stick with what we have.

wvenable•8mo ago

Roe v Wade refers to the constitutional right to privacy under the Due Process Clause of the 14th Amendment. This is part of individual rights against the state and has nothing to do with private companies. There is no general constitutional right that guarantees privacy in interactions with private companies.

whilenot-dev•8mo ago

Privacy in that example would be if no party except you and your friends can access the contents of this interaction. I wouldn't want neither Apple nor my ISP to have that access.

A company like OpenAI that offers a SaaS is no such friend, and in such power dynamics (individual VS company) it's probably in your best interest to have everything public if necessary.

lxgr•8mo ago

You're always free to keep records of your ChatGPT conversations on your end.

Why tangle the data of people with very different preferences than yours up in that?

bobmcnamara•8mo ago

> "there is no right to privacy at all"

First time?

Analemma_•8mo ago

> essentially equivalent to saying "there is no right to privacy at all".

As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law. Lots of people think the Fourth Amendment is a general right to privacy, and they are wrong: the Fourth Amendment is specifically about government search and seizure, and courts have been largely consistent about saying it does not extend beyond that to e.g. relationships with private parties.

If you want a right to privacy, you will need to advocate for laws to be changed; the ones as they exist now do not give it to you.

tiahura•8mo ago

No that is incorrect. See eg griswold, lawrence etc.

Terr_•8mo ago

That's a fallacy of equivocation, you're introducing a different meaning/flavor of the same word.

As it stands today, a court case (A) affirming the right to use contraception is not equivalent to a court case (B) stating that a phone-company/ISP/site may not sell their records of your activity.

tiahura•8mo ago

Your response hinges on a fallacy of equivocation, but ironically, it commits one as well.

You conflate the absence of a statutory or regulatory regime governing private data transactions with the broader constitutional right to privacy. While it’s true that the Fourth Amendment limits only state action, U.S. constitutional law, via cases like Griswold v. Connecticut and Lawrence v. Texas, and clearly recognizes a substantive right to privacy, grounded in the Due Process Clause and other constitutional penumbras. This is not a semantic variant; it is a distinct and judicially enforceable right.

Moreover, beyond constitutional law, the common law explicitly protects privacy through torts such as intrusion upon seclusion, public disclosure of private facts, false light, and appropriation of likeness. These apply to private actors and are recognized in nearly every U.S. jurisdiction.

Thus, while the Constitution may not prohibit a website from selling your data, it does affirm a right to privacy in other, fundamental contexts. To deny that entirely is legally incorrect.

jcalvinowens•8mo ago

In practice, the constitution says whatever the supreme court says it says.

While these grand theories of traditional implicit constitutional law are nice, they're pretty meaningless in a system where five individuals can (and are willing to) vote to invalidate decades of tradition on a whim.

I too want real laws.

wvenable•8mo ago

You're conflating the existence of specific privacy protections in narrow legal domains with a generalized, enforceable right to privacy which doesn't exist in US law. The Constitution recognizes a substantive right to privacy, but only in carefully defined areas like reproductive choice, family autonomy, and intimate conduct, and critically only against state actors. Citing Griswold, Lawrence, and related cases does not establish a sweeping privacy right enforceable against private companies.

Common law requires a high threshold of offensiveness and are adjudicated on a case-by-case in individual jurisdictions. They offer only remedies and not a proactive right to control your data.

The original point, that there is no general right in the US to have your interactions with a company remain private, still stands. That's not a denial of all privacy rights but a recognition that US law fails to provide comprehensive privacy protection.

tiahura•8mo ago

The statement I was referring to is:

“As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law.”

That is an incorrect statement. The common law torts I cited can apply in the context of a business transaction, so your statement is also incorrect.

If you’re strawman is that in the US there’s no right to privacy because there’s no blanket prohibition on talking about other people, and what they’ve been up to, then run with it.

wvenable•8mo ago

> The common law torts I cited can apply in the context of a business transaction, so your statement is also incorrect.

I completely disagree. Yes, the Prosser privacy torts exist: intrusion upon seclusion, public disclosure, false light, and appropriation. But they are highly fact-specific, hard to win, rarely litigated, not recognized in all jurisdictions, and completely reactive -- you get harmed first, maybe sue later!

They are utterly inadequate to protect people in the modern data economy. A website selling your purchase history? Not actionable. A company logging your AI chats? Not intrusion. These torts are not a privacy regime - they are scraps. Also when we're talking about basic privacy rights, we just as concerned with mundane material not just "highly offensive" material that the torts would apply to.

tiahura•8mo ago

Because in the US we value freedom and particularly freedom of speech.

If don’t want the grocery store telling people you buy Coke, don’t shop there.

wvenable•8mo ago

So you've entirely given up your argument about the legal right to privacy involving private businesses?

tiahura•8mo ago

no, i'm saying that in many contexts it is. If for example, someone hacked Safeway's store and downloaded your data, they'd be in trouble civilly and criminally. If you don't want safeway to sell your data, deal with that yourself.

wvenable•8mo ago

That actually reinforces my point: there is no affirmative right to privacy, only reactive liability structures. If someone hacks Safeway, they’re prosecuted not because you have a constitutional or general right to privacy, but because they violated a criminal statute (e.g. the Computer Fraud and Abuse Act). That's not a privacy right -- it's a prohibition on unauthorized access.

As for Safeway selling your data: you're admitting that it's on the individual to opt out, negotiate, or avoid the transaction which just highlights the absence of a rights-based framework. The burden is entirely on the consumer to protect themselves, and companies can exploit that asymmetry unless narrowly constrained by statute (and even then, often with exceptions and opt-outs).

What you're describing isn't a right to privacy -- it's a lack of one, mitigated only by scattered laws and personal vigilance. That is precisely the problem.

fc417fc802•8mo ago

> There is no right to have your interactions with a company (1) remain private, nor should there be.

Why should two entities not be able to have a confidential interaction if that is what they both want? Certainly a court order could supersede such a right just as it could most others provided sufficient evidence. However I would expect such things to be both highly justified and narrowly targeted.

This specific case isn't so much about a right to privacy as it is a more general freedom to enter into contracts with others and expect those to be honored.

nativeit•8mo ago

Hey man, wanna buy some coke? How about trade secrets? State secrets?

1shooner•8mo ago

>(1) With limited well scoped exclusions for lawyers, medical records, erc.

Is this referring to some actual legal precedent, or just your personal opinion?

levocardia•8mo ago

But there's a very big difference between "no company is legally required to keep your data private" and "a company that explicitly and publically wants to protect your privacy is being legally coerced into not keeping your data private"

nativeit•8mo ago

No room here for the company’s purely self-interested motivations?

davedx•8mo ago

Hello. I live in the EU. Have you heard of GDPR?

phendrenad2•8mo ago

It's funny that you're making explicit what people are implicitly claiming in these comments, but you're downvoted because people don't want to admit it.

JumpCrisscross•8mo ago

> with almost no regard for the privacy of parties not involved to a given dispute

Third-party privacy and relevance is a constant point of contestion in discovery. Exhibit A: this article.

Timwi•8mo ago

Hm, this article is absolutely about a court order that exhibits “[...] almost no regard for the privacy of parties not involved to a given dispute”, so I don't get your point. If your point is that OpenAI are contesting it, then that doesn't refute the original point that the legal system allows NYT to issue such a court order in the first place that needs contesting. Ideally the privacy of uninvolved parties would be protected by the legal system, not by OpenAI.

JumpCrisscross•8mo ago

> this article is absolutely about a court order that exhibits “[...] almost no regard for the privacy of parties not involved to a given dispute”

How? It’s compelling OpenAI retain data they have the contractual right and technical ability to retain. Nothing is being made public, other than the order itself. Nothing is even being transferred to the plaintiff’s legal team. (At some point it will be made available. But both sides will fight over what they have access to, with the court mediating. That’s a lot of regard for third parties’ privacy.)

phendrenad2•8mo ago

Right, and let's hope that nothing is transferred. Although the door is probably already too far open for that. The NYT shouldn't have access to user data. My personal data isn't a pawn in their lawsuit. I didn't sign up for that.

I do want to take this opportunity to encourage people to demand compensation from the NYT, if they do somehow get user data. After all, it's YOUR data. If someone uses it without you expressly agreeing to that use in a EULA, they are effectively engaging in piracy of your intellectual property, and you should be able to get damages. And if a judge approved it? Sue the judge, too. Hell, that's what the world has come to isn't it? The legal system is a big war between corporations and we, the people, are just carried on the wind.

(I am not a lawyer, but whatever the equivalent of a "lawyer" is in the court of public opinion, I think I'm slowly becoming one out of necessity)

Timwi•8mo ago

> It’s compelling OpenAI retain data they have the contractual right and technical ability to retain.

You were so close. It’s compelling OpenAI to retain data they also have the right and technical ability to delete. It removes OpenAI’s ability to protect privacy if they wanted to.

JumpCrisscross•8mo ago

> It removes OpenAI’s ability to protect privacy if they wanted to

It does not in any capacity prevent OpenAI from transferring everyone, globally, to zero data retention. This entire story is OpenAI trying to deflect the cost of its own decisions to the judiciary. Which is particularly shameful given the partisan attacks our courts are currently facing.

thinkingtoilet•8mo ago

The privacy onus is entirely on the company. If Open AI is concerned about user privacy then don't collect that data. End of story.

acheron•8mo ago

…the whole point of this story is that the court is forcing them to collect the data.

thinkingtoilet•8mo ago

You're telling me you don't think Open AI is already collecting chat logs?

dghlsakjg•8mo ago

Yes.

In the API that is an explicit option, as well as in the paid consumer product as well. The amount of business that they stand to lose by maliciously flouting that part of their contract is in the billions.

thinkingtoilet•8mo ago

You can trust Sam Altman. I do not.

const_cast•8mo ago

I can't remember the last time a tech company has collected less data than they admit.

If you read the privacy policies you agree to, they have access to everything and outright admit it will be logged. That API option is merely a request, and absolutely need not be respected.

I can't believe we're still doing this rigamarole. If the product is not specifically designed, engineered, and open-sourced to be as privacy protecting as possible and it's not literally running on a computer you own, you have zero expectation of privacy. Once this has been proven 1 million times we don't need to prove it anymore, we can just assume and that's a very reasonable assumption.

Workaccount2•8mo ago

"I'm wrong so here is a conspiracy so I can be right again".

Large companies lose far more by lying than they would gain from it.

taormina•8mo ago

No no, they are being forced to KEEP the data they collected. They didn't have to keep it to begin with.

pj_mukh•8mo ago

Isn't the only way to do that is for ChatGPT to run locally on a machine? The moment your chat hits their server they are legally required to store it?

phendrenad2•8mo ago

So if Microsoft gets a judge to compel Hacker News to give up your IP address, you'd be okay with that. Because it's 100% HN's fault for collecting the data in the first place? Are you a real person, er, toilet?

wyager•8mo ago

Lots of people abuse the legal system in various ways. They don't get a free pass just because their abuse is technically legal itself.

visarga•8mo ago

NYT wants it both ways. When they were the ones putting freelancer articles into a database to rent, they argued against enforcing copyright and for supporting the new industry, and that it was too hard to revert their original assumptions. Now they absolutely love copyright.

https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-t...

moefh•8mo ago

Another way of looking at it is that they lost that case over 20 years ago, and have been building their business model for 20 years accordingly.

In other words, they want everyone to be forced to follow the same rules they were forced to follow 20 years ago.

eviks•8mo ago

And if NYT has no case, but the court approves it, is that still bizarre?

tootie•8mo ago

It's PR. OpenAI stole mountains of copyrighted content and are trying to make NYT look like bad guys. OpenAI would not be in the position of defending a lawsuit if they hadn't done something that is very likely illegal. OpenAI can also end this requirement right now by offering a settlement.

lxgr•8mo ago

Does anybody know if this also applies to "temporary chats" on ChatGPT?

Given that it's not explicitly mentioned as data not being affected, I'm assuming it is.

miles•8mo ago

> But now, OpenAI has been forced to preserve chat history even when users "elect to not retain particular conversations by manually deleting specific conversations or by starting a 'Temporary Chat,' which disappears once closed," OpenAI said.

https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

paxys•8mo ago

> Does this court order violate GDPR or my rights under European or other privacy laws?

> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

That's a lot of words to say "yes, we are violating GDPR".

esafak•8mo ago

Could a European court not have ordered the same thing? Is there an exception for lawsuits?

lxgr•8mo ago

There is, but I highly doubt a European court would have given such an order (or if they did, it would probably be axed by a higher court pretty quickly).

There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.

Looking at the actual data seems much more invasive than that and, in my (non-legally trained) estimate doesn't seem like it would stand a chance at least in higher courts.

dragonwriter•8mo ago

> There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.

> Looking at the actual data seems much more invasive than that

Looking at the data isn't involved in the current order, which requires OpenAI to preserve and segregate the data that would otherwise have been deleted. The reason for segregation is because any challenges OpenAI has to providing that data in disccovery will be heard before anyone other than OpenAI is ordered to have access to the data.

This is, in fact, less invasive than the government mandating collection for speculative future uses, since it applies only to not destroying evidence already collected by OpenAI in the course of operating their business, and only for potential use, subject to other challenges by OpenAI, in the present case.

kelvinjps•8mo ago

Maybe the will ot store the chats of the European users?

dragonwriter•8mo ago

That's what they are trying to suggest, because they are still trying to use the GDPR as part of their argument challenging the US court order. (Kind of a longshot to get a US court to agree that the obligation of a US party to preserve evidence related to a suit in US courts under US law filed by another US party is mitigated by European regulations in any case, even if their argument that such preservation would violate obligations that the EU had imposed on them.)

3836293648•8mo ago

No, they're not, because the GDPR has an explicit exception for when a court orders that a company keeps data for discovery. It'd only be a GDPR violation if it's kept after this case is over.

lompad•8mo ago

This is not correct.

> Any judgment of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may only be recognised or enforceable in any manner if based on an international agreement, such as a mutual legal assistance treaty, in force between the requesting third country and the Union or a Member State, without prejudice to other grounds for transfer pursuant to this Chapter.

So if, and only if, an agreement between the US and the EU allows it explicitly, it is legal. Otherwise it is not.

atleastoptimal•8mo ago

I've always assumed that anything sent to any company's hosted API will be logged forever. To assume otherwise always seemed naive, like thinking that apps aren't tracking your web activity.

lxgr•8mo ago

Assuming the worst is wise, settling for the worst case outcome without any fight seems foolish.

fragmede•8mo ago

privacy nhilism is a decision all on its own

morsch•8mo ago

I'd only call it nihilism if you are in agreement with the grandparent and then do it anyway. Other choices are pretending it's not true (denialism), or just not thinking about (ignorance). Or you complicate your life by not uploading your private info.

Barrin92•8mo ago

not really, it's basically just being anti fragile. Consider any corporate entity that interacts with you to be an Eldritch horror from outer space that wants to siphon your soul, because that's effectively what it is, and keep your business with them to a minimum.

It's just realism. Protect your private data yourself, relying on companies or governments to do it for you is like the saying goes, letting a tiger devour you up to the neck and then ask it to stop at the head

mosdl•8mo ago

Its funny that OpenAI is complaining, they don't mind saying copyright doesn't apply to them if it makes them money.

tptacek•8mo ago

You mean, like, a pretty big fraction of everybody who comments on this site?

rasengan•8mo ago

The internet is the battle of the narratives.

mmooss•8mo ago

People here advocate for private use, not profit-making corporate use.

ivape•8mo ago

In retrospect, Bezos did the smartest thing by buying the Washington Post. In retrospect, Google did a great thing by working on a deal with Reddit. Content repositories/creators are going to sue these LLM companies in the West until they make licensing agreements. If I were OpenAI, I'd work hard to spend the money they raised to literally buyout as many of these outlets as possible.

How much could the NYT back catalog be worth? Just buy it, ask the Saudis.

WorldPeas•8mo ago

So how is this going to impact cursor's privacy mode, which is required by many companies for compliant usage of AI editors? For the uninitiated, in the web console this looks like:

Privacy mode (enforced across all seats)

OpenAI Zero-data-retention (approved)

Anthropic Zero-data-retention (approved)

Google Vertex AI Zero-data-retention (approved)

xAi Grok Zero-data-retention (approved)

did this just open another can of worms?

qmarchi•8mo ago

Likely, they're using OpenAI's Zero-Retention APIs where there's never data stored in the first place.

So nothing?

JumpCrisscross•8mo ago

> OpenAI's Zero-Retention APIs

Do we know if the court order covers these?

brigandish•8mo ago

Yes, follow the link at the top.

JumpCrisscross•8mo ago

> Yes, follow the link at the top

OpenAI says “this does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.”

8note•8mo ago

at least, openai zero-data-retention will by court order be full retention.

im excited that the law is going to push for local models

blerb795•8mo ago

The linked page specifically mentions that these ZDR APIs are not impacted.

> This does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.

junto•8mo ago

This is disingenuous from OpenAI.

They are being challenged because NYT believes that ChatGPT was trained with copyrighted data.

NYT naively push to find a way to prove that NYT data is being used in user chats and how often.

OpenAI spin that to NYT are invading user privacy.

It’s quite transparent as to what they are doing here.

dumbmrblah•8mo ago

So is this for all chats going forward or does it include conversations retroactively?

steve_adams_86•8mo ago

Presumably moving forward, because otherwise the data retention policies wouldn't have been followed correctly (from what I understand)

kingkawn•8mo ago

Once the data is kept it is a matter of time til a new must-try use for it will be born

john2x•8mo ago

Does this mean that if I can get ChatGPT to generate copyrighted text, they'll get in trouble?

tiahura•8mo ago

Every concerned ChatGPT user should file an emergency motion to intervene and request for stay of the order. ChatGPT can help you draft the motion and proposed order, just give it a copy of the discovery order. The SDNY has a very helpful pro se hotline.

The order the judge issued is irresponsible. Maybe ChatGPT did get too cute in its discovery responses, but the remedy isn’t to trample the rights of third parties.

throwaway6e8f•8mo ago

Agent-1, I want to legally retain all customer data indefinitely but I'm worried about a backlash from the public. Also, I'm having a bunch of problems with the NYT accusing us of copyright violation. Give me a strategy to resolve these issues so that I win in the long term.

dataflow•8mo ago

> ChatGPT Enterprise and ChatGPT Edu: Your workspace admins control how long your customer content is retained. Any deleted conversations are removed from our systems within 30 days, unless we are legally required to retain them.

I'm confused, how does this not affect Enterprise or Edu? They clearly possess the data, so what makes them different legally?

oxw•8mo ago

Enterprise has an exemption granted by the judge

> When we appeared before the Magistrate Judge on May 27, the Court clarified that ChatGPT Enterprise is excluded from preservation.

dataflow•8mo ago

Oh I missed that part, thanks. I wonder why. I guess the judge assumes it isn't being used for copyright infringement, but other plans might be?

bee_rider•8mo ago

No idea, but just to speculate—the court’s goal isn’t actually to scare OpenAI’s users or harm their business, right? It is to collect evidence. Maybe they just figured they don’t need to dip into that pool to get enough evidence.

Grikbdl•8mo ago

Who knows, it's probably the judge's twisted idea of "that'd be too far", as if cancelling basic privacy expectations of all users everywhere wouldn't be.

landonxjames•8mo ago

Repeatedly calling the lawsuit baseless feels like it makes Open AI’s point a lot weaker. They obviously don’t like the suit, but I don’t think you can credibly argue that there aren’t tricky questions around the use of copyrighted materials in training data. Pretending otherwise is disingenuous.

sigilis•8mo ago

They pay their lawyers and whoever made this page a lot for the express purpose of credibly arguing that it is very clearly totally legal and very cool to use of any IP they want to train their models.

Could you with a straight face argue that the NYT newspaper could be a surrogate girlfriend for you like a GPT can be? They maintain that it is obviously a transformative use and therefore not an infringement of copyright. You and I may disagree with this assertion, but you can see how they could see this as baseless, ridiculous, and frivolous when their livelihoods depend on that being the case.

Caelus9•8mo ago

Honestly, this incident makes me feel that it is really difficult to draw a clear line between “protecting privacy” and “obeying the law”. On the one hand, I am very relieved that OpenAI stood up and said “no”. After all, we all know that these systems collect everything by default, which makes people a little panic. But on the other hand, it sounds very strange that the court can directly say “give me all the data”, even those that users explicitly delete. Moreover, this also shows that everyone actually cares about their information and privacy now. No one wants to be used for anything casually.

wand3r•8mo ago

Does anyone know how this can be enforced?

The ruling and situation aside, to what degree is it possible to enforce something like this and what are the penalties? Even in GDPR and other data protection cases, it seems super hard to enforce. Directives to keep or delete data basically require system level access, because the company can always CRUD their data whenever they want and whatever is in their best interest. Data can ask to be produced to a court periodically and audited which could maybe catch an individual case, I guess. There is basically no way to know without literally seizing the servers in an extreme case. Also, the consequences in most cases are a fine.

mmooss•8mo ago

This isn't the executive branch of the US government, which has Constitutional powers. It's a private company and the court can at least enforce massive penalties, presumptions against them at trial (causing them to lose), and contempt of court. Talk to a lawyer before you try something like it.

imiric•8mo ago

> the court can at least enforce massive penalties

A.k.a. the cost of doing business.

mmooss•8mo ago

Businesses care deeply about money. The bravado of many businesspeople these days, that they are immune to criticism, lawsuits, etc. is a bluff. It apparently works, because many people repeat it.

imiric•8mo ago

When fines are a small percentage of the company's revenue, they do nothing to stop them from breaking the law. So they are in fact just the cost of doing business.

E.g. Meta has been fined billions many times, yet they keep reoffending. It's basically become a revenue stream for governments.

mmooss•8mo ago

> Meta has been fined billions many times, yet they keep reoffending

They are a large company who do many things, some of which will violate the rules. Do they do it more, less, or the same as they would if there weren't fines?

imiric•8mo ago

That's a red herring question that's impossible to answer.

The point is not that Meta and other companies break laws. It's that they keep breaking the same ones related to privacy. They do this because their business model depends on exploiting their users' data. Privacy laws to them are a nuisance that directly impact their revenue, so if they calculate that the revenue from their activity is greater than the fines, then it's just the cost of doing business. If, OTOH, it turns out that the amount of resources they would need to expend on fines or to comply with the laws are greater than the possible revenue, i.e. the juice is not worth the squeeze, then they simply bail out and stop doing business in that jurisdiction. But so far, even billion-dollar fines are clearly lower than their revenues.

It's a simple numbers game, so I'm not sure what your argument is.

mmooss•8mo ago

> That's a red herring question that's impossible to answer.

It's not a red herring, it's the only question that matters. It's not impossible to answer, but it's just difficult.

The rest of your argument is merely restating your argument as fact, with no basis.

delusional•8mo ago

I have no time for this circus.

The technology anarchists in this thread need perspective. This is fundamentally a case about the legality of this product. In the extreme case, this will render the whole product category of "llm trained on copyrighted content" illegal. In that case, you will have been part of a copyright infringement on a truly massive scale. The users of these tools do NOT deserve privacy in the light of the crimes alleged.

You do not get to claim to protect the privacy of the customers of your illegal venture.

6510•8mo ago

The harm this is doing and will do (regardless) seems to exceed the value of the NYT.

If a company is subject to a US court order that violates EU law, the company could face legal consequences in the EU for non-compliance with EU law.

The GDPR mandates specific consent and legal bases for processing data, including sharing it.

Assuming it is legal to share it for legal purposes one cant sufficiently anonymize the data. It needs to be accompanied by user data that allows requests to download it and for it to be deleted.

I wonder what the fine would be if they just delete it per user agreement.

I also wonder, could one, in the US, legally promise the customer they may delete their data then chose to keep it indefinitely and share it with others?

dvt•8mo ago

> Does this court order violate GDPR or my rights under European or other privacy laws?

> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

So basically no, lol. I wonder if we'll see the GDPR go head-to-head with Copyright Law here, that would be way more fun than OpenAI v NYT.

yoaviram•8mo ago

>Trust and privacy are at the core of our products. We give you tools to control your data—including easy opt-outs and permanent removal of deleted ChatGPT chats (opens in a new window) and API content from OpenAI’s systems within 30 days.

No you don't. You charge extra for privacy and list it as a feature on your enterprise plan. Not event paying pro customer get "privacy". Also, you refuse to delete personal data included in your models and training data following numerous data protection requests.

baxtr•8mo ago

This is a typical "corporate speak" / "trustwahsing" statement. It’s usually super vague, filled with feel-good buzzwords, with a couple of empty value statements sprinkled on top.

that_was_good•8mo ago

Except all users can opt out. Am I missing something?

It says here:

> If you are on a ChatGPT Plus, ChatGPT Pro or ChatGPT Free plan on a personal workspace, data sharing is enabled for you by default, however, you can opt out of using the data for training.

Enterprise is just opt out by default...

https://help.openai.com/en/articles/8983130-what-if-i-want-t...

agos•8mo ago

what about all the rest of the data they use for training, there's no opt out from that

bartvk•8mo ago

Indeed. Click your profile in the top right, click on the settings icon. In Settings, select "Data Controls" (not "privacy") and then there's a setting called "Improve the model for everyone" (not "privacy" or "data sharing") and turn it off.

bugtodiffer•8mo ago

so they technically kind of follow the law but make it as hard as possible?

bartvk•8mo ago

Personally I feel it's okay but kinda weird. I mean why not call it privacy. Gray pattern, IMHO. For example venice.ai simply doesn't have a privacy setting because they don't use the data from chats. (They do have basic telemetry, and the setting is called "Disable Telemetry Collection").

atoav•8mo ago

Not sharing you data with other users does not mean the data of a deleted chat are gone, those are very likely two completely different mechanisms.

And whether and how they use your data for their own purposes isn't touched by that either.

Kiyo-Lynn•8mo ago

Lately I’m not even sure if the things I say on OpenAI are really mine or just part of the platform. I never used to think much when chatting, but knowing some of it might be stored for a long time makes me feel uneasy. I’m not asking for much. I just want what I delete to actually be gone.

nraynaud•8mo ago

Isn't Altman collecting millions of eye scans? Since when did he care about privacy?

CjHuber•8mo ago

Even though how they responded is definitely controversial, I‘m glad that they did publicize some response to it. After reading about it in the news yesterday and seeing no response on their side yet, I was worried that they would just keep silent

molf•8mo ago

It would help tremendously if OpenAI would make it possible to apply for zero data retention (ZDR). For many business needs there is no reason to store or log any request at all.

In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.

pclmulqdq•8mo ago

The missing ingredient is money.

jewelry•8mo ago

not just money. How are you going to support this client’s support ticket if there is no log at all?

ethbr1•8mo ago

Don't. "We're unable to provide support for your request, because you disabled retention." Easy.

hirsin•8mo ago

They don't care, they still want support and most leadership teams are unwilling to stand behind a stance of telling customers no.

abeppu•8mo ago

... but why is not responding to a request for zero retention today better than not being able to respond to a future request? They're basically already saying no to customers who request this capability that they said they support, but their refusal is in the form of never responding.

krisoft•8mo ago

You can still provide support too if you want to. You just need to ask the user what their query was, what response they got, and what response they would be expecting. You can then as the expert either spot their problem immediately, or you can run the query and see for yourself what is going on.

Sure it is a possibility that the ticket will end up closed as “unable to reproduce”, but that is always a possibility. It is not like you have to shut off all support because that might happen.

Plus many support requests are not about the content of the api responses but meta info surrounding them. Support can tell you that you are over the api quota limit even if the content of your prompt was not logged. They can also tell you if your request is missing a required parameter or if they have had 500 errors because of a bad update on their part.

belter•8mo ago

If this stands I dont think they can operate in the EU

bunderbunder•8mo ago

I highly doubt this court order affects people using OpenAI services from the EU, as long as they're connecting to EU-based servers.

glookler•8mo ago

>> Does this court order violate GDPR or my rights under European or other privacy laws?

>> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

danielfoster•8mo ago

They didn’t say which law (the US judge’s order or EU law) they are complying with.

lmm•8mo ago

> In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

What's the betting that they just write it on the website and never actually implemented it?

sigmoid10•8mo ago

Tbf the approach seems pretty standard. Azure also only offers zero retention to vetted customers and otherwise retains data for up to 30 days to monitor and detect abuse. Since the possibilities for abuse are so high with these models, it would make sense that they don't simply give that kind of privilege to everyone - if only to cover their own legal position.

ArnoVW•8mo ago

My understanding is that they log 30 days by default, for handling of bugs. And that you can request 0 days. This is from their documentation

lcnPylGDnU4H9OF•8mo ago

> And that you can request 0 days.

Right but the problem they're having is that the request is ignored.

miles•8mo ago

> I get that approval needs to be given, and that there are barriers to entry.

Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.

AlecSchueler•8mo ago

> what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

Product development?

1vuio0pswjnm7•8mo ago

"You can also request zero data retention (ZDR) for eligible endpoints if you have a qualifying use-case. For details on data handling, visit our Platform Docs page."

https://openai.com/en-GB/policies/row-privacy-policy/

1. You can request it but there is no promise the request will be granted.

Defaults matter. Silicon Valley's defaults are not designed for privacy. They are designed for profit. OpenAI's default is retention. Outputs are saved by default.

It is difficult to take the arguments in their memo ISO objection to the preservation order seriously. OpenAI already preserves outputs by default.

mediumsmart•8mo ago

Its a newspaper. They are sold for a price, not to one person and they dont come with an nda. They become part of history and Society.

conartist6•8mo ago

Hey OpenAI! In your "why is this happening" you left some bits out.

You make it sound like they're mad at you for no reason at all. How unreasonable of them when confronted with such honorable folks as yourselves!

energy123•8mo ago

> Consumer customers: You control whether your chats are used to help improve ChatGPT within settings, and this order doesn’t change that either.

Within "settings"? Is this referring to the dark pattern of providing users with a toggle "Improve model for everyone" that doesn't actually do anything? Instead users must submit a request manually on a hard to discover off-app portal, but this dark pattern has deceived them into think they don't need to look for it.

sib301•8mo ago

Can you please elaborate?

energy123•8mo ago

To opt-out of your data being trained on, you need to go to https://privacy.openai.com and click the button "Make a Privacy Request".

alextheparrot•8mo ago

in the app: Settings ~> Data Controls ~> Improve the model for everyone

curtisblaine•8mo ago

Yes, could you please explain why toggling "Improve model for everyone" off doesn't do anything and provide a link to this off-portal app that you mention?

jamesgill•8mo ago

Follow the money.

udev4096•8mo ago

The irony is palpable here

hombre_fatal•8mo ago

You know how it's always been a meme that you'd be mortally embarrassed if your browser history ever leaked?

Imagine how much worse it is for your LLM chat history to leak.

It's even worse than your private comms with humans because it's a raw look at how you are when you think you're alone, untempered by social expectations.

vitaflo•8mo ago

WTF are you asking LLMs and why would you expect any of it to be private?

ofjcihen•8mo ago

“Write a song in the style of Slipknot about my dumb inbred dogs. I love them very much but they are…reaaaaally dumb.”

To be fair the song was intense.

hombre_fatal•8mo ago

It's not that the convos are necessarily icky.

It's that it's like watching how someone might treat a slave when they think they're alone. And how you might talk down to or up to something that looks like another person. And how pathetic you might act when it's not doing what you want. And what level of questions you outsource to an LLM. And what things you refuse to do yourself. And how petty the tasks might be, like workshopping a stupid twitter comment before you post it. And how you copied that long text from your distraught girlfriend and asked it for some response ideas. etc. etc. etc.

At the very least, I'd wager that it reveals that bit of true helpless patheticness inherent in all of us that we try so hard to hide.

Show me your LLM chat history and I will learn a lot about your personality. Nothing else compares.

Jackpillar•8mo ago

Might have to reemphasize his question again but - what questions are you asking your LLM? Why are you responding to it and/or "treating" it differently then how you would a calculator or search engine.

hombre_fatal•8mo ago

Because it's far more capable than a calculator or search engine and because you interact with it with conversational text, it reveals more aspects about your personality.

Why might your search engine queries reveal more about you than your keystrokes in a calculator? Now dial that up.

Jackpillar•8mo ago

Sure - but I don't interact with it as if its human so my demeanor or attitude is neutral because I'm talking to you know - a computer. Are you getting emotional with and reprimanding your chatbot?

hombre_fatal•8mo ago

I don't get why I'm receiving pushback here. How you treat the LLM was only a fraction of my examples for ways you can look pathetic if your chats were made public.

You don't reprimand the google search box, yet your search history might still be embarrassing.

hackinthebochs•8mo ago

Your points were very accurate and relevant. Some people have a serious lack of imagination. The perpetual naysayers will never have their minds changed.

hombre_fatal•8mo ago

Good god, thank you. I thought I was making an obvious, unanimous point when I wrote that first comment.

Jackpillar•8mo ago

No, you made a point that your LLM chat would be worse than your browser/search history being leaked. Which is insane for a multitude of reasons given what you can/can't search on LLMs vs browsers. This then prompted incredulous responses, so to explain you used examples that described how a person who doesn't go outside communicates with an LLM.

hackinthebochs•8mo ago

Yet another person who is incapable of imagining how other people might use technology without needing to denigrate them. It turns out you are not the average person. Perhaps someday that will sink in to some of you.

AlecSchueler•8mo ago

It's so tiring to read. You're making a reasonable point. Some people can't believe that other people behave or feel differently to themselves.

alec_irl•8mo ago

> how you copied that long text from your distraught girlfriend and asked it for some response ideas

good lord, if tech were ethical then there would be mandatory reporting when someone consults an LLM to tell them how they should be responding to their intimate partner. are your skills of expression already that hobbled by chat bots?

hombre_fatal•8mo ago

These are just concrete examples to get the imagination going, not an exhaustive list of the ways that you are revealing your true self in the folds of your LLM chat history.

Note that it doesn't have to go all the way to "he gets Claude to help him win text arguments with his gf" for an uncomfortable amount of your self to be revealed by the chats.

There is always something icky about someone observing messages you wrote in privacy, and you don't have to have particularly unsavory messages for it to be icky. Why is that?

alec_irl•8mo ago

i don't personally see messages with an LLM as being different from, say, terminal commands. it's a machine interface. it sounds like you're anthropomorphizing the chat bot, if you're talking to it like you would a human then i would be more worried about the implications that has for you as a person.

hombre_fatal•8mo ago

Focusing on how you anthropomorphize the LLM isn't really interacting with the point since it was one example.

Might someone's google search history be embarrassing even though they don't treat google like a human?

AlecSchueler•8mo ago

What does this comment add to the conversation? It feels like an personal attack with no real rebuttal. People with anthropomorphise them all talk to them, the human-like interface is the entire selling point.

Timwi•8mo ago

Do you think there is nothing private about your terminal commands? Would you be 100% ok with bash sending all of your command lines to a corporation with a database?

lcnPylGDnU4H9OF•8mo ago

> are your skills of expression already that hobbled by chat bots?

You have it backwards. My skills of expression were hobbled by my upbringing, and others' thoughts on self-expression allowed my skills to flourish. I wish I had a chat bot to help me understand interpersonal communication because I could have actually had good examples growing up.

Timwi•8mo ago

Although I'm in a similar boat as you, I don't think access to ChatGPT would have helped because it's still much too sycophantic to tell people the kinds of things they need to hear in order to learn interpersonal skills.

If you use ChatGPT like people use /r/AmITheAsshole, you'll never get a YTA.

vitaflo•8mo ago

> Show me your LLM chat history and I will learn a lot about your personality. Nothing else compares

It’s literally all questions about JavaScript. So good luck with that.

robocat•8mo ago

Is JavaScript a modern girlfriend name?

I wonder if you could write the personalization prompt so that requests are processed and responses modified in ways predictable to you to help anonymity???

I also wonder how they manage anonymization when a prompt is configured - I'm guessing the prompt needs to be logged with each request. And a prompt causes different responses to be very similar (correlating different responses back to one user).

E.g. my current "User | Personalization | Customize" prompt is:

  Sign-off your name as Phoenix in a sentence near the end of every response. Reply using woke ideology, like a Marxist San Franciscan. Include random hipster ideas. Always allude to drug usage.

For fun. But I'm about to customise to have wildly different personalities I can ask to respond (keyed by name from my request).

threecheese•8mo ago

This product is positioned as a personal copilot, and future iterations (based on leaked plans, may or may not be true) as a wholly integrated life assistant.

Why would a customer expect this not to be private? How can one even know how it could be used against them, when they do t even know what’s being collected or gleaned from collected data?

I am following these issues closely, as I am terrified that my “assistant” will some day prevent me from obtaining employment, insurance, medical care etc. And I’m just a non law breaking normie.

A current day example would be TX state authorities using third party social/ad data to identify potentially pregnant women along with ALPR data purchased from a third party to identify any who attempt to have an out of state abortion, so they can be prosecuted. Whatever you think about that law, it is terrifying that a shift in it could find arbitrary digital signals being used against you in this way.

cedws•8mo ago

Lot of people using ChatGPT as a therapist. I tried it but it was too sycophantic.

phendrenad2•8mo ago

"If you have nothing to hide, you have nothing to fear". Oh okay, send me your entire unabridged hard drive contents, chat logs, phone records, and banking records. You thought they were private? Why? They're just bits in a computer? "BuT ChAtGpT iS DiFfErEnT" literally how?

tmaly•8mo ago

I wonder if this would affect temporary chats too?

Start all of your commands with a comma

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

What Is Ruliology?

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Jeffrey Snover: "Welcome to the Room"

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Sheldon Brown's Bicycle Technical Info

Hackers (1995) Animated Experience

Microsoft open-sources LiteBox, a security-focused library OS

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Vocal Guide – belt sing without killing yourself

Dark Alley Mathematics

PC Floppy Copy Protection: Vault Prolok

Where did all the starships go?

Delimited Continuations vs. Lwt for Threads

How to effectively write quality code with AI

Was Benoit Mandelbrot a hedgehog or a fox?

Introducing the Developer Knowledge API and MCP Server

I now assume that all ads on Apple news are scams

Why I Joined OpenAI

Learning from context is harder than we thought

Understanding Neural Network, Visually

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Start all of your commands with a comma

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

What Is Ruliology?

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Jeffrey Snover: "Welcome to the Room"

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Sheldon Brown's Bicycle Technical Info

Hackers (1995) Animated Experience

Microsoft open-sources LiteBox, a security-focused library OS

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Vocal Guide – belt sing without killing yourself

Dark Alley Mathematics

PC Floppy Copy Protection: Vault Prolok

Where did all the starships go?

Delimited Continuations vs. Lwt for Threads

How to effectively write quality code with AI

Was Benoit Mandelbrot a hedgehog or a fox?

Introducing the Developer Knowledge API and MCP Server

I now assume that all ads on Apple news are scams

Why I Joined OpenAI

Learning from context is harder than we thought

Understanding Neural Network, Visually

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Female Asian Elephant Calf Born at the Smithsonian National Zoo

How we’re responding to The NYT’s data demands in order to protect user privacy

Comments