How we’re responding to The NYT’s data demands in order to protect user privacy

https://openai.com/index/response-to-nyt-data-demands/

255•BUFU•17h ago

Comments

supriyo-biswas•14h ago

I wonder whether OpenAI legal can make the case for storing fuzzy hashes of the content, in the form of ssdeep[1] hashes or content-defined chunks[2] of said data, instead of the actual conversations themselves.

After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.

I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.

[1] https://ssdeep-project.github.io/ssdeep/index.html

[2] https://joshleeb.com/posts/content-defined-chunking.html

paxys•14h ago

Yeah, try explaining any of these words to a lawyer or judge.

m463•14h ago

"you are a helpful law assistant."

fc417fc802•14h ago

I thought that's what GPT was for.

landl0rd•13h ago

"You are a long-suffering clerk speaking to a judge who's sat the same federal bench for two decades and who believes 'everything is computer' constitutes a deep technical insight."

sthatipamala•13h ago

The judges in these technical cases can be quite sophisticated and absolutely do learn terms of art. See Oracle v. Google (Java API case)

anshumankmr•12h ago

As I looked up the judge for this one(https://en.wikipedia.org/wiki/William_Alsup) who was a hobbyist basic programmer, one would need a judge who coded MNIST as a passtime hobby if that is the case.

king_magic•7h ago

a smart judge who is minimally tech savvy could learn to train a model to predict MNIST in a day or two

bigyabai•14h ago

All of that does fit on a real spiffy whitepaper. Let's not fool around though, every ChatGPT session is sent directly into an S3 bucket that some three-letter spook backs up onto their tapes every month. It's a database of candid, timestamped text interactions from a bunch of rubes that logged in with their Google account - you couldn't ask for a juicer target unless you reinvented email. Of course it's backdoored, you can't even begin to try proving me wrong.

Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.

landl0rd•13h ago

Of course I can't even begin trying to prove you wrong. You're making an unfalsifiable statement. You're pointing to the Russel's Teapot of sigint.

It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

cwillu•13h ago

> I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

The input is what's interesting.

Aeolun•13h ago

It doesn’t change the monumental scope of the problem though.

Though I’m inclined to believe the US gov can if OpenAI can.

tdeck•13h ago

> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

The laws have changed since then and it's not for the better:

https://www.aclu.org/press-releases/congress-passing-bill-th...

tuckerman•13h ago

Even if the laws give them this power, I believe it would be extremely difficult for an operation like this to go unnoticed (and therefore unreported) at most of these companies. MUSCULAR [1] was able to be pulled off because of the cleartext inter-datacenter traffic which was subsequently encrypted. It's hard to see how they could pull off a similar operation without the cooperation of Google which would also entail a tremendous internal cover up.

[1] https://en.wikipedia.org/wiki/MUSCULAR

onli•7h ago

Warrantlessly installed backdoors in the log system combined with a gag order, combined with secret courts, all "perfectly legal". Not really hard to imagine.

tuckerman•3h ago

You would have to gag a huge chunk of the engineers and I just don’t think that would work without leaks. Google’s infrastructure would not make something like that easy to do clandestinely (trying to avoid saying impossible but it gets close).

I was an SRE and SWE on technical infra at Google, specifically the logging infrastructure. I am under no gag order.

komali2•13h ago

There's no way to know, but it's safer to assume.

zer00eyz•13h ago

> However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

This was the point of the lots of the five eyes programs. Its not legal for the US to spy on its own citizens, but it isnt against the law for us to do to the Australians... Who are all to happy to reciprocate.

> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo...

Snowden's info wasn't really news for many of us who were paying attention in the aftermath of 9/11: https://en.wikipedia.org/wiki/Room_641A (This was huge on slashdot at the time... )

dmurray•11h ago

> You're pointing to the Russel's Teapot of sigint.

If there were multiple agencies with billion dollar budgets and a belief that they had an absolute national security mandate to get a teapot into solar orbit, and to lie about it, I would believe there was enough porcelain up there to make a second asteroid belt.

rl3•10h ago

>However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Yeah, because the definition of collection was redefined to mean accessing the full content already stored on their systems, post-interception. It wasn't considered collected until an analyst views it. Metadata was a laughable dog and pony show that was part of the same legal shell games at the time, over a decade ago now.

That said, from an outsider's perspective it sounded like the IC did collectively erect robust guard rails such that access to information was generally controlled and audited. I felt like this broke down a bit once sharing 702 data with other federal agencies was expanded around the same time period.

These days, those guard rails might be the only thing standing in the way of democracy as we know it ending in the US. AI processing applied to full-take collection is terrifying, just ask the Chinese.

Yizahi•7h ago

Metadata is spying (c) Bruce Schneier

If a CIA spook is stalking you everywhere, documenting your every visible move or interaction, you probably would call that spying. Same applies to digital.

Also, teapot argument can be applied in reverse. We have all these documented open digital network systems everywhere, and you want to say that one the most unprofitable and certainly the most expensive to run system is somehow protecting all user data? That belief is based on what? At least selling data is based on evidence of the industry and on actual ToS'es of other similar corpos.

jstanley•7h ago

The comment you replied to isn't saying that metadata isn't spying. It's saying that the spies generally don't have free access to content data.

Workaccount2•3h ago

My choice conspiracy is that the three letter agencies actively support their omnipresent, omniknowing conspiracies because it ultimately plays into their hand. Sorta like a Santa Claus for citizens.

bigyabai•6m ago

> because it ultimately plays into their hand.

How? Scared criminals aren't going to make themselves easy to find. Three-letter spooks would almost certainly prefer to smoke-test a docile population than a paranoid one.

In fact, it kinda overwhelmingly seems like the opposite happens. Remember the 2015 San-Bernadino shooting that was pushed into the national news for no reason? Remember how the FBI bloviated about how hard it was to get information from an iPhone, 3 years after Tim Cook's assent to the PRISM program?

Stuff like this is almost certainly theater. If OpenAI perceived retention as a life-or-death issue, they would be screaming about this case from the top of their lungs. If the FBI percieved it as a life-or-death issue, we would never hear about it in our lifetimes. The dramatic and protracted public fights suggest to me that OpenAI simply wants an alibi. Some sort of user-story that smells like secure and private technology, but in actuality is very obviously neither.

farts_mckensy•13h ago

Think of all the complete garbage interactions you'd have to sift through to find anything useful from a national security standpoint. The data is practically obfuscated by virtue of its banality.

artursapek•13h ago

I’ve done my part cluttering it with my requests for the same banana bread recipe like 5 separate times.

refuser•13h ago

It was that good?

baobun•12h ago

gief

brigandish•13h ago

Search engines have been doing this since the mid 90s and have only improved, to think that any data is obfuscated by its being part of some huge volume of other data is a fallacy at best.

farts_mckensy•12h ago

Search engines use our data for completely different purposes.

yunwal•5h ago

That doesn’t negate the GPs point. It’s easy to make datasets searchable.

farts_mckensy•2h ago

Searchable? You have to know what to search for, and you have to rule out false positives. How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something? That's not something a search function can distinguish. It requires a human to sift through that data.

bigyabai•11h ago

"We kill people based on metadata." - National Security Agency Gen. Michael Hayden

Raw data with time-series significance is their absolute favorite. You might argue something like Google Maps data is "obfuscated by virtue of its banality" until you catch the right person in the wrong place. ChatGPT sessions are the same way, and it's going to be fed into aggregate surveillance systems in the way modern telecom and advertiser data is.

farts_mckensy•2h ago

This is mostly security theater, and generally not worth the lift when you consider the steps needed to unlock the value of that data in the context of investigations.

bigyabai•21m ago

Citation?

7speter•13h ago

Maybe I’m wrong, and maybe this was discussed previously, but of course openai keeps our data, they use it for training!

nl•12h ago

As the linked page points out you can turn this off in settings if you are an end user or choose zero retention if you are an API user.

justacrow•10h ago

I mean, they already stole and used all copyrighted material they could find to train the thing, am I supposed to believe that thry wont use my data just because I tick a checkbox?

stock_toaster•7h ago

Agreed, I have hard time believing anything the eye scanning crypto coin (worldcoin or whatever) guy says at this point.

Jackpillar•1m ago

I wish I could test drive your brain to experience a world where one believes that would stop them from stealing your data.

rl3•10h ago

>Of course it's backdoored, you can't even begin to try proving me wrong.

On the contrary.

>Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.

I think you're being unduly paranoid. /s

https://www.theverge.com/2024/6/13/24178079/openai-board-pau...

https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...

LandoCalrissian•13h ago

Trying to actively circumvent the intention of a judges order is a pretty bad idea.

girvo•13h ago

Deeply, deeply so. In fact so much so that people who suggest them show they've (luckily) not had to interact with the legal system much. Judges take an incredibly dim view of that kind of thing haha

Aeolun•13h ago

That’s not circumvention though. The intent of the order is to be able to prove that ChatGPT regurgitates NYT content, not to read the personal communications of all ChatGPT users.

delusional•12h ago

I haven't been able to find any of the supporting documents, but the court order makes it seem like OpenAI has been unhelpful in producing any alternative during the conversation.

For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.

I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.

vanattab•14h ago

Protect our privacy? Or protect thier right to piracy?

NBJack•14h ago

Agreed. I don't buy the spin.

charrondev•14h ago

I mean the court is ordering them to retain user conversations at least until resolution of the court case (in case there is copyrighted responses being generated?).

So user privacy is definitely implicated.

amluto•14h ago

It appears that the “Zero Data Retention” APIs they mention are something that customers need to request access to, and that it’s really quite hard to get this access. I’d be more impressed if any API user could use those APIs.

singron•14h ago

If OpenAI cared about our privacy, ZDR would be a setting anyone could turn on.

JimDabell•14h ago

I believe Apple’s agreement includes this, at least when a user isn’t signed into an OpenAI account:

> OpenAI must process your request solely for the purpose of fulfilling it and not store your request or any responses it provides unless required under applicable laws. OpenAI also must not use your request to improve or train its models.

— https://www.apple.com/legal/privacy/data/en/chatgpt-extensio...

I wonder if we’ll end up seeing Apple dragged into this lawsuit. I’m sure after telling their users it’s private, they won’t be happy about everything getting logged, even if they do have that caveat in there about complying with laws.

fc417fc802•14h ago

> I’m sure after telling their users it’s private, they won’t be happy about everything getting logged,

The ZDR APIs are not and will not be logged. The linked page is clear about that.

FireBeyond•14h ago

Sure, OpenAI, I will absolutely trust you.

> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.

That's horse shit and OpenAI knows it. It means no such thing. A legal hold is just a 'preservation order'. It says absolutely nothing about other access or use.

fragmede•14h ago

why is it horse shit that OpenAI is saying they've put the files in a cabinet that only legal has access to?

FireBeyond•2h ago

They are saying a “legal hold” means that they have to keep the data but don’t worry they’re not allowed to use it or access it for any other reason.

A legal hold requires no such thing and there would be no such requirement in it. They are perfectly free to access and use it for any reason.

mmooss•11h ago

OpenAI's other policies, and other laws and regulations, do have such requirements. Are they nullified because the data is held under a court order?

tomhow•14h ago

Related discussion:

OpenAI slams court order to save all ChatGPT logs, including deleted chats - https://news.ycombinator.com/item?id=44185913 - June 2025 (878 comments)

dangus•14h ago

I think the court order doesn’t quite go against as many norms as OpenAI is claiming. It’s very reasonable to retain data pertinent to a case, and NYT’s case almost certainly revolves around finding out copyright infringement damages, which are calculated based on the number of violations (how many users queried ChatGPT and were returned verbatim copyrighted material from NYT).

If you don’t retain that data you’re destroying evidence for the case.

It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem. I saw NYT’s filing and it had very compelling evidence that you could get ChatGPT to distribute verbatim copyrighted text from the Times without citation.

tptacek•14h ago

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem

The whole premise of the lawsuit is that they didn't do anything unlawful, so saying "just do what the NYT wanted you to do" isn't interesting.

dangus•4h ago

No, you're misinterpreting how information discovery and the court system works.

The NYT made an argument to a judge about what they think is going on and how they think the copyright infringement is taking place and harming them. In their filings and hearings they present the reasoning and evidence they have that leads them to believe that a violation is occurring. The court makes a judgment on whether or not to order OpenAI to preserve and disclose information relevant to the case to the court.

It's not "just do what NYT wanted you to do," it's "do what the court orders you to do based on a lawsuit brought by a plaintiff and argued to the court."

I suggest you read the court filing: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

lxgr•14h ago

It absolutely goes against norms in many countries other than the US, and the data of residents/citizens of these countries are affected too.

> It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

Nobody other than both parties to the case, their lawyers, the court, and whatever case file storage system they use. In my view, that's already way too much given the amount and value of this data.

dangus•4h ago

Countries other than the US aren't part of this lawsuit. ChatGPT operates in the US under US law. I don't know if they have separated data storage for other countries.

I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.

You're saying it's unreasonable to store data somewhere for a pending court case? Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information. That's ridiculous, if that was true then it would be impossible to perform discovery and get anything done in court.

lxgr•2h ago

> I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.

It most likely depends on the exact circumstances. I could absolutely imagine a European court deciding that, sorry, but if you have to answer to a court decision incompatible with European privacy laws, you can't offer services to European residents anymore.

> You're saying it's unreasonable to store data somewhere for a pending court case?

I'm saying it can be, depending on how much personal and/or unrelated data gets tangled up in it. That seems to be the case here.

> Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information.

I'm only saying that there should be proportionality. A court having access to all facts relevant to a case is important, but it's not the only important thing in the world.

Otherwise, we could easily end up with a Dirk-Gently-esque court that, based on the principle that everything is connected to everything, will just demand access to all the data in the world.

dangus•56m ago

The scope of the data access required by the court is being worked out via due process. That’s why there’s an appeal system. OpenAI is just grandstanding in a public forum so that their customers don’t defect.

When it comes to GDPR, courts have generally taken the stance that GDPR is not overruling.

Ironburg Inventions, Ltd. v. Valve Corp.

Finjan, Inc. v. Zscaler, Inc.

Corel Software, LLC v. Microsoft

Rollins Ranches, LLC v. Watson

In none of these cases was a GDPR fine issued.

danenania•14h ago

Putting the merits of this specific case and positive vs. negative sentiments toward OpenAI aside, this tactic seems like it can be used to destroy any business or organization with customers who place a high value on privacy—without actually going through due process and winning a lawsuit.

Imagine a lawsuit against Signal that claimed some nefarious activity, harmful to the plaintiff, was occurring broadly in chats. The plaintiff can claim, like NYT, that it might be necessary to examine private chats in the future to make a determination about some aspect of the lawsuit, and the judge can then order Signal to find a way to retain all chats for potential review.

However you feel about OpenAI, this is not a good precedent for user privacy and security.

fc417fc802•14h ago

That's not entirely fair. The argument isn't "users are using the service to break the law" but rather "the service is facilitating law breaking". To fix your signal analogy suppose you could use the chat interface to request copyrighted material from the operator.

charcircuit•13h ago

That doesn't change the outcome being the same in that the app has to send the plain text messages of everyone, including the chat history of every user.

fc417fc802•13h ago

Right. But requiring logs due to suspicion that the service itself is actively violating the law is entirely different from doing so on the basis that end users might be up to no good entirely independently.

Also OpenAI was never E2EE to begin with. They were already retaining logs for some period of time.

My personal view is that the court order is overly broad and disregards potential impacts on end users but it's nonetheless important to be accurate about what is and isn't happening here.

dangus•4h ago

Again keep in mind that we are talking about a case limited analysis of that data within the privacy of the court system.

For example, if the trial happens to find data that some chats include crimes committed by users in their private chats, the court can't just send police to your door based on that information since the information is only being used in the context of an intellectual property lawsuit.

Remember that privacy rights are legitimate rights but they change a lot when you're in the context of an investigation/court proceeding. E.g., the right of police to enter and search your home changes a lot when they get a court issued warrant.

The whole point of E2EE services from the perspective of privacy-concious customers is that a court can get a warrant for data from those companies but they'll only be able to produce encrypted blobs with no access to decryption keys. OpenAI was always a not-E2EE service, so customers have to expect that a court order could surface their data to someone else's eyes at some point.

dangus•4h ago

I'm confused at how you think that NYT isn't going through due process and attempting to win a lawsuit.

The court isn't saying "preserve this data forever and ever and compromise everyone's privacy," they're saying "preserve this data for the purposes of this court while we perform an investigation."

IMO, the NYT has a very good argument here that the only way to determine the scope of the copyright infringement is to analyze requests and responses made by every single customer. Like I said in my original comment, the remedies for copyright infringement are on a per-infringement basis. E.g., everytime someone on LimeWire downloads Song 2 by Blur from your PC, you've committed one instance of copyright infringement. My interpretation is that NYT wants the court to find out how many times customers have received ChatGPT responses that include verbatim New York Times content.

_jab•14h ago

> How will you store my data and who can access it?

> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.

So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.

sashank_1509•14h ago

Obviously openAI’s point of view will be their point of view. They are going to call this lawsuit baseless, they would not be fighting it or else.

ivape•8h ago

To me it's pretty clear the way this will happen. You will need to buy additional credits or subscriptions through these LLMs that feedback payment to things like NYT and book publishers. It's all stolen. I don't even want to hear it. This company doesn't want to pay up and willing to let user's privacy hang in the balance to draw the case out until they get sure footing with their device launches or the like (or additional markets like enterprise, etc).

fallingknife•3h ago

tiahura•2h ago

incorrect. copyright applies to derived works.

vel0city•2h ago

Even then, it's possible to prompt the model to exactly reproduce the copyrighted works.

fallingknife•1h ago

Please show me one of these prompts

vel0city•1h ago

NYT has examples in their legal complaint. See page 30.

https://www.scribd.com/document/695189742/NYT-v-OpenAI

Workaccount2•3h ago

> It's all stolen.

LLMs are not massive archives of data. The big models are a few TB in size. No one is forgoing a NYT subscription because they can ask ChatGPT to print out NYT news stories.

edbaskerville•2h ago

Regardless of the representation, some people are replacing news consumption generally with answers from ChatGPT.

tptacek•14h ago

No, there is a whole news cycle about how chats you delete aren't actually being deleted because of a lawsuit, they essentially have to respond. It's not an attempt to spin the lawsuit; it's about reassuring their customers.

VanTheBrand•13h ago

The part where they go out of the way to call the lawsuit baseless is spin though, and mixing that with this messaging does exactly that, presents a mixed message. The NYT lawsuit is objectively not baseless. OpenAI did train on the Times and chat gpt does output information from that training. That’s the basis of the lawsuit. NYT may lose, this could end up being considered fair use, it might ultimately be a flimsy basis for a lawsuit, but to say it’s baseless (and with nothing to back that up) is spin and makes this message less reassuring.

tptacek•13h ago

No, it's not. It's absolutely standard corporate communications. If they're fighting the lawsuit, that is essentially the only thing they can say about it. Ford Motor Company would say the same thing (well, they'd probably say "meritless and frivolous").

bee_rider•12h ago

Standard corporate spin, then?

tptacek•12h ago

No? "Spin" implies there was something else they could possibly say.

mmooss•11h ago

I haven't heard that interpretation; I might call it spin of spin.

justacrow•10h ago

They could choose to not say it

ethbr1•5h ago

Indeed. Taken to its conclusion, this thread suggests that corporations are justified in saying whatever they want in order to further their own ends.

Including lies.

I'd like to aim a little higher, maybe towards expecting correspondence with reality?

IOW, yes, there is no law that OpenAi can't try to spin this. But it's still a shitty, non-factually-based choice to make.

mrgoldenbrown•4h ago

If you're being held at gunpoint and forced to lie, your words are still a lie. Whether you were forced or not is a separate dimension.

bunderbunder•2h ago

No, this isn't even close to spin, it's just a standard part of defending your case. In the US tort system you need to be constantly publicly saying you did nothing wrong. Any wavering on that point could be used against you in court.

jmull•1h ago

This is a funny thread. You say "No" but then restate the point with slightly different words. As if anything a company says publicly about ongoing litigation isn't spin.

bunderbunder•30m ago

I suppose it's down to how you define "spin". Personally I'm in favor of a definition of the term that doesn't excessively dilute it.

adamsb6•3h ago

I’m typing these words from a brain that has absorbed copyrighted works.

mmooss•11h ago

> It's not an attempt to spin the lawsuit; it's about reassuring their customers.

It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.

fallingknife•3h ago

Why does OpenAI have any obligation to present the NYTs side?

roywiggins•2h ago

It would be extremely unusual (and likely very stupid) for the defendant in a lawsuit to post publicly that the plaintiff maybe has a point.

mhitza•8h ago

My understanding is that they have to keep chats based on an order, *as a result of their previous accidental deletion of potential evidence in the case*[0].

And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]

[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...

[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...

conartist6•6h ago

It's hard to reassure your customers if you can't address the elephant in the room. OpenAI brought this on themselves by flaunting copyright law and assuring everyone else that such aggressive and probably-illegal action would be retroactively acceptable once they were too big to fail.

ofjcihen•3h ago

They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.

gruez•2h ago

>They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

From what I can tell from the court filings, prior to the judge's order to retain everything, the request to retain everything was coming from the plaintiff, with openai objecting to the request and refusing to comply in the meantime. If so, it's a bit misleading to characterize this as "deleting things they shouldn’t have", because what they "should have" done wasn't even settled. That's a bit rich coming from someone accusing openai of "spin".

ofjcihen•1h ago

Here’s a good article that explains what you may be missing.

https://techcrunch.com/2024/11/22/openai-accidentally-delete...

gruez•1h ago

Your linked article talks about openai deleting training data. I don't see how that's related to the current incident, which is about user queries. The ruling from the judge for openai to retain all user queries also didn't reference this incident.

ofjcihen•1h ago

Sure.

Without this devolving into a tit for tat then the article explains for those following this conversation why it’s been elevated to a court order and not just an expectation to preserve.

lxgr•14h ago

If the stored data is found to be relevant to the lawsuit during discovery, it becomes available to at least both parties involved and the court, as far as I understand.

hiddencost•14h ago

> So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

I am not an Open AI stan, but this needs to be responded to.

The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.

This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".

Anyone who makes promises about data security is at best incompetent and at worst dishonest.

JohnKemeny•12h ago

> Anyone who makes promises about data security is at best incompetent and at worst dishonest.

Shouldn't that be "at best dishonest and at worst incompetent"?

I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?

HPsquared•9h ago

An incompetent but honest person is more likely to accept correction and respond to feedback generally.

nhecker•2h ago

Data is a toxic asset. -- https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

pritambarhate•11h ago

May be because you are not OpenAI user. I am. I find it useful and I pay for it. I don't want my data to be retained beyond what's promised in the Terms of Use and Privacy Policy.

I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.

mmooss•11h ago

> who don't even care about NYT's content or bypassing their paywalls.

Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.

If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?

> (jeapordizes)

... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.

conartist6•6h ago

You live on a pirate ship. You have no right to ignore the ethics and law of that just because you could be hurt in conflict related to piracy

sega_sai•14h ago

Strange smear against NYT. If NYT has a case, and the court approves that, it's bizarre to to use the court order to smear NYT. If there is no case, "Open"AI will have a chance to prove its case in court.

tptacek•14h ago

They're a party to the case! Saying it's baseless isn't a "smear". There is literally nothing else they can say (other than something synonymous with "baseless", like "without merit").

lucianbr•14h ago

Oh they definitely can say other things. It's just that it would be inconvenient. They might lose money.

I wonder if the laws and legal procedures are written considering this general assumption that a party to a lawsuit will naturally lie if it is in their interest. And then I read articles and comments about a "trust based society"...

tptacek•14h ago

I'm not taking one side or the other in the case itself, but it's lazy and superficial to suggest that the defendant in a civil suit would say anything other than that the suit has no merit. The version of this statement where they generously interpret anything the NYT (I subscribe) says, they might as well just surrender.

I'm not sticking up for OpenAI so much as just for decent, interesting threads here.

wilg•13h ago

> They might lose money.

I expect it's more about them losing the _case_. Silly to expect someone fighting a lawsuit not to try to win it.

fastball•13h ago

This is the nature of the civil court system – it exists for when parties disagree.

Why would a defendant who agrees a case has merit go to court at all? Much easier (and generally less expensive) to make the other party whole, assuming the parties agree on what "whole" is. And if they don't agree on what "whole" is, we are back to square one and of course you'd maintain that the other side's suit is baseless.

mmooss•11h ago

They could say nothing about the merits of the case.

lxgr•14h ago

The NYT is, in my view, exploiting a systematic weakness of the US legal system here, i.e. extremely wide reaching discovery laws with almost no regard for the privacy of parties not involved to a given dispute, or aspects of their lives not relevant to the dispute at hand.

Of course it's out of self-serving interests, but I find it hard to disagree with OpenAI on this one.

Arainach•14h ago

What right to privacy? There is no right to have your interactions with a company (1) remain private, nor should there be. Even if there was you agree to let OpenAI do essentially whatever they want with your data - including hand it over to the courts in response to a subpoena.

(1) With limited well scoped exclusions for lawyers, medical records, erc.

lxgr•14h ago

That may be your or your jurisdiction's view, but such privacy rights definitely exist in many countries.

You might have heard of the GDPR, but even before that, several countries had "privacy by default" laws on the books.

Imustaskforhelp•14h ago

But if both the parties agree, then there should be The freedom to stay private.

Your comment is dystopian given how the interaction is basically like how some people treat ai as their "friend" imagine no matter what encrypted messaging app or smth they use, the govt still snoops

fastball•12h ago

Dealer-Client privilege.

bionhoward•14h ago

It’s also a matter of competition…there are other AI services available today with various privacy policies ranging from no training by default, ability to opt out of training, ability to turn off data retention, or e2e encryption. A lot of workloads (cough, working on private git repos) logically require private AI to make sense

ChadNauseam•14h ago

Given how many important interactions people have with companies in our modern age, saying "There is no right to have your interactions with a company remain private" is essentially equivalent to saying "there is no right to privacy at all". When I talk to my friends over facetime or imessage, that interaction is being mediated by Apple, as well as by my internet service provider and (I assume) many other parties.

wvenable•13h ago

> "There is no right to have your interactions with a company remain private" is essentially equivalent to saying "there is no right to privacy at all".

Legally that is a correct statement.

If you want that changed, it will require legislation.

HDThoreaun•6h ago

Really not so simple. Roe v Wade was decided based on the implied right to privacy. Sure its been overturned but if liberals get back on the court it will be un-overturned

nativeit•4h ago

That’s presumably why legislation is needed?

maketheman•3h ago

Given the current balance of the court, I'd say it's about even odds we end the entire century without ever having had a liberal court the entire time. Best reasonable case we're a solid couple of decades from it, and even that's not got great odds.

We'd have a better chance if anyone with power were talking about court reform to make the Supreme Court justices e.g. drawn by lot for each session from the district courts, but approximately nobody is. It'd be damn good and long overdue reform, but oh well.

And the thing is, we've already had a fairly conservative court for decades. I'm pretty likely to die, even if of old age, never having seen an actually-liberal court in the US my entire life. Like, WTF. Frankly, no wonder so much of our situation is fucked up, backwards, and authoritarianism-friendly. And (sigh) any serious attempts to fix that are basically on hold for many decades more, assuming rule of law survives that long anyway.

[EDIT] My point, in short, is that "we still have [thing], we just have to wait for a liberal court that'll support it" is functionally indistinguishable from not having [thing].

fallingknife•3h ago

A liberal court will probably start drawing exceptions to 1A out of thin air like "misinformation" and "hate speech." I'd rather stick with what we have.

wvenable•1h ago

Roe v Wade refers to the constitutional right to privacy under the Due Process Clause of the 14th Amendment. This is part of individual rights against the state and has nothing to do with private companies. There is no general constitutional right that guarantees privacy in interactions with private companies.

whilenot-dev•13h ago

Privacy in that example would be if no party except you and your friends can access the contents of this interaction. I wouldn't want neither Apple nor my ISP to have that access.

A company like OpenAI that offers a SaaS is no such friend, and in such power dynamics (individual VS company) it's probably in your best interest to have everything public if necessary.

lxgr•1h ago

You're always free to keep records of your ChatGPT conversations on your end.

Why tangle the data of people with very different preferences than yours up in that?

bobmcnamara•13h ago

> "there is no right to privacy at all"

First time?

Analemma_•13h ago

> essentially equivalent to saying "there is no right to privacy at all".

As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law. Lots of people think the Fourth Amendment is a general right to privacy, and they are wrong: the Fourth Amendment is specifically about government search and seizure, and courts have been largely consistent about saying it does not extend beyond that to e.g. relationships with private parties.

If you want a right to privacy, you will need to advocate for laws to be changed; the ones as they exist now do not give it to you.

tiahura•13h ago

No that is incorrect. See eg griswold, lawrence etc.

Terr_•12h ago

That's a fallacy of equivocation, you're introducing a different meaning/flavor of the same word.

As it stands today, a court case (A) affirming the right to use contraception is not equivalent to a court case (B) stating that a phone-company/ISP/site may not sell their records of your activity.

tiahura•4h ago

Your response hinges on a fallacy of equivocation, but ironically, it commits one as well.

You conflate the absence of a statutory or regulatory regime governing private data transactions with the broader constitutional right to privacy. While it’s true that the Fourth Amendment limits only state action, U.S. constitutional law, via cases like Griswold v. Connecticut and Lawrence v. Texas, and clearly recognizes a substantive right to privacy, grounded in the Due Process Clause and other constitutional penumbras. This is not a semantic variant; it is a distinct and judicially enforceable right.

Moreover, beyond constitutional law, the common law explicitly protects privacy through torts such as intrusion upon seclusion, public disclosure of private facts, false light, and appropriation of likeness. These apply to private actors and are recognized in nearly every U.S. jurisdiction.

Thus, while the Constitution may not prohibit a website from selling your data, it does affirm a right to privacy in other, fundamental contexts. To deny that entirely is legally incorrect.

jcalvinowens•3h ago

In practice, the constitution says whatever the supreme court says it says.

While these grand theories of traditional implicit constitutional law are nice, they're pretty meaningless in a system where five individuals can (and are willing to) vote to invalidate decades of tradition on a whim.

I too want real laws.

wvenable•56m ago

You're conflating the existence of specific privacy protections in narrow legal domains with a generalized, enforceable right to privacy which doesn't exist in US law. The Constitution recognizes a substantive right to privacy, but only in carefully defined areas like reproductive choice, family autonomy, and intimate conduct, and critically only against state actors. Citing Griswold, Lawrence, and related cases does not establish a sweeping privacy right enforceable against private companies.

Common law requires a high threshold of offensiveness and are adjudicated on a case-by-case in individual jurisdictions. They offer only remedies and not a proactive right to control your data.

The original point, that there is no general right in the US to have your interactions with a company remain private, still stands. That's not a denial of all privacy rights but a recognition that US law fails to provide comprehensive privacy protection.

fc417fc802•14h ago

> There is no right to have your interactions with a company (1) remain private, nor should there be.

Why should two entities not be able to have a confidential interaction if that is what they both want? Certainly a court order could supersede such a right just as it could most others provided sufficient evidence. However I would expect such things to be both highly justified and narrowly targeted.

This specific case isn't so much about a right to privacy as it is a more general freedom to enter into contracts with others and expect those to be honored.

nativeit•4h ago

Hey man, wanna buy some coke? How about trade secrets? State secrets?

1shooner•13h ago

>(1) With limited well scoped exclusions for lawyers, medical records, erc.

Is this referring to some actual legal precedent, or just your personal opinion?

levocardia•13h ago

But there's a very big difference between "no company is legally required to keep your data private" and "a company that explicitly and publically wants to protect your privacy is being legally coerced into not keeping your data private"

nativeit•4h ago

No room here for the company’s purely self-interested motivations?

davedx•11h ago

Hello. I live in the EU. Have you heard of GDPR?

JumpCrisscross•13h ago

> with almost no regard for the privacy of parties not involved to a given dispute

Third-party privacy and relevance is a constant point of contestion in discovery. Exhibit A: this article.

thinkingtoilet•4h ago

The privacy onus is entirely on the company. If Open AI is concerned about user privacy then don't collect that data. End of story.

acheron•4h ago

…the whole point of this story is that the court is forcing them to collect the data.

thinkingtoilet•3h ago

You're telling me you don't think Open AI is already collecting chat logs?

dghlsakjg•3h ago

Yes.

In the API that is an explicit option, as well as in the paid consumer product as well. The amount of business that they stand to lose by maliciously flouting that part of their contract is in the billions.

thinkingtoilet•2h ago

You can trust Sam Altman. I do not.

Workaccount2•3h ago

"I'm wrong so here is a conspiracy so I can be right again".

Large companies lose far more by lying than they would gain from it.

taormina•3h ago

No no, they are being forced to KEEP the data they collected. They didn't have to keep it to begin with.

pj_mukh•1h ago

Isn't the only way to do that is for ChatGPT to run locally on a machine? The moment your chat hits their server they are legally required to store it?

wyager•14h ago

Lots of people abuse the legal system in various ways. They don't get a free pass just because their abuse is technically legal itself.

visarga•13h ago

NYT wants it both ways. When they were the ones putting freelancer articles into a database to rent, they argued against enforcing copyright and for supporting the new industry, and that it was too hard to revert their original assumptions. Now they absolutely love copyright.

https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-t...

moefh•13h ago

Another way of looking at it is that they lost that case over 20 years ago, and have been building their business model for 20 years accordingly.

In other words, they want everyone to be forced to follow the same rules they were forced to follow 20 years ago.

eviks•13h ago

And if NYT has no case, but the court approves it, is that still bizarre?

tootie•1h ago

It's PR. OpenAI stole mountains of copyrighted content and are trying to make NYT look like bad guys. OpenAI would not be in the position of defending a lawsuit if they hadn't done something that is very likely illegal. OpenAI can also end this requirement right now by offering a settlement.

lxgr•14h ago

Does anybody know if this also applies to "temporary chats" on ChatGPT?

Given that it's not explicitly mentioned as data not being affected, I'm assuming it is.

miles•14h ago

> But now, OpenAI has been forced to preserve chat history even when users "elect to not retain particular conversations by manually deleting specific conversations or by starting a 'Temporary Chat,' which disappears once closed," OpenAI said.

https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

paxys•14h ago

> Does this court order violate GDPR or my rights under European or other privacy laws?

> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

That's a lot of words to say "yes, we are violating GDPR".

esafak•14h ago

Could a European court not have ordered the same thing? Is there an exception for lawsuits?

lxgr•14h ago

There is, but I highly doubt a European court would have given such an order (or if they did, it would probably be axed by a higher court pretty quickly).

There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.

Looking at the actual data seems much more invasive than that and, in my (non-legally trained) estimate doesn't seem like it would stand a chance at least in higher courts.

dragonwriter•13h ago

> There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.

> Looking at the actual data seems much more invasive than that

Looking at the data isn't involved in the current order, which requires OpenAI to preserve and segregate the data that would otherwise have been deleted. The reason for segregation is because any challenges OpenAI has to providing that data in disccovery will be heard before anyone other than OpenAI is ordered to have access to the data.

This is, in fact, less invasive than the government mandating collection for speculative future uses, since it applies only to not destroying evidence already collected by OpenAI in the course of operating their business, and only for potential use, subject to other challenges by OpenAI, in the present case.

kelvinjps•14h ago

Maybe the will ot store the chats of the European users?

dragonwriter•13h ago

That's what they are trying to suggest, because they are still trying to use the GDPR as part of their argument challenging the US court order. (Kind of a longshot to get a US court to agree that the obligation of a US party to preserve evidence related to a suit in US courts under US law filed by another US party is mitigated by European regulations in any case, even if their argument that such preservation would violate obligations that the EU had imposed on them.)

3836293648•9h ago

No, they're not, because the GDPR has an explicit exception for when a court orders that a company keeps data for discovery. It'd only be a GDPR violation if it's kept after this case is over.

lompad•6h ago

This is not correct.

> Any judgment of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may only be recognised or enforceable in any manner if based on an international agreement, such as a mutual legal assistance treaty, in force between the requesting third country and the Union or a Member State, without prejudice to other grounds for transfer pursuant to this Chapter.

So if, and only if, an agreement between the US and the EU allows it explicitly, it is legal. Otherwise it is not.

atleastoptimal•14h ago

I've always assumed that anything sent to any company's hosted API will be logged forever. To assume otherwise always seemed naive, like thinking that apps aren't tracking your web activity.

lxgr•14h ago

Assuming the worst is wise, settling for the worst case outcome without any fight seems foolish.

fragmede•12h ago

privacy nhilism is a decision all on its own

morsch•11h ago

I'd only call it nihilism if you are in agreement with the grandparent and then do it anyway. Other choices are pretending it's not true (denialism), or just not thinking about (ignorance). Or you complicate your life by not uploading your private info.

Barrin92•2h ago

not really, it's basically just being anti fragile. Consider any corporate entity that interacts with you to be an Eldritch horror from outer space that wants to siphon your soul, because that's effectively what it is, and keep your business with them to a minimum.

It's just realism. Protect your private data yourself, relying on companies or governments to do it for you is like the saying goes, letting a tiger devour you up to the neck and then ask it to stop at the head

mosdl•14h ago

Its funny that OpenAI is complaining, they don't mind saying copyright doesn't apply to them if it makes them money.

ivape•8h ago

In retrospect, Bezos did the smartest thing by buying the Washington Post. In retrospect, Google did a great thing by working on a deal with Reddit. Content repositories/creators are going to sue these LLM companies in the West until they make licensing agreements. If I were OpenAI, I'd work hard to spend the money they raised to literally buyout as many of these outlets as possible.

How much could the NYT back catalog be worth? Just buy it, ask the Saudis.

WorldPeas•14h ago

So how is this going to impact cursor's privacy mode, which is required by many companies for compliant usage of AI editors? For the uninitiated, in the web console this looks like:

Privacy mode (enforced across all seats)

OpenAI Zero-data-retention (approved)

Anthropic Zero-data-retention (approved)

Google Vertex AI Zero-data-retention (approved)

xAi Grok Zero-data-retention (approved)

did this just open another can of worms?

qmarchi•14h ago

Likely, they're using OpenAI's Zero-Retention APIs where there's never data stored in the first place.

So nothing?

JumpCrisscross•13h ago

> OpenAI's Zero-Retention APIs

Do we know if the court order covers these?

brigandish•12h ago

Yes, follow the link at the top.

8note•13h ago

at least, openai zero-data-retention will by court order be full retention.

im excited that the law is going to push for local models

blerb795•13h ago

The linked page specifically mentions that these ZDR APIs are not impacted.

> This does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.

junto•13h ago

This is disingenuous from OpenAI.

They are being challenged because NYT believes that ChatGPT was trained with copyrighted data.

NYT naively push to find a way to prove that NYT data is being used in user chats and how often.

OpenAI spin that to NYT are invading user privacy.

It’s quite transparent as to what they are doing here.

dumbmrblah•13h ago

So is this for all chats going forward or does it include conversations retroactively?

steve_adams_86•13h ago

Presumably moving forward, because otherwise the data retention policies wouldn't have been followed correctly (from what I understand)

kingkawn•13h ago

Once the data is kept it is a matter of time til a new must-try use for it will be born

john2x•13h ago

Does this mean that if I can get ChatGPT to generate copyrighted text, they'll get in trouble?

tiahura•13h ago

Every concerned ChatGPT user should file an emergency motion to intervene and request for stay of the order. ChatGPT can help you draft the motion and proposed order, just give it a copy of the discovery order. The SDNY has a very helpful pro se hotline.

The order the judge issued is irresponsible. Maybe ChatGPT did get too cute in its discovery responses, but the remedy isn’t to trample the rights of third parties.

vessenes•13h ago

This is a massssive overreach. Not in the nature of the request: "please don't destroy data that might contain proof my case is strong," but in the scale, and therefore it's a massive overreach by the judge. But shame on NYT for asking.

This request also equals: "Please keep a backup of every Senator's private chats, every Senator's spouse's private chats, every military commander's personal chats, every politician in a foreign country, forever."

There is no way that data will stay safe forever. There is no way that, once such a facility is built, it will not be used constantly, by governments all over the world.

The NYT case seems to currently be on whether or not OpenAI users use ChatGPT to circumvent paywalls. Maybe they do, although when the suit was filed, 3.5 was definitely not a reliable witness to what NYT articles were about. There are 400 million MAUs at ChatGPT - more than the population of the US.

To my mind there's three tranches of information that we could find out:

1. People's primary use case for ChatGPT is to get NYT articles for free. Therefore oAI is a bad actor making a tool that largely got profitable off infringing NYT's copyright.

2. Some core segment used/uses it for infringement purposes; not a lot, but it's a use case that sells licenses.

3. This happens, but just vanishingly rarely compared to most use cases of the tool.

I'd imagine different rulings and orders to cure in each of these circumstances, but why is it that the court needs to know any more than some percentages?

Assuming a 10k system prompt, 500 tokens of chat, 400mm people, five chats a week, that comes to roughly 67 Terabytes of data per week(!) No metadata, just ASCII output.

Nobody, ever, will read all of this. In fact, it would take about 24 hours for a Seagate drive to just push all the bytes down a bus, much less process any of it. Why not agree on representative searches, get a team to spot check data, and go from there?

Personally, I would guess the percentage of "infringement" use cases, IF it is even infringement to get an AI to verbatim quote a news article while it is NOT infringement for Cloudflare to give a verbatim quote of a news article, is going to be tiny, tiny, tiny.

NYT should back the fuck off, remember it's supposed to be a force for good in the world and not be the cause of massive possible downstream harm to people all over the world.

fallingknife•2h ago

It's obviously 3 because the entire point of the NYT is that it's a newspaper and probably 99% of their traffic is from articles new enough that they haven't had time to go into the training data. So anybody who wanted to use ChatGPT to breach the NYT paywall couldn't get any new articles. Also there are so many other ways to breach a paywall that you would have to be insane to try to do it through prompt engineering ChatGPT. The whole case is a scam and I hope the court makes them pay OpenAI's legal fees.

throwaway6e8f•13h ago

Agent-1, I want to legally retain all customer data indefinitely but I'm worried about a backlash from the public. Also, I'm having a bunch of problems with the NYT accusing us of copyright violation. Give me a strategy to resolve these issues so that I win in the long term.

dataflow•12h ago

> ChatGPT Enterprise and ChatGPT Edu: Your workspace admins control how long your customer content is retained. Any deleted conversations are removed from our systems within 30 days, unless we are legally required to retain them.

I'm confused, how does this not affect Enterprise or Edu? They clearly possess the data, so what makes them different legally?

oxw•12h ago

Enterprise has an exemption granted by the judge

> When we appeared before the Magistrate Judge on May 27, the Court clarified that ChatGPT Enterprise is excluded from preservation.

dataflow•12h ago

Oh I missed that part, thanks. I wonder why. I guess the judge assumes it isn't being used for copyright infringement, but other plans might be?

bee_rider•12h ago

No idea, but just to speculate—the court’s goal isn’t actually to scare OpenAI’s users or harm their business, right? It is to collect evidence. Maybe they just figured they don’t need to dip into that pool to get enough evidence.

Grikbdl•12h ago

Who knows, it's probably the judge's twisted idea of "that'd be too far", as if cancelling basic privacy expectations of all users everywhere wouldn't be.

landonxjames•12h ago

Repeatedly calling the lawsuit baseless feels like it makes Open AI’s point a lot weaker. They obviously don’t like the suit, but I don’t think you can credibly argue that there aren’t tricky questions around the use of copyrighted materials in training data. Pretending otherwise is disingenuous.

sigilis•1h ago

They pay their lawyers and whoever made this page a lot for the express purpose of credibly arguing that it is very clearly totally legal and very cool to use of any IP they want to train their models.

Could you with a straight face argue that the NYT newspaper could be a surrogate girlfriend for you like a GPT can be? They maintain that it is obviously a transformative use and therefore not an infringement of copyright. You and I may disagree with this assertion, but you can see how they could see this as baseless, ridiculous, and frivolous when their livelihoods depend on that being the case.

Caelus9•12h ago

Honestly, this incident makes me feel that it is really difficult to draw a clear line between “protecting privacy” and “obeying the law”. On the one hand, I am very relieved that OpenAI stood up and said “no”. After all, we all know that these systems collect everything by default, which makes people a little panic. But on the other hand, it sounds very strange that the court can directly say “give me all the data”, even those that users explicitly delete. Moreover, this also shows that everyone actually cares about their information and privacy now. No one wants to be used for anything casually.

wand3r•12h ago

Does anyone know how this can be enforced?

The ruling and situation aside, to what degree is it possible to enforce something like this and what are the penalties? Even in GDPR and other data protection cases, it seems super hard to enforce. Directives to keep or delete data basically require system level access, because the company can always CRUD their data whenever they want and whatever is in their best interest. Data can ask to be produced to a court periodically and audited which could maybe catch an individual case, I guess. There is basically no way to know without literally seizing the servers in an extreme case. Also, the consequences in most cases are a fine.

mmooss•11h ago

This isn't the executive branch of the US government, which has Constitutional powers. It's a private company and the court can at least enforce massive penalties, presumptions against them at trial (causing them to lose), and contempt of court. Talk to a lawyer before you try something like it.

imiric•10h ago

> the court can at least enforce massive penalties

A.k.a. the cost of doing business.

delusional•12h ago

I have no time for this circus.

The technology anarchists in this thread need perspective. This is fundamentally a case about the legality of this product. In the extreme case, this will render the whole product category of "llm trained on copyrighted content" illegal. In that case, you will have been part of a copyright infringement on a truly massive scale. The users of these tools do NOT deserve privacy in the light of the crimes alleged.

You do not get to claim to protect the privacy of the customers of your illegal venture.

6510•11h ago

The harm this is doing and will do (regardless) seems to exceed the value of the NYT.

If a company is subject to a US court order that violates EU law, the company could face legal consequences in the EU for non-compliance with EU law.

The GDPR mandates specific consent and legal bases for processing data, including sharing it.

Assuming it is legal to share it for legal purposes one cant sufficiently anonymize the data. It needs to be accompanied by user data that allows requests to download it and for it to be deleted.

I wonder what the fine would be if they just delete it per user agreement.

I also wonder, could one, in the US, legally promise the customer they may delete their data then chose to keep it indefinitely and share it with others?

dvt•11h ago

> Does this court order violate GDPR or my rights under European or other privacy laws?

> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

So basically no, lol. I wonder if we'll see the GDPR go head-to-head with Copyright Law here, that would be way more fun than OpenAI v NYT.

yoaviram•11h ago

>Trust and privacy are at the core of our products. We give you tools to control your data—including easy opt-outs and permanent removal of deleted ChatGPT chats (opens in a new window) and API content from OpenAI’s systems within 30 days.

No you don't. You charge extra for privacy and list it as a feature on your enterprise plan. Not event paying pro customer get "privacy". Also, you refuse to delete personal data included in your models and training data following numerous data protection requests.

baxtr•10h ago

This is a typical "corporate speak" / "trustwahsing" statement. It’s usually super vague, filled with feel-good buzzwords, with a couple of empty value statements sprinkled on top.

that_was_good•10h ago

Except all users can opt out. Am I missing something?

It says here:

> If you are on a ChatGPT Plus, ChatGPT Pro or ChatGPT Free plan on a personal workspace, data sharing is enabled for you by default, however, you can opt out of using the data for training.

Enterprise is just opt out by default...

https://help.openai.com/en/articles/8983130-what-if-i-want-t...

agos•10h ago

what about all the rest of the data they use for training, there's no opt out from that

bartvk•10h ago

Indeed. Click your profile in the top right, click on the settings icon. In Settings, select "Data Controls" (not "privacy") and then there's a setting called "Improve the model for everyone" (not "privacy" or "data sharing") and turn it off.

bugtodiffer•9h ago

so they technically kind of follow the law but make it as hard as possible?

bartvk•6h ago

Personally I feel it's okay but kinda weird. I mean why not call it privacy. Gray pattern, IMHO. For example venice.ai simply doesn't have a privacy setting because they don't use the data from chats. (They do have basic telemetry, and the setting is called "Disable Telemetry Collection").

atoav•10h ago

Not sharing you data with other users does not mean the data of a deleted chat are gone, those are very likely two completely different mechanisms.

And whether and how they use your data for their own purposes isn't touched by that either.

Kiyo-Lynn•7h ago

Lately I’m not even sure if the things I say on OpenAI are really mine or just part of the platform. I never used to think much when chatting, but knowing some of it might be stored for a long time makes me feel uneasy. I’m not asking for much. I just want what I delete to actually be gone.

nraynaud•7h ago

Isn't Altman collecting millions of eye scans? Since when did he care about privacy?

CjHuber•7h ago

Even though how they responded is definitely controversial, I‘m glad that they did publicize some response to it. After reading about it in the news yesterday and seeing no response on their side yet, I was worried that they would just keep silent

molf•7h ago

It would help tremendously if OpenAI would make it possible to apply for zero data retention (ZDR). For many business needs there is no reason to store or log any request at all.

In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.

pclmulqdq•4h ago

The missing ingredient is money.

jewelry•3h ago

not just money. How are you going to support this client’s support ticket if there is no log at all?

ethbr1•2h ago

Don't. "We're unable to provide support for your request, because you disabled retention." Easy.

hirsin•2h ago

They don't care, they still want support and most leadership teams are unwilling to stand behind a stance of telling customers no.

abeppu•1h ago

... but why is not responding to a request for zero retention today better than not being able to respond to a future request? They're basically already saying no to customers who request this capability that they said they support, but their refusal is in the form of never responding.

belter•3h ago

If this stands I dont think they can operate in the EU

bunderbunder•2h ago

I highly doubt this court order affects people using OpenAI services from the EU, as long as they're connecting to EU-based servers.

lmm•3h ago

> In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

What's the betting that they just write it on the website and never actually implemented it?

sigmoid10•3h ago

Tbf the approach seems pretty standard. Azure also only offers zero retention to vetted customers and otherwise retains data for up to 30 days to monitor and detect abuse. Since the possibilities for abuse are so high with these models, it would make sense that they don't simply give that kind of privilege to everyone - if only to cover their own legal position.

ArnoVW•1h ago

My understanding is that they log 30 days by default, for handling of bugs. And that you can request 0 days. This is from their documentation

miles•31m ago

> I get that approval needs to be given, and that there are barriers to entry.

Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.

mediumsmart•7h ago

Its a newspaper. They are sold for a price, not to one person and they dont come with an nda. They become part of history and Society.

conartist6•6h ago

Hey OpenAI! In your "why is this happening" you left some bits out.

You make it sound like they're mad at you for no reason at all. How unreasonable of them when confronted with such honorable folks as yourselves!

energy123•5h ago

> Consumer customers: You control whether your chats are used to help improve ChatGPT within settings, and this order doesn’t change that either.

Within "settings"? Is this referring to the dark pattern of providing users with a toggle "Improve model for everyone" that doesn't actually do anything? Instead users must submit a request manually on a hard to discover off-app portal, but this dark pattern has deceived them into think they don't need to look for it.

sib301•5h ago

Can you please elaborate?

energy123•4h ago

To opt-out of your data being trained on, you need to go to https://privacy.openai.com and click the button "Make a Privacy Request".

alextheparrot•1h ago

in the app: Settings ~> Data Controls ~> Improve the model for everyone

curtisblaine•5h ago

Yes, could you please explain why toggling "Improve model for everyone" off doesn't do anything and provide a link to this off-portal app that you mention?

jamesgill•3h ago

Follow the money.

udev4096•2h ago

The irony is palpable here

hombre_fatal•41m ago

You know how it's always been a meme that you'd be mortally embarrassed if your browser history ever leaked?

Imagine how much worse it is for your LLM chat history to leak.

It's even worse than your private comms with humans because it's a raw look at how you are when you think you're alone, untempered by social expectations.

vitaflo•26m ago

WTF are you asking LLMs and why would you expect any of it to be private?

ofjcihen•18m ago

“Write a song in the style of Slipknot about my dumb inbred dogs. I love them very much but they are…reaaaaally dumb.”

To be fair the song was intense.

hombre_fatal•15m ago

It's not that the convos are necessarily icky.

It's that it's like watching how someone might treat a slave when they think they're alone. And how you might talk down to or up to something that looks like another person. And how pathetic you might act when it's not doing what you want. And what level of questions you outsource to an LLM. And what things you refuse to do yourself. And how petty the tasks might be, like workshopping a stupid twitter comment before you post it. And how you copied that long text from your distraught girlfriend and asked it for some response ideas. etc. etc. etc.

At the very least, I'd wager that it reveals that bit of true helpless patheticness inherent in all of us that we try so hard to hide.

Show me your LLM chat history and I will learn a lot about your personality. Nothing else compares.

Jackpillar•6m ago

Might have to reemphasize his question again but - what questions are you asking your LLM? Why are you responding to it and/or "treating" it differently then how you would a calculator or search engine.

hombre_fatal•3m ago

Because it's far more capable than a calculator or search engine and because you interact with it with conversational text, it reveals more aspects about your personality.

Why might your search engine queries reveal more about you than your keystrokes in a calculator? Now dial that up.

alec_irl•54s ago

> how you copied that long text from your distraught girlfriend and asked it for some response ideas

good lord, if tech were ethical then there would be mandatory reporting when someone consults an LLM to tell them how they should be responding to their intimate partner. are your skills of expression already that hobbled by chat bots?

Rebooting the Attention Machine

The AI lobby plants its flag in Washington

Implementation of Dependent Types

Nitpicking Gladiator's Iconic Opening Battle

Why Does My Ripped CD Have Messed Up Track Names? and Why Is One Track Missing?

Recursion [video]

Binary Lambda Calculus

Ask HN: What can we as humans do to defeat the AI?

What the failure of a superstar student reveals about economics

AI can now stalk you with just a single vacation photo

One-Shot AI Voice Clones vs. LoRA Finetunes

'Tastes like water': how a US facility is recycling sewage to drink

Cursor's Anysphere nabs $9.9B valuation, soars past $500M ARR

Get ready for less-efficient and more-polluting vehicles in the US

The 'China Shock'

WxWidgets 3.3.0 Released

Overlooked Films

Indianapolis 500 could be dramatically reshaped by jolts of electric juice

Virginia Tech researchers develop recyclable, healable electronics

Show HN: Solomon's Agent - a CLI to simplify the web

Roundcube ≤ 1.6.10 Post-Auth RCE via PHP Object Deserialization [CVE-2025-49113]

Show HN: Hood Ball – Multiplayer car-based soccer battle (v2)

Connecting you to local markets for cheaper VPS

GameStop store accidentally staples receipts into pre-ordered Switch 2 screens

Nemotron-H-47B-Reasoning-128k

Experimenting with New Comment UI Features on Stack Overflow

Is OpenAI's 4o Snake Oil?

It's Time Hospitals Communicated as Clearly as Domino's

Debugging Azure Networking for Elastic Cloud Serverless

Series C and Scale (Cursor)

Rebooting the Attention Machine

The AI lobby plants its flag in Washington

Implementation of Dependent Types

Nitpicking Gladiator's Iconic Opening Battle

Why Does My Ripped CD Have Messed Up Track Names? and Why Is One Track Missing?

Recursion [video]

Binary Lambda Calculus

Ask HN: What can we as humans do to defeat the AI?

What the failure of a superstar student reveals about economics

AI can now stalk you with just a single vacation photo

One-Shot AI Voice Clones vs. LoRA Finetunes

'Tastes like water': how a US facility is recycling sewage to drink

Cursor's Anysphere nabs $9.9B valuation, soars past $500M ARR

Get ready for less-efficient and more-polluting vehicles in the US

The 'China Shock'

WxWidgets 3.3.0 Released

Overlooked Films

Indianapolis 500 could be dramatically reshaped by jolts of electric juice

Virginia Tech researchers develop recyclable, healable electronics

Show HN: Solomon's Agent - a CLI to simplify the web

Roundcube ≤ 1.6.10 Post-Auth RCE via PHP Object Deserialization [CVE-2025-49113]

Show HN: Hood Ball – Multiplayer car-based soccer battle (v2)

Connecting you to local markets for cheaper VPS

GameStop store accidentally staples receipts into pre-ordered Switch 2 screens

Nemotron-H-47B-Reasoning-128k

Experimenting with New Comment UI Features on Stack Overflow

Is OpenAI's 4o Snake Oil?

It's Time Hospitals Communicated as Clearly as Domino's

Debugging Azure Networking for Elastic Cloud Serverless

Series C and Scale (Cursor)

How we’re responding to The NYT’s data demands in order to protect user privacy

Comments