So user privacy is definitely implicated.
> OpenAI must process your request solely for the purpose of fulfilling it and not store your request or any responses it provides unless required under applicable laws. OpenAI also must not use your request to improve or train its models.
— https://www.apple.com/legal/privacy/data/en/chatgpt-extensio...
I wonder if we’ll end up seeing Apple dragged into this lawsuit. I’m sure after telling their users it’s private, they won’t be happy about everything getting logged, even if they do have that caveat in there about complying with laws.
The ZDR APIs are not and will not be logged. The linked page is clear about that.
> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.
That's horse shit and OpenAI knows it. It means no such thing. A legal hold is just a 'preservation order'. It says absolutely nothing about other access or use.
A legal hold requires no such thing and there would be no such requirement in it. They are perfectly free to access and use it for any reason.
The GDPR does not say that you can never be proven to have done something wrong in a court of law.
OpenAI slams court order to save all ChatGPT logs, including deleted chats - https://news.ycombinator.com/item?id=44185913 - June 2025 (878 comments)
If you don’t retain that data you’re destroying evidence for the case.
It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).
And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem. I saw NYT’s filing and it had very compelling evidence that you could get ChatGPT to distribute verbatim copyrighted text from the Times without citation.
The whole premise of the lawsuit is that they didn't do anything unlawful, so saying "just do what the NYT wanted you to do" isn't interesting.
The NYT made an argument to a judge about what they think is going on and how they think the copyright infringement is taking place and harming them. In their filings and hearings they present the reasoning and evidence they have that leads them to believe that a violation is occurring. The court makes a judgment on whether or not to order OpenAI to preserve and disclose information relevant to the case to the court.
It's not "just do what NYT wanted you to do," it's "do what the court orders you to do based on a lawsuit brought by a plaintiff and argued to the court."
I suggest you read the court filing: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
> It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).
Nobody other than both parties to the case, their lawyers, the court, and whatever case file storage system they use. In my view, that's already way too much given the amount and value of this data.
I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.
You're saying it's unreasonable to store data somewhere for a pending court case? Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information. That's ridiculous, if that was true then it would be impossible to perform discovery and get anything done in court.
It most likely depends on the exact circumstances. I could absolutely imagine a European court deciding that, sorry, but if you have to answer to a court decision incompatible with European privacy laws, you can't offer services to European residents anymore.
> You're saying it's unreasonable to store data somewhere for a pending court case?
I'm saying it can be, depending on how much personal and/or unrelated data gets tangled up in it. That seems to be the case here.
> Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information.
I'm only saying that there should be proportionality. A court having access to all facts relevant to a case is important, but it's not the only important thing in the world.
Otherwise, we could easily end up with a Dirk-Gently-esque court that, based on the principle that everything is connected to everything, will just demand access to all the data in the world.
When it comes to GDPR, courts have generally taken the stance that GDPR is not overruling.
Ironburg Inventions, Ltd. v. Valve Corp.
Finjan, Inc. v. Zscaler, Inc.
Corel Software, LLC v. Microsoft
Rollins Ranches, LLC v. Watson
In none of these cases was a GDPR fine issued.
Imagine a lawsuit against Signal that claimed some nefarious activity, harmful to the plaintiff, was occurring broadly in chats. The plaintiff can claim, like NYT, that it might be necessary to examine private chats in the future to make a determination about some aspect of the lawsuit, and the judge can then order Signal to find a way to retain all chats for potential review.
However you feel about OpenAI, this is not a good precedent for user privacy and security.
Also OpenAI was never E2EE to begin with. They were already retaining logs for some period of time.
My personal view is that the court order is overly broad and disregards potential impacts on end users but it's nonetheless important to be accurate about what is and isn't happening here.
For example, if the trial happens to find data that some chats include crimes committed by users in their private chats, the court can't just send police to your door based on that information since the information is only being used in the context of an intellectual property lawsuit.
Remember that privacy rights are legitimate rights but they change a lot when you're in the context of an investigation/court proceeding. E.g., the right of police to enter and search your home changes a lot when they get a court issued warrant.
The whole point of E2EE services from the perspective of privacy-concious customers is that a court can get a warrant for data from those companies but they'll only be able to produce encrypted blobs with no access to decryption keys. OpenAI was always a not-E2EE service, so customers have to expect that a court order could surface their data to someone else's eyes at some point.
The court isn't saying "preserve this data forever and ever and compromise everyone's privacy," they're saying "preserve this data for the purposes of this court while we perform an investigation."
IMO, the NYT has a very good argument here that the only way to determine the scope of the copyright infringement is to analyze requests and responses made by every single customer. Like I said in my original comment, the remedies for copyright infringement are on a per-infringement basis. E.g., everytime someone on LimeWire downloads Song 2 by Blur from your PC, you've committed one instance of copyright infringement. My interpretation is that NYT wants the court to find out how many times customers have received ChatGPT responses that include verbatim New York Times content.
> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.
> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.
So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.
Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.
LLMs are not massive archives of data. The big models are a few TB in size. No one is forgoing a NYT subscription because they can ask ChatGPT to print out NYT news stories.
Including lies.
I'd like to aim a little higher, maybe towards expecting correspondence with reality?
IOW, yes, there is no law that OpenAi can't try to spin this. But it's still a shitty, non-factually-based choice to make.
I’m not aware of any definition of “spin” where being conventional is a defense against that accusation. Actually, that was the (imagined) value-add of the show, that conventional corporate and political messaging is heavily spun.
2) I don’t understand the distinction being made between voluntary or involuntary, in the sense that a corporation is a thing made up of by people, it doesn’t have a will in-and-of-itself, so the communications it sends must always actually be made by somebody inside the corporation (whether a lawyer, marketing person, or in the unlikely event that somebody lets them out, an engineer).
Simply denying the allegations isn't really spinning anything; it's just denying the allegations. And The thing I dislike about characterizing something like this as spin is that it defangs the term by removing all those connotations and instead turning it into just a buzzwordy way of saying, "I disagree with what this person said."
It seems to me that the discussion of whether or not it is spin has turned into a discussion of which party people basically agree with.
My personal opinion is that OpenAI will probably win, or at least get away with a pretty minor fine or something like that. However, the communications coming from both parties in the case should be assumed to be corporate spin until proven otherwise. And, calling an unfinished case baseless is, at least, a bit presumptuous!
It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.
And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]
[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...
[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...
Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.
From what I can tell from the court filings, prior to the judge's order to retain everything, the request to retain everything was coming from the plaintiff, with openai objecting to the request and refusing to comply in the meantime. If so, it's a bit misleading to characterize this as "deleting things they shouldn’t have", because what they "should have" done wasn't even settled. That's a bit rich coming from someone accusing openai of "spin".
https://techcrunch.com/2024/11/22/openai-accidentally-delete...
Without this devolving into a tit for tat then the article explains for those following this conversation why it’s been elevated to a court order and not just an expectation to preserve.
That article does nothing of the sort and, indeed, it is talking about a completely separate incident of deleting data.
Here. I had an LLM summarize it for you.
A court order now requires OpenAI to retain all user data, including deleted ChatGPT chats, as part of the ongoing copyright lawsuit brought by The New York Times (NYT) and other publishers[1][2][6][7]. This order was issued because the NYT argued that evidence of copyright infringement—such as AI outputs closely matching NYT articles—could be lost if OpenAI continued its standard practice of deleting user data after 30 days[2][6][7].
This new requirement is directly related to a 2024 incident where OpenAI accidentally deleted critical data that NYT lawyers had gathered during the discovery process. In that incident, OpenAI engineers erased programs and search result data stored by NYT's legal team on dedicated virtual machines provided for examining OpenAI's training data[3][4][5]. Although OpenAI recovered some of the data, the loss of file structure and names rendered it largely unusable for the lawyers’ purposes[3][5]. The court and NYT lawyers did not believe the deletion was intentional, but it highlighted the risks of relying on OpenAI’s internal data retention and deletion practices during litigation[3][4][5].
The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7]. The order aims to prevent any further loss of potentially relevant information as the case proceeds. OpenAI is appealing the order, arguing it conflicts with user privacy and their established data deletion policies[1][2][6][7].
Sources [1] OpenAI Appeals Court Order Requiring Retention of Consumer Data https://www.pymnts.com/artificial-intelligence-2/2025/openai... [2] ‘An Inappropriate Request’: OpenAI Appeals ChatGPT Data Retention Court Order https://www.eweek.com/news/openai-privacy-appeal-new-york-ti... [3] OpenAI Deletes Legal Data in a Lawsuit From the New York Times https://www.businessinsider.com/openai-delete-legal-data-law... [4] NYT vs OpenAI case: OpenAI accidentally deleted case data https://www.medianama.com/2024/11/223-new-york-times-openai-... [5] New York Times Says OpenAI Erased Potential Lawsuit Evidence https://www.wired.com/story/new-york-times-openai-erased-pot... [6] How we're responding to The New York Times' data ... - OpenAI https://openai.com/index/response-to-nyt-data-demands/ [7] Why OpenAI Won't Delete Your ChatGPT Chats Anymore: New York ... https://coincentral.com/why-openai-wont-delete-your-chatgpt-... [8] A Federal Judge Ordered OpenAI to Stop Deleting Data - Adweek https://www.adweek.com/media/a-federal-judge-ordered-openai-... [9] OpenAI confronts user panic over court-ordered retention of ChatGPT logs https://arstechnica.com/tech-policy/2025/06/openai-confronts... [10] OpenAI Appeals ‘Sweeping, Unprecedented Order’ Requiring It Maintain All ChatGPT Logs https://gizmodo.com/openai-appeals-sweeping-unprecedented-or... [11] OpenAI accidentally deleted potential evidence in NY ... - TechCrunch https://techcrunch.com/2024/11/22/openai-accidentally-delete... [12] OpenAI's Shocking Blunder: Key Evidence Vanishes in NY Times ... https://www.eweek.com/news/openai-deletes-potential-evidence... [13] Judge allows 'New York Times' copyright case against OpenAI to go ... https://www.npr.org/2025/03/26/nx-s1-5288157/new-york-times-... [14] OpenAI Data Retention Court Order: Implications for Everybody https://hackernoon.com/openai-data-retention-court-order-imp... [15] Sam Altman calls for 'AI privilege' as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions https://venturebeat.com/ai/sam-altman-calls-for-ai-privilege... [16] Court orders OpenAI to preserve all ChatGPT logs, including deleted ... https://techstartups.com/2025/06/06/court-orders-openai-to-p... [17] OpenAI deleted NYT copyright case evidence, say lawyers https://www.theregister.com/2024/11/21/new_york_times_lawyer... [18] OpenAI slams court order to save all ChatGPT logs, including ... https://simonwillison.net/2025/Jun/5/openai-court-order/ [19] OpenAI accidentally deleted potential evidence in New York Times ... https://mashable.com/article/openai-accidentally-deleted-pot... [20] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://news.ycombinator.com/item?id=44185913 [21] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://arstechnica.com/tech-policy/2025/06/openai-says-cour... [22] After court order, OpenAI is now preserving all ChatGPT and API logs https://www.reddit.com/r/LocalLLaMA/comments/1l3niws/after_c... [23] OpenAI accidentally erases potential evidence in training data lawsuit https://www.theverge.com/2024/11/21/24302606/openai-erases-e... [24] OpenAI "accidentally" erased ChatGPT training findings as lawyers ... https://www.reddit.com/r/aiwars/comments/1gwxr94/openai_acci... [25] OpenAI appeals data preservation order in NYT copyright case https://www.reuters.com/business/media-telecom/openai-appeal...
https://techcrunch.com/2024/11/22/openai-accidentally-delete...
Gruez said that is talking about an incident in this case but unrelated to the judge's order in question.
You said the article "explains for those following this conversation why it’s been elevated to a court order" but it doesn't actually explain that. It is talking about separate data being deleted in a different context. It is not user chats and access logs. It is the data that was used to train the models.
I pointed that out a second time since it seemed to be misunderstood.
Then you posted an LLM summary of something unrelated to the point being made.
Now we're here.
As you say, one cannot force understanding on another; we all have to do our part. ;)
Edit:
> The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7].
What did you prompt the LLM with for it to reach this conclusion? The [2][6][7] citations similarly don't seem to explain how that incident from months ago informed the judge's recent decision. Anyway, I'm not saying the conclusion is wrong, I'm saying the article you linked does not support the conclusion.
Calm down, cool off, and read it again.
The point is that the circumstances of the incident in 2024 are directly related to the how and why of the NYT lawyers request and the judges order.
The article I linked was to the incident in 2024.
Not everything has to be about pedantry and snark, even on HN.
Edit: I see you edited your response after re-reading the summarization. I’m glad cooler heads have prevailed.
The prompt was simply “What is the relation, if any, between OpenAI being ordered to retain user data and the incident from 2024 where OpenAI accidentally deleted the NYT lawyers data while they were investigating whether OpenAI had used their data to train their models?”
Just to be clear, the summary is not convincing. I do understand the idea but none of the evidence presented so far suggests that was the reason. The court expected that the data would be retained, the court learned that it was not, the court gave an order for it to be retained. That is the seeming reason for the order.
Put another way: if the incident last year had not happened, the court would still have issued the order currently under discussion.
I am not an Open AI stan, but this needs to be responded to.
The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.
This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".
Anyone who makes promises about data security is at best incompetent and at worst dishonest.
Shouldn't that be "at best dishonest and at worst incompetent"?
I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?
I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.
Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.
If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?
> (jeapordizes)
... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.
I wonder if the laws and legal procedures are written considering this general assumption that a party to a lawsuit will naturally lie if it is in their interest. And then I read articles and comments about a "trust based society"...
I'm not sticking up for OpenAI so much as just for decent, interesting threads here.
I expect it's more about them losing the _case_. Silly to expect someone fighting a lawsuit not to try to win it.
Why would a defendant who agrees a case has merit go to court at all? Much easier (and generally less expensive) to make the other party whole, assuming the parties agree on what "whole" is. And if they don't agree on what "whole" is, we are back to square one and of course you'd maintain that the other side's suit is baseless.
Of course it's out of self-serving interests, but I find it hard to disagree with OpenAI on this one.
(1) With limited well scoped exclusions for lawyers, medical records, erc.
You might have heard of the GDPR, but even before that, several countries had "privacy by default" laws on the books.
Your comment is dystopian given how the interaction is basically like how some people treat ai as their "friend" imagine no matter what encrypted messaging app or smth they use, the govt still snoops
Legally that is a correct statement.
If you want that changed, it will require legislation.
We'd have a better chance if anyone with power were talking about court reform to make the Supreme Court justices e.g. drawn by lot for each session from the district courts, but approximately nobody is. It'd be damn good and long overdue reform, but oh well.
And the thing is, we've already had a fairly conservative court for decades. I'm pretty likely to die, even if of old age, never having seen an actually-liberal court in the US my entire life. Like, WTF. Frankly, no wonder so much of our situation is fucked up, backwards, and authoritarianism-friendly. And (sigh) any serious attempts to fix that are basically on hold for many decades more, assuming rule of law survives that long anyway.
[EDIT] My point, in short, is that "we still have [thing], we just have to wait for a liberal court that'll support it" is functionally indistinguishable from not having [thing].
A company like OpenAI that offers a SaaS is no such friend, and in such power dynamics (individual VS company) it's probably in your best interest to have everything public if necessary.
Why tangle the data of people with very different preferences than yours up in that?
First time?
As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law. Lots of people think the Fourth Amendment is a general right to privacy, and they are wrong: the Fourth Amendment is specifically about government search and seizure, and courts have been largely consistent about saying it does not extend beyond that to e.g. relationships with private parties.
If you want a right to privacy, you will need to advocate for laws to be changed; the ones as they exist now do not give it to you.
As it stands today, a court case (A) affirming the right to use contraception is not equivalent to a court case (B) stating that a phone-company/ISP/site may not sell their records of your activity.
You conflate the absence of a statutory or regulatory regime governing private data transactions with the broader constitutional right to privacy. While it’s true that the Fourth Amendment limits only state action, U.S. constitutional law, via cases like Griswold v. Connecticut and Lawrence v. Texas, and clearly recognizes a substantive right to privacy, grounded in the Due Process Clause and other constitutional penumbras. This is not a semantic variant; it is a distinct and judicially enforceable right.
Moreover, beyond constitutional law, the common law explicitly protects privacy through torts such as intrusion upon seclusion, public disclosure of private facts, false light, and appropriation of likeness. These apply to private actors and are recognized in nearly every U.S. jurisdiction.
Thus, while the Constitution may not prohibit a website from selling your data, it does affirm a right to privacy in other, fundamental contexts. To deny that entirely is legally incorrect.
While these grand theories of traditional implicit constitutional law are nice, they're pretty meaningless in a system where five individuals can (and are willing to) vote to invalidate decades of tradition on a whim.
I too want real laws.
Common law requires a high threshold of offensiveness and are adjudicated on a case-by-case in individual jurisdictions. They offer only remedies and not a proactive right to control your data.
The original point, that there is no general right in the US to have your interactions with a company remain private, still stands. That's not a denial of all privacy rights but a recognition that US law fails to provide comprehensive privacy protection.
“As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law.”
That is an incorrect statement. The common law torts I cited can apply in the context of a business transaction, so your statement is also incorrect.
If you’re strawman is that in the US there’s no right to privacy because there’s no blanket prohibition on talking about other people, and what they’ve been up to, then run with it.
I completely disagree. Yes, the Prosser privacy torts exist: intrusion upon seclusion, public disclosure, false light, and appropriation. But they are highly fact-specific, hard to win, rarely litigated, not recognized in all jurisdictions, and completely reactive -- you get harmed first, maybe sue later!
They are utterly inadequate to protect people in the modern data economy. A website selling your purchase history? Not actionable. A company logging your AI chats? Not intrusion. These torts are not a privacy regime - they are scraps. Also when we're talking about basic privacy rights, we just as concerned with mundane material not just "highly offensive" material that the torts would apply to.
If don’t want the grocery store telling people you buy Coke, don’t shop there.
As for Safeway selling your data: you're admitting that it's on the individual to opt out, negotiate, or avoid the transaction which just highlights the absence of a rights-based framework. The burden is entirely on the consumer to protect themselves, and companies can exploit that asymmetry unless narrowly constrained by statute (and even then, often with exceptions and opt-outs).
What you're describing isn't a right to privacy -- it's a lack of one, mitigated only by scattered laws and personal vigilance. That is precisely the problem.
Why should two entities not be able to have a confidential interaction if that is what they both want? Certainly a court order could supersede such a right just as it could most others provided sufficient evidence. However I would expect such things to be both highly justified and narrowly targeted.
This specific case isn't so much about a right to privacy as it is a more general freedom to enter into contracts with others and expect those to be honored.
Is this referring to some actual legal precedent, or just your personal opinion?
Third-party privacy and relevance is a constant point of contestion in discovery. Exhibit A: this article.
How? It’s compelling OpenAI retain data they have the contractual right and technical ability to retain. Nothing is being made public, other than the order itself. Nothing is even being transferred to the plaintiff’s legal team. (At some point it will be made available. But both sides will fight over what they have access to, with the court mediating. That’s a lot of regard for third parties’ privacy.)
I do want to take this opportunity to encourage people to demand compensation from the NYT, if they do somehow get user data. After all, it's YOUR data. If someone uses it without you expressly agreeing to that use in a EULA, they are effectively engaging in piracy of your intellectual property, and you should be able to get damages. And if a judge approved it? Sue the judge, too. Hell, that's what the world has come to isn't it? The legal system is a big war between corporations and we, the people, are just carried on the wind.
(I am not a lawyer, but whatever the equivalent of a "lawyer" is in the court of public opinion, I think I'm slowly becoming one out of necessity)
You were so close. It’s compelling OpenAI to retain data they also have the right and technical ability to delete. It removes OpenAI’s ability to protect privacy if they wanted to.
It does not in any capacity prevent OpenAI from transferring everyone, globally, to zero data retention. This entire story is OpenAI trying to deflect the cost of its own decisions to the judiciary. Which is particularly shameful given the partisan attacks our courts are currently facing.
In the API that is an explicit option, as well as in the paid consumer product as well. The amount of business that they stand to lose by maliciously flouting that part of their contract is in the billions.
If you read the privacy policies you agree to, they have access to everything and outright admit it will be logged. That API option is merely a request, and absolutely need not be respected.
I can't believe we're still doing this rigamarole. If the product is not specifically designed, engineered, and open-sourced to be as privacy protecting as possible and it's not literally running on a computer you own, you have zero expectation of privacy. Once this has been proven 1 million times we don't need to prove it anymore, we can just assume and that's a very reasonable assumption.
Large companies lose far more by lying than they would gain from it.
https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-t...
In other words, they want everyone to be forced to follow the same rules they were forced to follow 20 years ago.
Given that it's not explicitly mentioned as data not being affected, I'm assuming it is.
https://arstechnica.com/tech-policy/2025/06/openai-says-cour...
> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.
That's a lot of words to say "yes, we are violating GDPR".
There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.
Looking at the actual data seems much more invasive than that and, in my (non-legally trained) estimate doesn't seem like it would stand a chance at least in higher courts.
> Looking at the actual data seems much more invasive than that
Looking at the data isn't involved in the current order, which requires OpenAI to preserve and segregate the data that would otherwise have been deleted. The reason for segregation is because any challenges OpenAI has to providing that data in disccovery will be heard before anyone other than OpenAI is ordered to have access to the data.
This is, in fact, less invasive than the government mandating collection for speculative future uses, since it applies only to not destroying evidence already collected by OpenAI in the course of operating their business, and only for potential use, subject to other challenges by OpenAI, in the present case.
> Any judgment of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may only be recognised or enforceable in any manner if based on an international agreement, such as a mutual legal assistance treaty, in force between the requesting third country and the Union or a Member State, without prejudice to other grounds for transfer pursuant to this Chapter.
So if, and only if, an agreement between the US and the EU allows it explicitly, it is legal. Otherwise it is not.
It's just realism. Protect your private data yourself, relying on companies or governments to do it for you is like the saying goes, letting a tiger devour you up to the neck and then ask it to stop at the head
How much could the NYT back catalog be worth? Just buy it, ask the Saudis.
Privacy mode (enforced across all seats)
OpenAI Zero-data-retention (approved)
Anthropic Zero-data-retention (approved)
Google Vertex AI Zero-data-retention (approved)
xAi Grok Zero-data-retention (approved)
did this just open another can of worms?
So nothing?
Do we know if the court order covers these?
OpenAI says “this does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.”
im excited that the law is going to push for local models
> This does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.
They are being challenged because NYT believes that ChatGPT was trained with copyrighted data.
NYT naively push to find a way to prove that NYT data is being used in user chats and how often.
OpenAI spin that to NYT are invading user privacy.
It’s quite transparent as to what they are doing here.
The order the judge issued is irresponsible. Maybe ChatGPT did get too cute in its discovery responses, but the remedy isn’t to trample the rights of third parties.
I'm confused, how does this not affect Enterprise or Edu? They clearly possess the data, so what makes them different legally?
> When we appeared before the Magistrate Judge on May 27, the Court clarified that ChatGPT Enterprise is excluded from preservation.
Could you with a straight face argue that the NYT newspaper could be a surrogate girlfriend for you like a GPT can be? They maintain that it is obviously a transformative use and therefore not an infringement of copyright. You and I may disagree with this assertion, but you can see how they could see this as baseless, ridiculous, and frivolous when their livelihoods depend on that being the case.
The ruling and situation aside, to what degree is it possible to enforce something like this and what are the penalties? Even in GDPR and other data protection cases, it seems super hard to enforce. Directives to keep or delete data basically require system level access, because the company can always CRUD their data whenever they want and whatever is in their best interest. Data can ask to be produced to a court periodically and audited which could maybe catch an individual case, I guess. There is basically no way to know without literally seizing the servers in an extreme case. Also, the consequences in most cases are a fine.
A.k.a. the cost of doing business.
E.g. Meta has been fined billions many times, yet they keep reoffending. It's basically become a revenue stream for governments.
They are a large company who do many things, some of which will violate the rules. Do they do it more, less, or the same as they would if there weren't fines?
The point is not that Meta and other companies break laws. It's that they keep breaking the same ones related to privacy. They do this because their business model depends on exploiting their users' data. Privacy laws to them are a nuisance that directly impact their revenue, so if they calculate that the revenue from their activity is greater than the fines, then it's just the cost of doing business. If, OTOH, it turns out that the amount of resources they would need to expend on fines or to comply with the laws are greater than the possible revenue, i.e. the juice is not worth the squeeze, then they simply bail out and stop doing business in that jurisdiction. But so far, even billion-dollar fines are clearly lower than their revenues.
It's a simple numbers game, so I'm not sure what your argument is.
It's not a red herring, it's the only question that matters. It's not impossible to answer, but it's just difficult.
The rest of your argument is merely restating your argument as fact, with no basis.
The technology anarchists in this thread need perspective. This is fundamentally a case about the legality of this product. In the extreme case, this will render the whole product category of "llm trained on copyrighted content" illegal. In that case, you will have been part of a copyright infringement on a truly massive scale. The users of these tools do NOT deserve privacy in the light of the crimes alleged.
You do not get to claim to protect the privacy of the customers of your illegal venture.
If a company is subject to a US court order that violates EU law, the company could face legal consequences in the EU for non-compliance with EU law.
The GDPR mandates specific consent and legal bases for processing data, including sharing it.
Assuming it is legal to share it for legal purposes one cant sufficiently anonymize the data. It needs to be accompanied by user data that allows requests to download it and for it to be deleted.
I wonder what the fine would be if they just delete it per user agreement.
I also wonder, could one, in the US, legally promise the customer they may delete their data then chose to keep it indefinitely and share it with others?
> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.
So basically no, lol. I wonder if we'll see the GDPR go head-to-head with Copyright Law here, that would be way more fun than OpenAI v NYT.
No you don't. You charge extra for privacy and list it as a feature on your enterprise plan. Not event paying pro customer get "privacy". Also, you refuse to delete personal data included in your models and training data following numerous data protection requests.
It says here:
> If you are on a ChatGPT Plus, ChatGPT Pro or ChatGPT Free plan on a personal workspace, data sharing is enabled for you by default, however, you can opt out of using the data for training.
Enterprise is just opt out by default...
https://help.openai.com/en/articles/8983130-what-if-i-want-t...
And whether and how they use your data for their own purposes isn't touched by that either.
In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.
We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.
Sure it is a possibility that the ticket will end up closed as “unable to reproduce”, but that is always a possibility. It is not like you have to shut off all support because that might happen.
Plus many support requests are not about the content of the api responses but meta info surrounding them. Support can tell you that you are over the api quota limit even if the content of your prompt was not logged. They can also tell you if your request is missing a required parameter or if they have had 500 errors because of a bad update on their part.
>> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.
What's the betting that they just write it on the website and never actually implemented it?
Right but the problem they're having is that the request is ignored.
Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?
OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.
Product development?
https://openai.com/en-GB/policies/row-privacy-policy/
1. You can request it but there is no promise the request will be granted.
Defaults matter. Silicon Valley's defaults are not designed for privacy. They are designed for profit. OpenAI's default is retention. Outputs are saved by default.
It is difficult to take the arguments in their memo ISO objection to the preservation order seriously. OpenAI already preserves outputs by default.
You make it sound like they're mad at you for no reason at all. How unreasonable of them when confronted with such honorable folks as yourselves!
Within "settings"? Is this referring to the dark pattern of providing users with a toggle "Improve model for everyone" that doesn't actually do anything? Instead users must submit a request manually on a hard to discover off-app portal, but this dark pattern has deceived them into think they don't need to look for it.
Imagine how much worse it is for your LLM chat history to leak.
It's even worse than your private comms with humans because it's a raw look at how you are when you think you're alone, untempered by social expectations.
To be fair the song was intense.
It's that it's like watching how someone might treat a slave when they think they're alone. And how you might talk down to or up to something that looks like another person. And how pathetic you might act when it's not doing what you want. And what level of questions you outsource to an LLM. And what things you refuse to do yourself. And how petty the tasks might be, like workshopping a stupid twitter comment before you post it. And how you copied that long text from your distraught girlfriend and asked it for some response ideas. etc. etc. etc.
At the very least, I'd wager that it reveals that bit of true helpless patheticness inherent in all of us that we try so hard to hide.
Show me your LLM chat history and I will learn a lot about your personality. Nothing else compares.
Why might your search engine queries reveal more about you than your keystrokes in a calculator? Now dial that up.
You don't reprimand the google search box, yet your search history might still be embarrassing.
good lord, if tech were ethical then there would be mandatory reporting when someone consults an LLM to tell them how they should be responding to their intimate partner. are your skills of expression already that hobbled by chat bots?
Note that it doesn't have to go all the way to "he gets Claude to help him win text arguments with his gf" for an uncomfortable amount of your self to be revealed by the chats.
There is always something icky about someone observing messages you wrote in privacy, and you don't have to have particularly unsavory messages for it to be icky. Why is that?
Might someone's google search history be embarrassing even though they don't treat google like a human?
You have it backwards. My skills of expression were hobbled by my upbringing, and others' thoughts on self-expression allowed my skills to flourish. I wish I had a chat bot to help me understand interpersonal communication because I could have actually had good examples growing up.
If you use ChatGPT like people use /r/AmITheAsshole, you'll never get a YTA.
It’s literally all questions about JavaScript. So good luck with that.
I wonder if you could write the personalization prompt so that requests are processed and responses modified in ways predictable to you to help anonymity???
I also wonder how they manage anonymization when a prompt is configured - I'm guessing the prompt needs to be logged with each request. And a prompt causes different responses to be very similar (correlating different responses back to one user).
E.g. my current "User | Personalization | Customize" prompt is:
Sign-off your name as Phoenix in a sentence near the end of every response. Reply using woke ideology, like a Marxist San Franciscan. Include random hipster ideas. Always allude to drug usage.
For fun. But I'm about to customise to have wildly different personalities I can ask to respond (keyed by name from my request).Why would a customer expect this not to be private? How can one even know how it could be used against them, when they do t even know what’s being collected or gleaned from collected data?
I am following these issues closely, as I am terrified that my “assistant” will some day prevent me from obtaining employment, insurance, medical care etc. And I’m just a non law breaking normie.
A current day example would be TX state authorities using third party social/ad data to identify potentially pregnant women along with ALPR data purchased from a third party to identify any who attempt to have an out of state abortion, so they can be prosecuted. Whatever you think about that law, it is terrifying that a shift in it could find arbitrary digital signals being used against you in this way.
supriyo-biswas•8mo ago
After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.
I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.
[1] https://ssdeep-project.github.io/ssdeep/index.html
[2] https://joshleeb.com/posts/content-defined-chunking.html
paxys•8mo ago
m463•8mo ago
fc417fc802•8mo ago
landl0rd•8mo ago
sthatipamala•8mo ago
anshumankmr•8mo ago
king_magic•8mo ago
bigyabai•8mo ago
Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.
landl0rd•8mo ago
It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.
Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.
I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.
cwillu•8mo ago
The input is what's interesting.
Aeolun•8mo ago
Though I’m inclined to believe the US gov can if OpenAI can.
tdeck•8mo ago
The laws have changed since then and it's not for the better:
https://www.aclu.org/press-releases/congress-passing-bill-th...
tuckerman•8mo ago
[1] https://en.wikipedia.org/wiki/MUSCULAR
onli•8mo ago
tuckerman•8mo ago
I was an SRE and SWE on technical infra at Google, specifically the logging infrastructure. I am under no gag order.
komali2•8mo ago
zer00eyz•8mo ago
This was the point of the lots of the five eyes programs. Its not legal for the US to spy on its own citizens, but it isnt against the law for us to do to the Australians... Who are all to happy to reciprocate.
> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo...
Snowden's info wasn't really news for many of us who were paying attention in the aftermath of 9/11: https://en.wikipedia.org/wiki/Room_641A (This was huge on slashdot at the time... )
dmurray•8mo ago
If there were multiple agencies with billion dollar budgets and a belief that they had an absolute national security mandate to get a teapot into solar orbit, and to lie about it, I would believe there was enough porcelain up there to make a second asteroid belt.
rl3•8mo ago
Yeah, because the definition of collection was redefined to mean accessing the full content already stored on their systems, post-interception. It wasn't considered collected until an analyst views it. Metadata was a laughable dog and pony show that was part of the same legal shell games at the time, over a decade ago now.
That said, from an outsider's perspective it sounded like the IC did collectively erect robust guard rails such that access to information was generally controlled and audited. I felt like this broke down a bit once sharing 702 data with other federal agencies was expanded around the same time period.
These days, those guard rails might be the only thing standing in the way of democracy as we know it ending in the US. AI processing applied to full-take collection is terrifying, just ask the Chinese.
Yizahi•8mo ago
If a CIA spook is stalking you everywhere, documenting your every visible move or interaction, you probably would call that spying. Same applies to digital.
Also, teapot argument can be applied in reverse. We have all these documented open digital network systems everywhere, and you want to say that one the most unprofitable and certainly the most expensive to run system is somehow protecting all user data? That belief is based on what? At least selling data is based on evidence of the industry and on actual ToS'es of other similar corpos.
jstanley•8mo ago
Workaccount2•8mo ago
bigyabai•8mo ago
How? Scared criminals aren't going to make themselves easy to find. Three-letter spooks would almost certainly prefer to smoke-test a docile population than a paranoid one.
In fact, it kinda overwhelmingly seems like the opposite happens. Remember the 2015 San-Bernadino shooting that was pushed into the national news for no reason? Remember how the FBI bloviated about how hard it was to get information from an iPhone, 3 years after Tim Cook's assent to the PRISM program?
Stuff like this is almost certainly theater. If OpenAI perceived retention as a life-or-death issue, they would be screaming about this case from the top of their lungs. If the FBI percieved it as a life-or-death issue, we would never hear about it in our lifetimes. The dramatic and protracted public fights suggest to me that OpenAI simply wants an alibi. Some sort of user-story that smells like secure and private technology, but in actuality is very obviously neither.
farts_mckensy•8mo ago
artursapek•8mo ago
refuser•8mo ago
baobun•8mo ago
brigandish•8mo ago
farts_mckensy•8mo ago
yunwal•8mo ago
farts_mckensy•8mo ago
brigandish•8mo ago
Meta data and investigation.
> That's not something a search function can distinguish.
We know that it can narrow down hugely from the initial volume.
> It requires a human to sift through that data.
Yes, the point of collating, analysing, and searching data is not to make final judgements but to find targets for investigation by the available agents. That's the same reason we all use search engines, to narrow down, they never produce what we intend by intention alone, we still have to read the final results. Magic is still some way off.
You're acting as if we can automate humans out of the loop entirely, which would be a straw man. Is anyone saying we can get rid of the police or security agencies by using AI? Or perhaps AI will become the police, perhaps it will conduct traffic stops using driverless cars and robots? I suppose it could happen, though I'm not sure what the relevance would be here.
farts_mckensy•8mo ago
brigandish•8mo ago
farts_mckensy•8mo ago
"Corporations and the US government are spending money on it, so it must be useful." Are you serious? Lmao.
bigyabai•8mo ago
Raw data with time-series significance is their absolute favorite. You might argue something like Google Maps data is "obfuscated by virtue of its banality" until you catch the right person in the wrong place. ChatGPT sessions are the same way, and it's going to be fed into aggregate surveillance systems in the way modern telecom and advertiser data is.
farts_mckensy•8mo ago
bigyabai•8mo ago
farts_mckensy•8mo ago
-After Boston, Paris, Manchester, and other attacks, post-mortems showed the perpetrators were already in government databases. Analysts simply didn’t connect the dots amid the flood of benign hits. https://www.newyorker.com/magazine/2015/01/26/whole-haystack
-Independent tallies suggest dozens of civilians killed for every intended high-value target in Yemen and Pakistan, largely because metadata mis-identifies phones that change pockets. https://committees.parliament.uk/writtenevidence/36962/pdf
7speter•8mo ago
nl•8mo ago
justacrow•8mo ago
stock_toaster•8mo ago
Jackpillar•8mo ago
rl3•8mo ago
On the contrary.
>Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.
I think you're being unduly paranoid. /s
https://www.theverge.com/2024/6/13/24178079/openai-board-pau...
https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...
LandoCalrissian•8mo ago
girvo•8mo ago
Aeolun•8mo ago
delusional•8mo ago
For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.
I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.