OpenAI may be trying to paint themselves as the goody-two-shoes here, but they're not.
> We may use Personal Data for the following purposes: [...] To comply with legal obligations and to protect the rights, privacy, safety, or property of our users, OpenAI, or third parties.
OpenAI outright says it will give your conversations to people like lawyers.
If you thought they wouldn't give it out to third parties, you not only have not read OpenAI's privacy policy, you've not read any privacy policy from a big tech company (because all of them are basically maximalist "your privacy is important, we'll share your data only with us and people who we deem worthy of it, which turns out to be everybody.")
The New York Times is demanding that we turn over 20 million of your private
ChatGPT conversations. They claim they might find examples of you using
ChatGPT to try to get around their paywall.The constitution is clear that the purpose of intellectual property is to promote progress. I feel that OpenAI is on the right side of that and this is not IP theft as long as they aren't reproducing others work in a non-transformative way.
Training the AI is clearly transformative (and lossy to boot). Giving the AI the ability to scrape and paraphrase others work is less clear and both sides each have valid arguments. I don't envy the judges that must make that call.
But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.
In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.
It's quite literally a fishing expedition.
As a fan and DAU of both OpenAI and the NYT, this is just a weird discovery demand and there should be another pathway for these two to move fwd in this case (NYT to get some semblance of understanding, OAI protecting end-user privacy).
...had never been private in the first place.
not only is the data used for refining the models, OpenAI had also shariah policed plenty of people for generating erotica.
That framing is retorically brilliant if you think about it. I will use that more. Chat Sharia Law for Chat Control. Mass Sharia Surveillance from flock etc.
Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain through training on <1% more user chats. So no, they are not lying when they say they don't train on private chats.
I felt quite some disappointment with the comments I saw on the thread at that time.
NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"
That's a question they fundamentally cannot answer without these chat logs.
That's what discovery, especially in a copyright case, is about.
Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.
That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".
And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.
The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.
This is nonsense. I’ve personally been involved in these things, and fought to protect user privacy at all levels and never lost.
The correct term for this is prima facie right.
You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.
Similarly, liberty is a prima facie right; you can be arrested for committing a crime.
The trouble with this logic is NYT already made that argument and lost as applied to an original discovery scope of 1.4 billion records. The question now is about a lower scope and about the means of review, and proposed processes for anonymization.
They have a right to some form of discovery, but not to a blank check extrapolation that sidesteps legitimate privacy issues raised both in OpenAIs statement as well as throughout this thread.
This wasn't solid enough for a summary judgement, and it seems the labs have largely figured out how to stop the models from doing this. So it looks like NYT wants to comb all user chats rather than pay a team of people tens of thousands a day to try an coax articles out of ChatGPT-5.
NYT is suing for statutory copyright infringement. That means you only need to demonstrate that the copyright infringement, since the infringement alone is considered harm; the actual harm only matters if you're suing for actual damages.
This case really comes down to the very unsolved question of whether or not AI training and regurgitation is copyright infringement, and if so, if it's fair use. The actual ways the AI is being used is thus very relevant for the case, and totally within the bounds of discovery. Of course, OpenAI has also been engaging this lawsuit with unclean hands in the first place (see some of their earlier discovery dispute fuckery), and they're one of the companies with the strongest "the law doesn't apply to US because we're AI and big tech" swagger.
When Altman says "They claim they might find examples of you using ChatGPT to try to get around their paywall." he is blatantly misrepresenting the case.
https://smithhopen.com/2025/07/17/nyt-v-openai-microsoft-ai-...
"The lawsuit focuses on using copyrighted material for AI training. The NYT says OpenAI and Microsoft copied vast amounts of its content. They did this to build generative AI tools. These tools can output near-exact copies of NYT articles. Therefore, the NYT argues this breaks copyright laws. It also hurts journalism by skipping paywalls and cutting traffic to original sites. The complaint shows examples where ChatGPT mimics NYT stories closely. This could lead to money loss and harm from AI errors, called hallucinations."
This has nothing to do with the users, it has everything to do with OpenAI profiting off of pirated copyrighted material.
Also, Altmans is getting scared because the NY Times proved to the judge that CahtGPT copied many articles:
"2025 brings big steps in the case. On March 26, 2025, Judge Sidney Stein rejected most of OpenAI’s dismissal motion. This lets the NYT’s main copyright claims go ahead. The judge pointed to “many” examples of ChatGPT copying NYT articles. He found them enough to continue. This ruling dropped some side claims, like unfair competition. But it kept direct and contributory infringement, plus DMCA breaches."
This is so transparently icky. "Oh woe is us! We're being sued and we're looking out for YOU the user, who is definitely not the product. We are just a 'lil 'ol (near) trillion-dollar business trying to protect you!"
Come ON.
Look I don't actually know who's in the right in the OAI vs. NYT dispute, and frankly I personally lean more toward the side the says that you are allowed to train models on the world's information as long as you consume it legally and don't violate copyright.
But this transparent attempt to get user sympathy under insanely disingenuous pretenses is just absurd.
Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...
Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.
Also their framing of the NYT intent makes me strongly distrust anything they say. Sit down with a third party interviewer who asks challenging questions, and I'll pay attention.
…”as does any culpability for poisoning yourself, suicide, and anything else we clearly enabled but don’t want to be blamed for!”
Edit: honestly I’m surprised I left out the bit where they just indiscriminately scraped everything they could online to train these models. The stones to go “your data belongs to you” as they clearly feel entitled to our data is unbelievably absurd
Should walmart be "culpable" for selling rope that someone hanged themselves with? Should google be "culpable" for returning results about how to commit suicide?
An overly simplistic claim only deserves an overly simplistic response.
That's simply a function of the fact it's a controversial news organization running a dragnet on private communications to a technology platform.
"Great cases, like hard cases, make bad law."
And what if they for example find evidence of X other thing such as:
1. Something useful for a story, maybe they follow up in parallel. Know who to interview and what to ask?
2. A crime.
3. An ongoing crime.
4. Something else they can sue someone else for.
5. Top secret information
2. That sounds useful.
3. That sounds useful.
4. That sounds useful.
5. That sounds useful.
Are these supposed to be examples of things that shouldn't be found out about? This has to be the worst pro-privacy argument I've ever seen on the internet. "Privacy is good because they will find out about our crimes"
In direct contrast: I fully agree with OpenAI here. We can have a more nuanced opinion than 'piracy to train AI is bad therefore refusing to share chats is bad', which sounds absurd but is genuinely how one of the other comments follows logic.
Privacy is paramount. People _trust_ that their chats are private: they ask sensitive questions, ones to do with intensely personal or private or confidential things. For that to be broken -- for a company to force users to have their private data accessed -- is vile.
The tech community has largely stood against this kind of thing when it's been invasive scanning of private messages, tracking user data, etc. I hope we can collectively be better (I'm using ethical terms for a reason) than the other replies show. We don't have to support OpenAI's actions in order to oppose the NYT's actions.
IMO we can have multiple views over multiple companies and actions. And the sort of discussions I value here on HN are ones where people share insight, thought, show some amount of deeper thinking. I wanted to challenge for that with my comment.
_If_ we agree the NYT even has a reason to examine chats -- and I think even that should be where the conversation is -- I agree that there should be other ways to achieve it without violating privacy.
The tech community has been doing the scanning and tracking.
Probably because they have a lot to hide, a lot to lose, and no interest in fair play.
Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.
You don't have to think that OpenAI is good to think there's a legitimate issue over exposing data to a third party for discovery. One could see the Times discovering something in private conversations outside the scope of the case, but through their own interpretation of journalistic necessity, believe it's something they're obligated to publish.
Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.
It's OpenAI's data, there is a protective order in the case and OpenAI already agreed to anonymize it all.
>Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.
lol... what?
I thought they did? The warning currently says
>This chat won't appear in history, use or update ChatGPT's memory, or be used to train our models. For safety purposes, we may keep a copy of this chat for up to 30 days.
But AFAIK it was this way before the lawsuit as well.
The dodgy thing is that they don't now warn users that all chats, including temporary, are now "Bcc: NYT"
https://www.schneier.com/blog/archives/2025/06/what-llms-kno...
At some point they'll monetize these dossiers.
what protection does user data typically have during legal discovery in a civil suit like this where the defendant is a service provider but relevant evidence is likely present in user data?
Does a judge have to weigh a users' expectation of privacy against the request? Do terms of service come into play here (who actually owns the data? what privacy guarantees does the company make?).
I'm assuming in this case that the request itself isn't overly broad and seems like a legitimate use of the discovery process.
I think I have enough with the first sentence, no need to read more. The narration is clear, we are the brain and no one can stop us.
OpenAI is right here. The NYT needs to prove their case another way.
what are you referencing here?
The NYT is certainly open to criticism along many fronts, but I don't have the slightest idea what you mean in claiming it promotes authoritarianism.
Under those circumstances, why wouldn't NYT have a case? I advise everybody who employs some sort of DRM or online system that limits access to ask for every chat that every one of these companies has ever had with anyone. Why are they the only people who get to break copyright and hacking laws? Why are they the only people who get to have private conversations?
I might also check if any LLMs have ever endorsed terrorist points of view (or banned political parties) during a chat, because even though those points of view may be correct (depending on the organization), endorsing them may be illegal and make you subject to sanctions or arrest. If people can't just speak, certainly corporate LLMs shouldn't be able to.
- Is it part of a slow process of eroding public expectations of data privacy while blaming it on an external actor?
- Is it to undermine trust in traditional media, in an effort to increase dependence on AI companies as a source of truth?
- Is something else I'm not seeing?
I'm guessing it's all three of these?
[1] Those emails that came up in the suit with Elon Musk, followed by his eventual complete takeover of OpenAI, and the elaborate process of getting himself installed as chairman of the Reddit board to get the original founders back in control are prominent examples.
As might any plaintiff. NYT might be the first of many others and the lawsuits may not be limited to copyright claims
Why has OpenAI collected and stored 20 million conversations
What is the purpose of OpenAI storing millions of private conversations
By contrast the purpose of NYT's request is both clear and limited
The documents requested are not being made public by the plaintiffs. The documents will presumably be redacted to protect any confidential information before being produced to the plaintiffs, the documents can only be used by the plaintiffs for the purpose of the litigation against OpenAI and, unlike OpenAI who has collected and stored these conversations for all time, the plaintiffs are prohibited from retaining copies of the documents after the litigation is concluded
The privacy issue here has been created by OpenAI for their own commercial benefit
It is not even clear what this benefit, if any, will be as OpenAI continues to search for a "business model"
Wanton data collection
Of course the Times wants more evidence that the content OpenAI allegedly stole is ending in things OpenAI is selling.
This isn't even a hyperbole. It's literally the same thing.
Is this a joke? We all know people do this. There is no "might" in it. They WILL find it.
OpenAI is trying to make it look like this is a breach of user's privacy, when the reality is that it's operating like a pirate website and if it were investigated that would become proven.
mac3n•2h ago
-- openai
great_wubwub•1h ago
-- openai, probably.
frig57•1h ago
gk1•1h ago
nrhrjrjrjtntbt•1h ago
remember a corporation generally is an object owned by some people. Do you trust "unspecified future group of people" with your privacy? You can't. Best we can do is understand the information architecture and act accordingly.