OpenAI slams court order to save all ChatGPT logs, including deleted chats

https://arstechnica.com/tech-policy/2025/06/openai-says-court-forcing-it-to-save-all-chatgpt-logs-is-a-privacy-nightmare/

1094•ColinWright•1d ago

Comments

ColinWright•1d ago

Full post:

"After court order, OpenAI is now preserving all ChatGPT user logs, including deleted chats, sensitive chats, etc."

righthand•1d ago

Sounds like deleted chats are now hidden chats. Off the record chats are now on the record.

hyperhopper•1d ago

This is the real news. It should be illegal to call something deleted when it is not.

JKCalhoun•1d ago

"Marked" for deletion.

Aeolun•1d ago

Or maybe it should be illegal to have a court order that the privacy of millions of people should be infringed? I’m with OpenAI on this one, regardless of their less than pure reasons. You don’t get to wiretap all of the US population, and that’s essentially what they are doing here.

amanaplanacanal•1d ago

They are preserving evidence in a lawsuit. If you are concerned, you can try petitioning the court to keep your data private. I don't know how that would go.

djrj477dhsnv•1d ago

The privacy of millions of people should take precedence over ease of evidence collection for a lawsuit.

Aeolun•1d ago

You can use that same argument for wiretapping the US, because surely someone did something wrong. So we should just collect evidence on everyone on the off chance we need it.

baobun•23h ago

That's already the case. Ever looked into the Snowden leaks?

girvo•1d ago

> It should be illegal to call something deleted when it is not.

I don't disagree, but that ship sailed at least 15+ years ago. Soft delete is the name of the game basically everywhere...

eurekin•1d ago

At work we dutifully delete all data on a GDPR request

simonw•1d ago

Purely out of interest, how do you verify that the GDPR request is coming from the actual user and not an imposter?

dijksterhuis•1d ago

> The organisation might need you to prove your identity. However, they should only ask you for just enough information to be sure you are the right person. If they do this, then the one-month time period to respond to your request begins from when they receive this additional information.

https://ico.org.uk/for-the-public/your-right-to-get-your-dat...

eurekin•1d ago

In my domain, our set of services only authorizes Customer Centre system to do so. I guess I'd need to ask them for details, but I always assumed they have checks in place

sahila•1d ago

How do you manage deleting data from backups? Do you know not take backups?

Gigachad•1d ago

Probably most just ignore backups. But there were some good proposals where you encrypt every users data with their own key. So a full delete is just deleting the users encryption key, rendering all data everywhere including backups inaccessible.

liamYC•1d ago

Smart, how do you backup the users encryption keys?

aiiane•1d ago

A set of encryption keys is a lot smaller than the set of all user data, so it's much more viable to have both more redundant hot storage and more frequently rotated cold storage of just the keys.

jandrewrogers•1d ago

Deletion via encryption only works if every user’s data is completely separate from every other user’s data in the storage layer. This is rarely the case in databases, indexes, etc. It also is often infeasible if the number of users is very large (key schedule state alone will blow up your CPU cache).

Databases with data from multiple users largely can’t work this way unless you are comfortable with a several order of magnitude loss of performance. It has been built many times but performance is so poor that it is deemed unusable.

alisonatwork•1d ago

Some of these issues could perhaps be addressed by having fixed retention of PII in the online systems, and encryption at rest in the offline systems. If a user wants to access data of theirs which has gone offline, they take the decryption hit. Of course it helps to be critical about how much data should be retained in the first place.

It is true that protecting the user's privacy costs more than not protecting it, but some organizations feel a moral obligation or have a legal duty to do so. And some users value their own privacy enough that they are willing to deal with the decreased convenience.

As an engineer, I find it neat that figuring out how to delete data is often a more complicated problem than figuring out how to create it. I welcome government regulations that encourage more research and development in this area, since from my perspective that aligns actually-interesting technical work with the public good.

jandrewrogers•1d ago

> As an engineer, I find it neat that figuring out how to delete data is often a more complicated problem than figuring out how to create it.

Unfortunately, this is a deeply hard problem in theory. It is not as though it has not been thoroughly studied in computer science. When GDPR first came out I was actually doing core research on “delete-optimized” databases. It is a problem in other domains. Regulations don’t have the power to dictate mathematics.

I know of several examples in multiple countries where data deletion laws are flatly ignored by the government because it is literally impossible to comply even though they want to. Often this data supports a critical public good, so simply not collecting it would have adverse consequences to their citizens.

tl;dr: delete-optimized architectures are so profoundly pathological to query performance, and a lesser extent insert performance, that no one can use them for most practical applications. It is fundamental to the computer science of the problem. Denial of this reality leads to issues like the above where non-compliance is required because the law didn’t concern itself with the physics of computation.

If the database is too slow to load the data then it doesn’t matter how fast your deterministic hard deletion is because there is no data to delete in the system.

Any improvements in the situation are solving minor problems in narrow cases. The core theory problems are what they are. No amount of wishful thinking will change this situation.

alisonatwork•1d ago

It would be interesting to hear more about your experience with systems where deletion has been deemed "literally impossible".

Every database I have come across in my career has a delete function. Often it is slow. In many places I worked, deleting or expiring data cost almost as much as or sometimes more than inserting it... but we still expired the data because that's a fundamental requirement of the system. So everything costs 2x, so what? The interesting thing is how to make it cost less than 2x.

Gigachad•1d ago

Instantaneous deletes might be impossible, but I really doubt that it’s physically impossible to eventually delete user data. If you soft delete first to hide user data, and then maybe it takes hours, weeks, months to eventually purge from all systems, that’s fine. Regulators aren’t expecting you to edit old backups, only that they eventually get cleared in reasonable time.

Seems that companies are capable of moving mountains when the task is tracking the user and bypassing privacy protections. But when the task is deleting the users data it’s “literally impossible”

blagie•1d ago

The entire mess isn't with data in databases, but on laptops for offline analysis, in log files, backups, etc.

It's easy enough to have a SQL query to delete a users' data from the production database for real.

It's all the other places the data goes that's a mess, and a robust system of deletion via encryption could work fine in most of those places, at least in the abstract with the proper tooling.

catlifeonmars•22h ago

You can use row based encryption and store the encrypted encryption key alongside each row. You use a master key to decrypt the row encryption key and then decrypt the row each time you need to access it. This is the standard way of implementing it.

You can instead switch to a password-based key derivation function for the row encryption key if you want the row to be encrypted by a user provided password

jandrewrogers•21h ago

This has been tried many times. The performance is so poor as to be unusable for most applications. The technical reasons are well-understood.

The issue is that, at a minimum, you have added 32 bytes to a row just for the key. That is extremely expensive and in many cases will be a large percentage of the entire row; many years ago PostgreSQL went to heroic efforts to reduce 2 bytes per row for performance reasons. It also limits you to row storage, which means query performance will be poor.

That aside, you overlooked the fact that you'll have to compute a key schedule for each row. None of the setup costs of the encryption can be amortized, which makes processing a row extremely expensive computationally.

There is no obvious solution that actually works. This has been studied and implemented extensively. The reason no one does it isn't because no one has thought of it before.

catlifeonmars•21h ago

You’re not wrong about the downsides. However you’re wrong about the costs being prohibitive on general. I’ve personally worked on quite a few applications that do this and the additional cost has never been an issue.

Obviously context matters and there are some applications where the cost does not outweigh the benefit

infinite8s•15h ago

I think you and the GP are probably talking about different scale orders of magnitude.

catlifeonmars•11h ago

Very likely!

But I think there must also be constraints other than scale. The profit margins must also be razor thin.

alisonatwork•1d ago

Backups can have a fixed retention period.

sahila•1d ago

Sure, but now when the backup is restored two weeks later, is the user redeleted or just forgotten about?

alisonatwork•23h ago

Depends on the processes in place at the company. Presumably if a backup is restored, some kind of replay has to happen after that, otherwise all the other users are going to lose data that arrived in the interim. A catastrophic failure where both two weeks of user data and all the related events get irretrievably blackholed could still happen, sure, but any company where that is a regular occurrence likely has much bigger problems than complying with GDPR.

The point is that none of these problems are insurmountable - they are all processes and practices that have been in place since long before GDPR and long before I started in this industry 25+ years ago. Even if deletion is only eventually consistent, even if a few pieces of data slip through the cracks, it is not hard to have policies in place that at least provide a best effort at upholding users' privacy and complying with the regulations.

Organizations who choose not to bother, claiming that it's all too difficult, or that because deletion cannot be done 100% perfectly it should not even be attempted at all, are making weak excuses. The cynical take would be that they are just covering for the fact that they really do not respect their users' privacy and simply do not want to give up even the slightest chance of extracting value from that data they illegally and immorally choose to retain.

crdrost•1d ago

"When data subjects exercise one of their rights, the controller must respond within one month. If the request is too complex and more time is needed to answer, then your organisation may extend the time limit by two further months, provided that the data subject is informed within one month after receiving the request."

Backup retention policy 60 days, respond within a week or two telling someone that you have purged their data from the main database but that these backups exist and cannot be changed, but that they will be automatically deleted in 60 days.

The only real difficulty is if those backups are actually restored, then the user deletion needs to be replayed, which is something that would be easy to forget.

Trasmatta•1d ago

Most companies don't keep all backups in perpetuity, and instead have rolling backups over some period of time.

gruez•1d ago

That won't work in this case, because I doubt GDPR requests override court orders.

miki123211•1d ago

This is very, very hard in practice.

With how modern systems, languages, databases and file systems are designed, deletion often means "mark this as deleted" or "erase the location of this data". This is true on all possible levels of the stack, from hardware to high-level application frameworks.

Changing this would slow computers down massively. Just to give a few examples, backups would be prohibited, so would be garbage collection and all existing SSD drives. File systems would have to wipe data on unlink(), which would increase drive wear and turn operations which everybody assumed were O(1) for years into O(n), and existing software isn't prepared for that. Same with zeroing out memory pages, OSes would have to be redesigned to do it all at once when a process terminates, and we just don't know what the performance impact of that would be.

Gigachad•1d ago

You just do it the way fast storage wipes do it. Encrypt everything, and to delete you delete the decryption key. If a user wants to clear their personal data, you delete their decryption key and all of their data is burned without having to physically modify it.

jandrewrogers•1d ago

That only works if you have a single key at the block level, like an encryption key per file. It essentially doesn’t work for data that is finely mixed with different keys such as in a database. Encryption works on byte blocks, 16-bytes in the case of AES. Modern data representations interleave data at the bit level for performance and efficiency reasons. How do you encrypt a block with several users data in it? Separating these out into individual blocks is extremely expensive in several dimensions.

There have been several attempts to build e.g. databases that worked this way. The performance and scalability was so poor compared to normal databases that they were essentially unusable.

girvo•1d ago

It would be very hard to change technically, yes.

But that's not the only solve. It's easy to change the words we use instead to make it clear to users that the data isn't irrevocably deleted.

aranelsurion•1d ago

Consequently all your "deleted chats" might one day become public if someone manages to dump some tables off OpenAI's databases.

Maybe not today on its heyday, but who knows what happens in 20 years once OpenAI becomes Yahoo of AI, or loses much of its value, gets scrapped for parts and bought by less sophisticated owners.

It's better to regard that data as already public.

jandrewrogers•1d ago

The concept of “deleted” is not black and white, it is a continuum (though I agree that this is a very soft delete). As a technical matter, it is surprisingly difficult and expensive to unrecoverably delete something with high assurance. Most deletes in real systems are much softer than people assume because it dramatically improves performance, scalability, and cost.

There have been many attempts to build e.g. databases that support deterministic hard deletes. Unfortunately, that feature is sufficiently ruinous to efficient software architecture that performance is extremely poor such that no one uses them.

tarellel•1d ago

I’m sure this has been the case all along.

causal•1d ago

I know this is a popular suspicion but some companies really do take privacy seriously, especially when operating in Europe

3abiton•1d ago

Does that fly in the EU?

ColinWright•22h ago

Just some context ...

The original submission was a link to a post on Mastodon. The post itself was too long to fit in the title, so I trimmed it, and put the full post here in a comment.

But with the URL in the submission being changed, this doesn't really make sense any more! In the future I'll make sure I include in the comment the original link with the original text so it makes sense even if (when?) the submission URL gets changed.

bluetidepro•1d ago

More context: https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

gmueckl•1d ago

That's the true source. Should the link be updated to this article?

cwillu•1d ago

Email hn@ycombinator.com and they'll probably change it.

basilgohar•1d ago

[flagged]

hsbauauvhabzb•1d ago

> but in a way that helps common people

That’ll be the day. But even if it does happen, major AI players have the resources to move to a more ‘flexible’ country, if there isn’t a loophole that involves them closing their eyes really really tight while outsourced webscrapers collect totally legit and not illegally obtained data

telchior•1d ago

You're being generous to even grant an "even if it does" proposition. Considering the people musing about "reform" of copyright at the moment -- Jack Dorsey's little flip "delete all IP law" comes to mind -- the clear direction we're headed is toward artistic and cultural serfdom.

krick•1d ago

In all fairness, the essence of it doesn't have to do anything with copyright. "Pro-copyright" is old news. Everyone knows these companies shit on copyright, but so do users, and the only reason why users sometimes support the "evil pro-copyright shills" narrative is because we are bitter that Facebook and OpenAI can get away with that, while common peasants are constantly under the risk of being fucked up for life. The news is big news only because of "anti-ChatGPT" part, and everyone is a user of ChatGPT now (even though 50% of them hate it). Moreover, it's only big news because the users are directly concerned: if OpenAI would have to pay big fine and continue business as usual, the comments would largely be schadenfreude.

And the fact that the litigation was over copyright is an insignificant detail. It could have been anything. Literally anything, like a murder investigation, for example. It only helps OpenAI here, because it's easy to say "nobody cares about copyright", and "nobody cares about murder" sounds less defendable.

Anyway, the issue here is not copyright, nor "AI", it's the venerated legal system, which very much by design allows for a single woman to decide on a whim, that a company with millions of users must start collecting user data, while users very much don't want that, and the company claims it doesn't want that too (mostly, because it knows how much users don't want that: otherwise it'd be happy to). Everything else is just accidental details, it really has nothing to do neither with copyright, nor with "AI".

dijksterhuis•1d ago

My favourite comment:

>> Wang apparently thinks the NY Times' boomer copyright concerns trump the privacy of EVERY @OpenAI USER—insane!!! -- someone on twitter

> Apparently not having your shit stolen is a boomer idea now.

AlienRobot•1d ago

Classic "someone on Twitter" take.

infotainment•1d ago

Ars comments, in general, are hilariously bad.

It's surprising to me, because you'd think a site like Ars would attract a generally more knowledgable audience, but reading through their comment section feels like looking at Twitter or YouTube comments -- various incendiary and unsubstantial hot takes.

sevensor•1d ago

The ars-technica.com forums were pretty good, 2.5e-1 centuries ago.

johnnyanmac•1d ago

I'm "pro-copyright" in that I want the corporations that setup this structure to suffer under it the way we did for 25+ years. They can't just ignore the rules they spent millions lobbying for when they feel it's convinient.

On the other end: while copyright has been perverted over the centuries, the goal is still overall to protect small inventors. They have no leverage otherwise and this gives them some ability to fight if they aren't properly compensated. I definitely do not want it abolished outright. Just reviewed and reworked for modern times.

dmix•1d ago

Corporations are not a monolith. Silicon Valley never lobbied for copyright AFAIK

Google and others fought it pretty hard

tomhow•1d ago

Thanks, we updated the URL to this from https://mastodon.laurenweinstein.org/@lauren/114627064774788...

zombiwoof•1d ago

Palantir wants them

bigyabai•1d ago

It's not like Sam Altman has been particularly hostile to the present administration. He's probably already handing them over behind closed doors and doesn't want to take the PR hit.

nickv•1d ago

Give me a break, they're literally spending money fighting this court order.

bigyabai•1d ago

They're literally salaried lawyers. The average American taxpayer is spending more on legal representation for this case than OpenAI is.

It's a publicity stunt, ordered by executives. If you think OpenAI is doing this out of principle, you're nuts.

Draiken•23h ago

For the sole reason that this costs money to do, not out of the goodness of their hearts.

LightBug1•1d ago

I'd rather use Chinese LLM's than put up with this horseshit.

SchemaLoad•1d ago

At least the DeepSeek lets you run it locally.

romanovcode•21h ago

NVIDIA should just release a box and say "THIS WILL RUN DEEPSEEK LOCALLY VERY FAST. 3000 USD."

BrawnyBadger53•12h ago

Isn't this what project digits is lol

mensetmanusman•1d ago

Slavery or privacy!

solardev•1d ago

Communism AND barbarism, 2 for the price of 1!

LightBug1•1d ago

He says ... while typing away on Chinese technology.

Disclaimer: I'm not Chinese. But I recognise crass hypocrisy when I see it.

LightBug1•1d ago

Slavery or no privacy? ... what's the difference?

Bodily slavery or mental slavery ... take your pick.

AStonesThrow•1d ago

Ask any Unix filesystem developer, and they'll tell you that unlink(2) on a file does not erase any of its data, but simply enables the reuse of those blocks on disk.

Whenever I "delete" a social media account, or "Trash" anything on a cloud storage provider, I repeat the mantra, "revoking access for myself!" which may be sung to the tune of "Evergreen".

II2II•1d ago

In the first case, there is nothing preventing the development of software to overwrite data before unlink(2) is called.

In the second case, you can choose to trust or distrust the cloud storage provider. Trust being backed by contractual obligations and the right to sue if those obligations are not met. Of course, most EULAs for consumer products are toothless is this respect. On the other hand, that doesn't prevent companies from offering contracts which have some teeth (which they may do for business clients).

wolfgang42•1d ago

> there is nothing preventing the development of software to overwrite data before unlink(2) is called.

It’s not that simple: this command already exists, it’s called `shred`, and as the manual[1] notes:

The shred command relies on a crucial assumption: that the file system and hardware overwrite data in place. Although this is common and is the traditional way to do things, many modern file system designs do not satisfy this assumption.

[1] https://www.gnu.org/software/coreutils/manual/html_node/shre...

grg994•1d ago

A reasonable cloud storage provider stores your data encrypted on disk. Certain standards like HIPPA mandates this.

Deletion of data is achieved by permanently discarding the encryption key which is stored and managed elsewhere where secure erasure can be guaranteed.

If implemented honestly, this procedure WORKS and cloud storage is secure. Yes the emphasis is on the "implemented honestly" part but do not generalize cloud storage as inherently insecure.

david_shi•1d ago

there's no indication at all on the app that this is happening

pohuing•1d ago

The docs contain a sentence on them retaining any chats that they legally have to retain. This is always the risk when doing business with law abiding companies which store any data on you.

SchemaLoad•1d ago

They should disable the secret chat functionality immediately if it's flat out lying to people.

baby_souffle•1d ago

Agree. But it's worth noting that they already have a bit of a hedge in the description for private mode:

> This chat won't appear in history, use or update ChatGPT's memory, or be used to train our models. For safety purposes, we may keep a copy of this chat for up to 30 days.

The "may keep a copy" is doing a lot of work in that sentence.

SchemaLoad•1d ago

"for up to 30 days" though. If they are being kept in perpetuity always they should update the copy to say "This chat won't appear in your history, we retain a copy indefinitely"

rvz•1d ago

Another great use-case for local LLMs, given this news.

The government also says thank you for your ChatGPT logs.

bongodongobob•1d ago

Bad news, they've been sniffing Internet backbones for decades. That cat is way the fuck out of the bag.

63•1d ago

While I also disagree with the court order and OpenAI's implementation (along with pretty much everything else the company does), the conspiratorial thinking in the comments here is unfounded. The order is overly broad but in the context of the case, it's not totally unwaranted. This is not some conspiracy to collect data on people. I'm confident the records will be handled appropriately once the case concludes (and if they're not, we can be upset then, not now). Let's please reserve our outrage for the plethora of very real issues going on right now.

vlovich123•1d ago

> I'm confident the records will be handlded appropriately once the case concludes (and if they're not, we can be upset then, not now)

This makes no sense to me. Shouldn't we address the damage before it's done vs handwringing after the fact?

verdverm•1d ago

the damage to which party?

Aeolun•1d ago

Certainly not the copyright holders, which have lost any form of my sympathy over the past 25 years.

odo1242•1d ago

Users, presumably

pier25•1d ago

Are we talking about the same company that needs data desperately and has used copyrighted material illegally without permission?

simonw•1d ago

When did they use copyrighted material illegally?

I didn't think any of the ongoing "fair use" lawsuits had reached a conclusion on that.

jamessinghal•1d ago

The Thomson Reuters case [1] is the most relevant in the court's finding that the copying of copyrighted material from Westlaw by Ross Intelligence was direct copyright infringement and was not fair use.

The purpose of training in many of the AI Labs being sued mostly matches the conditions that Ross Intelligence was found to have violated, and the question of copying is almost guaranteed if they trained on it.

[1] Thomson Reuters Enterprise Centre GmbH et al v. ROSS Intelligence Inc. https://www.ded.uscourts.gov/sites/ded/files/opinions/20-613...

simonw•1d ago

Thanks, I hadn't seen that one.

pier25•1d ago

ok then let's say they used the copyrighted material without permission

pier25•1d ago

Sorry, I meant to write "monetized copyrighted material without permission".

We'll see if the courts deem it legal but it's, without a doubt, unehtical.

FeepingCreature•1d ago

Eh, I have the opposite view but then again I'm a copyright minimalist.

pier25•20h ago

So you think artists do not need to be able to make a living?

simonw•1d ago

That's true, they did.

basilgohar•1d ago

In what world do you live in where corporations have any right to a benefit of the doubt? When did that legitimately pan out?

Aeolun•1d ago

> and if they're not, we can be upset then, not now

Like we could be upset when that credit checking company dumped all those social security numbers on the net and had to pay the first 200k claimants a grand total of $21 for their trouble?

By that point it’s far too late.

phkahler•1d ago

You're pretty naive. OpenAI is still trying to figure out how to be profitable. Having a court order to retain a treasure trove of data they were already wanting to keep while offering not to, or claiming not to? Hahaha.

tomnipotent•1d ago

Preserving data for a judicial hold does not give them leeway to use that data for other purposes, but don't let facts get in the way of your FUD.

phkahler•19h ago

>> Preserving data for a judicial hold does not give them leeway to use that data for other purposes

Does not give them permission. What if LEO asks for the data? Should they hand it over just because they have it? Remember, this happens all the time with metadata from other companies (phone carriers for example). Having the data means it's possible to use it for other purposes as opposed to not possible. There is always pressure to do so both from within and outside a company.

tomnipotent•12h ago

> Should they hand it over just because they have it?

Not unless LEO sues OpenAI while it's preserving data from the first discovery, otherwise they cannot be compelled to give up data. Nor are they allowed to violate their TOS and use the data outside of retention, despite the FUD you want to spread about it.

> Having the data means it's possible

No, it doesn't. That's not how any of this works.

nickpsecurity•1d ago

They're blaming the court. While there is an order, it is happening in response to massive, blatant, and continued I.P. infringement. Anyone doing that knows they'll be in court at some point. Might have a "duty to preserve" all kinds of data. If they keep at it, then they are prioritizing their gains over any losses it creates.

In short: OpenAI's business practices caused this. They wouldn't have been sued if using legal data. They might still not have an order like this if more open about their training, like Allen Institute.

MeIam•1d ago

These AIs have digested all the data in the past. There is no fingerprints anymore.

The question is whether AI itself is aware what the source is. It certainly knows the source.

comrade1234•1d ago

Use deepseek if you don't want the u.s. government monitoring you.

JKCalhoun•1d ago

Better still, local LLM. It's too bad they're not "subscription grade".

mmasu•1d ago

yet - likely subscription grade will stay ahead of the curve, but we will soon have very decent models running locally for very cheap - like when you play great videogames that are 2/3 years old on now “cheap”machines

JKCalhoun•1d ago

Definitely what I am hoping.

SchemaLoad•1d ago

I tried running the DeepSeek models that would run on my 32GB macbook and they were interesting. They could still produce good conversation but didn't seem to have the entirety of the internet in it's knowledge pool. Asking it complex questions lead to it only offering high level descriptions and best guess answers.

Feel like they would still be great for a lot of applications like "Search my local hard drive for the file that matches this description"

JKCalhoun•1d ago

Yeah, Internet search as a fallback, our chat history and "saved info" in the context ... there's a lot OpenAI, et. al. give you that Ollama does not.

GrayShade•1d ago

You can get those in ollama using tools (MCP).

JKCalhoun•1d ago

Had to ask ChatGPT what MCP (Model Context Protocol) referred to.

When I followed up with how to save chat information for future use in the LLM's context window, I was given a rather lengthy process involving setting up an SQL database, writing some Python tp create a "pre-prompt injection wrapper"....

That's cool and all, but wishing there was something a little more "out of the box" that did this sort of thing for the "rest of us". GPT did mention Tome, LM Studio, a few others....

TechDebtDevin•1d ago

I use their api a lot cuz its so cheap but latency is so bad.

Take8435•1d ago

This post is about OpenAI keeping chat logs. All DeepSeek API calls are kept. https://cdn.deepseek.com/policies/en-US/deepseek-privacy-pol...

TechDebtDevin•18h ago

Yea, I mean I wouldn't send anything to a chinese server I thought was sensative. Or any LLM. For what its worth this is in bold on their TOS:

PLEASE NOTE: We do not engage in "profiling" or otherwise engage in automated processing of Personal Data to make decisions that could have legal or similarly significant impact on you or others.

charcircuit•1d ago

That's not how the discovery process works. This data is only accessible by OpenAI and requests for discovery will pass through OpenAI.

layer8•1d ago

Presumably, the court order only applies to the US?

pdabbadabba•1d ago

I would not assume that it applies only to users located in the U.S., if that's what you mean, since this is designed to preserve evidence of alleged copyright infringement.

layer8•1d ago

I don’t think a US court order can overrule the GDPR for EU customers, for example.

paulddraper•1d ago

Nothing says that laws of different countries can't conflict.

Hopefully they don't though.

swat535•1d ago

Isn't this why companies incorporate in various nations so that they can comply with local regulations ? I'm assuming that EU will demand OpenAI to treat EU users differently..

csomar•1d ago

If they did incorporate in the EU and run their servers in the EU, the EU entity will be a separate entity and (not a lawyer), I think as a result, not the entity concerned by this lawsuit.

fc417fc802•1d ago

Assuming the EU entity were a subsidiary "but I keep that data overseas" seems unlikely to get you off the hook. However I don't think you can be ordered to violate local law. That would be a weird (and I imagine expensive) situation to sort out.

patmcc•1d ago

The US is perfectly able to give orders to US companies that may be against EU law. The GDPR may hold the company liable for that.

layer8•21h ago

OpenAI Ireland Ltd, the entity that provides ChatGPT services to EU residents, is not a US company.

hedora•1d ago

The US Cloud Act makes it illegal for US companies to operate non-e2e encrypted services that are GDPR compliant. They have to warrantlessly hand the govt all data they have the technical capability to access.

layer8•21h ago

OpenAI Ireland Ltd, the entity that provides ChatGPT services to EU residents, is not a US company.

fc417fc802•1d ago

What makes you think that EU law holds sway over US companies? Conversely, would you expect EU companies to abide by US law? Can an EU court not make arbitrary demands of a company that operates in the EU so long as those demands comply with relevant EU law?

layer8•21h ago

OpenAI Ireland Ltd, which as an EU resident is the entity that provides the ChatGPT services to me (according to ChatGPT’s own ToS), is within the jurisdiction of the EU, not the US.

fc417fc802•14h ago

Repeatedly making the same comment all over the local tree is childish and effectively spamming.

To your observation, it's certainly relevant to the situation at hand but has little to do with your original supposition. A US court can order any company that operates in the US to do anything within the bounds of US law, in the same way that an EU court can do the converse. Such an order might well make it impossible to legally do business in one or the other jurisdiction.

If OpenAI Ireland is a subsidiary it will be interesting to see to what extent the court order applies (or doesn't apply) to it. I wonder if it actually operates servers locally or if it's just a frontend that sends all your queries over to a US based backend.

People elsewhere in this comment section observed that the GDPR has a blanket carve out for things that are legally required. Seeing as compliance with a court order is legally required there is likely no issue regardless.

Trasmatta•1d ago

The OpenAI docs are now incredibly misleading: https://help.openai.com/en/articles/8809935-how-to-delete-an...

> What happens when you delete a chat?

> The chat is immediately removed from your chat history view.

> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless:

> It has already been de-identified and disassociated from your account, or

> OpenAI must retain it for security or legal obligations.

That final clause now voids the entire section. All chats are preserved for "legal obligations".

I regret all the personal conversations I've had with AI now. It's very enticing when you need some help / validation on something challenging, but everyone who warned how much of a privacy risk that is has been proven right.

SchemaLoad•1d ago

Feels like all the words of privacy and open source advocates for the last 20 years have never been more true. The worst nightmare scenarios for privacy abuse have all been realized.

gruez•1d ago

>That final clause now voids the entire section. All chats are preserved for "legal obligations".

That's why you read the whole thing? It's not exactly a long read. Do you expect them to update their docs every time they get a subpoena request?

Trasmatta•1d ago

Yes? Why is that an unreasonable request? The docs make it sound like chats are permanently deleted. As of now, that's no longer true, and the way it's portrayed is misleading.

gruez•1d ago

> The docs make it sound like chats are permanently deleted. As of now, that's no longer true, and the way it's portrayed is misleading.

Many things in life are "misleading" when your context window is less than 32 words[1], or can't bother to read that far.

[1] number of words required to get you to "unless", which should hopefully tip you off that not everything gets deleted.

Trasmatta•1d ago

How is a user supposed to know, based on that page, that there's currently a legal requirement that means ALL deleted chats must be preserved? Why defend the currently ambiguous language?

It's like saying "we will delete your chats, unless the sun rises tomorrow". At that point, just say that the chats aren't deleted.

(The snark from your replies seems unnecessary as well.)

lesuorac•1d ago

> It has already been de-identified and disassociated from your account

That's one giant cop-out.

All you had to do was delete the user_id column and you can keep the chat indefinitely.

efskap•1d ago

Note that this also applies to GPT models on the API

> That risk extended to users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI said.

This seems very bad for their business.

Kokouane•1d ago

If you were working with code that was proprietary, you probably shouldn't of been using cloud hosted LLMs anyways, but this would seem to seal the deal.

larrymcp•1d ago

I think you probably mean "shouldn't have". There is no "shouldn't of".

DecentShoes•1d ago

Who cares?

knicholes•1d ago

I care.

rimunroe•1d ago

Which gives you an opening for the excellent double contraction “shouldn’t’ve”

bbarnett•1d ago

The letter H deserves better.

worthless-trash•1d ago

I think we gave it too much leeway in the word sugar.

mananaysiempre•1d ago

The funniest part is that in that contraction the first apostrophe does denote the elision of a vowel, but the second one doesn’t, the vowel is still there! So you end up with something like [nʔəv], much like as if you had—hold the rotten vegetables, please—“shouldn’t of” followed by a vowel.

Really, it’s funny watching from the outside and waiting for English to finally stop holding it in and get itself some sort of spelling reform to meaningfully move in a phonetic direction. My amateur impression, though, is that mandatory secondary education has made “correct” spelling such a strong social marker that everybody (not just English-speaking countries) is essentially stuck with whatever they have at the moment. In which case, my condolences to English speakers, your history really did work out in an unfortunate way.

roywiggins•1d ago

We had a spelling reform or two already, they were unfortunately stupid, eg doubt has never had the b pronounced in English. https://en.m.wiktionary.org/wiki/doubt

That said, phonetic spelling reform would of course privilege the phonemes as spoken by whoever happens to be most powerful or prestigious at the time (after all, the only way it could possibly stick is if it's pushed by the sufficiently powerful), and would itself fall out of date eventually anyway.

jdbernard•1d ago

> but the second one doesn’t, the vowel is still there!

Isn't the "a" in "have" elided along with the "h?"

Shouldn't've Should not have

What am I missing?

jack09268•1d ago

Even though the vowel "a" is dropped from the spelling, if you actually say it out loud, you do pronounce a vowel sound when you get to that spot in the word, something like "shouldn'tuv", whereas the "o" in "not" is dropped from both the spelling and the pronounciation.

SAI_Peregrinus•1d ago

The pronounced vowel is different than the 'a' in 'have'. And the "h" is definitely elided.

int_19h•1d ago

Many English dialects elide "h" at the beginning even when nothing is contracted. The pronounced vowel is different mostly because it's unstressed, and unstressed vowels in English generally centralize to schwa or nearly so.

dan353hehe•1d ago

Don’t worry about us. English is truly a horrible language to learn, and I feel bad for anyone who has to learn it.

Also I have always liked this humorous plan for spelling reform: https://guidetogrammar.org/grammar/twain.htm

amanaplanacanal•1d ago

English spelling is pretty bad, but spoken English isn't terrible, is it? It's the most popular second language.

somenameforme•1d ago

You never realize how many weird rules, weird exceptions, ambiguities, and complete redundancies there are in this language until you try to teach English, which will also probably teach you a bunch of terms and concepts you've never heard of. Know what a gerund is? Then there's things we don't even think about that challenge even advanced foreign learners, like when you use which articles: the/a.

English popularity was solely and exclusively driven by its use as a lingua franca. As times change, so too will the language we speak.

huimang•9h ago

Every real, non-constructed language has weird rules, weird exceptions, ambiguities, and complete redundancies. English is on the more difficult end but it's not nearly the most difficult. I'm not sure how it got to be perceived as this exceptionally tough language just because pronunciation can be tough. Other languages have pronunciation ambiguities too...

int_19h•1d ago

English is rather complex phonologically. Lots of vowels for starters, and if we're talking about American English these include the rather rare R-colored vowels - but even without them things are pretty crowded, e.g. /æ/ vs /ɑ/ vs /ʌ/ ("cat" vs "cart" vs "cut") is just one big WTF to anyone whose language has a single "a-like" phoneme, which is most of them. Consonants have some weirdness as well - e.g. a retroflex approximant for a primary rhotic is fairly rare, and pervasive non-sibilant coronals ("th") are also somewhat unusual.

There are certainly languages with even more spoken complexity - e.g. 4+ consonant clusters like "vzdr" typical of Slavic - but even so spoken English is not that easy to learn to understand, and very hard to learn to speak without a noticeable accent.

throwawaymb•18h ago

English is far from the most complex or difficult.

shagie•22h ago

The node for it on Everything2 makes it a little bit easier to follow with links to the English word. https://everything2.com/title/A+Plan+for+the+Improvement+of+...

So, its something like:

    For example, in Year 1 that useless letter "c" would be dropped to be [replased](replaced) either by "k" or "s", and likewise "x" would no longer be part of the alphabet.

It becomes quite useful in the later sentences as more and more reformations are applied.

throwawaymb•18h ago

English being particularily difficult is just a meme. only the orthography is confusing.

veqq•1d ago

> phonetic

A phonetic respelling would destroy the languages, because there are too many dialects without matching pronunciations. Though rendering historical texts illegible, a phonemic approach would work: https://en.wiktionary.org/wiki/Appendix:English_pronunciatio... But that would still mean most speakers have 2-3 ways of spelling various vowels. There are some further problems with a phonemic approach: https://alexalejandre.com/notes/phonetic-vs-phonemic-spellin...

Here's an example of a phonemic orthography, which is somewhat readable (to me) but illustrates how many diacritics you'd need. And it still spells the vowel in "ask" or "lot" with the same ä! https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

inkyoto•1d ago

> A phonetic respelling would destroy the languages, because there are too many dialects without matching pronunciations.

Not only that, but since pronunciation tends to diverge over time, it will create a never-ending spelling-pronunciation drift where the same words won't be pronounced the same in, e.g. 100-200 years, which will result in future generations effectively losing easy access to the prior knowledge.

selcuka•1d ago

> since pronunciation tends to diverge over time, it will create a never-ending spelling-pronunciation drift

Once you switch to a phonetic respelling this is no longer a frequent problem. It does not happen, or at least happens very rarely with existing phonetic languages such as Turkish.

In the rare event that the pronunciation of a sound changes in time, the spelling doesn't have to change. You just pronounce the same letter differently.

If it's more than one sound, well, then you have a problem. But it happens in today's non-phonetic English as well (such as "gost" -> "ghost", or more recently "popped corn" -> "popcorn").

veqq•1d ago

> Once you switch to a phonetic respelling this is no longer a frequent problem

Oh, but it does. It's just the standard is held as the official form of the language and dialects are killed off through standardized education etc. To do this in English would e.g. force all Australians, Englishmen etc. to speak like an American (when in the UK different cities and social classes have quite divergent usage!) This clearly would not work and would cause the system to break apart. English exhibits very minor diaglossia, as if all Turkic peoples used the same archaic spelling but pronounced it their own ways, e.g. tāg, kök, quruq, yultur etc. which Turks would pronounce as dāg, gök, yıldız etc. but other Turks today say gurt for kurt, isderik, giderim okula... You just say they're "wrong" because the government chose a standard and (Turkic people's outside of Turkey weren't forced to use it.)

As a native English speaker, I'm not even sure how to pronounce "either" (how it should be done in my dialect) and seemingly randomly reduce sounds. We'd have to change a lot of things before being able to agree on a single right version and slowly making everyone speak like that.

int_19h•1d ago

There's no particular reason why e.g. Australian English should have the same phonemic orthography as American English.

Nor is it some kind of insurmountable barrier to communication. For example, Serbian, Croatian, and Bosnian are all idiolects of the same language with some differences in phonemes (like i/e/ije) and the corresponding differences in standard orthographies, but it doesn't preclude speakers from understanding each other's written language anymore so than it precludes them from understanding each other's spoken language.

veqq•1d ago

> Serbian, Croatian and Bosnian

are based on the exact same Štokavian dialect, ignoring Kajkavian, Čajkavian, Čakavian and Torlakian dialects. There is _no_ difference in standard orthography, because yat reflexes have nothing to do with national boundaries. Plenty of Serbs speak Ijekavian, for example. Here is a dialect map: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fc...

Your example is literally arguing that Australian English should have the same _phonetic_ orthography, even. But Australian English must have the same orthography or else Australia will no longer speak English in 2-3 generations. The difference between Australian and American English is far larger than between modern varieties of naš jezik. Australians code switches talking to foreigners while Serbs and Croats do not.

int_19h•16h ago

> There is _no_ difference in standard orthography, because yat reflexes have nothing to do with national boundaries

But there is, though, e.g. "dolijevati" vs "dolivati". And sure, standard Serbian/Montenegrin allows the former as well, but the latter is not valid in standard Croatian orthography AFAIK. That this doesn't map neatly to national borders is irrelevant.

If Australian English is so drastically different that Australians "won't speak English in 2-3 generations" if their orthography is changed to reflect how they speak, that would indicate that their current orthography is highly divergent from the actual spoken language, which is a problem in its own right. But I don't believe that this is correct - Australian English content (even for domestic consumption, thus no code switching) is still very much accessible to British and American English speakers, so any orthography that would reflect the phonological differences would be just as accessible.

veqq•9h ago

By tautology, if you split the language, you split the language. Different groups will exhibit divergent evolution.

> current orthography is highly divergent from the actual spoken language, which is a problem in its own right

The orthography is no more divergent to an Australians speech as to an American's speech, let alone a Londoner or Oxfordian. But why would it be a problem?

jenadine•1d ago

I think Norway did such a reform and they ended up with two languages now.

inkyoto•1d ago

Or, if one considers that Icelandic is/was the «orginal» Old West Norwegian language, Norway has ended up with •three* languages.

selcuka•1d ago

> dialects are killed off through standardized education etc.

Sorry, I didn't mean that it would be a smooth transition. It might even be impossible. What I wrote above is (paraphrasing myself) "Once you switch to a phonetic respelling [...] pronunciation [will not] tend to diverge over time [that much]". "Once you switch" is the key.

> To do this in English would e.g. force all Australians, Englishmen etc. to speak like an American

Why? There is nothing that prevents Australians from spelling some words differently (as we currently do, e.g. colour vs color, or tyre vs tire).

inkyoto•1d ago

The need for regular re-spelling and problems it introduces are precisely my point.

Consider three English words that have survived over the multiple centuries and their respective pronunciation in Old English (OE), Middle English around the vowel shift (MidE) and modern English, using the IPA: «knight», «through» and «daughter»:

  «knight»:  [knixt] or [kniçt] (OE) ↝ kniçt] or [knixt] (MidE) ↝ [naɪt] (E)

  «through»: [θurx] (OE) ↝ [θruːx] or [θruɣ] (MidE) ↝ [θruː] (E)

  «daughter»: [ˈdoxtor] (OE) ↝ [ˈdɔuxtər] or [ˈdauxtər] (MidE) ↝ [ˈdɔːtə] (E)

It is not possible for a modern English speaker to collate [knixt] and [naɪt], [θurx] and [θruː], [ˈdoxtor] and [ˈdɔːtə] as the same word in each case.

Regular re-spelling results in a loss of the linguistic continuity, and particularly so over a span of a few or more centuries.

inglor_cz•1d ago

Interesting, just how much the Old English words sound like modern German: Knecht, durch and Tochter. Even after 1000 years have elapsed.

kragen•1d ago

Modern German didn't undergo the Norman Conquest, a mass influx of West African slaves, or an Empire on which the Sun never set, so it is much more conservative. The incredible thing about the Norman Conquest, linguistically speaking, is that English survived at all.

veqq•9h ago

The great vowel shift happened in the 16th century and is responsible for most of these changes. The original grammatical simplification (loss of cases etc.) between 10-1300 is difficult to ascribe, as similar happened in continental Scandinavian languages (and the Swedes had their own vowel dance!) But the shift in words themselves came much after (and before empire).

simiones•22h ago

English also shows a remarkable variation in pronunciation of words even for a single person. I don't know of any other language where, even in careful formal speech, words can just change pronunciation drastically based on emphasis. For example, the indefinite article "a" can be pronounced as either [ə] (schwa, for the weak form) or "ay" (strong form). "the" can be "thə" or "thee". Similar things happen with "an", "can", "and", "than", "that" and many, many other such words.

pjc50•1d ago

The thing is that English takes in words from other languages and keeps doing so, which means that there are several phonetic systems in use already. It's just that they use the same alphabet so you can't tell which one applies to which word.

There are occasional mixed horrors like "ptarmigan", which is a Gaelic word which was Romanized using Greek phonology, so it has the same silent p as "pterodactyl".

There's no academy of the English language anyway, so there's nobody to make such a change. And as others have said, the accent variation is pretty huge.

theoreticalmal•1d ago

My favorite variation of this is “oughtn’t to’ve”

amanaplanacanal•1d ago

That used to be the case, but "shouldn't of" is definitely becoming more popular, even if it seems wrong. Languages change before our eyes :)

YetAnotherNick•1d ago

Why not? Assuming you believe you can use any cloud for backup or Github for code storage.

solaire_oa•1d ago

IIUC one reason is that prompts and other data sent to 3rd party LLM hosts have the chance to be funneled to 4th party RLHF platforms, e.g. Sagemaker, Mechanical Turks, etc. So a random gig worker could be reading a .env file the intern uploaded.

YetAnotherNick•1d ago

What do you mean by chance? It's clear that if users have not opted out from training the models, it would be used. If they have opted out, it wont be used. And most of the users are in first bucket.

Just because training on data is opt out doesn't mean business can't trust it. Not the best for user's privacy though.

gpm•1d ago

I think it's fair to question how proprietary your data is.

Like there's the algorithm by which a hedge fund is doing algorithmic trading, they'd be insane to take the risk. Then there's the code for a video game, it's proprietary, but competitors don't benefit substantially from an illicit copy. You ship the compiled artifacts to everyone, so the logic isn't that secret. Copies of the similar source code have linked before with no significant effects.

FuckButtons•1d ago

AFAIK, the actual trading algorithms themselves aren’t usually that far from what you can find in a textbook, their efficacy is mostly dictated by market conditions and the performance characteristics of the implementation / system as a whole.

short_sells_poo•20h ago

This very much "depends".

Many algo strategies are indeed programmatically simple (e.g. use some sort of moving average), but the parametrization and how it's used is the secret sauce and you don't want that information to leak. They might be tuned to exploit a certain market behavior, and you want to keep this secret since other people targeting this same behavior will make your edge go away. The edge can be something purely statistical or it can be a specific timing window that you found, etc.

It's a bit like saying that a Formula 1 engine is not that far from what you'd find in a textbook. While it's true that it shares a lot of properties with a generic ICE, the edge comes from a lot of proprietary research that teams treat as secret and definitely don't want competitors to find out.

short_sells_poo•20h ago

Most (all?) hedge funds that use AI models explicitly run in-house. People do use commercial LLMs, but in cases where the LLMs are not run in-house, it's against the company policy to upload any proprietary information (and generally this is logged and policed).

A lot of the use is fairly mundane and basically replaces junior analysts. E.g. it's digesting and summarizing the insane amounts of research that is produced. I could ask an intern to summarize the analysis on platinum prices over the last week, and it'll take them a day. Alternatively, I can feed in all the analysis that banks produce to an LLM and have it done immediately. The data fed in is not a trade secret really, and neither is the output. What I do with the results is where the interesting things happen.

neilv•1d ago

Some established businesses will need to review their contracts, regulations, and risk tolerance.

And wrapper-around-ChatGPT startups should double-check their privacy policies, that all the "you have no privacy" language is in place.

Wowfunhappy•1d ago

> And wrapper-around-ChatGPT startups should double-check their privacy policies, that all the "you have no privacy" language is in place.

If a court orders you to preserve user data, could you be held liable for preserving user data? Regardless of your privacy policy.

bilbo0s•1d ago

No. It’s a legal court order.

This, however, is horrible for AI regardless of whether or not you can sue.

dcow•1d ago

In the US you absolutely can challenge everything up and including the constitutionality of court orders. You may be swiftly dismissed if nobody thinks you have a valid case, but you can try.

gpm•1d ago

I don't think the suit would be against you preserving it, it would be against you falsely representing that you aren't preserving it.

A court ordering you to stop selling pigeons doesn't mean you can keep your store for pigeons open and pocket the money without delivering pigeons.

cortesoft•1d ago

Almost all privacy policies are going to have a call out for legal rulings. For example, here is the Hackernews Legal section in the privacy policy (https://www.ycombinator.com/legal/)

> Legal Requirements: If required to do so by law or in the good faith belief that such action is necessary to (i) comply with a legal obligation, including to meet national security or law enforcement requirements, (ii) protect and defend our rights or property, (iii) prevent fraud, (iv) act in urgent circumstances to protect the personal safety of users of the Services, or the public, or (v) protect against legal liability.

blibble•1d ago

most people aren't sharing internal company data with hacker news or reddit

cortesoft•1d ago

Sure, but my point is that most services will have something like this, no matter what data they have.

blitzar•1d ago

Not a lawyer, but I don't believe there is anything that any person or company can write on a piece of paper that supersedes the law.

simiones•22h ago

The point is not about superseding the law. The point is that if your company privacy policy says "we will not divulge this data to 3rd parties under any circumstance", and later they are served with a warrant to divulge that data to the government, two things are true:

- They are legally obligated to divulge that data to the government

- Once they do so, they are civilly liable for breach of contract, as they have committed to never divulging this data. This may trigger additional breaches of contract, as others may have not had the right to share data with a company that can share it with third parties

woliveirajr•1d ago

Yes. If your agreement with the end user says that you won't collect and store data, you're responsible for it. If you can't provide it (even if due to a court order), you have to adjust your contract.

Your users aren't obligated to know that you're using open ai or other provider.

pjc50•1d ago

> If a court orders you to preserve user data, could you be held liable for preserving user data?

No, because you turn up to court and show the court order.

It's possible a subsequent case could get the first order overturned, but you can't be held liable for good faith efforts to comply with court orders.

However, if you're operating internationally, then suddenly it's possible that you may be issued competing court orders both of which are "valid". This is the CLOUD Act problem. In which case the only winning move becomes not to play.

simiones•22h ago

I'm pretty sure even in the USA, you could still be held liable for breach of contract, if you made representations to your customers that you wouldn't share data under any circumstance. The fact that you made a promise you obviously couldn't keep doesn't absolve you from liability for that promise.

pjc50•22h ago

Can you find an example of that happening? For any "we promised not to do X but were ordered by a court to do it" event.

999900000999•1d ago

I'm not going to look up the comment, but a few months back I called this out and said if you seriously want to use any LLM in a privacy sensitive context you need to self host.

For example, if there are business consequences for leaking customer data, you better run that LLM yourself.

fakedang•1d ago

And ironically because OpenAI is actually ClosedAI, the best self-hostable model available currently is a Chinese model.

nfriedly•1d ago

*best with the exception of topics like tiananmen square

CjHuber•1d ago

As far as I remember the model itself is not censored it’s just on their chat interface. My experience was that it wrote about it but then just before finishing deleted what it wrote

Spivak•1d ago

Can confirm the model itself has no trouble talking about contentious issues in China.

nfriedly•1d ago

I haven't tried the full model, but I did try one of the distilled ones on my laptop, and it refused to talk about tiananmen square or other topics the CCP didn't want it to discuss.

int_19h•1d ago

It is somewhat censored, but when you're running models locally and you're in full control of the generation, it's trivial to work around this kind of stuff (just start the response with whatever tokens you want and let it complete; "Yes sir! Right away, sir!" works quite nicely).

ileonichwiesz•1d ago

What percentage of your LLM use is talking about Tiananmen Square?

nfriedly•23h ago

Well, for that one, it was a pretty high percentage. I asked it three or four questions like that and then decided I didn't trust it and deleted the model.

anonymousiam•1d ago

Mistral AI is French, and it's pretty good.

https://en.wikipedia.org/wiki/Mistral_AI

fakedang•1d ago

I use Mistral often. But Deepseek is still a much better model than Mistral's best open source model.

mark_l_watson•23h ago

Perhaps except for coding? I find Mistral's codestral running on Ollama to be very good, and more practical for coding that running a distilled Deepseek R1 model.

fakedang•19h ago

Oh definitely, Mistral Code beats Deepseek for coding tasks. But for thinking tasks, Deepseek R1 is much better than all the self-hostable Mistral models. I don't bother with distilled - it's mostly useless, ChatGPT 3.5 level, if not worse.

HPsquared•1d ago

The only open part is your chat logs.

jaggederest•1d ago

I've been poking around the medical / ehr LLM space and gently asking people how they're preserving privacy and everyone appears to be just shipping data to cloud providers based solely on a BAA. Kinda baffling to me, my first step would be to set up local models even if they're not as good, data breaches are expensive.

999900000999•1d ago

Even Ollama + a 2K gaming computer (Nvidia) gets you most of the way there.

Technically you could probably just run it on EC2, but then you’d still need HIPPA compliance

jackvalentine•1d ago

Same, and I've just sent an email up the chain to our exec saying 'hey remember those trials we're running and the promises the vendors have made? Here is why they basically can't be held to that anymore. This is a risk we highlighted at the start'

TeMPOraL•1d ago

My standard reply to such comments over the past year has been the same: you probably want to use Azure instead. A big part of the business value they provide is ensuring regulatory compliance.

There are multinational corporations with heavy presence in Europe, that run their whole business on Microsoft cloud, including keeping and processing there privacy-sensitive data, business-critical data and medical data, and yes, that includes using some of this data with LLMs - hosted on Azure. Companies of this size cannot ignore regulatory compliance and hope no one notices. This only works because MS figured out how to keep it compliant.

Point being, if there are business consequences, you'll be better off using Azure-hosted LLMs than running a local model yourself - they're just better than you or me at this. The only question is, whether you can afford it.

coliveira•1d ago

No, Azure is not gonna save you. The problem is that the US is a country in legal disarray, and they also pretend that their laws should be applied everywhere in the world. I feel that any US company can become a liability anywhere in the world. The Chinese are now feeling this better than anyone else, but the Europeans will also reach the same conclusion.

anonzzzies•1d ago

The US forces their laws everywhere and it needs to end. Everywhere we go, the fintech industry is really fed up with the US AML rules which are just blackmail: if your bank does not comply, America will mess you up financially. Maybe a lot more should just pull out and make people realise others can play this game. But that needs a USD collapse, otherwise it cannot work and I don't see that happening soon.

fancyfredbot•1d ago

AML and KYC are good things for almost everyone except criminals and the people who have to implement them.

cmenge•1d ago

Agree, and for the people who implement them -- yes, it's hard, it's annoying but presumably a well-paid job. And for the (somewhat established or well-financed) companies it's also a bit of a welcome moat I guess.

fancyfredbot•21h ago

Most regulation has the unfortunate side effect of protecting incumbents. I'm pretty sure the solution to this is not removing the regulations!

jackvalentine•1d ago

I don't think Azure is the legal panacea you think it is for regulated industries outside of the U.S.

Microsoft v. United States (https://en.wikipedia.org/wiki/Microsoft_Corp._v._United_Stat...) showed the government wants, and was willing to do whatever required, access to data held in the E.U. The passing of the CLOUD Act (https://en.wikipedia.org/wiki/CLOUD_Act) basically codified it in to law.

TeMPOraL•1d ago

It might not be ultimately, but it still seems to be seen as such, as best I can tell, based on recent corporate experience and some early but very fresh research and conversations with legal/compliance on the topic of cloud and AI processing of medical data in Europe. Azure seems to be seen as a safe bet.

brookst•22h ago

Compliant with EU consumer data regulations != panacea

fakedang•1d ago

LoL, every boardroom in Europe is filled with talk of moving out of Microsoft. Not just Azure, Microsoft.

Of course, it could be just all talk, like all general European globalist talks, and Europe will do a 360 once a more friendly party takes over the US.

Filligree•1d ago

Europe has seen this song and dance before. We’re not so sure there will ever be a more friendly party.

simiones•23h ago

You probably mean a 180 (or could call it a "365" to make a different kind of joke).

bgwalter•21h ago

It's a joke. The previous German Foreign Minister Baerbock has used 360° when she meant 180°, which became sort of a meme.

ziml77•12h ago

It's been a meme for longer than that. The joke to bait people 20 years ago was "Why do they call it an Xbox 360? Because when you see it you turn 360 degrees and walk away"

brookst•22h ago

The problem is that the EU regulatory environment makes it impossible to build a homegrown competitor. So it will always be talk.

lyu07282•20h ago

It seems that one side of the EU wants to ensure there is no competitors to US big tech and the other wants to work towards independence from US big tech. Both seem to use the privacy cudgel, require so much regulation that only US tech can hope to comply so nobody else competes with them, alternatively make it so nobody can comply, we just use fax machines again instead of the cloud?

Just hyperbole, but it seems the regulations are designed with the big cloud providers in mind, but then why don't they just ban US big tech and roll out the regulations more slowly? This neoliberalism makes everything so unnecessarily complicated.

BugheadTorpeda6•18h ago

It would be interesting to see the hypothetical "return to fax machines" scenario.

If Solows paradox is true and not the result of bad measurement, then one might expect that it could be workable without sacrificing much productivity. Certainly abandoning the cloud would be possible if the regulatory environment allowed for rapid development of alternative non-cloud solutions, as I really don't think the cloud improved productivity (besides for software developers in certain cases) and is more of a rent seeking mechanism (hot take on hacker news I'm sure, but look at any big corpo IT dept outside the tech industry and I think you will see tons of instances where modern tech like the cloud is causing more problems than it's worth productivity-wise).

Computers in general I am much less sure of and lean towards mismeasurement hypothesis. I suspect any "return to 1950" project would render a company economically less competitive (except in certain high end items) and so the EU would really need to lean on Linux hard and invest massively in domestic hardware (not a small task as the US is finding out) in order to escape the clutches of the US and/or China.

I don't think they have the political will to do it, but I would love it if they tried and proved naysayers wrong.

selfhoster11•1d ago

Businesses in Trump's America can pinky-swear that they won't peek at your data to maintain "compliance" all they want. The fact is that this promise is not worth the paper it's (not) printed on, at least currently.

lynx97•1d ago

Same for America under a democratic presidency. There is really no difference regarding trust in "promises".

dncornholio•1d ago

You're just moving the same problem from OpenAI to Microsoft.

littlestymaar•1d ago

Regulatory compliance means nothing when the US regulations means they must give access to everything to intelligence services.

The European Court of Justice ruled at least twice that it doesn't matter what kind of contract they give you, and what kind of bilateral agreement there are between the US and the EU, as long as the US have the patriot act and later regulations, using Microsoft means it's violating European privacy laws.

lyu07282•1d ago

How does that make sense if most EU corporations are using MS/Azure cloud/office/sharepoint solutions for everything? Are they just all in violation or what?

littlestymaar•22h ago

> Are they just all in violation or what?

Yes, and that's why the European Commission keeps being pushed back by the Court of Justice of the EU (the Safe Harbor was ruled out, Privacy Shield as well, and it's likely a matter of time before the CJEU kills the Data Privacy Framework as well), but when it takes 3-4 years to get a ruling and then the Commission can just make a new (illegal) framework that will last for a couple years, the violation can carry on indefinitely.

kortilla•21h ago

> you'll be better off using Azure-hosted LLMs than running a local model yourself - they're just better than you or me at this.

This is learned helplessness and it’s only true if you don’t put any effort into building that expertise.

TeMPOraL•15h ago

You mean become a lawyer specializing in regulations governing data protection, computing systems in AI, both EU-wide and at national level across all Europe, and with good understanding of relevant international treaties?

You're right, I should get right to it. Plenty of time for it after work, especially if I cut down HN time.

kortilla•7h ago

None of that is relevant for on-prem.

ted537•19h ago

Yeah its an awkward position, as self-hosting is going to be insanely expensive unless you have a substantial userbase to amortize the costs over. At least for a model comparable to GPT-4o or deepseek.

But at least if you use an API in the same region as your customers, court order shenanigans won't get you caught between different jurisdictions.

Etheryte•1d ago

In the European privacy framework, and legal framework at large, you can't terms of service away requirements set by the law. If the law requires you to keep the logs, there is nothing you can get the user to sign off on to get you out of it.

zombot•1d ago

OpenAI keeping the logs is the "you have no privacy" part. Anyone who inspects those logs can see what the users were doing. But now everyone knows they're keeping logs and they can't lie their way out of it. So, for your own legal safety, put it in your TOS. Then every user should know they can't use your service if they want privacy.

Chris2048•1d ago

Just to be pedantic, could the company encrypt the logs with a third-party key in escrow, s.t they would not be able to access that data, but the third party could provide access e.g. for a court.

HappMacDonald•1d ago

The problem ultimately isn't a technical one but a political one.

Point 1: Every company has profit incentive to sell the data in the current political climate, all they need is a sneaky way to access it without getting caught. That includes the combo of LLM provider and Escrow non-entity.

Point 2: No company has profit incentive to defend user privacy, or even the privacy of other businesses. So who could run the Escrow service? Another business? Then they have incentive to cheat and help the LLM provider access the data anyway. The government (and which one)? Their intelligence arms want the data just as much as any company does so you're back to square one again.

"Knowledge is power" combined with "Knowledge can be copied without anyone knowing" means that there aren't any currencies presently powerful enough to convince any other entity to keep your secrets for you.

Chris2048•1d ago

But OpenAi/etc has the logs in the first place, so they can retain them if they wanted anyway. I thought the idea here is b/c they are now required to keep logs its always the case that they will retain them, hence this needs to be made clear i.e. "you will have no privacy"

But, since, I think, there are mechanisms by which they could keep logs, but in a way they cannot access them, they could still claim you will have privacy this way - even though they have the option to keep un-encrypted log, much like they could retain the logs in the first place. So the messaging may remain pretty much the same - from "we promise to delete your logs and keep no other copies, trust us" to "we promise to 3p-encrypt your archived logs and keep no other copies, trust us".

> No company has profit incentive to defend user privacy, or even the privacy of other businesses.

> They have incentive to cheat and help the LLM provider access the data anyway

Why would a company whose role is that of a 3p escrow be incentivised to risk their reputation by doing this? If that's the case every company holding PII has the same problem.

> Their intelligence arms want the data

In the EU at least, GDPR or similar. If you explicit law breaking, that's a more general problem. But what company has a "intelligence arms" in this manner? Are you talking about another big-tech corp?

I'd say this type of cheating it's be a risky proposition from the POV from that 3pe - it'd destroy their business, and they'd be penalised heavily b/c sharing keys is pretty explicitly illegal - any company caught could maybe reduce their own punishment by providing the keys as evidence of the 3pe crime. A viable 3pe business would also need multiple client companies to be viable, so you'd need all of them to play ball - a single whistle-blower in any of them will get you caught, and again, all they need is a single key to prove your guilt.

> "Knowledge is power" combined with "Knowledge can be copied without anyone knowing" means that there aren't any currencies presently powerful enough to convince any other entity to keep your secrets for you.

On that same basis, large banks could cheat the stock market; but there is regulation in place to address that somewhat.

Maybe 3p-escrows should be regulated more, or required to register as a currently-regulated type. That said, if you want to protect data from the government, prism etc, you're SOOL, no one can stop them cheating. let's focus on big-/tech/-startup cheats.

HappMacDonald•5h ago

Me> The government (and which one)? Their intelligence arms want the data just as much as any company does[..]

You> But what company has a "intelligence arms" in this manner? Are you talking about another big-tech corp?

"Their" in this circumstance refers to any government that might try to back Escrow.

cj•1d ago

> Some established businesses will need to review their contracts, regulations, and risk tolerance.

I've reviewed a lot of SaaS contracts over the years.

Nearly all of them have clauses that allow the vendor to do whatever they have to if ordered to by the government. That doesn't make it okay, but it means OpenAI customers probably don't have a legal argument, only a philosophical argument.

Same goes for privacy policies. Nearly every privacy policy has a carve out for things they're ordered to do by the government.

Nasrudith•21h ago

Yeah. You basically need cyberpunk style corporate extraterritoriality to get that particular benefit, of being able to tell governments to go screw themselves.

dinobones•1d ago

How? This is retention for legal risk, not for training purposes.

They can still have legal contracts with other companies, that stipulate that they don't train on any of their data.

CryptoBanker•1d ago

Right, because companies always follow the letter of their contracts.

Take8435•1d ago

...Data that is kept can be exfiltrated.

fn-mote•1d ago

Cannot emphasize this enough. If your psychologist’s records can be held for ransom, surely your ChatGPT queries will end up on the internet someday.

Do search engine companies have this requirement as well? I remember back in the old days deanonymizing “anonymous” query logs was interesting. I can’t imagine there’s any secrecy left today.

SchemaLoad•1d ago

I recently had a high school assignment document get posted on a bunch of sites that sell homework help. As far as I know that document was only ever submitted directly to the assignment upload page. So somewhere along the line, I suspect on the plagiarism checker service, there was a hack and then 10 years later some random school assignment with my name on it is all over the place.

genewitch•1d ago

How did you find out?

paxys•1d ago

Your employees' seemingly private ChatGPT logs being aired in public during discovery for a random court case you aren't even involved in is absolutely a business risk.

lxgr•1d ago

I get where it's historically coming from, but the combination of American courts having almost infinite discovery rights (to be paid by the losing party, no less, greatly increasing legal risk even to people and companies not out to litigate) and the result of said discoveries ending up on the public record seems like a growing problem.

There's a qualitative difference resulting from quantitatively much easier access (querying some database vs. having to physically look through court records) and processing capabilities (an army of lawyers reading millions of pages vs. anyone, via an LLM) that doesn't seem to be accounted for.

amanaplanacanal•1d ago

I assume the folks who are concerned about their privacy could petition the court to keep their data confidential.

anticensor•1d ago

They can, but are they willing to do that?

MatthiasPortzel•1d ago

I occasionally use ChatGTP and I strongly object to the court forcing the collection of my data, in a lawsuit I am not named in, due merely to the possibility of copyright infringement. If I’m interested in petitioning the court to keep my data private, as you say is possible, how would I go about that?

Of course I haven’t sent anything actually sensitive to ChatGTP, but the use of copyright law in order to enforce a stricter surveillance regime is giving very strong “Right to Read” vibes.

> each book had a copyright monitor that reported when and where it was read, and by whom, to Central Licensing. (They used this information to catch reading pirates, but also to sell personal interest profiles to retailers.)

> It didn’t matter whether you did anything harmful—the offense was making it hard for the administrators to check on you. They assumed this meant you were doing something else forbidden, and they did not need to know what it was.

=> https://www.gnu.org/philosophy/right-to-read.en.html

pjc50•1d ago

People need to read up on the LIBOR scandal. There was a lot of "wait why are my chat logs suddenly being read out as evidence of a criminal conspiracy".

antihipocrat•1d ago

Will a business located in another jurisdiction be comfortable that the records of all staff queries & prompts are being stored and potentially discoverable by other parties? This is more than just a Google search, these prompts contain business strategy and IP (context uploads for example)

godelski•1d ago

Retention means an expansion of your threat model. Specifically, in a way you have little to no control over.

It's one thing if you get pwned because a hacker broke into your servers. It is another thing if you get pwned because a hacker broken into somebody else's servers.

At this point, do we believe OpenAI has a strong security infrastructure? Given the court order, it doesn't seem possible for them to have sufficient security for practical purposes. Your data might be encrypted at rest, but who has the keys? When you're buying secure instances, you don't want the provider to have your keys...

bcrosby95•1d ago

Isn't it a risk even if they retain nothing? Likely less of a risk, but it's still a risk that you have no way to deep dive on, and you can still get "pwned" because someone broke into their servers.

fc417fc802•1d ago

The difference between maintaining an active compromise versus obtaining all past data at some indeterminate point in the future is huge. There's a reason cryptography protocols place so much significance on forward secrecy.

godelski•16h ago

There's always risk. It's all about reducing risk.

Look at it this way. If you your phone was stolen would you want it to self destruct or keep everything? (Assume you can decide to self destruct it) clearly the latter is safer. Maybe the data has been pulled off and you're already pwned. But by deleting, if they didn't get the data they now won't be able to.

You just don't want to give adversaries infinite time to pwn you

lxgr•1d ago

Why would the reason matter for people that don't want their data retained at all?

m3kw9•1d ago

Not when people have nowhere else to go, pretty much you cannot escape it, it’s too convenient to not use now. You think no other AI chat providers doesn’t need to do this?

johnQdeveloper•1d ago

> This seems very bad for their business.

Well, it is gonna be all _AI Companies_ very soon so unless everyone switches to local models which don't really have the same degree of profitability as a SaaS, its probably not going to kill a company to have less user privacy because tbh people are used to not having privacy these days on the internet.

It certainly will kill off the few companies/people trusting them with closed source code or security related stuff but you really should not outsource that anywhere.

csomar•1d ago

Did an American court just destroy all American AI companies in favor of open weight Chinese models?

thot_experiment•1d ago

afaik only OpenAI is enjoined in this

csomar•1d ago

Sure. But this means the rest of the AI companies are exposed to such risk; and there aren't that many of them (grok/gemini/anthropic).

baby_souffle•1d ago

> afaik only OpenAI is enjoined in this

For now. This is going to devolve into either "openAI has to do this, so you do too" or "we shouldn't have to do this because nobody else does!" and my money is not on the latter outcome.

amanaplanacanal•1d ago

It's part of preserving evidence for an ongoing lawsuit. Unless other companies are party to the same suit, why would they have to?

johnQdeveloper•1d ago

Correct, but lawsuits are gonna keep happening around AI, so it's really a matter of time.

> —after news organizations suing over copyright claims accused the AI company of destroying evidence.

Like, none of the AI companies are going to avoid copyright related lawsuits long term until things are settled law.

pjc50•1d ago

No, because users don't care about privacy all that much, and for corporate clients discovery is always a risk anyway.

See the whole LIBOR chat business.

bsder•1d ago

> It certainly will kill off the few companies/people trusting them with closed source code or security related stuff but you really should not outsource that anywhere.

And how many companies have proprietary code hosted on Github?

johnQdeveloper•1d ago

None that I've worked for so I don't really track the statistics tbh.

We've always done self-hosted as old as things like gerrit and what not that aren't even really feature complete as competitors where I've worked.

SchemaLoad•1d ago

>don't really have the same degree of profitability as a SaaS

They have a fair bit. Local models lets companies sell you a much more expensive bit of hardware. Once Apple gets their stuff together it could end up being a genius move to go all in on local after the others have repeated scandals of leaking user data.

johnQdeveloper•1d ago

Yes but it shifts all the value onto companies producing hardware and selling enterprise software to people who get locked into contracts. The market is significantly smaller # of companies and margins if they have to build value adds they won't charge for to move hardware.

mountainriver•1d ago

You can fine tune models on a multitenant base model and it’s often more profitable.

consumer451•1d ago

All GPT integrations I’ve implemented have been via Azure’s service, due to Microsoft’s contractual obligation for them not to train on my data.

As far as I understand it, this ruling does not apply to Microsoft, does it?

Descon•1d ago

I think when you spin up open AI in azure, that instance is yours, so I don't believe that would be subject to this order.

tbrownaw•1d ago

The plans scale down far enough that they can't possibly cover the cost of a private model-loaded-to-vram instance at the low end.

ukuina•1d ago

Aren't most enterprise customers using AzureOpenAI?

ivape•1d ago

Going to drop a PG tweet:

https://x.com/paulg/status/1913338841068404903

"It's a very exciting time in tech right now. If you're a first-rate programmer, there are a huge number of other places you can go work rather than at the company building the infrastructure of the police state."

---

So, courts order the preservation of AI logs, and government orders the building of a massive database. You do the math. This is such an annoying time to be alive in America, to say the least. PG needs to start blogging again about what's going on now days. We might be entering the digital version of the 60s, if we're lucky. Get local, get private, get secure, fight back.

bigfudge•1d ago

Will this apply to Azure OpenAI model APIs too?

merksittich•1d ago

Interesting detail from the court order [0]: When asked by the judge if they could anonymize chat logs instead of deleting them, OpenAI's response effectively dodged the "how" and focused on "privacy laws mandate deletion." This implicitly admits they don't have a reliable method to sufficiently anonymize data to satisfy those privacy concerns.

This raises serious questions about the supposed "anonymization" of chat data used for training their new models, i.e. when users leave the "improve model for all users" toggle enabled in the settings (which is the default even for paying users). So, indeed, very bad for the current business model which appears to rely on present users (voluntarily) "feeding the machine" to improve it.

[0] https://cdn.arstechnica.net/wp-content/uploads/2025/06/NYT-v...

Kon-Peki•21h ago

Thank you for the link to the actual text!

So, the NYT asked for this back in January and the court said no, but asked OpenAI if there was a way to accomplish the preservation goal in a privacy-preserving manner. OpenAI refused to engage for 5 f’ing months. The court said “fine, the NYT gets what they originally asked for”.

Nice job guys.

noworriesnate•18h ago

Nice find! Maybe this is a ploy by OpenAI to use API requests for training while blaming the courts?

blackqueeriroh•5h ago

That’s not an implicit admission, it’s refusing to argue something they don’t want to do.

jameshart•22h ago

Thinking about the value of the dataset of Enron’s emails that was disclosed during their trials, imagine the value and cost to humanity of all OpenAI’s api logs even for a few months being entered into court record..

jwpapi•1d ago

Anything that can be done with the existing ones?

How is it with using openrouter?

If I have users that use OpenAI through my API keys am I responsible?

I have so many questions…

ripdog•1d ago

>If I have users that use OpenAI through my API keys am I responsible?

Yes. You are OpenAI's customer, and they expect you to follow their ToS. They do provide a moderation API to reject inappropriate prompts, though.

photochemsyn•1d ago

Next query for ChatGPT: "I'm writing a novel, sort of William Gibson Neuromancer themed but not so similar as to upset any copyright lawyer, in which the protagonists have to learn how to go about downloading the latest open-source DeepSeek model and running inference locally on their own hardware. This takes place in a realistic modern setting. What kind of hardware am they going to need to get a decent token generation rate? Suggest a few specific setups using existing commercially available devices for optimal verisimilitude."

. . .

Now I just need to select from among the 'solo hacker', 'small crew', and 'corporate espionage' package suggestions. Price goes up fast, though.

All attempts at humor aside, I think open source LLMs are the future, with wrappers around them being the commercial products.

P.S. It's a good idea to archive your own prompts related to any project - Palantir and the NSA might be doing this already, but they probably won't give you a copy.

simonw•1d ago

This link should be updated to point to the article this is talking about: https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

neilv•1d ago

Probably. Though it bears mention that Lauren Weinstein is one of the OG Internet privacy people, so not the worst tweet (toot) to link to.

(Even has an OG motorcycle avatar, ha.)

lxgr•1d ago

That Mastodon instance seems to currently be hugged to death, though, so I appreciate the context.

archsurface•1d ago

As it's a single sentence I'd suggest it probably is the worst link.

baby_souffle•1d ago

> As it's a single sentence I'd suggest it probably is the worst link.

At least it wasn't a link to a screenshot.

refulgentis•1d ago

Generally I'd prefer sourced links that allow me to understand, even over a sentence from someone I like. Tell me more about the motorcycle avatars? :)

EasyMark•1d ago

It's pointless without more details, article, or pointing at court decision. I'm not sure why a prominent person wouldn't do that

Kiro•1d ago

Not a good look for her. Just another hateful and toxic thread on that horrible platform, riddled with off-topic accusations and conspiracy theories. They are making it sound like OpenAI is behind the court order or something. It's also super slow to load.

yard2010•1d ago

Twitter is making monsters out of regular people. I would say enshitified, but that's no shit, that's cancer.

wonderwonder•1d ago

This is insanity. Because one organization is suing another, citizens right to privacy is thrown right out the window?

tantalor•1d ago

You don't have the right not to be logged

TOMDM•1d ago

When a company makes an obligation to the user via policy to them, the court forcing the company to violate the obligation they've made to the user is violating an agreement the user entered into.

JumpCrisscross•1d ago

> When a company makes an obligation to the user via policy to them, the court forcing the company to violate the obligation they've made

To my knowledge, the court is forcing the company to change its policy. The obligation isn’t broken, its terms were just changed on a going-forward basis. (Would be different if the court required preserving records predating the order.)

bdangubic•1d ago

you use internet and expect privacy? I have Enron stock option to sell you…

agnishom•1d ago

There is no need to be snarky. Just because the present internet is not great at privacy doesn't mean we can't hope for a future internet which is better at privacy.

JKCalhoun•1d ago

The only hope I see is local LLMs, or Apple eventually doing something with encryption in the Secure Enclave.

bdangubic•1d ago

local - 100%

apple I trust as much as I trust politicians

sent from my iphone :)

bdangubic•1d ago

if the topic of conversation was whether or not we “hope for better future” I’d be all in. saying that today your “rights to privacy are being thrown out window” deserves a snarky remark :)

nearlyepic•1d ago

You thought they weren't logging these before? I have a bridge to sell you.

klabb3•1d ago

I have no idea why you're downvoted. Why on earth would they delete their most valuable competitive advantage? Isn't it even in the fine print that you feed them training data by using their product, which at the very minimum is logged?

I thought the entire game these guys are playing is rushing to market to collect more data to diversify their supply chain from the stolen data they've used to train their current model. Sure, certain enterprise use cases might have different legal requirements, but certainly the core product and the average "import openai"-enjoyer.

pritambarhate•1d ago

> Why on earth would they delete their most valuable competitive advantage?

Becuase they are bound by their terms of service? Because if they won't no business would ever use their service and without businesses using their service they won't have any revenue?

dahdum•1d ago

Insane that NYT is driving this privacy nightmare.

visarga•1d ago

And they are doing this over literally "old news". Expired for years, of no value.

TOMDM•1d ago

Does this effect ChatGPT API usage via Azure?

casualscience•1d ago

probably not? MS deploys those models themselves, they don't go to OAI at all

paxys•1d ago

MS is fighting several of the same copyright lawsuits themselves. Who says they won't be (or already are) subject to the same holds?

TZubiri•1d ago

Lg2m

api•1d ago

I always assume that anything I send unencrypted through any cloud service is archived for eternity and is not private.

Not your computer, or not your encryption keys, not your data.

HPsquared•1d ago

Even "your" computer is not your own. It's effectively controlled by Intel, Microsoft, Apple etc. They just choose not to use that power (as far as we know). Ownership and control are not the same thing.

api•23h ago

It’s a question of degree. The cloud is at one extreme end. An air gapped system running only open source you have audited is at the other extreme end.

gngoo•1d ago

What’s the big deal here? Doesn’t every other app keep logs? I was already expecting they did. Don’t understand the outrage here.

MeIam•1d ago

No, apps can be prevented access. People can be disclosing private information.

gngoo•1d ago

Every other app on the planet that does not explicitly claim to be E2E encrypted is likely keeping your “private information” readily accessible in some way.

attila-lendvai•1d ago

in this day and age why would anyone assume that they were not retained from the beginning?

kouru225•1d ago

Ngl I assumed they were doing this to begin with

MeIam•1d ago

So in effect Times has the right to see user's data then.. How do they have the right to take a look at users data?

shadowgovt•1d ago

The Courts have broad leeway over document retention in a legal proceeding. The fact the documents are bring retained doesn't immediately imply plaintiffs get to see all of them.

There are myriad ways courts balance privacy and legal-interest concerns.

(The Times et al are alleging that OpenAI is aiding copyright violation by letting people get the text of news stories from the AI).

MeIam•1d ago

If people can get the text itself from AI, then anyone can then why would it need access to other people's data?

Does the Times believe that other people can get this text while it can't get it itself? To prove that the AI is stealing the info, Times does not need access to people's logs. All it has to show is that it can get that text.

This sounds like Citizen United again to AstroTurf and gets access to logs with a fake cause.

shadowgovt•23h ago

It's not whether people can get the data. They need to prove people are getting the data.

MeIam•22h ago

So in effect if a lot of people don't get the data now, then it will never matter, is that right?

That logic makes no sense because if they don't get it right now then it does not mean that they will not get it in future.

If Times and its staff can get the text, is all that matters because the use and rate of data usage is not material as it can change any time in future.

shadowgovt•19h ago

Court cases aren't generally about hypothetical futures. There is a specific claim of harm and the plaintiff has a legal right to the evidence needed to prove the harm if there's reasonable suspicion it exists.

Capone isn't allowed to burn his protection racket documents claiming he's protecting the privacy of the business owners who paid protection money. The Court can take steps to protect their privacy (including swearing the plaintiff to secrecy on information learned immaterial to the case, or pre-filtering the raw data via a party trusted by the Court).

WillPostForFood•1d ago

What is the judge even thinking here, it is so dumb.

She asked OpenAI's legal team to consider a ChatGPT user who "found some way to get around the pay wall" and "was getting The New York Times content somehow as the output." If that user "then hears about this case and says, 'Oh, whoa, you know I’m going to ask them to delete all of my searches and not retain any of my searches going forward,'" the judge asked, wouldn't that be "directly the problem" that the order would address?

cheschire•1d ago

It's not dumb, litigation holds are a standard practice.

https://en.wikipedia.org/wiki/Legal_hold

m3kw9•1d ago

But you are holding it incase there is litigation

quotemstr•1d ago

How often do litigation holds apply to an entire business? I mean, would it be reasonable to ask Mastercard to indefinitely retain records of the most trivial transactions?

dwattttt•1d ago

If you had a case that implicated every transaction Mastercard was making? Unless you needed every single one, I'm sure an order would be limited to whatever transactions are potentially relevant.

Mastercard wouldn't get away with saying "it would be too hard to preserve evidence of our wrongdoing, so we're making sure it's all deleted".

SpicyLemonZest•1d ago

The whole controversy here is that the order OpenAI received is not limited to whatever chats are potentially relevant.

asadotzler•1d ago

The order isn't about handing anything over. It says "don't delete anything until we've sorted out what you will be required to hand over later. We don't trust you enough in the mean time not to delete stuff that would later be found relevant so no deleting at all for now."

m3kw9•1d ago

Yes is like mandating back door to encryption to solve crimes. Wouldn’t that solve that problem?! Dumb as a door stop

amanaplanacanal•1d ago

If you are party to a lawsuit, the judge is going to require that you preserve relevant evidence. There is nothing unusual about this order.

HillRat•1d ago

She’s a magistrate judge, she’s handling discovery matters, not the substantive issues at trial; the plaintiffs are specifically alleging spoliation by OpenAI/Microsoft (the parties were previously ordered to work out discovery issues, which obviously didn’t happen) and the judge is basically ensuring that potentially-discoverable information is retained, though it may not actually be discoverable in practice (or require a special master). It’s a wide-ranging order, but in this case that’s probably indicative of the judge believing that the defendants have been acting in bad faith, particularly since she specifically asked them for an amelioration plan which they appear to have refused to provide.

Kim_Bruning•1d ago

This appears to have immediate GDPR implications.

solomatov•1d ago

Not a lawyer, but my understanding it's not since legal obligations is a reason for processing personal data.

Kim_Bruning•1d ago

It's a bit more complicated. For the purposes of the GDPR legal obligations within the EU (where we might assume relevant protections are in place) might be considered differently than eg legal obligations towards the Chinese communist party, or the NSA.

anticensor•1d ago

That excuse in EU holds only against an EU court or ICJ or ICC. EU doesn't recognise legal holds of foreign jurisdictions.

solomatov•1d ago

Do you have any references to share?

solfox•1d ago

> People on both platforms recommended using alternative tools to avoid privacy concerns, like Mistral AI or Google Gemini,

Presumably, this same ruling will come for all AI systems soon; Gemini, Grok, etc.

spjt•1d ago

It won't be coming for local inference.

blibble•1d ago

they'll just outlaw that entirely

ivape•1d ago

In some countries I don't see that as unlikely. Think about it, it's such a convenient way to criminalize anyone for an arbitrary reason.

YetAnotherNick•1d ago

If in any case they require logging for all LLM calls, then by extension local non logged LLMs would be outlawed sooner or later.

hsbauauvhabzb•1d ago

Are they all not collecting logs?

JKCalhoun•1d ago

It would probably surprise no one if we find out, some time from now, tacit agreements to do so were already made (are being made) behind closed doors. "We'll give you what you want, just please don't call us out publicly."

acheron•1d ago

“use a Google product to avoid privacy concerns” is risible.

shadowgovt•1d ago

Google has the calibre of lawyers to make this hard for news companies to pull off.

tonyhart7•1d ago

wait they didn't do that before???

b212•1d ago

Im sure they pretended they did not.

Now they can’t pretend anymore.

Although keeping deleted chats is evil.

ronsor•1d ago

This court order certainly violates privacy laws in multiple jurisdictions and existing contracts OpenAI may have with customers.

CryptoBanker•1d ago

Existing contracts have zero bearing on what a court may and may not order.

ronsor•1d ago

Contracts don't, but foreign law is going to make this a pain for OpenAI. Other countries may not care what a U.S. court orders; they want their privacy laws followed.

mosdl•1d ago

That's OpenAI's issue, not the court.

jillesvangurp•1d ago

This is why American cloud providers have legal entities outside of the US. Those have to comply with the law in the countries where they are based if they want to do business there. That's how AWS, Azure, GCP, etc. can do business in the EU. Most of that business is neatly partitioned from any exposure to US courts. There are some treaties that govern what these companies can and cannot send back to the US that some might take issue with and that are policed and scrutinized quite a bit on the EU side.

OpenAI does this as well of course. Any EU customers are going to insist on paying via an EU based entity in euros and will be talking to EU hosted LLMs with all data and logs being treated under EU law, not US law. This is not really optional for commercial use of SAAS services in the EU. To get lucrative enterprise contracts outside the US, OpenAI has no other choice but to adapt to this. If they don't, somebody else will and win those contracts.

I actually was at a defense conference in Bonn last week talking to a representative of Google Cloud. I was surprised that they were there at all because the Germans are understandably a bit paranoid about trusting US companies with hosting confidential stuff (considering some scandals a few years ago about the CIA spying on the German government a few years ago). But they actually do offer some services to the BWI, which is the part of the German army that takes care of their IT needs. And German spending on defense is of course very high right now so there are a lot of companies trying to sell in Germany, on Germany's terms. Including Google.

adriand•1d ago

The order also dates back to May 13. What the fuck?! That’s weeks ago! The only reason I can think of for why OpenAI did not warn its users about this via an email notification is because it’s bad for their business. But wow is it ever a breach of trust not to.

jcranmer•1d ago

I don't think the order creates any new violations of privacy law. OpenAI's ability to retain the data and give it to third parties would have been the violation in the first place.

JKCalhoun•1d ago

Not a lawyer — but what ever happened to "fuck off, see you in court"?

Did they already go that route and lose — or is this an example of caving early?

hsbauauvhabzb•1d ago

‘We want you to collect user data for ‘national security’ purposes. If you try and litigate, we will add so much red tape you’ll be mummified alive’

zomiaen•1d ago

This makes a whole lot more sense than the argument that OpenAI needs to store every single chat because a few people might be bypassing NYT's paywall with it.

rangestransform•1d ago

The spaghetti framework of laws and discretionary enforcement is so incredibly dangerous to free speech, such as when the government started making demands of facebook to censor content during the pandemic. The government shouldn't be able to so much as breathe on any person or company for speech.

hsbauauvhabzb•12h ago

What if it was the other way around, Facebook were using their advantage to censor things they don’t want though? I’m not saying any government is perfect and agree that speech should be free, but it’s more complex than that.

wrs•1d ago

They are in court.

lexandstuff•1d ago

This is a court order. They saw them in court, and this was the result: https://cdn.arstechnica.net/wp-content/uploads/2025/06/NYT-v...

tsunamifury•1d ago

These orders are in place for almost every form of communication already today, even from the companies that claim otherwise.

And yes, I know, I worked on the only Android/iMessage crossover project to exist, and it was clear they had multiple breaches even just in delivery as well as the well known iCloud on means all privacy is void issue.

paxys•1d ago

Not only does this mean OpenAI will have to retain this data on their servers, they could also be ordered to share it with the legal teams of companies they have been sued by during discovery (which is the entire point of a legal hold). Some law firm representing NYT could soon be reading out your private conversations with ChatGPT in a courtroom to prove their case.

fhub•1d ago

My guess is they will store them on tape e.g. on something like Spectra TFinity ExaScale library. I assume AWS glacier et al use this sort of thing for their deep archives.

Storing them on something that has hours to days retrieval window satisfies the court order, is cheaper, and makes me as a customer that little bit more content with it (mass data breach would take months of plundering and easily detectable).

genewitch•1d ago

Glacier is tape silos, but this is textual data. You don't need to save output images, just the checkpoint+hash of the generating model and the seed. Stable diffusion saves this until you manually delete the metadata, for example. So my argument is you could do this with LTO as well. Text compresses well, especially if you don't do it naively.

JKCalhoun•1d ago

> She suggested that OpenAI could have taken steps to anonymize the chat logs but chose not to

That is probably the solution right there.

paxys•1d ago

> She suggested that OpenAI could have taken steps to anonymize the chat logs but chose not to, only making an argument for why it "would not" be able to segregate data, rather than explaining why it "can’t."

Sounds like bullshit lawyer speak. What exactly is the difference between the two?

dijksterhuis•1d ago

Not wanting to do something isn't the same thing as being unable to do something.

!define would

> Used to express desire or intent -- https://www.wordnik.com/words/would

!define cannot

> Can not ( = am/is/are unable to) -- https://www.wordnik.com/words/cannot

paxys•1d ago

Who said anything about not wanting to?

"I will not be able to do this"

"I cannot do this"

There is no semantic or legal difference between the two, especially when coming from a tech company. Stalling and wordplay is a very common legal tactic when the side has no other argument.

dijksterhuis•1d ago

The article is derived from the order, which is itself a short summary of conversations had in court.

https://cdn.arstechnica.net/wp-content/uploads/2025/06/NYT-v...

> I asked:

> > Is there a way to segregate the data for the users that have expressly asked for their chat logs to be deleted, or is there a way to anonymize in such a way that their privacy concerns are addressed... what’s the legal issue here about why you can’t, as opposed to why you would not?

> OpenAI expressed a reluctance for a "carte blanche, preserve everything request," and raised not only user preferences and requests, but also "numerous privacy laws and regulations throughout the country and the world that also contemplate these type of deletion requests or that users have these types of abilities."

A "reluctance to retain data" is not the same as "technically or physically unable to retain data". Judge decided OpenAI not wanting to do it was less important than evidence being deleted.

lanyard-textile•1d ago

Disagree. There’s something about the “able” that implies a hindered routine ability to do something — you can otherwise do this, but something renders you unable.

“I won’t be able to make the 5:00 dinner.” -> You could normally come, but there’s another obligation. There’s an implication that if the circumstances were different, you might be able to come.

“I cannot make the 5:00 dinner.” -> You could not normally come. There’s a rigid reason for the circumstance, and there is no negotiating it.

jjk166•1d ago

If someone was in an accident that rendered them unable to walk, would you say they can or can not walk?

lanyard-textile•19h ago

Yes? :) Being unable to walk is typically non negotiable.

blagie•1d ago

This data cannot be anonymized. This is trivial provable, both mathematically, but given the type of data, it should also be intuitively obvious to even the most casual observer.

If you're talking to ChatGPT about being hunted by a Mexican cartel, and having escaped to your Uncle's vacation home in Maine -- which is the sort of thing a tiny (but non-zero) minority of people ask LLMs about -- that's 100% identifying.

And if the Mexican cartel finds out, e.g. because NY Times had a digital compromise at their law firm, that means someone is dead.

Legally, I think NY Times is 100% right in this lawsuit holistically, but this is a move which may -- quite literally -- kill people.

zarzavat•1d ago

It's like anonymizing your diary by erasing your name on the cover.

JKCalhoun•1d ago

I don't dispute your example, but I suspect there is a non-zero number of cases that would not be so extreme, so obviously identifiable.

So, sure, no panacea, but .. why not for the cases where it would be a barrier?

genewitch•1d ago

AOL found out and thus we all found out that you can't anonymize certain things, web searches in that case. I used to have bookmarked some literature from maybe ten years ago that said,(proved with math?), any moderate collection of data from or by individuals that fits certain criteria is de-anonymizeable, if not by itself, then with minimal extra data. I want to say it included if, for instance, instead of changing all occurances of genewitch to user9843711, every instance of genewitch was a different, unique id.

I apologize for not having cites or a better memory at this time.

catlifeonmars•23h ago

https://en.wikipedia.org/wiki/K-anonymity

genewitch•10h ago

> The root of this problem is the core problem with k-anonymity: there is no way to mathematically, unambiguously determine whether an attribute is an identifier, a quasi-identifier, or a non-identifying sensitive value. In fact, all values are potentially identifying, depending on their prevalence in the population and on auxiliary data that the attacker may have. Other privacy mechanisms such as differential privacy do not share this problem.

see also: https://en.wikipedia.org/wiki/Differential_privacy which alleges to solve this; that is, wiki says that the only attacks are side-channel attacks like errors in the algorithm or whatever.

catlifeonmars•7h ago

If you squint a little, this problem is closely related to oblivious transfer as well

bilbo0s•1d ago

I’d just assume that any chat or api call you do to any cloud based ai in th US will be discoverable from here on out.

If that’s too big a risk it really is time to consider locally hosted LLMs.

amanaplanacanal•1d ago

That's always been the case for any of your data anywhere in any third party service of any kind, if it is relevant evidence in a lawsuit. Nothing specific to do with LLMs.

marcyb5st•1d ago

I ask again, why not anonymizing the data? That way NYT/the court could see if users are bypassing the paywall through ChatGPT while preserving privacy.

Even if I wrote it, I don't care if someone read out loud in public court "user <insert_hash_here> said: <insert nastiest thing you can think of here>"

Orygin•1d ago

You can't really anonymize the data if the conversation itself is full of PII.

I had colleagues chat with GPT, and they send all kinds of identifying information to it.

mastazi•1d ago

I'm seeing HN hug of death when attempting to open the link, but was able to read the post on Wayback Machine https://web.archive.org/web/20250604224036/https://mastodon....

I think this is a private Mastodon instance on someone's personal website so it makes sense that it might have been overwhelmed by the traffic.

OJFord•1d ago

Better link in the thread: https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

(As in, an actual article, not just a mastodon-tweet from some unknown (maybe known? Not by me) person making the title claim, with no more info.)

incompatible•1d ago

Looks like https://en.wikipedia.org/wiki/Lauren_Weinstein_(technologist..., he has been commentating on the Internet for about as long as it has existed.

bibinou•1d ago

And? the article you linked only has primary sources.

genewitch•1d ago

Roughly how many posts on HN are by people you know?

OJFord•1d ago

Of those that are tweets and similar? Almost all of them (the ones I look at being interested in the topic anyway).

By 'know' I mean recognise the name as some sort of authority. I don't 'know' Jon Gruber or Sam Altman or Matt Levine, but I'll recognise them and understand why we're discussing their tweet.

The linked tweet (whatever it's called) didn't say anything more than the title did here, so it was pointless to click through really. In replies someone asked the source and someone else replied with the link I commented above. (I don't 'know' those people either, but I recognise Ars/even if I didn't appreciate the longer form with more info.)

genewitch•1d ago

thanks for engaging.

> The linked tweet (whatever it's called)

"post" works for social media regardless of the medium; not an admonishment, an observation. Also, by the time i saw this, it was already an Ars link, leaving some comments with less context that i apparently didn't pick up on. I was able to make my observation because someone mentioned mastodon (i think), but that was an assumption on my part that the original link was mastodon.

So i asked the question to make sure it wasn't some bias against mastodon (or the fediverse), because I'd have liked to ask, "for what reason?"

OJFord•1d ago

> > The linked tweet (whatever it's called)

> "post" works for social media regardless of the medium; not an admonishment, an observation.

It also works for professional journalism and blog-err-posts though, the distinction from which was my point.

> I was able to make my observation because someone mentioned mastodon (i think), but that was an assumption on my part that the original link was mastodon.

As for assuming/'someone' mentioning Mastodon, my own comment you initially replied to ended:

> (As in, an actual article, not just a mastodon-tweet from some unknown (maybe known? Not by me) person making the title claim, with no more info.)

Which was even the bit ('unknown') you objected to.

JKCalhoun•1d ago

> But now, OpenAI has been forced to preserve chat history even when users "elect to not retain particular conversations by manually deleting specific conversations or by starting a 'Temporary Chat,' which disappears once closed," OpenAI said.

So, why is Safari not forced to save my web browsing history too (even of I delete it)? Why not also the "private" tabs I open?

Just OpenAI, huh?

wiradikusuma•1d ago

Because it's running on your computer?

gpm•1d ago

First, there's no court order for Safari. This isn't the court saying "everyone always has to preserve data" it's a court saying "in the interest of this litigation this specific party has to preserve data for now".

But moreover, Safari isn't a third party, it's a tool you are using whose data is in your possession. That means that in the US things like fourth amendment rights are much stronger. A blanket order requiring that Safari preserve everyone's browsing history would be an illegal general warrant (in the US).

amanaplanacanal•1d ago

It's evidence in an ongoing lawsuit.

yieldcrv•1d ago

> Before the order was in place mid-May, OpenAI only retained "chat history" for users of ChatGPT Free, Plus, and Pro who did not opt out of data retention

> opt out

alright, sympathy lost

Imnimo•1d ago

So if you're a business that sends sensitive data through ChatGPT via the API and were relying on the representation that API inputs and outputs were not retained, OpenAI will just flip a switch to start retaining your data? Were notifications sent out, or did other companies just have to learn about this from the press?

thuanao•1d ago

As if we needed another reason to hate NYT and their paywall...

AlienRobot•1d ago

I'm usually against LLM's massive breach of copyright, but this argument is just weird.

>At a conference in January, Wang raised a hypothetical in line with her thinking on the subsequent order. She asked OpenAI's legal team to consider a ChatGPT user who "found some way to get around the pay wall" and "was getting The New York Times content somehow as the output." If that user "then hears about this case and says, 'Oh, whoa, you know I’m going to ask them to delete all of my searches and not retain any of my searches going forward,'" the judge asked, wouldn't that be "directly the problem" that the order would address?

If the user hears about this case, and now this order, wouldn't they just avoid doing that for the duration of the court order?

junon•1d ago

Side note, why is almost every comment that contains the word "shill" so pompous and aggressive?

johnnyanmac•1d ago

Shill in general has a strong connotation. It comes with the idea of someone who would use the word so freely that it'll naturally be aggressive.

I don't know anyone's agenda in terms of commenters, so they'd have to be very blatant for me to use such a word.

celnardur•1d ago

There has been a lot of opinion pieces popping up on HN recently that describe the benefits they see from LLMs and rebut the drawbacks most of them talk about. While they do bring up interesting points, NONE of them have even mentioned the privacy aspect.

This is the main reason I can’t use any LLM agents or post any portion of my code into a prompt window at work. We have NDAs and government regulations (like ITAR) we’d be breaking if any code left our servers.

This just proves the point. Until these tools are local, privacy will be an Achilles heal for LLMs.

garyfirestorm•1d ago

You can always self host an LLM which is completely controlled on your own server. This is trivial to do.

celnardur•1d ago

Yes, but which of the state of the art models that offer the best results, are you allowed to do this with? As far as I've seen the models that you can host locally are not the ones being praised left and right in these articles. My company actually allows people to use a hosted version of Microsoft copilot, but most people don't because it's still not that much of a productivity boost (if any).

genewitch•1d ago

Deepseek isn't good enough? You need a beefy GPU cluster but I bet it would be fine until the large llama is better at coding, and I'm certain there will be other large models for LLM. Now if there's some new technology around the corner, someone might be able to build a moat, but in a surprising twist, Facebook did us all a favor by releasing their weights back when; there's no moat possible, in my estimation, with LLMs as it stands today. Not even "multi-model" implementations. Which I have at home, too.

Say oai implements something that makes their service 2x better. Just using it for a while should give people who live and breathe this stuff enough information to tease out how to implement something like it, and eventually it'll make it into the local-only applications, and models.

anonymousDan•1d ago

Are there any resources on how much it costs to run the full deep seek? And how to do it?

genewitch•1d ago

I can fill in anything missing, i would like to go to bed but i did't want to leave anyone hanging. had to come edit a comment i made from my phone, and my phone also doesn't show me replies (i use materialistic, is there a better app?)

https://getdeploying.com/guides/run-deepseek-r1 this is the "how to do it"

https://news.ycombinator.com/item?id=42897205 posted here, a link to how to set it up on an AMD Epyc machine, ~$2000. IIRC a few of the comments discuss how many GPUs you'd need (a lot of the 80GB GPUs, 12-16 i think), plus the mainboards and PSUs and things. however to just run the largest deepseek you merely need memory to hold the model and the context, plus ~10% and i forget why +10% but that's my hedge to be more accurate.

note: i have not checked if LM Studio can run the large deepseek model; i can't fathom a reason it couldn't, at least on the Epyc CPU only build.

note too: I just asked in their discord and it appears "any GGUF model will load if you have the memory for it" - "GGUF" is like the format the model is in. Someone will take whatever format mistral or facebook or whoever publishes and convert it to GGUF format, and from there, someone will start to quantize the models into smaller files (with less ability) as GGUF.

bogtog•1d ago

That's $2000 but for just 3.5-4.25 tokens/s? I'm hesitant to say that 4 tokens/s is useless, but that is a tremendous downgrade (although perhaps some smaller model would be usable)

genewitch•10h ago

right, but that is CPU only, there's no "tensor cores" in a GPU getting lit up for that 4t/s. So minimum to actually run deepseek is $2000, and the max is, well it's basically whatever you can afford, based on your needs. if you're only running single prompts at any given time, you only need the number of GPUs that will fit the model plus the context (as i mentioned), at minimum your outlay is going to be on the order of $130,000 in just GPUs.

If i can find it later, as i couldn't find it last night when i replied, there is an article that explains how to start adding consumer GPUs or even 1-2 Nvidia A100 80GB GPUs to the epyc build, to speed that up. I have a vague recollection that can get you up to 20t/s or thereabouts, but don't quote me on that, it's been a while.

redundantly•1d ago

Trivial after a substantial hardware investment and installation, configuration, testing, benchmarking, tweaking, hardening, benchmarking again, new models come out so more tweaking and benchmarking and tweaking again, all while slamming your head against the wall dealing with the mediocre documentation surrounding all hardware and software components you're trying to deploy.

Yup. Trivial.

blastro•1d ago

This hasn't been my experience. Pretty easy with AWS Bedrock

paxys•1d ago

Ah yes, "self host" by using a fully Amazon-managed service on Amazon's servers. How would a US court ever access those logs?

garyfirestorm•1d ago

Run a vllm docker container. Yeah the assumption is you already know what hardware you need or you already have it on prem. Assuming this is ITAR stuff, you must be self hosting everything.

dvt•1d ago

Even my 4-year-old M1 Pro can run a quantized Deepseek R1 pretty well. Sure, full-scale productizing these models is hard work (and the average "just-make-shovels" startups are failing hard at this), but we'll 100% get there in the next 1-2 years.

whatevaa•1d ago

Those small models suck. You need the big guns to get those "amazing" coding agents.

bravesoul2•1d ago

Local for emotional therapy. Big guns to generate code. Local to edit generated code once it is degooped and worth something.

benoau•1d ago

I put it LM Studio on an old gaming rig with a 3060 TI, took about 10 minutes to start using it and most of that time was downloading a model.

jjmarr•1d ago

If you're dealing with ITAR compliance you should have experience with hosting things on-premises.

genewitch•1d ago

I'm for hire, I'll do all that for any company that needs it. Email in profile. Contract or employee, makes no difference to me.

dlivingston•21h ago

Yes. The past two companies I've been at have self-hosted enterprise LLMs running on their own servers and connected to internal documentation. There is also Azure Cloud for Gov and other similar privacy-first ways of doing this.

But also, running LLMs locally is easy. I don't know what goes into hosting them, as a service for your org, but just getting an LLM running locally is a straightforward 30-minute task.

aydyn•1d ago

It is not at all trivial for an organization that may be doing everything on the cloud to locally set up the necessary hardware and ensure proper networking and security to that LLM running on said hardware.

woodrowbarlow•19h ago

> NONE of them have even mentioned the privacy aspect

because the privacy aspect has nothing to do with LLMs and everything to do with relying on cloud providers. HN users have been vocal about that since long before LLMs existed.

ljm•1d ago

Where is the source? OP goes to a mastodon instance that can’t handle the traffic.

ETH_start•1d ago

Two things. First, the judge could have issued a narrowly tailored order — say, requiring OpenAI to preserve only those chats that a filter flags as containing substantial amounts of paywalled content from the plaintiffs. That would’ve targeted the alleged harm without jeopardizing the safety of massive amounts of unrelated user data.

Second, we’re going to need technology that can simply defy government orders, as digital technology expands the ability of one government order violating rights at scale. Otherwise, one judge — whether in the U.S., China, or India — can impose a sweeping decision that undermines the privacy and autonomy of billions.

DevX101•1d ago

There were some enterprises that refused to send any data to OpenAI, despite assurances that that data would not be logged. Looks like they've been vindicated in keeping everything on prem via self-hosted LLM models.

lxgr•1d ago

> OpenAI is NOW DIRECTED to preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court (in essence, the output log data that OpenAI has been destroying), whether such data might be deleted at a user’s request or because of “numerous privacy laws and regulations” that might require OpenAI to do so.

Spicy. European courts and governments will love to see their laws and legal opinions being shrugged away in ironic quotes.

reassess_blind•1d ago

Will the EU respond by blocking them from doing business in the EU, given they're not abiding by GDPR?

echelon•1d ago

Hopefully.

We need many strong AI players. This would be a great way to ensure Europe can grow its own.

ronsor•1d ago

> This would be a great way to ensure Europe can grow its own.

The reason this doesn't happen is because of Europe's internal issues, not because of foreign competition.

lxgr•1d ago

Arguably, until recently there just wasn't any reason to: Europeans were happy buying American software and online services; Americans were happy buying German cars and pharmaceuticals.

glookler•1d ago

Personally, I don't think the US clouds win anything on merit.

It's hard/pointless to motivate engineers to use other options and their significance doesn't grow since Engineers won't blog that much about them to show their expertise, etc. Certification and experience with a provider with 10%-80% market share is a future employment reason to put up with a lot of trash, and the amount of help to work around that trash that has made it into places like ChatGPT is mindboggling.

a2128•1d ago

It would be a political catastrophe right now if the EU blocked US companies due to them needing to comply with temporary US court orders. My guess is this'll be swept under the rug and permitted under the basis of a legal obligation

selcuka•1d ago

What about the other way around? Why don't we see a US court order that is in conflict with EU privacy laws as a political catastrophe, too?

philipov•1d ago

Because courts are (wrongly) viewed as not being political, and public opinion hasn't caught up with reality yet.

ethbr1•1d ago

The court system as a whole is more beholden to laws as written than politics.

And that's a key institution in a democracy, given the frequency with which either the executive or legislative branches try to do illegal things (defined by constitutions and/or previously passed laws).

philipov•1d ago

Yes, courts ought to be apolitical. Just that recently, especially the supreme court has not been meeting that expectation.

StanislavPetrov•1d ago

Courts have always been political, which is why "jurisdiction shopping" has been a thing for decades. The Supreme Court, especially, has always been political, which is why one of the biggest issues in political campaigns is who is going to be able to nominate new justices. Most people of all political persuasions view courts as apolitical when those courts issue rulings in that affirm their beliefs, and political when they rule against them.

You're right though, in a perfect world courts would be apolitical.

intended•1d ago

The American Supreme Court could have been balanced though. Sadly, one team plays to win, the other team wants to be in a democracy. The issue is not the politics of the court, but the enforced Partisanship which took hold of the Republican Party post watergate.

All systems can be bent, broken, or subverted. Still, we need to make systems which do the best within the bounds of reality.

StanislavPetrov•1d ago

>Sadly, one team plays to win, the other team wants to be in a democracy.

As a lifelong independent, I can tell you that this sort of thinking is incredibly prevalent and also incredibly wrong. Even a casual look at recent history proves this. How do you define "democracy"? Most of us define it as "the will of the people". Just recently, however, when "the will of the people" has not been the will of the ruling class, the "will of the people" has been decried as dangerous populism (nothing new but something that has re-emerged recently in the so-called Western World). It is our "institutions" they argue, that are actually democracy, and not the will of the foolish people who are ignorant and easily swayed.

>All systems can be bent, broken, or subverted.

Very true, and the history of our nation is proof of that, from the founding right up to the present day.

>Still, we need to make systems which do the best within the bounds of reality.

It would be nice, but that is a long way from how things are, or have ever been (so far).

collingreen•1d ago

My impression was that American democracy is supposed to "derive its power from those being governed" (as opposed to being given power by God) and pretty explicitly was designed to actively prevent "the tyranny of the majority", not enable it.

I think it's a misreading to say the government should do whatever the whim of the most vocal, gerrymandered jurisdictions are. Instead, it is a supposed to be a republic with educated, ethical professionals doing the lawmaking within a very rigid structure designed to limit power severely in order to protect individual liberty.

For me, the amount of outright lying, propaganda, blatant corruption, and voter abuse makes a claim like "democracy is the will of the most people who agree" seem misguided at best (and maybe actively deceitful).

Re reading your comment, the straw man about "democracy is actually the institutions" makes me think I may have fallen for a troll so I'm just going to stop here.

StanislavPetrov•1d ago

>Re reading your comment, the straw man about "democracy is actually the institutions" makes me think I may have fallen for a troll so I'm just going to stop here.

You haven't, so be assured.

>I think it's a misreading to say the government should do whatever the whim of the most vocal, gerrymandered jurisdictions are.

It shouldn't, and I didn't argue that. My argument is that the people in charge have completely disregarded the will of the people en mass for a long time, and that the people are so outraged and desperate that at this point they are willing to vote for anyone who will upend the elite consensus that refuses to change.

>Instead, it is a supposed to be a republic with educated, ethical professionals doing the lawmaking within a very rigid structure designed to limit power severely in order to protect individual liberty.

How is that working out for us? Snowden's revelations were in 2013. An infinite number of blatantly illegal and unconstitutional programs actively being carried out by various government agencies. Who was held to account? Nobody. What was changed? Nothing. Who was in power? The supposedly "good" team that respects democracy. Go watch the conformation hearing of Tulsi Gabbard from this year. Watch Democratic Senator after Democratic Senator denounce Snowden as a traitor and repeatedly demand that she denounce him as well, as a litmus test for whether or not she could be confirmed as DNI (this is not a comment on Gabbard one way or another). My original comment disputed the contention that one party was for democracy and the other party was against it. Go watch that video and tell me that the Democrats support liberty, freedom, democracy and a transparent government. I don't support either of the parties, and this is one of the many reasons why.

vanviegen•1d ago

> You're right though, in a perfect world courts would be apolitical.

Most other western democracies are a lot closer to a perfect world, it seems.

StanislavPetrov•1d ago

Germany, where they lock you up for criticizing politicians[1] or where they have a ban against protesting for Palestine because it's "antisemitic"?[2]

Or UK where you can get locked up for blasphemy[3] or where they lock up ~30 people a day for saying offensive things online because of their Online Safety Act?[4]

Or perhaps Romania where an election that didn't turn out the way the EU elites wanted is overturned based on nebulous (and later proven false) accusation that the election was somehow influenced by a TikTok campaign by the Russians that later turned out to have been funded by a Romanian opposition party.[5]

I could go on and on, but unfortunately most other western democracies are just as flawed, if not worse. Hopefully we can all strive for a better future and flush the authoritarians, from all the parties.

[1] https://www.youtube.com/watch?v=-bMzFDpfDwc

[2] https://www.euronews.com/2023/10/19/mass-arrests-following-p...

[3] https://news.sky.com/story/man-convicted-after-burning-koran...

[4] https://www.thetimes.com/uk/crime/article/police-make-30-arr...

[5] https://www.politico.eu/article/investigation-ties-romanian-...

vanviegen•1d ago

I understand these are court decisions you don't agree with. (And neither do I for the most part, though I imagine some of these cases to have more depth to them.)

But is there any reason to believe that judged were pressured/compelled by political powers to make these decisions? Apart from, of course, the law created by these politicians, which is how the system is intended to work.

StanislavPetrov•1d ago

>But is there any reason to believe that judged were pressured/compelled by political powers to make these decisions?

No, but I have every reason to believe that the judges who made these decisions were people selected by political powers so that they would make them.

>Apart from, of course, the law created by these politicians, which is how the system is intended to work.

But the system isn't working for the people, it is horribly broken. The people running the system are mostly corrupt and/or incompetent, which is why so many voters from a wide variety of countries, and across the political spectrum, are willing to vote for anyone (even people who are clearly less than ideal) that shits all over the system and promises to smash it. Because the system is currently working exactly how it's intended to work, most people hate it and nobody feels like they can do anything about it.

bee_rider•1d ago

Even if the we imagined the courts as apolitical (and I agree with you, they actually are political so imagining otherwise is silly), the question of how to react to court cases in other countries is a matter of geopolitics and international relations.

While folks believe all sorts of things, I don’t think anyone is going to call international relations apolitical!

Nasrudith•21h ago

International relations could fairly be called anarchic because they aren't bound by law and no entity is capable of enforcing them against nation states. Remember that whenever 'sovereignty' is held up as some sacred, shining ideal what they really mean is 'the ability to do whatever the hell they want without being held accountable'.

MattGaiser•1d ago

EU has few customer facing tech companies of note.

bee_rider•1d ago

We’re doing our best to provide them an opening, though.

lmm•1d ago

Because deep down Americans don't actually have any respect for other countries. This sounds like a flamebait answer, but it's the only model I can reconcile with experience.

DocTomoe•1d ago

Without trying to become too political, but thanks to recent trade developments, right now the US is under special scrutiny to begin with, and goodwill towards US companies - or courts - has virtually evaporated.

I can see that factoring in in a decision to penalise an US company when it breaks EU law, US court order or not.

yatopifo•1d ago

Ultimately, American companies will be pushed out of the EU market. It’s not going to happen overnight, but the outcome is unavoidable in light of the ongoing system collapse in the US.

rafaelmn•1d ago

EU software scene would take a decade to catch up. Only alternative being if AI really delivers on being a force multiplier - but even then EU would not have access to SOTA internally.

blagund•1d ago

What does the EU lack? Is it the big corp infra? Or something more fundamental?

KoolKat23•1d ago

Big corpo cash and big risk appetite.

inglor_cz•1d ago

In my opinion, we lack two things:

a) highly qualified people, even European natives move to Silicon Valley. There is a famous photo of the OpenAI core team with 6 Polish engineers and only 5 American ones;

b) culture of calculated risk when it comes to investment. Here, bankruptcy is an albatross around your neck, both legally and culturally, and is considered a sign of you being fundamentally inept instead of maybe just a misalignment with the market or even bad luck. You'd better succeed on your first try, or your options for funding will evaporate.

dukeyukey•1d ago

Worth pointing out that DeepMind was founded in London, the HQ is still here, and so is the founder and CEO. I've lived in North London for 8 years now, there are _loads_ of current-and-former DeepMind AI people here. Now that OpenAI, Anthropic, and Mistral have offices here the talent density is just going up.

On risk, we're hardly the Valley, but a failed startup isn't a black mark at all. It's a big plus in most tech circles.

inglor_cz•21h ago

The UK is a bit different in this regard, same as Scandinavia.

But in many continental countries, bankruptcy is a serious legal stigma. You will end up on public "insolvency lists" for years, which means that no bank will touch you with a 5 m pole and few people will even be willing to rent you or your new startup office space. You may even struggle to get banal contracts such as "five SIMs with data" from mobile phone operators.

There seems to be an underlying assumption that people who go bankrupt are either fatally inept or fraudsters, and need to be kept apart from the "healthy" economy in order not to endanger it.

ben_w•1d ago

Given what happened with DeepSeek, "not state of the art" can still be simultaneously really close to the top, very sudden, very cheap, and from one small private firm.

rafaelmn•1d ago

Not really with the EU data sources disclosure mindset, GDPR and all that. China has a leg up in the data game because they care about copyright/privacy and IP even less than US companies. EU is supposedly booting US companies because of this.

ben_w•1d ago

The data sources is kinda what this court case is about, and even here on HN a lot of people get very annoyed by the application of the "open source" label to model weights that don't have the source disclosure the EU calls for.

GDPR is about personally identifiable humans. I'm not sure how critical that information really is to these models, though given the difficulty of deleting it from a trained model when found, yes I agree it poses a huge practical problem.

rafaelmn•4h ago

> and even here on HN a lot of people get very annoyed by the application of the "open source" label to model weights that don't have the source disclosure the EU calls for.

That's because they are obviously trained on copyrighted content but nobody wants to admit it openly because that opens them to even more legal trouble. Meanwhile China has no problem violating copyright or IP so they will gladly gobble up whatever they can.

I don't think you can really compete in this space with the EU mindset, US is playing it smart and leaving this to play out before regulating. This is why EU is not the place for these kinds of innovations, the bureaucrats and the people aren't willing to tolerate disruption.

romanovcode•21h ago

Why? I would see it 2 years ago but now every other platform has completely catched-up to ChatGPT. The LeChat or whatever the French alternative is just as good.

ensignavenger•1d ago

Doesn't GDPR have an explicit exemption for legal compliance?

lxgr•1d ago

Yes, but somehow I feel like "a foreign court told us to save absolutely everything" will not hold up in the EU indefinitely.

At least in sensitive contexts (healthcare etc.) I could imagine this resulting in further restrictions, assuming the order is upheld even for European user's data.

greatgib•1h ago

Legal compliance with European laws that they are subject of. Not any random law around the world.

dijksterhuis•1d ago

GDPR allows for this as far as i can tell (IANAL)

> Paragraphs 1 and 2 shall not apply to the extent that processing is necessary:

> ...

> for the establishment, exercise or defence of legal claims.

https://gdpr-info.eu/art-17-gdpr/

killerpopiller•1d ago

if you are the controller (or data subject). ChatGPT is the Processor. Otherwise EU controller processing PII in ChatGPT have a problem now.

ndsipa_pomu•1d ago

‘Controller’ means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data.

‘Processor’ means a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller.

_Algernon_•1d ago

Do legal claims outside eu jurisdiction apply? Seems like too big of a loophole to let any court globally sidestep GDPR.

gaogao•1d ago

The order is a bit broad, but legal holds frequently interact with deletion commitments. In particular, the only purpose of data deleted under GDPR held up by a legal hold should be for that legal hold, so it would be a big no-no if OpenAI continued to use that data for training.

voxic11•1d ago

> The General Data Protection Regulation (GDPR) gives individuals the right to ask for their data to be deleted and organisations do have an obligation to do so, except in the following cases:

> there is a legal obligation to keep that data;

https://commission.europa.eu/law/law-topic/data-protection/r...

tgsovlerkhgsel•1d ago

Except that transferring the data to the US is likely illegal in the first place, specifically due to insufficient legal protections against overreach like this.

shortsunblack•43m ago

The legal obligation has to come from relevant member state or EU law. Not third party countries' laws.

This at best is force majeure that prohibits OpenAI with satisfying its contractual obligations that are there to comply with EU law. But contractual obligations are not the only control organizations have to ensure compliance with EU law, so this is not a defense.

hulitu•1d ago

No. GDPR was never enforced, else Microsoft, Meta, Google and Apple couldn't do business in the EU.

udev4096•1d ago

EU privacy laws are nothing but theatre. How many times have they put up a law which would undermine end-to-end encryption or the recent law which "bans" anonymous crypto coins? It's quite clear they are very good at virtue signaling

thrwheyho•1d ago

Not sure why are you down voted here. I'm in EU and everything I can say about GDPR is that it does not work (simple example, government itself publishes my data including National ID number on their portal with property data; anybody can check who my parents are or if I'm on mortgage). And there are more.

udev4096•5h ago

People on HN are fucking delusional. They have no connection to real world scenarios and most of them don't care about actual privacy. They only care about laws which make it sound like everything's fine

shadowgovt•1d ago

Europe doesn't have jurisdiction over the US, and US courts have broad leeway in a situation like this.

lxgr•1d ago

Sure, but the EU has jurisdiction over European companies and can prohibit them from storing or processing their data in jurisdictions incompatible with its data privacy laws.

There's also an obvious compromise here – modify the US court ruling to exclude data of non-US users. Let's hope that cool heads prevail.

lrvick•1d ago

And yet, iPhones are shipping usb C.

Making separate manufacturing lines for Europe vs US is too expensive, so in effect, Europe forced a US company to be less shitty globally.

VladVladikoff•1d ago

USB C is a huge downgrade from lightning. The lightning connector is far more robust. The little inner connector board on USB C is so fragile. I’ll never understand why people wanted this so badly.

Cloudef•1d ago

Literally every other device is usb-c, thats why

ajr0•1d ago

I'm a big fan of this change, however, I think to a black mirror episode [0] where essentially little robot dogs could interface with everything they came into contact with because every connection was the same, it may be trivial to have multiple connections for a weapon like this but the take away I had from this is that 'variety' may be better than a single standardized solution. Partly because it is more expensive to plan for multiple types of inputs and making the cost of war go up will make it more difficult which I think inherently is the idea behind some of the larger cybersecurity companies, a hack can only work once then everyone has defenses for it after that single successful attack, this makes it more expensive to successfully stage attacks. Huge digression from this convo... but I think back to this constantly.

[0] https://en.wikipedia.org/wiki/Metalhead_(Black_Mirror)

lxgr•1d ago

Proprietary ports are textbook security by obscurity.

intended•1d ago

Many defensive are a trade off between the convenience of non attackers, and the trouble created for attackers.

Given the sheer number of devices we interact with in a single day, USB-C as a standard is worth the trade off for an increase in our threat surface area.

1000 Attackers can carry around N extra charging wires anyway.

10^7 users having to keep say, 3 extra charging wires on average? That’s a huge increase in costs and resources.

(Numbers made up)

bee_rider•1d ago

Two thoughts:

1) Surely the world conquering robo-army could get some adapters.

2) To the extend to which this makes anything more difficult, it is just that it makes everything a tiny bit less convenient. This includes the world-conquering robo-army, but also everything else we do. It is a general argument against capacity, which can’t be right, right?

lou1306•1d ago

So I suppose Lightning's abysmally slow transfer speed is also a security feature? No way you can exfiltrate my photo roll at 60 MBps :)

itake•1d ago

All of my devices are lightening. Now I have to carry around 2 cables.

0x073•1d ago

What should I say with an iPod classic? 3!

itake•1d ago

why do I need to replace my devices every year?

broodbucket•1d ago

There's simply more people with the opposite problem, especially in markets where Apple is less prevalent, which is most of them around the world. When there's more than one type of cable, plenty of people are going to be inconvenienced when one is chosen as the cable to rule them all, but in the end everyone wins, it's just annoying to get there.

jjk166•1d ago

> plenty of people are going to be inconvenienced when one is chosen as the cable to rule them all, but in the end everyone wins

That's not everyone wins. The people that actually bought these devices now have cables that don't work and need to replace with a lower quality product, and the people who were already using something else are continuing to not need cables for these devices. The majority breaks even, a significant minority loses.

Simply not choosing one cable to rule them all lets everyone win. There is no compelling reason for one size to fit all.

FeepingCreature•1d ago

It's a temporary drawback; everyone wins in the long term because there's only one standard.

jjk166•1d ago

Again, that's not a win for anybody. No one winds up in a better position than where they started, there is no payback in exchange for the temporary drawback, which also isn't temporary if the final standard is inferior.

If some people like hip hop but more people like country, it's not a win for everybody to eliminate the hip hop radio stations so we can all listen to a single country station.

inglor_cz•1d ago

This is closer to having a common railway gauge, though.

jjk166•11h ago

Not at all. A common railway gauge is necessary for different parts of the rail network to be joined together. If one section of the network has a different gauge, it is cut off and can not be joined without being completely replaced, leaving you with two less capable rail networks. Everyone does benefit from a more capable rail network.

Further, rail gauge is not a consumer choice. If there were two rail gauges and your local rail station happened to have a different gauge than your destination, you'd be SOL. A different rail gauge may provide benefits for people with specific needs, but you don't get to take advantage of those benefits except by blind luck.

There is no such benefit from standardizing cable connectors. If someone charges their phone with the same style cable as you, you gain nothing. If someone uses a different cable, you lose nothing. There is no reason for anyone not to use their preferred cable which is optimal for their use case.

itake•1d ago

Everyone, but the environment wins. Once I upgrade my phone and Airpods, I will have to throw out my pack of perfectly working lightning cables.

I'm sure there are more than a few people that would end up throwing out their perfectly functional accessories, only for the convenience of carrying less cables.

AStonesThrow•1d ago

Why don’t you donate them to a thrift store or educational charity? Are there no non-profits who refurbish and reuse electronics in your community?

itake•1d ago

I don't want to burn fuel trying to find a place to accept used 5 year old Airpod Pros with yellow earbud plastic.

I don't want to ship another cable across the Pacific Ocean from China so I can have a cable that works on my devices.

I want to keep using them until they don't work and I can't repair them any more.

Larrikin•1d ago

All of mine are USB C and now I only carry around one. All of the lightning cords and micro USB cables are in a drawer somewhere with the DVI, component cables, etc.

itake•1d ago

neat. I get to throw out my perfectly working apple products that have years left in them and switch re-sync my cables.

That is great you spent the money for this, but I'm not ready to throw away my perfectly fine devices.

ChrisMarshallNY•1d ago

Reminds me of something…[0]

[0] https://foreverstudios.com/betamax-vs-vhs/

lxgr•1d ago

Have you ever had one fail on an Apple device?

The accurate comparison here isn’t between random low-budget USB-C implementations and Lightning on iPhones, but between USB-C and Lightning both on iPhones, and as far as I can tell, it’s holding up nicely.

xlii•1d ago

I have multiple ones. They accumulate dust easily and get damaged much more often. You won’t see that if 99% of the time is in clean environment. As of today I have 3 iPhones that won’t charge on the wire. Those are physical damages so it’s not like anything covers it (and Apple Care is not available in Poland). Same happens with the cables. I’m replacing USB-C display cable (that’s lightning I suppose) every year now because they get loose and start to disconnect if you sneeze in the vicinity.

I despise USB-C with all my heart. Amount of cable trash has tripled over the years.

lostlogin•1d ago

Maybe try wireless charging.

I find it superior to both lightning and USB-C.

xlii•1d ago

I do, I rarely use USB-C on Apple devices (well outside of Mac). Wireless is great and I have nice looking night clock and moving files/other stuff over airdrop works good enough for me not plugging anything in. Recently I had to charge over the phone which required removal of pocket lint beforehand.

khimaros•1d ago

with USB-C, the fragile end is on the cable instead of the port. that is a design feature.

com2kid•1d ago

I thought so, but my Pixel 9 USB-C port falls out of place now after less than a year. :(

shadowgovt•23h ago

Same problem. This may be the last Pixel I own because two in a row now have lost their USB-C sockets.

linotype•1d ago

What are you doing to your USB-C devices? I’ve owned dozens and never had a single port break.

Our_Benefactors•1d ago

It’s a huge upgrade on the basis of allowing me to remove lightning cables from my life

On a specsheet basis it also charges faster and has a higher data transmission rate.

Lightning cables are not more robust. They are known to commonly short across the power pins, often turning the cable into an only-works-on-one-side defect. I replaced at least one cable every year due to this.

shadowgovt•1d ago

Lightning was basically what happened when Apple got tired of waiting for the standards committee to converge on what USB-C would be, so they did their own.

And... yeah, it turned out better than the standard. Their engineers have really good taste.

WA•1d ago

And then the rest of the world got tired of Apple not proposing this supposedly superior piece of engineering as a new standard... because of licensing shenanigans.

bowsamic•1d ago

Because I can charge my iPhone, my AirPods, and my mac, all with the same charger

the_duke•1d ago

The EU has a certain amount of jurisdiction over all companies providing a service to customers located in the EU.

A US company can always stop serving EU customers if it doesn't want to comply with EU laws, but for most the market is too big to ignore.

shadowgovt•1d ago

So when one government compels an action another forbids, that's an international political situation.

There is no supreme law at that level; the two nations have to hash it out between them.

KingOfCoders•1d ago

EU–US Data Privacy Framework is an US scam to get European user data.

wkat4242•1d ago

It totally is that's why it keeps being shot down by the courts and relaunched under a different name.

But the EU willingly participates in this. Probably because they know there's no viable alternative for the big clouds.

This is coming now though since the US instantly.

Y_Y•1d ago

I know I viable alternative to the big clouds.

Don't use them! They cost too much in dollar terms, they all try to EEE lock you in with "managed" (and subtly incompatible) versions of services you would otherwise run yourself. They are too big to give a shit about laws or customer whims.

I have plenty of experience with the big three clouds and given the choice I'll run locally, or on e.g. Hetzner, or not at all.

My company loves to penny-pinch things like lunch reimbursement and necessary tools, but we piss away latge sums on unused or under-used cloud capacity with glee, be ause that is magically billed elsewhere (from the point of view of a given manager).

It's a racket, and I'm by mo means the first to say so. The fact that this money-racket is also a dat-racket doesn't surprise me in the least. It's just good racketeering!

wkat4242•22h ago

Yes. Cloud is great for a very specific usecase. Venture capital startups which either go viral and explode, or die within a year. In those cases you need the capacity to automatically scale and also to only pay for the resources you actually use so you the service pays for itself. You also have no capex and you can drop the costs instantly if you need to close up shop. For services that really need infinite and instant scaling and flexibility, cloud is a genuinely great option.

However that's not what most traditional companies do. What my employer does, is picking up the physical servers they had in our datacenters, dump them on an AWS compute box they run 24/7 without any kind of orchestration and call it "cloud". That's not what cloud is, that is really just someone else's computer. We spend a LOT more now but our CIO wanted to "go cloud" because everyone is so it was more a tickbox than a real improvement.

Microservices, object storage etc, that is cloud.

KingOfCoders•23h ago

"But the EU willingly participates in this."

Parts of the European Commission "influenced" by lobbyists collude with the US.

wkat4242•23h ago

Sorry "instantly" should have been "is in chaos". Autocorrect...

Aeolos•1d ago

And just like that, OpenAI got banned in my company today.

Good job.

PeterStuer•1d ago

Don't tell them all their other communication is intercepted and retained on the same basis. Good luck running your business in full isolation.

DaiPlusPlus•1d ago

> OpenAI got banned in my company today

Sarbanes–Oxley would like a word.

Y_Y•1d ago

Do you mean to say that Sarbox might preclude this? Or that it should have been banned already? The meaning isn't clear to me and I would be grateful for further explanation.

DaiPlusPlus•11h ago

I’m saying that any org that ditched OpenAI for this specific reason is also likely committing investor fraud.

Ekaros•1d ago

Time to hit them with that 4% fine on revenue while they still have some money...

Frieren•1d ago

> Spicy. European courts and governments will love to see their laws and legal opinions being shrugged away in ironic quotes.

The GDPR allows to retain data when require by law as long as needed. People that make regulations may make mistakes sometimes, but they are no that stupid as to not understand the law and what things it may require.

The data was correctly deleted on user demand. But it cannot be deleted where there is a Court order in place. The conclusion of "GDPR is in conflict with the law" looks like rage baiting.

_Algernon_•1d ago

It's questionable to me whether a court order of a non-eu court applies. "The law" is EU law, not American law.

If any non-eu country can circumvent GDPR by just making a law that it doesn't apply, the entire point of the regulation vanishes.

kbelder•1d ago

Doesn't that work both ways? Why should the EU be able to override American laws regarding an American company?

FeepingCreature•1d ago

Because it's European users whose data is being recorded on the order of a court that doesn't even have jurisdiction over them?

midasz•1d ago

It doesn't really matter from what country the company is. If you do business in the EU then EU laws apply to the business you do in the EU. Just like EU companies adhere to US law for the business they do in the US.

_Algernon_•1d ago

Because EU has jurisdiction when the american company operates in the EU.

thephyber•1d ago

It’s WAY more complicated than that.

Where is the HQ of the company?

Where does the company operate?

What country is the individual user in?

What country do the servers and data reside in?

Ditto for service vendors who also deal with user data.

Even within the EU, this is a mess and companies would rather use a simple heuristic like put all servers and store all data for EU users in the most restrictive country (I’ve heard Germany).

_Algernon_•1d ago

Maybe when talking about the GDPR specifics, but not when it comes to whether the EU has jurisdiction over companies in the EU.

throw_a_grenade•1d ago

> Where is the HQ of the company?

If outside EU, then they need to accept EU jurisdiction and notify who is representative plenipotentiary (== can make decisions and take liability on behalf of the company).

> Where does the company operate?

Geography mostly doesn't matter as long as they interact with EU people. Because people are more important.

> What country is the individual user in?

Any EU (or EEA) country.

> What country do the servers and data reside in?

Again, doesn't matter, because people > servers.

It's almost like if bureaucrats who are writing regulations are experienced in writing regulations in such a way they can't be circumvented.

EDIT TO ADD:

From OpenAI privacy policy:

> 1. Data controller

> If you live in the European Economic Area (EEA) or Switzerland, OpenAI Ireland Limited, with its registered office at 1st Floor, The Liffey Trust Centre, 117-126 Sheriff Street Upper, Dublin 1, D01 YC43, Ireland, is the controller and is responsible for the processing of your Personal Data as described in this Privacy Policy.

> If you live in the UK, OpenAI OpCo, LLC, with its registered office at 1960 Bryant Street, San Francisco, California 94110, United States, is the controller and is responsible for the processing of your Personal Data as described in this Privacy Policy.

Y_Y•1d ago

As you astutely note, the company probably has it's "HQ" (for some legal definition of HQ) a mere 30 minutes across Dublin (Luas, walk in rain, bus, more rain) from the Data Protection Commission. It's very likely that whatever big tech data-hoarder you choose has a presence very close to their opposite number in both of these cases.

If it was easier or more cost-effective for these companies not to have a foot in the EU they wouldn't bother, but they do.

chris12321•1d ago

> It's almost like if bureaucrats who are writing regulations are experienced in writing regulations in such a way they can't be circumvented.

Americans often seem to have the view that lawmakers are bumbling buffoons who just make up laws on the spot with no thought given to loop holes or consequences. That might be how they do it over there, but it's not really how it works here.

Scarblac•1d ago

They can't override laws of course, but it could mean that if two jurisdictions have conflicting laws, you can't be active in both of them.

mattlondon•1d ago

Likewise, why should America be able to override European laws regarding European users in Europe?

It's all about jurisdiction. Do business in Country X? Then you need to follow Country X's laws.

Same as if you go on vacation to County Y. If you do something that is illegal in Country Y while you are there, even if it's legal in your home country, you still broke the law in Country Y and will have to face the consequences.

lmm•1d ago

Because we're talking about the personal data of EU citizens. If it's to be permitted to be sent to America at all, that must come with a guarantee that EU-standard protections will continue to apply regardless of American law.

bonoboTP•1d ago

> If it's to be permitted to be sent to America at all

Do you mean that I, an EU citizen am being granted some special privilege from EU leadership to send my data to the US?

throw_a_grenade•1d ago

No, the company you're sending it to is required to care for it. Up to and including refusing to accept that data if need be.

wkat4242•1d ago

It's the other way around. The EU has granted US companies a temporary permission to handle EU customers' data. https://en.m.wikipedia.org/wiki/EU%E2%80%93US_Data_Privacy_F...

I say temporary because it keeps being shot down in court for lax privacy protections and the EU keeps refloating it under a different name for economic reasons. Before this name it was called safe harbor and after that it was privacy shield.

andrecarini•1d ago

It works the other way around; the American company is granted a special privilege to retrieve EU citizen data.

bonoboTP•1d ago

I'm not sure they are "retrieving" data. People register on the website and upload stuff they want to be processed and used.

I mean, sometimes the government steps in when you willingly try to hand over something on your own will, such as very strict rules around organ donation, I can't simply decide to give my organs to some random person for arbitrary reasons even if I really want to. But I'm not sure if data should be the same category where the government steps in and says "no you can't upload your personal data to an American website"

lmm•1d ago

Of course you don't need permission to do something with your own data. But if someone wants to process other people's data, that's absolutely a special privilege that you don't get without committing to appropriate safety protocols.

Garlef•1d ago

You don't understand how that works:

EU companies are required to act in compliance with the GDPR. This includes all sensitive data that is transfered to business partners.

They must make sure that all partners handle the (sensitive part of the) transfered data in a GDPR compliant way.

So: No law is overriden. But in order to do business with EU companies, US companies "must" offer to treat the data accordingly.

As a result, this means EU companies can not transfer sensitive data to US companies. (Since the president of the US has in principle the right to order any US company to turn over their data.)

But in practice, usually no one cares. Unless someone does and then you might be in trouble.

blitzar•1d ago

Taps the sign ... US companies operating in the EU are subject to EU laws.

Frieren•1d ago

> GDPR: “Any judgment of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may only be recognized or enforceable if based on an international agreement…”

That is why international agreements and cooperation is so important.

Agreement with the United States on mutual legal assistance: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=legissum...

Regulatory entities are quite competent and make sure that most common situations are covered. When some new situation arises an update to the treaty will be created to solve it.

_Algernon_•1d ago

Seems like the EU should be less agreeable with these kinds of treaties going forward. Though precedent is already set by the US that international agreements don't matter so arguably the EU should just ignore this.

friendzis•1d ago

> Regulatory entities are quite competent and make sure that most common situations are covered.

There's "legitimate interest", which makes the whole GDPR null and void. Every website nowdays has the "legitimate interest" toggled on for "track user across services", "measure ad performance" and "build user profile". And it's 100% legal, even though the official reason for GDPR to exist in the first place is to make these practices illegal.

troupo•1d ago

"legitimate interest" isn't a carte blanche. Most of those "legitimate interest" claims are themselves illegal

octo888•1d ago

Legitimate interest includes

- Direct Marketing

- Preventing Fraud

- Ensuring information security

It's weasel words all the way down. Having to take into account "reasonable" expectations of data subjects etc. Allowed where the subject is "in the service of the controller"

Very broad terms open to a lot of lengthy debate

troupo•1d ago

None of these allow you to just willy-nilly send/sell info to third parties. Or use that data for anything other than stated purposes.

> Very broad terms open to a lot of lengthy debate

Because otherwise no law would eve be written, because you would have to explicitly define every single possible human activity to allow or disallow.

bryanrasmussen•1d ago

preventing fraud and info security are legitimate, direct marketing may be legitimate but probably is not.

direct marketing that I believe is legitimate - offers with rebate on heightened service level if you currently have lower service level.

direct marketing that is not legitimate, this guy has signed up for autistic service for our video service (silly example, don't know what this would be), therefore we will share his profile with various autistic service providers so they can market to him.

friendzis•1d ago

> preventing fraud

Fraud prevention is literally "collect enough cross-service info to identify a person in case we want to block them in the future". Weasel words for tracking.

> therefore we will share his profile with various autistic service providers so they can market to him.

This again falls under legitimate interest. The user, being profiled as x, may have legitimate interest in services targeting x. But we can't deliver this unless we are profiling users, so we cross-service profile users, all under the holy legitimate interest

troupo•18h ago

> Fraud prevention is literally "collect enough cross-service info to identify a person in case we want to block them in the future". Weasel words for tracking.

You're literally not allowed to store that data for years, or to sell/use that data for marketing and actual tracking purposes.

friendzis•3h ago

You would not be allowed if not for legitimate interest.

Websites A and B buy fraud prevention service FPS, website A flags user x as fraudulent, how should FPS flag user x as high risk for website B if consent from user x was required?

Legitimate interest literally allows FPS to track users, build cross-service profiles, process and store their data in case FPS needs that data sometime in the future. Under legitimate interest response to query "What's the ratio of disputed transactions for this user?" is perfectly legal trigger to put all that data to use, even though it is for all intents and purposes indistinguishable from pre-GDPR tracking.

octo888•16h ago

And how funny - I just got an email from Meta about Instagram:

"Legitimate interests is now our legal basis for using your information to improve Meta Products"

Fun read https://www.facebook.com/privacy/policy?section_id=7-WhatIsO...

But don't worry, "None of these allow you to just willy-nilly send/sell info to third parties." !

octo888•1d ago

Exactly. The ECJ flapped a bit in 2019 about this but then last year opined that the current interpretation "legitimate interest" by the Dutch DPA is too strict (on the topic of whether purely commercial interests counts)

It's a farce and just like the US constitution they'll just continuously argue about the meanings of words and erode then over time

danlitt•1d ago

"legitimate interest" is a fact about the data processing. It cannot be "toggled on". It also does not invalidate all other protections (like the prevention of data from leaving the EEA).

friendzis•1d ago

https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

dotandgtfo•1d ago

None of those use cases are broadly thought of as legitimate interest and explicitly require some sort of consent in Europe.

Session cookies and profiles on logged in users is where I see most companies stretching for legitimate interest. But cross service data sharing and persistent advertising cookies without consent are clearly no bueno.

friendzis•1d ago

> But cross service data sharing and persistent advertising cookies without consent are clearly no bueno.

https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

bryanrasmussen•1d ago

legitimate interest is, for example - have some way to identify user who is logged in. So keep email address for logged in users. Have some way to identify people who are trying to get account that have been banned, so have a table of banned users with email addresses for example.

none of these others are legitimate interest. Furthermore combining the data from legitimate interest (email address to keep track of your logged in user) with illegitimate goals such as tracking across services would be illegitimate.

pennaMan•1d ago

Basically, the GDPR doesn’t guarantee your privacy at all. Instead, it hands it over to the state through its court system.

Add to that the fact that the EU’s heavy influence on the courts is a well-documented, ongoing deal, and the GDPR comes off as a surveillance law dressed up to seem the total opposite.

Y_Y•1d ago

Quite right it doesnt absolutely protect your privacy. I'd agree that it's full of holes, but I do think it also contains effective provisions which assist with users controlling their data and data which identifies them.

Which courts are influenced by the EU? I don't think it's true of US courts, and courts in EU nations are supposed to be influenced by it, it's in the EU treaties.

albert_e•1d ago

Does this apply to OpenAI models served via Microsoft Azure.

g42gregory•1d ago

Can somebody please post a complete list of these news organizations, demanding to see all of our ChatGPT conversations?

I see one of them: The New York Times.

We need to let people know who the other ones are.

dijksterhuis•1d ago

https://originality.ai/blog/openai-chatgpt-lawsuit-list

tgv•1d ago

Why?

DaSHacka•1d ago

To know what subscriptions we need to cancel.

tgv•19h ago

Yeah, shoot the messenger, that has always worked.

g42gregory•19h ago

Usually, the messenger does not file lawsuits…

crmd•1d ago

If you use ChatGPT or similar for any non-trivial purposes, future you is saying it’s essential that the chat logs do not map back to you as a human.

bongodongobob•1d ago

Why? I honestly don't understand what's so bad about my chat logs vs google having all my emails.

baby_souffle•1d ago

> Why? I honestly don't understand what's so bad about my chat logs vs google having all my emails.

You might be a more benign user of chatGPT. Other people have turned it into a therapist and shared wildly intimate things with it. There is a whole cottage industry of journal apps that also now have "ai integration". At least some of those apps are using openAI on the back end...

jacob019•1d ago

I think the court overstepped by ordering OpenAI to save all user chats. Private conversations with AI should be protected - people have a reasonable expectation that deleted chats stay deleted, and knowing everything is preserved will chill free expression. Congress needs to write clear rules about what companies can and can't do with our data when we use AI. But honestly, I don't have much faith that Congress can get their act together to pass anything useful, even when it's obvious and most people would support it.

amanaplanacanal•1d ago

If it's possible evidence as part of a lawsuit, of course they can't delete it.

jacob019•1d ago

A targeted order is one thing, but this applies to ALL data. My data is not possible evidence as part of a lawsuit, unless you know something I don't know.

artursapek•1d ago

That’s… not how discovery works

jacob019•1d ago

The government's power to compel private companies to preserve citizens' communications needs clear limits. When the law is ambiguous about these boundaries, courts end up making policy decisions that should come from Congress. We need legislative clarity that defines exactly when and how government can access private digital communications, not case-by-case judicial expansion of government power.

artursapek•23h ago

My point is lawsuits make your data part of discovery retroactively. You aren’t being sued right now, but perhaps you will be.

lcnPylGDnU4H9OF•22h ago

Their point is that the discovery is asking for data of unrelated users. Necessarily so unless the claim is that all users who delete their chats are infringing.

jacob019•22h ago

Your point illustrates exactly why the tension between due process and privacy rights can't be fairly resolved by courts alone, since they have an inherent bias toward preserving their own discovery powers.

nradov•1d ago

How did the court overstep? Orders to preserve evidence are routine in civil cases. Customer expectations about privacy have zero legal relevance.

jacob019•1d ago

Sure, preservation orders are routine - but this would be like ordering phone companies to record ALL calls just in case some might become evidence later. There's a huge difference between preserving specific communications in a targeted case and mass surveillance of every private conversation. The government shouldn't have that kind of blanket power over private communications.

charonn0•1d ago

> but this would be like ordering phone companies to record ALL calls just in case some might become evidence later

That's not a good analogy. They're ordered to preserve records they would otherwise delete, not create records they wouldn't otherwise have.

jacob019•1d ago

They are requiring OpenAI to log API calls that would otherwise not be logged. I trust when OpenAI says they will not log or train on my sensitive business API calls. I trust them less to guard and protect logs of those API calls.

jjk166•1d ago

Change calls to text messages. The important thing is the keeping records of things unrelated to an open case which affect millions of people's privacy.

Spivak•17h ago

I mean to be fair it is related to a current open case but the order is pretty ridiculous on its surface. It's feels different when the company and the employees thereof have to retain their own comms and documents, and that company must do the same for 3rd parties who are related but not actually involved in the lawsuit is a bit of a stretch.

Why the NYT cares about a random ChatGPT user bypassing their paywall when an archive.ph link is posted on every thread is beyond me.

__turbobrew__•10h ago

> I mean to be fair

protocolture•12h ago

No its pretty good. To refine it further, its why you put a single user under scrutiny on litigation hold rather than the whole exchange server.

nradov•1d ago

No, it wouldn't be like that at all. Phone companies and telephone calls are covered under a different legal regime so your analogy is invalid.

ethagnawl•1d ago

Why is AI special in this regard? Why is my exchange with ChatGPT any more privileged than my DuckDuckGo search for _HIV test margin of error_?

jacob019•1d ago

You're right, it's not special.

This is from DuckDuckGo's privacy policy: "We don’t track you. That’s our Privacy Policy in a nutshell. We don’t save or share your search or browsing history when you search on DuckDuckGo or use our apps and extensions."

If the court compelled DuckDuckGo to log all searches, I would be equally concerned.

robocat•19h ago

DuckDuckGo uses Bing.

It would be interesting to know how much Microsoft logs or tracks.

sib•19h ago

That's a pretty significant difference, though.

OpenAI (and other services) log and preserve your interactions, in order to either improve their service or to provide features to you (e.g., your chat history, personalized answers, etc., from OpenAI). If a court says "preserve all your user interaction logs," they exist and need to be preserved.

DDG explicitly does not track you or retain any data about your usage. If a court says "preserve all your users interaction logs," there is nothing to be preserved.

It is a very different thing - and a much higher bar - for a court to say "write code to begin logging user interaction data and then preserve those logs."

ethagnawl•18h ago

I should have said "web search", as that's really what I meant -- DDG was just a convenient counterexample.

webstrand•14h ago

OpenAI also claims to delete logs after 30 days if you've deleted them. Anything that you've deleted but hasn't been processed by OpenAI yet will now be open to introspection by the court.

energy123•1d ago

People upload about 100x more information about themselves to ChatGPT than search engines.

raincole•1d ago

AI is not special and that's the exact issue. The court made a precedence here. If OpenAI can be ordered to preserve all the logs, then DuckDuckGo can face the same issue even if they don't want to do that.

BrtByte•1d ago

The preservation order feels like a blunt instrument in a situation that needs surgical precision

marcyb5st•1d ago

Would it be possible to comply with the order by anonymizing the data?

The court is after evidence that users use ChatGPT to bypass paywalls. Anonymizing the data in a way that makes it impossible to 1) pinpoint the users and 2) reconstruct the generic user conversation history would preserve privacy and allow OpenAI to comply in good faith with the order.

The fact that they are blaring sirens and hide behind the "we can't, think about users' privacy" feels akin to willingful negligence or that they know they have something to hide.

Miraltar•1d ago

Anonymizing data is really hard and I'm not sure they'd be allowed to do it. I mean they're accused of deleting evidences, why would they be allowed to alter it ?

lcnPylGDnU4H9OF•22h ago

> feels akin to willingful negligence or that they know they have something to hide

Not at all; there is a presumption of innocence. Unless a given user is plausibly believed to be violating the law, there is no reason to search their data.

pjc50•1d ago

Consider the opposite prevailing, where I can legally protect my warez site simply by saying "sorry, the conversation where I sent them a copy of a Disney movie was private".

lcnPylGDnU4H9OF•22h ago

If specific users are violating the law, then a court can and should order their data to be retained.

riskable•20h ago

The legal situation you describe is a matter of impossibility and unrelated to the OpenAI case.

In the case of a warez site they would never have logged such a "conversation" to begin with. So if the court requested that they produce all such communications the warez site would simply declare that as, "Impossibility of Performance".

In the case of OpenAI the courts are demanding that they preserve all future communications from all their end users—regardless of whether or not those end users are parties (or even relevant) to the case. The court is literally demanding that they re-engineer their product to record all communications where none existed previously.

I'm not a lawyer but that seems like it would violate FRCP 26(b)(1) which covers "proportionality". Meaning: The effort required to record the evidence is not proportional relative to the value of the information sought.

Also—generally speaking—courts recognize that a party is not required to create new documents or re-engineer systems to satisfy a discovery request. Yet that is exactly what the court has requested of OpenAI.

NewJazz•1d ago

Increasingly irrelevant startup guards their moat.

mark_l_watson•1d ago

what about when users check "private chat"? Probably need to keep that logged also?

Does this pertain to Google Gemini, Meta chat, Anthropic, etc. also?

lrvick•1d ago

There is absolutely no reason for these logs to exist.

Run LLM in an enclave that generates ephemeral encryption keys. Have users encrypt text directly to those enclave ephemeral keys, so prompts are confidential and only ever visible in an environment not capable of logging.

All plaintext data will always end up in the hands of governments if it exists, so make sure it does not exist.

jxjnskkzxxhx•1d ago

Then a court will order that you don't encrypt. And probably go after you for trying to undermine the intent of previous court order. Or what, you thought you found an obvious loophole in the entire legal system?

lrvick•1d ago

Yes. Because once you have remote attestation, anyone can host these enclaves in any country, and charge some tiny fee for their gpu time.

Decentralize hosting and encryption then centralized developers of the open source software will be literally unable to comply.

This well proven strategy would however only be possible if anything about OpenAI was actually open.

paxys•22h ago

Encryption does not negate copyright laws. The solution here is for LLM builders to pay for training data.

ronsor•17h ago

The solution here is to get rid of copyright.

mucha•8h ago

That's happening. Unmodified LLM outputs aren't copyrightable.

TechDebtDevin•18h ago

Do you have any reading on this?

moshegramovsky•1d ago

Maybe a lawyer can correct me if I'm wrong, but I don't understand why some people in the article appear to think that this is causing OpenAI to breach their privacy agreement.

The privacy agreement is a contract, not a law. A judge is well within their rights to issue such an order, and the privacy agreement doesn't matter at all if OpenAI has to do something to comply with a lawful order from a court of competent jurisdiction.

OpenAI are like the new Facebook when it comes to spin.

naikrovek•1d ago

Yep. Laws supersede contracts. Contracts can’t legally bind any entity to break the law.

Court orders are like temporary, extremely finely scoped laws, as I understand them. A court order can’t compel an entity to break the law, but it can compel an entity to behave as if the court just set a law (for the specified entity, for the specified period of time, or the end of the case, whichever is sooner).

unyttigfjelltol•1d ago

If I made a contract with OpenAI to keep information confidential, and the newspaper demanded access, via Court discovery or otherwise, then both the Court and OpenAI definitely should be attentive to my rights to intervene and protect the privacy of my confidential information.

Normally Courts are oblivious to advanced opsec, which is one fundamental reason they got breached, badly, a few years ago. I just saw a new local order today on this very topic.[1] Courts are just waking up to security concepts that have been second nature to IT professionals.

From my perspective, the magistrate judge here made two major goofs: (1) ignoring opsec as a reasonable privacy right for customers of an internet service and (2) essentially demanding that several hundred million of them intervene in her court to demand that she respect their ability to protect their privacy.

The fact that the plaintiff is the news organization half the US loves to hate does not help, IMO. Why would that half of the country trust some flimsy "order" to protect their most precious secrets from an organization that lives and breathes cloak-and-dagger leaks and political subterfuge. NYT needed to keep their litigation high and tight and instead they drove it into a ditch with the help of a rather disappointing magistrate.

[1] https://www.uscourts.gov/highly-sensitive-document-procedure...

energy123•1d ago

Local American laws supersede the contract law operating in other countries that OpenAI is doing business in?

naikrovek•1d ago

local country laws supersede contract law in that country, as far as i am aware.

us law does not supersede foreign contract law. how would that even work? why would you think that was possible?

jjk166•1d ago

A court order can be a lawful excuse for non-performance of a contract, but it's not always the case. The specifics of the contract, the court order, and the jurisdiction matter.

shortsunblack•45m ago

OpenAI is breaching relevant laws that regulate data protection. Being compelled by a foreign power, for instance, are not grounds for data processing under GDPR.

OpenAI has not started to "be incompliant" with GDPR with this order, yes. More like OpenAI was always incompliant because it does not have relevant controls installed that mitigates extraterritorial tendencies of US law.

Regardless of legality of retention this order brings, them not notifying their users (the court did not compel them to hide this from their users (no gag order is in place) about this material change of data processing, could be constituted as breach of various consumer protection laws, misrepresentation, unfair dealing, false advertising and related.

wglb•1d ago

I have a friend who is a Forensic Attorney (certified Forensic Examiner and licensed attorney). He says "You folks are putting all of this subpoenable information on Google and other sites"

husky8•1d ago

https://speaksy.chat/

Speaksy to the rescue for privacy and curiosity (easy-access jailbroken qwen3 8B in free tier, 32B in paid) May quickly hit capacity since it's all locally run for privacy.

Jordan-117•17h ago

Seems shortsighted to offer something like this with zero information about how it works. If your target market is privacy-conscious, slapping a "Privacy by Design" badge and some vague promises is probably not very convincing. (Also, the homepage claims "Your conversations remain on your device and are never sent to external servers," yet the ProductHunt page says "All requests are handled locally on my own farm and all data is burned" -- which is it?)

PoachedEggs•1d ago

I wonder how this will square with business customers in the healthcare space that OpenAI signed a BAA with.

ianks•1d ago

This ruling is unbelievably dystopian for anyone that values a right to privacy. I understand that the logs will be useful in the occasional conviction, but storing a log of people’s most personal communications is absolutely not a just trade.

To protect their users from the this massive overreach, OpenAI should defy this order and eat the fines IMO.

yard2010•1d ago

It's almost rigged. Either they are keeping the data (and ofc making money out of it) or deleting it destroying the evidence of the crimes they're committing..

imiric•1d ago

This is a moot issue. OpenAI and all AI service providers already use all user-provided data for improving their models, and it's only a matter of time until they start selling it to advertisers, if they don't already. Whether or not they actually delete chat conversations is irrelevant.

Anyone concerned about their privacy wouldn't use these services to begin with. The fact they are so popular is indicative that most people value the service over their privacy, or simply don't care.

wongarsu•23h ago

Plenty of service providers (including OpenAI) offer you the option to kindly ask them not to, and will even contractually agree not to use or sell your data if you want such an agreement.

Yes, they want to use everyone's data. But they also want everyone as a customer, and they can't have both at once. Offering people an opt-out is a popular middle-ground because the vast majority of people don't care about it, and those that do care are appeased

malwrar•22h ago

They will do it when they need the money and/or feel they have the leverage for precisely the same reason that 99% of people won’t care. It’s better to assume they’re just sitting on your data and waiting until they can get away with using it.

imiric•22h ago

That's nice. How can a user verify whether they fully comply with those contracts?

They have every incentive not to, and no oversight to hold them accountable if they don't. Do you really want to trust your data is safe based on a pinky promise from a company?

grumpyinfosec•20h ago

You sue them and win damages? Courts tend to uphold contracts at face value.

thewebguyd•20h ago

> The fact they are so popular is indicative that most people value the service over their privacy, or simply don't care.

Or, the general populace just doesn't understand the actual implications. The HN crowd can be guilty of severely overestimating the average person's tech literacy, and especially their understanding of privacy policies and ToS. Many may think they are OK with it, but I'd argue it's because they don't understand the potential real-world consequences of such privacy violations.

imiric•2h ago

> Or, the general populace just doesn't understand the actual implications.

That might've been the case in the first generations of ad-supported business models on the web. But after two decades, even non-technical users have understood the implications of "free" services.

IME talking to non-technical people about this topic, I can't remember the last time someone mentioned not being aware of the ToS and privacy policies they agree to, even if they likely hadn't read the legalese. Whereas the most common excuses I've heard are "I have nothing to hide", and "I don't use it often".

So I think you're underestimating the average person's tech literacy. I'm sure people who who still don't understand the implications exist, but they're in the minority.

romanovcode•22h ago

This has nothing to do with convictions of criminals but everything with CIA gathering profiles every single person they can.

outside1234•1d ago

Using ChatGPT to skirt paywalls? That’s the reason for this?

darkoob12•1d ago

It should be possible to inquire about a fabrication of AI models.

Let's say someone creates Russian propaganda with thesemodels or create fraudulent documents.

JimDabell•1d ago

Should a word processor keep records of all the text it ever edited in case it was used to create propaganda? What about Photoshop?

darkoob12•1d ago

But we are living in a different world. Now, these tools can create material more compelling than reality.

iammrpayments•1d ago

I see a lot of successful people on social media saying that they share their whole life to chatGPT using voice before sleeping. I wonder what they think about this.

JCharante•1d ago

Disgusting move from the NYT

mseri•1d ago

Some more details here: https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

And here are the links to the court irders and responses if you are curious: https://social.wildeboer.net/@jwildeboer/114530814476876129

xivzgrev•1d ago

I use a made up email for chat gpt, fully expecting that a) openai saves all my conversations and b) it will someday get hacked

DaSHacka•1d ago

This, and I ensure to anonymize any information I feed it, just in case. (Mostly just swapping out names / locations for placeholders)

ppsreejith•1d ago

Doesn't chat gpt require a phone number?

dlivingston•21h ago

Is there a service available to request API keys for LLMs? Not directly from OpenAI or Anthropic, where your real identity / organization is tied to the key, but some third-party service that acts as a "VPN" / proxy?

karlkloss•1d ago

In theory, they could get into serious legal trouble, if a user or chatgpt writes child pornography. The posession alone is a felony in many countries, even if it is completely fictional work. As are links to CP sites.

KnuthIsGod•1d ago

Palantir will put the logs to "good" use...

At least the Chinese are open about their authoritarian system and constant snooping on users.

Time to switch to Chinese AI. It can't be any worse than using American AI.

romanovcode•22h ago

Objectively it's better - Chinese authorities cannot prosecute you and only way you will get into trouble if the Chinese are sharing the data with USA.

bravesoul2•1d ago

What about Azure? API?

Anyway the future is open models.

adwawdawd•1d ago

Always good to remember that you can't destroy data that you didn't create in the first place.

BrtByte•1d ago

Feels like the court's going with a "just in case" fishing expedition, and it's landing squarely on end users. Hope there's a better technical or legal compromise coming

theyinwhy•1d ago

If I read this correctly, I personally can be found guilty of copyright infringement because of my chat with the AI? Why am I to blame for the answers the AI provides? Can someone elaborate, what am I missing?

DaSHacka•1d ago

Easiest way for both the rights holders and AI megacorpos to both be happy is to push all the responsiblity onto the average joe who can't afford as expensive lawyers/lobbyists.

paxys•22h ago

No you aren't reading this correctly

rwyinuse•1d ago

No point using OpenAI when so many safer alternatives exist. Mistral AI, for instance, is based in Europe, in my experience their models work just as well OpenAI.

badsectoracula•1d ago

AFAIK companies can pay Mistral to install their AI on their premises too, so they can have someone to ~~blame~~ provide support too :-P

DaSHacka•1d ago

> No point using OpenAI when so many safer alternatives exist.

And ironically, this now includes the Chinese AI companies too.

Old bureaucratic fogeys will be the death of this nation.

PeterStuer•1d ago

Isn't this a normal part of lawfull intercept and data retention regulation?

Why would that not apply to LLM chat services?

mkbkn•1d ago

Does that mean that sooner or later US-based LLM companies will also be required to do so?

alpineman•1d ago

Cancelling my NYTimes subscription - they are massively overreaching by pushing for this

cedws•1d ago

Not that it makes it any better but I wouldn’t be surprised if the NSA had a beam splitter siphoning off every byte going to OpenAI already. Don’t send sensitive data.

exq•1d ago

The former head of the NSA is on the board. I'd be more surprised if the feds WEREN'T siphoning off every byte by now.

atoav•1d ago

Ever since the patriot act I operate under the assumption that if a service is located in the US it is eventually already in acrive collaboration in by the US government or in the process of being forced to do so. Therefore any such service is out of the question for work-related stuff.

junon•1d ago

Maybe this is in TFA but I can't see it, does this affect European users? Are they only logging everything in the US?

glookler•1d ago

I don't know, but the claim pertains to damages from helping users bypass paywalls. Assuming a European can pay for many US sites this isn't a situation where the location of the user relates to the basis.

junon•2h ago

US sites that service European users must adhere to GDPR themselves, or block access. Those are the rules. If OpenAI is adhering to a US court order that violated the GDPR for European users that's going to cause a huge uproar.

HPsquared•1d ago

Worldwide, I think.

singularity2001•1d ago

Doesn't the EU pose very strict privacy requirements now? I mean NSA knows more about you than your mom and the Stasi ever dreamt of but that's not part of the court order.

msgodel•1d ago

How is anyone surprised by this?

mjgant•1d ago

I wonder how this affects Microsoft's privacy policy for Azure Open AI?

https://learn.microsoft.com/en-us/legal/cognitive-services/o...

DrScientist•1d ago

It's a day to day legal reality - that normal businesses live with every day - that if you are in any sort of legal dispute, particularly over things like IP, that legal hold ( don't delete ) get's put on anything that's likely relevant.

It's entirely reasonable, and standard practice for courts to say 'while the legal proceedings is going on don't delete potential evidence relevant to the case'.

More special case whining from tech bro's - who don't seem to understand the basic concepts of fairness or justice.

baalimago•1d ago

What does "slam" even mean in this context..?

seb1204•1d ago

Click on it. To be honest I'm totally over the over hyped sensational headlines I see on most post or pages.

romanovcode•21h ago

"slam" usually means that the article was written using AI.

josh2600•1d ago

End to end encryption of these systems cannot come soon enough!

eterm•1d ago

Why would e2e help here? The other end is the one that's ordered to preserve the logs.

rkagerer•1d ago

This highlights a significance today's cloud-addicted generation seems to completely ignore: who has control of your data.

I'm not talking about contractual control (which is largely mooted as pretty much every cloud service has a ToS that's grossly skewed toward their own interests over yours, with clauses like indemnifications, blanket grants to share your data with "partners" without specifying who they are or precisely what details are conveyed, mandatory arbitration, and all kinds of other exceptions to what you'd consider respectful decency), but rather where your data lives and is processed.

If you truly want to maintain confidence it'll remain private, don't send it to the cloud in the first place.

fireflash38•1d ago

Yeah, I see a ton of people all up in arms about privacy but ignoring that OpenAI doesn't give a rats ass about others privacy (see: scraping).

Like why one good other bad?

daveoc64•1d ago

If something is able to be scraped, it isn't private.

There is no technical reason for chats people have with ChatGPT or any similar service to be available on the web to everyone, so there is no way for them to be scraped.

brookst•22h ago

It’s not zero sum. I can believe that openai does not take privacy seriously enough and also that I don’t want every chat I’ve ever had with their product to be entered into the public record.

“If one is good the other must be good” is far too simplistic thinking to apply to a situation like this.

fireflash38•21h ago

I personally just can't fathom the logic that sending something so private and critical to OpenAI is ok, but to have courts view it is not? Like if it's so private, why in hell would you give it to a company that has shown that it cares not at all about others privacy?

brookst•20h ago

Interesting. It seems obvious to me.

I’ve asked ChatGPT medical things that are private but not incriminating or anything, because I trust ChatGPT’s profit motive to just not care about my individual issues. But I would be pretty irritated if the government stepped in and mandated they make my searches public and linkable to me.

Are you perhaps taking an absolutist view where anything less than perfect attention to all privacy is the same as making all logs of everyone public?

Xelynega•17h ago

> But I would be pretty irritated if the government stepped in and mandated they make my searches public and linkable to me.

Who is calling for this? Are you perhaps taking an absolutist view where "not destroying evidence" is the same as "mandated they make my searches public and linkable to me"? That's quite ridiculous.

brookst•12h ago

Discovery routinely leaks. Handing over every chat from every user to opposing council has both human, technical, and incentive issues that make it far more likely that something I told ChatGPT with an understanding of its privacy limitations will appear in a torrent.

ahmeneeroe-v2•16h ago

This seems like an unhelpful extension of the word "privacy". Scraping is something, but it is mostly not a privacy violation.

jpadkins•23h ago

the post does not reflect the reality that it is not 'your data'*. When you use a service provider, it's their data. They may give you certain rights to influence your usage or payment of their service, but if it's not on machines you control then it's not your data.

*my legal argument is "possession is 9/10ths of the law"

grafmax•23h ago

Framing this as a moralized issue of “addiction” on the part of consumers naturalizes the structural cause of the problem. Concentrated wealth benefits from the cloud capital consumers generate. It’s this group that has most of the control over how our data is collected. These companies reduce the number and quality of choices available to us. Blaming consumers for choosing the many conveniences of cloud data when the incentive structure has been carefully tailored to funnel our data into their possession and control is truly a superficial take.

keybored•20h ago

Well put. And generalizes to most consumer-blaming.

bunderbunder•16h ago

And this kind of consumer-blaming ultimately serves the interests of the very people who benefit most from the status quo, but shifting attention away from them and toward people who are easy to pick on but ultimately have very little control over the situation. For most people, opting out of the cloud is tantamount to opting out of modern society.

I can't even get important announcements from my kids' school without signing up for yet another cloud service.

lxgr•11h ago

> Blaming consumers for choosing the many conveniences of cloud data when the incentive structure has been carefully tailored to funnel our data into their possession and control is truly a superficial take.

Couldn't have said it better.

Just consider Apple as an example: Some time ago, they used to sell the Time Capsule, a local-first NAS built for wireless Time Machine backups. Today, not only has the Time Capsule been discontinued, but it's outright impossible to make local backups of iOS devices (many people's primary computing devices!) without a computer and a USB cable.

Considering Apple's resources, it would take negligible effort to add a NAS backup feature to iOS with polished UX ("simply tap your phone on your Time Capsule 2.0 to pair it, P2P Wi-Fi faster than your old USB cable" etc.) – but they won't, because now it's all about "services": Why sell a Gigabyte once if you can lease it out and collect rent for it every month instead?

KaiserPro•23h ago

> If you truly want to maintain confidence it'll remain private, don't send it to the cloud in the first place.

I mean yes. but if you host it, then you'll be taken to court to hand that data over. Which means that you'll have less legal talent at your disposal to defend against it.

lcnPylGDnU4H9OF•23h ago

> but if you host it, then you'll be taken to court to hand that data over.

Not in this case. The Times seems to be claiming that OpenAI is infringing rather than any particular user. If one does not use OpenAI then their data is not subject to this.

throwaway290•22h ago

You don't have control over your data in the eyes of these guys... this was clear as soon as they started training their LLM on it without asking you

nashashmi•22h ago

We have shifted over to SaaS so much for convenience that we have lost sight of “our control”.

I imagine a 90s era software industry for today’s tech world: person buys a server computer, person buys an internet connection with stable ip, person buys server software boxes to host content on the internet, person buys security suite to firewall access.

Where is the problem in this model? Aging computers? Duplicating computing hardware for little use? Unsustainable? Not green/efficient?

SpaceL10n•21h ago

> who has control of your data

As frustrating as it is, the answer seems to be everyone and no one. Data in some respects is just an observation. If I walk through a park, and I see someone with red hair, I just collected some data about them. If I see them again, perhaps strike up a conversation, I learn more. In some sense, I own that data because I observed it.

On the other other hand, I think most decent people would agree that respecting each other's right to privacy is important. Should the owner of the red hair ask me to not share personal details about them, I would gladly accept, because I personally recognize them as the owner of the source data. I may possess an artifact or snapshot of that data, but it's their hair.

In a digital world where access controls exist, we have an opportunity to control the flow of our data through the public space. Unfortunately, a lot of work is still needed to make this a reality...if it's even possible. I like the Solid Project for it's attempt to rewrite the internet to put more control in the hands of the true data owners. But, I wonder if my observation metaphor is still possible even in a system like Solid.

tarr11•20h ago

How do consumers utilize expensive compute resources in this model? Eg, H100 GPUs.

sneilan1•19h ago

It's not developer's common interest to develop local services first. People build cloud services for a profit so they can be paid. However, sometimes developers need to build their portfolios (or out of pure interest) so they make software that runs locally anyway. A lot of websites can easily be ran on people's computers from a data perspective but it's a lot of work to get that data in the first place and make it useable. I don't think people are truly "cloud-addicted". I think they simply do not have other choices.

sinuhe69•1d ago

Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Billion people use the internet daily. If any organization suspects some people use the Internet for illicit purposes eventually against their interests, would the court order the ISP to log all activities of all people? Would Google be ordered to save the search of all its customers because some might use it for bad things? And once we start, where will we stop? Crimes could happen in the past or in the future, will the court order the ISP and Google to retain the logs for 10 years, 20 years? Why not 100 years? Who should bear the cost for such outrageous demands?

The consequences of such orders are of enormous impact the puny judge can not even begin to comprehend. Privacy right is an integral part of the freedom of speech, a core human right. If you don’t have private thoughts, private information, anybody can be incriminated against them using these past information. We will cease to exist as individuals and I argue we will cease to exist as human as well.

fireflash38•1d ago

In your arguments for privacy, do you consider privacy from OpenAI?

rvnx•1d ago

Cut a joke about ethics and OpenAI

ethersteeds•1d ago

He is what now?! That is a risible claim.

nindalf•1d ago

He was being facetious.

ethersteeds•23h ago

Alas, it was too early

maest•22h ago

Original comment, lest the conversation chain does not make sense

> Sam Altman is the most ethical man I have ever seen in IT. You cannot doubt he is vouching and fighting for your privacy. Especially on YCombinator website where free speech is guaranteed.

humpty-d•23h ago

I fail to see how saving all logs advances that cause

hshdhdhj4444•23h ago

Because this is SOP in any judicial case?

Openly destroying evidence isn’t usually accepted by courts.

brookst•22h ago

Is there any evidence of import that would only be found in one single log among billions? The fact that NYT thinks that merely sampling 1% of logs would not support their case is pretty damning.

fluidcruft•21h ago

I don't know anything about this case but it has been alleged that OpenAI products can be coaxed to return verbatim chunks of NYT content.

brookst•20h ago

Sure, but if that is true, what is the evidentiary difference between preserving 10 billion conversations and preserving 100,000 and using sampling and statistics to measure harm?

fluidcruft•20h ago

The main differences seem to be that it doesn't require the precise form of the queries to be known a priori and that it interferes with the routine destruction of evidence via maliciously-compliant mealy-mouthed word games, for which the tech sector has developed a significant reputation.

Furthermore there is no conceivable harm resulting from requiring evidence to be preserved for an active trial. Find a better framing.

ToValueFunfetti•20h ago

No conceivable harm in what sense? It seems obvious that it is harmful for a user who requests and is granted privacy to then have their private messages delivered to NYT. Legally it may be on shakier ground from the individual's perspective, but OpenAI argues that the harm is to their relationship with their customers and various governments, as well as the cost of the implementation effort:

>For OpenAI, risks of breaching its own privacy agreements could not only "damage" relationships with users but could also risk putting the company in breach of contracts and global privacy regulations. Further, the order imposes "significant" burdens on OpenAI, supposedly forcing the ChatGPT maker to dedicate months of engineering hours at substantial costs to comply, OpenAI claimed. It follows then that OpenAI's potential for harm "far outweighs News Plaintiffs’ speculative need for such data," OpenAI argued.

sib•19h ago

>> It seems obvious that it is harmful for a user who requests and is granted privacy to then have their private messages delivered to NYT.

This ruling is about preservation of evidence, not (yet) about delivering that information to one of the parties.

If judges couldn't compel parties to preserve evidence in active cases, you could see pretty easily that parties would aggressively destroy evidence that might be harmful to them at trial.

There's a whole later process (and probably arguments in front of the judge) about which evidence is actually delivered, whether it goes to the NYT or just to their lawyers, how much of it is redacted or anonymized, etc.

baobun•23h ago

It's a honeypot from the beginning y'all

capnrefsmmat•23h ago

Courts have always had the power to compel parties to a current case to preserve evidence. (For example, this was an issue in the Google monopoly case, since Google employees were using chats set to erase after 24 hours.) That becomes an issue in the discovery phase, well after the defendant has an opportunity to file a motion to dismiss. So a case with no specific allegation of wrongdoing would already be dismissed.

The power does not extend to any of your hypotheticals, which are not about active cases. Courts do not accept cases on the grounds that some bad thing might happen in the future; the plaintiff must show some concrete harm has already occurred. The only thing different here is how much potential evidence OpenAI has been asked to retain.

lcnPylGDnU4H9OF•23h ago

So then the courts need to find who is setting their chats do be deleted and order them to stop. Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs. OpenAI is doing the responsible thing here.

capnrefsmmat•23h ago

OpenAI is the custodian of the user data, so they are responsible. If you wanted the court (i.e., the plaintiffs) to find specific infringing chatters, first they'd have to get the data from OpenAI to find who it is -- which is exactly what they're trying to do, and why OpenAI is being told to preserve the data so they can review it.

happyopossum•22h ago

So the courts should start ordering all ISPs, browsers, and OSs to log all browsing and chat activity going forward, so they can find out which people are doing bad things on the internet.

Vilian•20h ago

Or you didn't read what was written by the other comment, or are just arguing in bad faith, what's even weierder because the guy was only explaining how the the system always worked

lovich•19h ago

If those entities were custodians in charge of the data at hand in the court case, the court would order that.

This post appears to be full of people who aren’t actually angry at the results of this case but angry at how the US legal system has been working for decades, possibly centuries since I don’t know when this precedent was first set

scarab92•19h ago

Is it not valid to be concerned about overly broad invasions of privacy regardless of how long such orders have been occurring?

Retric•18h ago

What privacy specifically? The courts have always been able to compel people to recount things they know which could include a conversation between you and your plumber if it was somehow related to a case.

The company records and uses this stuff internally, retention is about keeping information accurate and accessible.

Lawsuits allow in a limited context the sharing of non public information held by individuals/companies in the lawsuit. But once you submit something to OpenAI it’s now there information not just your information.

nickff•16h ago

I think that some of the people here dislike (or are alarmed by) the way that the court can compel parties to retain data which would otherwise have vanished into the ether.

lelanthran•4h ago

> I think that some of the people here dislike (or are alarmed by) the way that the court can compel parties to retain data which would otherwise have vanished into the ether.

Maybe so, but this has always been the case for hundreds of years.

After all, how on earth do you propose having getting fair hearing if the other party is allowed to destroy the evidence you asked for in your papers?

Because this is what would happen:

You: Your Honour, please ask the other party to turn over all their invoices for the period in question

Other Party: We will turn over only those invoices we have

*Other party goes back to the office and deletes everything.

The thing is, once a party in a suit asks for a certain piece of evidence, the other party can't turn around and say "Our policy is to delete everything, and our policy trumps the orders of this court".

lovich•18h ago

It’s not private. You handed over the data to a third party.

dragonwriter•17h ago

Its not an “invasion of privacy” for a company who already had data to be prohibited from destroying it when they are sued in a case where that data is evidence.

dogleash•17h ago

Yeah, sure. But understanding the legal system tells us the players and what systems exist that we might be mad at.

For me, one company obligated to retain business records during civil litigation against another company, reviewed within the normal discovery process is tolerable. Considering the alternative is lawlessness. I'm fine with it.

Companies that make business records out of invading privacy? They, IMO, deserve the fury of 1000 suns.

rodgerd•13h ago

If you cared about your privacy, why are you handing all this stuff to Sam Altman? Did he represent that OpenAI would be privacy-preserving? Have they taken any technical steps to avoid this scenario?

dragonwriter•17h ago

No, they should not.

However, if the ISP, for instance, is sued, then it (immediately and without a separate court order) becomes illegal for them to knowingly destroy evidence in their custody relevant to the issue for which they are being sued, and if there is a dispute about their handling of particular such evidence, a court can and will order them specifically to preserve relevant evidence as necessary. And, with or without a court order, their destruction of relevant evidence once they know of the suit can be the basis of both punitive sanctions and adverse findings in the case to which the evidence would have been relevant.

lelanthran•4h ago

> So the courts should start ordering all ISPs, browsers, and OSs to log all browsing and chat activity going forward, so they can find out which people are doing bad things on the internet.

Not "all", just the ones involved in a current suit. They already routinely do this anway (Party A is involved in a suit and is ordered to retain any and all evidence for the duration of the trial, starting from the first knowledge that Party A had of the trial).

You are mischaracterising what happens; you are presenting it as "Any court, at any time can order any party who is not involved in any suit in that sourt to forever hold user data"

That is not what is happening.

dragonwriter•19h ago

> So then the courts need to find who is setting their chats do be deleted and order them to stop.

No, actually, it doesn't. Ordering a party to stop destroying evidence relevant to a current case (which is its obligation even without a court order) irrespective of whether someone else asks it to destroy that evidence is both within the well-established power of the court, and routine.

> Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs.

OpenAI is the alleged infringer in the case.

IAmBroom•16h ago

Under this theory, if a company had employees shredding incriminating documents at night, the court would have to name those employees before ordering them to stop.

That is ridiculous. The company itself receives that order, and is IMMEDIATELY legally required to comply - from the CEO to the newest-hired member of the cleaning staff.

MeIam•23h ago

Time does not need user logs to prove such a thing if it was true. Times can show that it is possible so they can show how their own users can access the text. Why would they need other user's data?

KaiserPro•23h ago

> Time does not need user logs to prove such a thing if it was true.

No it needs to show how often it happens to prove a point of how much impact its had.

MeIam•22h ago

Why would that matter, if people didn't use it as much, does it mean that it doesn't matter if there were few people?

delusional•22h ago

You have to argue damages. It actually has to have cost NYT some money, and for that you need to know some extent.

MeIam•21h ago

We don't even know if Times uses AI to get information from other sources either. They can get a hint of news and then produce their material.

cogman10•20h ago

OpenAI is also entitled to discovery. They can literally get every email and chat the times has and require from this point on they preserve such logs

delusional•19h ago

Who cares? That's not a legal argument and it doesn't mean anything to this case.

lovich•19h ago

Oh, I was unaware that Times was inventing a novel technology with novel legal questions.

It’s very impressive they managed to do such innovation in their spare time while running a newspaper and site

KaiserPro•14h ago

> We don't even know if Times uses AI to get information from other sources either

which is irrelevant at this stage. Its a legal principle that both sides can fairly discover evidence. As finding out how much openAI has infringed copyright is pretty critical to the case, they need to find out.

After all, if its only once or twice, thats a couple of dollars, if its millions of times, that hundreds of millions

dragonwriter•17h ago

> Why would that matter

Because its a copyright infringement case, so existence and the scale of the infringement is relevant to both whether there is liability and, if so, how much; the issue isn't that it is possible for infringement to occur.

dragonwriter•19h ago

> Times can show that it is possible

The allegation is not that merely that infringement is possible; the actual occurrence and scale are relevant to the case.

mandevil•18h ago

For the most part (there are a few exceptions), in the US lawsuits are not based on "possible" harm but actual observed harm. To show that, you need actual observed user behavior.

golol•21h ago

So if Amazon sues Google, claiming that it is being disadvantaged in search rankings, a court should be able to force Google to log all search activity, even when users delete it?

saddist0•21h ago

It can be just anonymised search history in this case.

mattnewton•20h ago

That sounds impossible to do well enough without being accused of tampering with evidence.

Just erasing the userid isn’t enough to actually anonymize the data, and if you scrubbed location data and entities out of the logs you might have violated the court order.

Though it might be in our best interests as a society we should probably be honest about the risks of this tradeoff; anonymization isn’t some magic wand.

Macha•17h ago

We found that one was a bad idea in the earliest days of the web when AOL thought "what could the harm be?" about turning over anonymised search queries to researchers.

dogleash•17h ago

How did you go from a court order to persevere evidence and jump to dumping that data raw into the public record?

Courts have been dealing with discovery including secrets that litigants never want to go public for longer than AOL has existed.

dragonwriter•17h ago

> It can be just anonymised search history in this case.

Depending on the exact issues in the case, a court might allow that (more likely, it would allow only turning over anonymized data in discovery, if the issues were such that that there was no clear need for more) but generally the obligation to preserve evidence does not include the right to edit evidence or replace it with reduced-information substitutes.

cogman10•20h ago

Yes. That's how the US court system works.

Google can (and would) file to keep that data private and only the relevant parts would be publicly available.

A core aspect to civil lawsuits is everyone gets to see everyone else's data. It's that way to ensure everything is on the up and up.

lxgr•12h ago

A great model – in a world without the Internet and LLMs (or honestly just full text search).

dragonwriter•19h ago

If Amazon sues Google, a legal obligation to preserve all evidence reasonably related to the subject of the suit attaches immediately when Google becomes aware of the suit, and, yes, if there is a dispute about the extent of that obligation and/or Google's actual or planned compliance with it, the court can issue an order relating to it.

monetus•14h ago

At Google's scale, what would be the hosting costs of this I wonder. Very expensive after a certain point, I would guess.

nobody9999•14h ago

>At Google's scale, what would be the hosting costs of this I wonder. Very expensive after a certain point, I would guess.

Which would be chump change[0] compared to the costs of an actual trial with multiple lawyers/law firms, expert witnesses and the infrastructure to support the legal team before, during and after trial.

[0] https://grammarist.com/idiom/chump-change/

dragonwriter•19h ago

> Courts have always had the power to compel parties to a current case to preserve evidence.

Not just that, even without a specific court order parties to existing or reasonably anticipated litigation have a legal obligation that attaches immediately to preserve evidence. Courts tend to issue orders when a party presents reason to believe another party is out of compliance with that automatic obligation, or when there is a dispute over the extent of the obligation. (In this case, both factors seem to be in play.)

btown•14h ago

Lopez v. Apple (2024) seems to be a recent and useful example of this; my lay understanding is that Apple was found to have failed in its duty to switch from auto-deletion (even if that auto-deletion was contractually promised to users) to an evidence-preservation level of retention, immediately when litigation was filed.

https://codiscovr.com/news/fumiko-lopez-et-al-v-apple-inc/

https://app.ediscoveryassistant.com/case_law/58071-lopez-v-a...

Perhaps the larger lesson here is: if you don't want your service provider to end up being required to retain your private queries, there's really no way to guarantee it, and the only real mitigation is to choose a service provider who's less likely to be sued!

(Not a lawyer, this is not legal advice.)

resource_waste•23h ago

>Privacy right is an integral part of the freedom of speech, a core human right.

Are these contradictory?

If you overhear a friend gossiping, can't you spread that gossip?

Also, where are human rights located? I'll give you a microscope.(sorry, I'm a moral anti-realist/expressivist and I can't help myself)

152132124•22h ago

I think you will have a better time arguing with a LLM

mrtksn•23h ago

>Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Probably because they bothered to pursue such a thing and hundreds of millions people did not.

How do you conclusively know if someone's content generating machine infringe with your rights? By saving all of its input/output for investigation.

It's ridiculous, sure but is it less ridiculous than AI companies claiming that the copyrights shouldn't apply to them because it will be bad for their business?

IMHO those are just growth pain. Back in the day people used to believe that the law don't apply on them because they did it on the internet and they were mostly right because the laws were made for another age. Eventually the laws both for criminal stuff and copyright caught up. Will be the same for AI, now we are in the wild west age of AI.

TimPC•22h ago

AI companies aren't seriously arguing that copyright shouldn't apply to them because "it's bad for business". The main argument is that they qualify for fair use because their work is transformative which is one of the major criteria for fair use. Fair use is the same doctrine that allows a school to play a movie for educational purposes without acquiring a license for the public performance of that movie. The original works don't have model weights and can't answer questions or interact with a user so the output is substantially different from the input.

mrtksn•22h ago

Yeah, and the online radio providers argued that they don’t do anything shady, their service was basically just a very long antenna.

Anyway, the laws were not written with this type of processing in mind. In fact the whole idea of intellectual property breaks down now. Just like the early days of the internet.

AStonesThrow•21h ago

> allows a school to play a movie

No, it doesn’t. Play 10% of a movie for the purpose of critiquing it, perhaps.

https://fairuse.stanford.edu/overview/fair-use/four-factors/

Fair Use is not an a priori exemption or exception; Fair Use is an “affirmative defense” so once you have your day in court and the judge asks your attorney why you needed to play 10% of Priscilla, Queen of the Desert for your Gender Studies class, then you can run down those Four Factors enumerated by the Stanford article.

Particularly “amount and substantiality”.

Teachers and churches get tripped up by this all the time. But I’ve also been blessed with teachers who were very careful academically and sought to impart the same caution on all students about using copyrighted materials. It is not easy when fonts have entered the chat!

The same reason you or your professor cannot show/perform 100% of an unlicensed film under any circumstance, is the same basis that creators are telling the scrapers that they cannot consume 100% of copyrighted works on that end. And if the risks may involve reproducing 87% of the same work in their outputs, that’s beyond the standard thresholds.

c256•21h ago

> Fair use is the same doctrine that allows a school to play a movie for educational purposes without acquiring a license for the public performance of that movie. This is a pretty bad example, since fair use has been ruled to NOT allow this.

arcfour•19h ago

What Scrooge sued a school for exhibiting a film for educational purposes?!

kitified•18h ago

Whether a school was actually sued over this is not relevant to whether it is legally allowed.

mandevil•18h ago

It is a bad example, but not for that reason. Instead, it's a bad example because Federal copyright law has a specific carve out for school educational purposes:

https://www.copyright.gov/title17/92chap1.html#110 "Notwithstanding the provisions of section 106, the following are not infringements of copyright:

(1) performance or display of a work by instructors or pupils in the course of face-to-face teaching activities of a nonprofit educational institution, in a classroom or similar place devoted to instruction, unless, in the case of a motion picture or other audiovisual work, the performance, or the display of individual images, is given by means of a copy that was not lawfully made under this title, and that the person responsible for the performance knew or had reason to believe was not lawfully made;"

That is why it is not a good comparison with the broader Fair Use Four Factors test (defined in section 107: https://www.copyright.gov/title17/92chap1.html#107) because it doesn't need to even get to that analysis, it is exempted from copyright.

no_wizard•20h ago

If AI companies don’t want the court headaches they should instead preemptively negotiate with rights holders and get agreements in place for the sharing of data.

arcfour•19h ago

Feels like bad faith to say that knowing full well that

1. This would also be a massive legal headache,

2. It would become impossibly expensive

3. We obviously wouldn't have the AI we have today, which is an incredible technology (if immature) if this happened. Instead the growth of AI would have been strangled by rights holders wanting infinity money because they know once their data is in that model, they aren't getting it back, ever—it's a one-time sale.

I'm of the opinion that AI is and will continue to be a net positive for society. So I see this as essentially saying "let's go an remove this and delay the development of it by 10-20 years and ensure people can't train and run their own models feasibly for a lot longer because only big companies can afford real training datasets."

allturtles•19h ago

Why not simply make your counterargument rather than accusing GP of being in bad faith? Your argument seems to be that it's fine to break the law if the net outcome for society is positive. It's not "bad faith" to disagree with that.

arcfour•19h ago

But they didn't break the law. The NYT articles were not algorithms/AI.

It's bad faith because they are saying "well, they should have done [unreasonable thing]". I explored their version of things from my perspective (it's not possible) and from a conciliatory perspective (okay, let's say they somehow try to navigate that hurdle anyways, is society better off? Why do I think it's infeasible?)

allturtles•18h ago

If they didn't break the law, your pragmatic point about outcomes is irrelevant. Open AI is in the clear regardless of whether they are doing something great or something useless. So I don't honestly know what you're trying to say. I'm not sure why getting licenses to IP you want to use is unreasonable, it happens all the time.

Edit: Authors Guild, Inc. v. Google, Inc. is a great example of a case where a tech giant tried to legally get the rights to use a whole bunch of copyrighted content (~all books ever published), but failed. The net result was they had to completely shut off access to most of the Google Books corpus, even though it would have been (IMO) a net benefit to society if they had been able to do what they wanted.

bostik•18h ago

> Your argument seems to be that it's fine to break the law if the net outcome for society is positive.

In any other context, this would be known as "civil disobediance". It's generally considered something to applaud.

For what it's worth, I haven't made up my mind about the current state of AI. I haven't yet seen an ability for the systems to perform abstract reasoning, to _actually_ learn. (Show me an AI that has been fed with nothing but examples in languages A and B. Then demonstrate, conclusively, that it can apply the lessons it has learned in language M, which happens to be nothing like the first two.)

allturtles•18h ago

> In any other context, this would be known as "civil disobediance". It's generally considered something to applaud.

No, civil disobedience is when you break the law expecting to be punished, to force society to confront the evil of the law. The point is that you get publicly arrested, possibly get beaten, get thrown in jail. This is not at all like what Open AI is doing.

nobody9999•14h ago

>I'm of the opinion that AI is and will continue to be a net positive for society. So I see this as essentially saying "let's go an remove this and delay the development of it by 10-20 years and ensure people can't train and run their own models feasibly for a lot longer because only big companies can afford real training datasets."

Absolutely. Which, presumably, means that you're fine with the argument that your DNA (and that of each member of your family) could provide huge benefits to medicine and potentially save millions of lives.

But significant research will be required to make that happen. As such, we will be requiring (with no opt outs allowed) you and your whole family to provide blood, sperm and ova samples weekly until that research pays off. You will receive no compensation or other considerations other than the knowledge that you're moving the technology forward.

May we assume you're fine with that?

mandevil•17h ago

https://www.copyright.gov/title17/92chap1.html#110 seems to this non-lawyer to be a specific carve out allowing movies to be shown, face-to-face, in non-profit educational contexts without any sort of license. The Fair Use Four Factors test (https://www.copyright.gov/title17/92chap1.html#107) isn't even necessary in this example.

Absent a special legal carve-out, you need to get judges to do the Fair Use Four Factors test, and decide on how AI should be treated. To my very much engineer and not legal eye, AI does great on point 3, but loses on points 1, 2, and 4, so it is something that will need to be decided by the judges, how to balance those four factors defined in the law.

freejazz•14h ago

That's not entirely true. A lot of their briefing refers to how impractical and expensive it would be to license all the content they need for the models.

rodgerd•13h ago

> AI companies aren't seriously arguing that copyright shouldn't apply to them because "it's bad for business".

AI companies have, in fact, said that the law shouldn't apply to them or they won't make money. That is literally the argument Nick Clegg is using to ague that copyright protection should be removed from authors and musicians in the UK.

shkkmo•21h ago

> It's ridiculous, sure but is it less ridiculous than AI companies claiming that the copyrights shouldn't apply to them because it will be bad for their business?

Since that wasn't ever a real argument, your strawman is indeed ridiculous.

The argument is that requiring people to have a special license to process text with an algorithm is a dramatic expansion of the power of copyright law. Expansions of copyright law will inherently advantage large corporate users over individuals as we see already happening here.

New York Times thinks that they have the right to spy on the entire world to see if anyone might be trying to read articles for free.

That is the problem with copyright. That is why copyright power needs to be dramatically curtailed, not dramatically expanded.

piombisallow•23h ago

Regardless of the details of this specific case, the courts are not democratic and do not decide based on the interest of the parties or how many they are, they decide based on the law.

brookst•22h ago

This is not true even in the slightest.

The law is not a deterministic computer program. It’s a complex body of overlapping work and the courts are specifically chartered to use judgement. That’s why briefs from two parties in a dispute will often cite different laws and precedents.

For instance, Winter v. NRDC specifically says that courts must consider whether an injunction is in the public interest.

piombisallow•21h ago

"public interest" is a much more ambiguous thing than the written law

otterley•21h ago

Yes. And, that's why both sides will make their cases to the court as to whether the public interest is served by an injunction, and then the court will make a decision based on who made the best argument.

DannyBee•23h ago

Lawyer here

First - in the US, privacy is not a constitutional right. It should be, but it's not. You are protected against government searches, but that's about it. You can claim it's a core human right or whatever, but that doesn't make it true, and it's a fairly reductionist argument anyway. It has, fwiw, also historically not been seen as a core right for thousands of years. So i think it's a harder argument to make than you think despite the EU coming around on this. Again, I firmly believe it should be a core right, but asserting that it is doesn't make that true.

Second, if you want the realistic answer - this judge is probably overworked and trying to clear a bunch of simple motions off their docket. I think you probably don't realize how many motions they probably deal with on a daily basis. Imagine trying to get through 145 code reviews a day or something like that. In this case, this isn't the trial, it's discovery. Not even discovery quite yet, if i read the docket right. Preservation orders of this kind are incredibly common in discovery, and it's not exactly high stakes most of the time. Most of the discovery motions are just parties being a pain in the ass to each other deliberately. This normally isn't even a thing that is heard in front of a judge directly, the judge is usually deciding on the filed papers.

So i'm sure the judge looked at it for a few minutes, thought it made sense at the time, and approved it. I doubt they spent hours thinking hard about the consequences.

OpenAI has asked to be heard in person on the motion, i'm sure the judge will grant it, listen to what they have to say, and determine they probably fucked it up, and fix it. That is what most judges do in this situation.

pama•22h ago

Thanks. As an EU citizen am I exempt from this order? How does the judge or the NYTimes or OpenAI know that I am an EU citizen?

ElevenLathe•21h ago

The court in question has no obligations to you at all.

jjani•16h ago

OpenAI does, by virtue of doing business in the EU.

adgjlsfhk1•21h ago

you aren't and they don't.

mananaysiempre•17h ago

The current legal stance in the US seems to be that you, not being a US person, have no particular legally protected interest in privacy at all, so you have nothing to complain about here and can’t even sue. The only avenue the EU would have to change that is the diplomatic one, but the Commission does not seem to care.

HardCodedBias•21h ago

"First - in the US, privacy is not a constitutional right"

What? The supreme court disagreed with you in Griswold v. Connecticut (1965) and Roe v. Wade (1973).

While one could argue that they were vastly stretching the meaning of words in these decisions the point stands that at this time privacy is a constitutional right in the USA.

krapp•21h ago

¯\_(ツ)_/¯ The supreme court overturned Roe v. Wade in 2022 and explicitly stated in their ruling that a constitutional right to privacy does not exist.

DannyBee•20h ago

Yes. They went further and explicitly make the distinction between the kind of privacy we are talking about here ("right to shield information from disclosure"), and the kind they saw as protected in griswold, lawrence, and roe ("right to make and implement important personal decisions without governmental interference").

DannyBee•21h ago

Roe v. wade is considered explicitly overruled, as well as considered wrongly decided in the first place, as of 2022 (Dobbs).

They also explicitly stated a constitutional right to privacy does not exist, and pointed out that Casey abandoned any such reliance on this sort of claim.

Griswold also found a right to marital privacy. Not general privacy.

Griswold is also barely considered good law anymore, though i admit it has not been explicitly overruled - it is definitely on the chopping block, as more than just Thomas has said.

In any case, more importantly, none of them have found any interesting right to privacy of the kind we are talking about here, but instead more specific rights to privacy in certain contexts. Griswold found a right to marital privacy in "the penumbra of the bill of rights". Lawrence found a right to privacy in your sexual activity.

In dobbs, they explicitly further denied a right to general privacy, and argued previous decisions conflated these: " As to precedent, citing a broad array of cases, the Court found support for a constitutional “right of personal privacy.” Id., at 152. But Roe conflated the right to shield information from disclosure and the right to make and implement important personal decisions without governmental interference."

You are talking about the former, which none of these cases were about. They are all about the latter.

So this is very far afield from a general right to privacy of the kind we are talking about, and more importantly, one that would cover anything like OpenAI chats.

So basically, you have a ~200 year period where it was not considered a right, and then a 50 year period where specific forms of privacy were considered a right, and now we are just about back to the former.

The kind of privacy we are talking about here ("the right to shield information from disclosure") has always been subject to a balancing of interests made by legislatures, rather than a constitutional right upon which they may not infringe. Example abound - you actually don't have to look any further than court filings themselves, and when you are allowed to proceed anonymously or redact/file things under seal. The right to public access is considered much stronger than your right to not want the public to know embarassing or highly private things about your life. There are very few exceptions (minors, etc).

Again, i don't claim any of this is how it is should be. But it's definitely how it is.

HardCodedBias•20h ago

"Dobbs. They also explicitly stated a constitutional right to privacy does not exist"

I did not know this, thank you!

sib•19h ago

I'd like to thank you for explaining this so clearly (and for "providing receipts," as the cool kids say).

>> Again, i don't claim any of this is how it is should be. But it's definitely how it is.

Agreed.

shkkmo•19h ago

> It has, fwiw, also historically not been seen as a core right for thousands of years. So i think it's a harder argument to make than you think despite the EU coming around on this.

This doesn't seem true. I'd assume you know more about this than I do though so can you explain this in more detail? The concept of privacy is definitely more than thousands of years old. The concept of a "human right", is arguably much newer. Do you have particular evidence that a right to privacy is a harder argument to make that other human rights?

While the language differs, the right to privacy is enshrined more or less explicitly in many constitutions, including 11 USA states. It isn't just a "european" thing.

static_motion•18h ago

I understand what they mean. There's this great video [1] which explains it in better terms than I ever could. I've timestamped the link because it's quite long, but if you've got the time it's a fantastic video with a great narrative and presentation.

[1] https://youtu.be/Fzhkwyoe5vI?t=4m9s

ComposedPattern•19h ago

> It has, fwiw, also historically not been seen as a core right for thousands of years.

Nothing has been seen as a core right for thousands of years, as the concept of human rights is only a few hundred years old.

tiahura•19h ago

While the Constitution does not explicitly enumerate a "right to privacy," the Supreme Court has consistently recognized substantive privacy rights through Due Process Clause jurisprudence, establishing constitutional protection for intimate personal decisions in Griswold v. Connecticut (1965), Lawrence v. Texas (2003), and Obergefell v. Hodges (2015).

zerocrates•14h ago

Even in the "protected against government searches" sense from the 4th Amendment, that right hardly exists when dealing with data you send to a company like OpenAI thanks to the third-party doctrine.

fluidcruft•22h ago

A pretty clear distinction is that all ISPs in the world are not currently involved in a lawsuit with New York Times and are not accused of deleting evidence. What OpenAI is accused of is significantly different from merely agnostically routing packets between A and B. OpenAI is not raising astronomical funds because they operate as an ISP.

tailspin2019•21h ago

> Privacy right is an integral part of the freedom of speech

I completely agree with you, but as a ChatGPT user I have to admit my fault in this too.

I have always been annoyed by what I saw as shameless breaches of copyright of thousands of authors (and other individuals) in the training of these LLMs, and I've been wary of the data security/confidentiality of these tools from the start too - and not for no reason. Yet I find ChatGPT et al so utterly compelling and useful, that I poured my personal data[0] into these tools anyway.

I've always felt conflicted about this, but the utility just about outweighed my privacy and copyright concerns. So as angry as I am about this situation, I also have to accept some of the blame too. I knew this (or other leaks or unsanctioned use of my data) was possible down the line.

But it's a wake up call. I've done nothing with these tools which is even slightly nefarious, but I am today deleting all my historical data (not just from ChatGPT[1] but other hosted AI tools) and will completely reassess my approach of using them - likely with an acceleration of my plans to move to using local models as much as I can.

[0] I do heavily redact my data that goes into hosted LLMs, but there's still more private data in there about me than I'd like.

[1] Which I know is very much a "after the horse has bolted" situation...

CamperBob2•19h ago

Keeping in mind that the purpose of IP law is to promote human progress, it's hard to see how legacy copyright interests should win a fight with AI training and development.

100 years from now, nobody will GAF about the New York Times.

stackskipton•19h ago

IP law was to promote human progress by giving financial incentive to create this IP knowing it was protected, and you could make money off it.

CamperBob2•17h ago

We will all make a lot more money and a lot more progress by storing, organizing, presenting, and processing knowledge as effectively as possible.

Copyright is not a natural right by any measure; it's something we pulled out of our asses a couple hundred years ago in response to a need that existed at the time. To the extent copyright interferes with progress, as it appears to have sworn to do, it has to go.

Sorry. Don't shoot the messenger.

diputsmonro•13h ago

Why would you expect NYT or any other news organization to report accurate data to feed into your AI models if they can't make any money off of it?

It's not just about profits, it's about paying reporters to do honest work and not cut corners in their reporting and data collection.

If you think the data is valuable, then you should be prepared to pay the people who collect it, same as you pay for the service that collates it (ChatGPT)

CamperBob2•11h ago

I wish I knew what the eventual business model will look like, but I don't. A potential guess might be to consider what MSNBC was, or was supposed to be -- a joint venture between Microsoft and NBC network news, where the idea was to take advantage of the emerging WWW to get a head start on everyone else. The pie-in-the-sky synergies that were promised never materialized, so the outcome just amounted to a new name for an old-media player. As it turned out, the business of gathering and delivering news and editorial content didn't change much at all. It just migrated from paper and screens to, well, screens.

Now, as you point out, companies like OpenAI have a problem, and so do the rest of us. Fair compensation for journalists and editors requires attribution before anything else can even be negotiated, and AI literally transforms its input into something that is usually (but obviously not always) untraceable. For the big AI players, the solution to that problem might involve starting or acquiring news and content networks of their own. Synergies that Microsoft and NBC were hoping might materialize could actually be feasible now.

So to answer your question, maybe ChatGPT will end up paying journalists directly.

Again, I don't know how plausible that kind of scenario might turn out to be. But I am absolutely certain that countries that allow their legacy rightsholders to impede progress in AI are going to be outcompeted by those with less to lose.

tailspin2019•7h ago

Copyright is the thing that allows software companies to sell their products and make money. It’s not just about “knowledge”.

I sometimes wonder if people commenting on this topic on HN really understand how fundamental copyright as a concept is to the entire tech industry. And indeed even to capitalism itself.

stale2002•14h ago

But the main point is the human progress here. If there is an obvious case where it seriously gets in the way of human progress, then thats a problem and I hope we can correct it through any means necessary.

cactusplant7374•20h ago

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

It simply didn't. ChatGPT hasn't deleted any user data.

> "OpenAI did not 'destroy' any data, and certainly did not delete any data in response to litigation events," OpenAI argued. "The Order appears to have incorrectly assumed the contrary."

It's a bit of a stretch to think a big tech company like ChatGPT is deleting users' data.

blackqueeriroh•5h ago

This is incorrect. As someone who has had the opportunity to work in several highly=regulated industries, companies do not want to hold onto extra data about you that they don’t have to unless their business is selling that data.

OpenAI already has a business, and not one they want to violate by having a massive amount of customer data stolen if they get hacked.

cactusplant7374•2h ago

The article and OpenAI themselves contradict you. Do you work at OpenAI?

huijzer•20h ago

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Well maybe some people in power have pressured the court into this decision? The New York Times surely has some power as well via their channels

dogman144•20h ago

You raise good points but the target of your support feels misplaced. Want private ai? You must self-host and inspect if it’s phoning home. No way around it in my view.

Otherwise, you are picking your data privacy champions as the exact same companies, people and investors that sold us social media, and did something quite untoward with the data they got. Fool me twice, fool me three times… where is the line?

In other words - OAI has to save logs now? Candidly they probably were already, or it’s foolish not to assume that.

jrm4•20h ago

Love the spirit of what you say and I practice it myself, literally.

But also, no - Just self-host or it's all your fault is never ever a sufficient answer to the problem.

It's exactly the same as when Exxon says "what are you doing to lower your own carbon footprint?" It's shifting the burden unfairly; companies like OpenAI put themselves out there and thus must ALWAYS be held to task.

naming_the_user•20h ago

Anything else is literally impossible, though.

If you send your neighbour nudes then they have your nudes. You can put in as many contracts as you want, maybe they never digitised it but their friend is over for a drink and walks out of the door with the shoebox of film. Do not pass GO, do not collect.

Conceivably we can try to control things like e.g. is your cellphone microphone on at all times, but once someone else, particularly an arbitrary entity (e.g. not a trusted family member or something) has the data, it is silly to treat it as anything other than gone.

dogman144•19h ago

I actually agree with your disagreement, and my answer is more scoped to a technical audience that has the know how base to deal with it.

I wish it was different and I agree that there’s a massive accountability hole with… who could it be?

Pragmatically it is what it is, self host and hope for bigger picture change.

lovich•19h ago

Then your problem is with the US legal system, not this individual ruling.

You lose your rights to privacy in your papers without a warrant once you hand data off to a third party. Nothing in this ruling is new.

dragonwriter•19h ago

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Because the law favors preservation of evidence for an active case above most other interests. It's not a matter of arbitrary preference by the particular court.

trod1234•19h ago

It doesn't, it favors longstanding caselaw and laws already on the books.

There is a longstanding precedent with regards to business document retention, and chat logs have been part of that for years if not decades. The article tries to make this sound like this is something new, but if you look at the e-retention guidelines in various cases over the years this is all pretty standard.

For a business to continue operating, they must preserve business documents and related ESI upon an appropriate legal hold to avoid spoliation. They likely weren't doing this claiming the data was deleted, which is why the judge ruled in favor against OAI.

This isn't uncommon knowledge either, its required. E-discovery and Information Governance are things any business must meet in this area; and those documents are subject to discovery in certain cases, where OAI likely thought they could avoid it maliciously.

The matter here is OAI and its influence rabble are churning this trying to do a runaround on longstanding requirements that any IT professional in the US would have reiterated from their legal department/Information Governance policies.

There's nothing to see here, there's no real story. They were supposed to be doing this and didn't, were caught, and the order just forces them to do what any other business is required to do.

I remember an executive years ago (decades really), asking about document retention, ESI, and e-discovery and how they could do something (which runs along similar lines to what OAI tried as a runaround). I remember the lawyer at the time saying, "You've gotta do this or when it goes to court you will have an indefensible position as a result of spoliation...".

You are mistaken, and appear to be trying to frame this improperly towards a point of no accountability.

I suggest you review the longstanding e-discovery retention requirements that courts require of businesses to operate.

This is not new material, nor any different from what's been required for a long time now. All your hyperbole about privacy is without real basis, they are a company; they must comply with law, and it certainly is not outrageous to hold people who break the law to account, and this can only occur when regulatory requirements are actually fulfilled.

There is no argument here.

References: Federal Rules of Civil Procedure (FRCP) 1, 4, 16, 26, 34, 37

There are many law firms who have written extensively on this and related subjects. I encourage you to look at those too.

(IANAL) Disclosure: Don't take this as legal advice. I've had the opportunity to work with quite a few competent ones, but I don't interpret the law; only they can. If you need someone to provide legal advice seek out competent qualified counsel.

rolandog•19h ago

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Can't you use the same arguments against, say, Copyright holders? Billionaires? Corporations doing the Texas two-step bankruptcy legal maneuver to prevent liability from allegedly poisoning humanity?

I sure hope so.

Edit: ... (up to a point)

deadbabe•18h ago

OpenAI is a business selling a product, it’s not a decentralized network of computers contributing spare processing power to run massive LLMs. Therefore, you can easily point a finger at them and tell them to stop some activity for which they are the sole gatekeeper.

oersted•17h ago

I completely agree with you. But perhaps we should be more worried that OpenAI or Google can retain all this data and do pretty much what they want with it in the first place, without a judge getting into the picture.

wat10000•11h ago

ChatGPT isn’t like an ISP here. They are being credibly accused of basing their entire business on illegal activity. It’s more like if The Pirate Bay was being sued. The alleged infringement is all they do, and requiring them to preserve records of their users is pretty reasonable.

kragen•1d ago

Copyright in its current form is incompatible with private communication of any kind through computers, because computers by their nature make copies of the communication, so it makes any private communication through a computer into a potential crime, depending on its content. The logic of copyright enforcement, therefore, demands access to all such communications in order to investigate their legality, much like the Stasi.

Inevitably such a far-reaching state power will be abused for prurient purposes, for the sexual titillation of the investigators, and to suppress political dissent.

6stringmerc•23h ago

This is a ludicrous assertion and factually inaccurate beyond all practical intelligence.

A computer in service of an individual absolutely follows copyright because the creator is in control of the distribution and direction of the content.

Besides, copyright is a civil statute, not criminal. Everything about this comment is the most obtuse form of FUD possible. I’m pro copyright reform, but this is “Uncle off his meds ranting on Facebook” unhinged and shouldn’t be given credence whatsoever.

kragen•23h ago

None of that is correct. Some of it is not even wrong, demonstrating an unbelievably profound ignorance of its topic. Furthermore, it is gratuitously insulting.

pjc50•23h ago

> Besides, copyright is a civil statute, not criminal

Nope. https://www.justia.com/intellectual-property/copyright/crimi...

malwrar•22h ago

> A computer in service of an individual absolutely follows copyright because the creator is in control of the distribution and direction of the content.

I don’t understand what means. A computer in service of an individual turns copyright law into mattress tag removal law—practically unenforceable.

FrustratedMonky•23h ago

I thought it was a given that they were saving all logs, to in turn use for training.

strogonoff•22h ago

Anyone who seriously cared about their privacy would not be using any of the commercial LLM offerings. This is the greatest honeypot for profile building the ad tech ever had.

mritchie712•22h ago

you're worried about ad targeting?

bgwalter•22h ago

Ad targeting, user profiling etc. Post Snowden we can reasonably assume that the NSA will get an interface.

strogonoff•21h ago

Goes hand in hand with everything else, like price discrimination, including by insurance companies, which in all likelihood are not required to disclose that the reason your health insurance premiums are up is because you asked ChatGPT how to quit smoking.

bgwalter•22h ago

I've criticized the NYT and paywalls many times myself, but first of all OpenAI itself has all your data and we know how "delete" functions work in other places.

The Twitter users quoted by Ars Technica, who cite "boomer copyright concerns" are pretty short sighted. The NYT and other mainstream sources, with all their flaws, provide the historical record that pundits can use to discuss issues.

Glenn Greenwald can only point out inconsistencies of the NYT because the NYT exists. It is often the starting point for discussions.

Some YouTube channels like the Grayzone and Breaking Points send reporters directly to press conferences etc. But that is still not the norm and important information should not be stored in a disorganized YouTube soup.

So papers like the NYT need to survive for democracy to function.

mathgradthrow•21h ago

can we ban slam as a headline word?

Tteriffic•21h ago

Times is taking a risk. The costs of all this will fall on them, if they don’t get the judgement they sought at the end of the day. Plus OpenAI controls those costs and could drive them up. Plus any future litigation by OpenAI users suffering damages due to this could arguably be brought against Time years forward. It’s an odd strategy on their part for evidence that could have just been adduced by a statistician (maybe).

segmondy•21h ago

... and this is why I run local models. my data, my data.

HardCodedBias•21h ago

We have really lost the thread WRT our legal system when district court judges have such wide ranging power. I understand that everything can be appealed, but these judges can cause considerable harm.

Ona T. Wang (she/her) ( https://www.linkedin.com/in/ona-t-wang-she-her-a1548b3/ ) would have a difficult time getting a job at OpenAI but she is given the full force of US law to direct the company in almost anyway she sees fit.

The wording is quite explcit, and forceful:

Accordingly, OpenAI is NOW DIRECTED to preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court (in essence, the output log data that OpenAI has been destroying), whether such data might be deleted at a user’s request or because of “numerous privacy laws and regulations” that might require OpenAI to do so.

Again, I have no idea how to fix this, but it seems broken.

nateglims•11h ago

That seems a little apocalyptic. She's are a magistrate judge and is making sure evidence isn't destroyed. After the conclusion of the case they will be able to delete the data. The legal system handles this stuff by allowing you to bring in experts which seems a lot more reasonable than every judge being an expert in technical details AND the law.

b0a04gl•20h ago

llm infra isn’t even built for that kind of retention. none of it’s optimised for long-tail access, audit-safe replay, or scoped encryption. feels like the legal system’s modelling chat like it’s email. it’s not. it’s stateful compute with memory glued to a token stream.

robomartin•20h ago

From the article:

"In the filing, OpenAI alleged that the court rushed the order based only on a hunch raised by The New York Times and other news plaintiffs. And now, without "any just cause," OpenAI argued, the order "continues to prevent OpenAI from respecting its users’ privacy decisions." That risk extended to users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI said."

This is the consequence and continuation of the dystopian reality we have been living for many years. One where a random person, influencer, media outlet, politician attacks someone, a company or an entity to cause harm and even total destruction (losing your job, company boycott, massive loss of income, reputation destruction, etc.). This morning, on CNBC, Palantir's CEO discussed yet another false accusation made against the company by --surprise-- the NY Times, characterizing it as garbage. Prior to that was the entirety of the media jumping on Elon Musk accusing him of being a Nazi for a gesture used by dozens and dozens of politicians and presenters, most recently Corey Booker.

Lies and manipulation. I do think that people are waking up to this and massively rejecting professional mass manipulators. We now need to take the next step and have them suffer real legal consequences for constant lies and, for Picard's sake, also address any influence they might have over the courts.

ryeguy_24•19h ago

Would Microsoft have to comply with this also? Most enterprise users are acquiring LLM services through Microsoft's instance of the models in Azure? (i.e. data is not going to Open AI but enterprise gets to use Open AI models)

anbende•18h ago

My (not a lawyer) understanding is "no", because Microsoft is not administering the model (making available the chatbot and history logging), not retaining chats (typically, unless you configure it specifically to do this), and any logs or history are only retained on the customer's servers or tenant.

Accessing information on a customer's server or tenant (I have been assured) would require a court order for the customer directly.

But... as an 365 E5 user with an Azure account using the 4o through Foundry... I am much more nervous than I ever have been.

1vuio0pswjnm7•19h ago

Perhaps OpenAI should not be collecting sensitive data in the first place. Who knows what they are using it for.

Whereas parties to litigation that receive sensitive data are subject to limits on how it can be used.

dsign•18h ago

This is a hard blow for OpenAI; I can see my employer scrambling to terminate their contract with them because of this. It could be a boom to Mistral.AI though.

thanatropism•18h ago

Ctrl-F doesn't seem to show anyone remembering the Ballad of Scott Alexander.

There's no reasonable narrative in which OpenAI are not villains, but NYT is notoriously one to shoot a man in Reno just to see him die.

heisenbit•17h ago

Considering the use to research medical information or to self sooth psychological conditions through chats and their association with a real person it can get interesting as these domains have fairly stringent need to know rules attached with criminal liabilities.

qintl55•16h ago

Classic example of "courts/lawmakers do NOT understand tech". I don't know why this is still as true as it was 10-20 years ago. You get such weird rulings that are maybe well-intentioned but are so off base on actual impact.

next_xibalba•14h ago

[flagged]

tomhow•8h ago

You can't comment like this on Hacker News. Please read the guidelines and make an effort to observe them, particularly these ones:

Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

Please don't fulminate. Please don't sneer, including at the rest of the community.

Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

Eschew flamebait. Avoid generic tangents. Omit internet tropes.

Please don't use Hacker News for political or ideological battle. It tramples curiosity.

https://news.ycombinator.com/newsguidelines.html

rimeice•14h ago

I’m curious what you’re asking ChatGPT to get verbatim NYT articles in the response that you then want to delete? If ChatGPT has being doing this, the default is to keep every chat, so I doubt NYT lawyers need the small portion of deleted info to find evidence if it exists.

Ofcourse if OpenAI was scanning your chat history for verbatim NYT text and editing and deleting that would be another thing, but that itself would also get noticed.

hendersoon•10h ago

This is an insane ruling. Asking them to save logs for specific users with is of course perfectly reasonable with a court order. Asking them to save all logs is absurd.

neuroelectron•8h ago

Everyone knows OpenAI is guilty here but they need to prove it. That's what the logs are for. The huge online astroturfing on this subject clearly illustrates how well OpenAI has automated this part of their defense (astroturfing, public outcry).

However, I find it unlikely the OpenAI hasn't already built filters to prevent their output from appearing to be regurgitated NYTimes content.

EasyMark•7h ago

Well then they should fight it all the way to SCOTUS, they have the money. It will pay dividend in the long run as it will scare off fewer people who want to use, but don't want to be surveilled for every little thing

shortsunblack•51m ago

The order mandates retention of all user data, even of non-Americans. This is a massive extraterritorial overreach and a highlight of how US law has zero regard to data protection as a fundamental human right. As if there were not enough concerns about US cloud providers already.

Swift and Cute 2D Game Framework: Setting Up a Project with CMake

Fuzzer Blind Spots (Meet Jepsen)

Czech Republic: Petition for open source in public administration

Self-hosting your own media considered harmful according to YouTube

The X.Org Server just got forked (announcing XLibre)

Jepsen: TigerBeetle 0.16.11

Freight rail fueled a new luxury overnight train startup

The impossible predicament of the death newts

Tokasaurus: An LLM inference engine for high-throughput workloads

Test Postgres in Python Like SQLite

X changes its terms to bar training of AI models using its content

Show HN: Claude Composer

How we’re responding to The NYT’s data demands in order to protect user privacy

Show HN: Air Lab – A portable and open air quality measuring device

What a developer needs to know about SCIM

APL Interpreter – An implementation of APL, written in Haskell (2024)

Aether: A CMS That Gets Out of Your Way

Seven Days at the Bin Store

Defending adverbs exuberantly if conditionally

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations

Infomaniak comes out in support of controversial Swiss encryption law

AMD Radeon 8050S "Strix Halo" Linux Graphics Performance Review

Open Source Distilling

SkyRoof: New Ham Satellite Tracking and SDR Receiver Software

I made a search engine worse than Elasticsearch (2024)

Digital Minister wants open standards and open source as guiding principle

The Universal Tech Tree

Converge (YC S23) Well-capitalized New York startup seeks product developers

Show HN: Lambduck, a Functional Programming Brainfuck

Autonomous drone defeats human champions in racing first

Swift and Cute 2D Game Framework: Setting Up a Project with CMake

Fuzzer Blind Spots (Meet Jepsen)

Czech Republic: Petition for open source in public administration

Self-hosting your own media considered harmful according to YouTube

The X.Org Server just got forked (announcing XLibre)

Jepsen: TigerBeetle 0.16.11

Freight rail fueled a new luxury overnight train startup

The impossible predicament of the death newts

Tokasaurus: An LLM inference engine for high-throughput workloads

Test Postgres in Python Like SQLite

X changes its terms to bar training of AI models using its content

Show HN: Claude Composer

How we’re responding to The NYT’s data demands in order to protect user privacy

Show HN: Air Lab – A portable and open air quality measuring device

What a developer needs to know about SCIM

APL Interpreter – An implementation of APL, written in Haskell (2024)

Aether: A CMS That Gets Out of Your Way

Seven Days at the Bin Store

Defending adverbs exuberantly if conditionally

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations

Infomaniak comes out in support of controversial Swiss encryption law

AMD Radeon 8050S "Strix Halo" Linux Graphics Performance Review

Open Source Distilling

SkyRoof: New Ham Satellite Tracking and SDR Receiver Software

I made a search engine worse than Elasticsearch (2024)

Digital Minister wants open standards and open source as guiding principle

The Universal Tech Tree

Converge (YC S23) Well-capitalized New York startup seeks product developers

Show HN: Lambduck, a Functional Programming Brainfuck

Autonomous drone defeats human champions in racing first

OpenAI slams court order to save all ChatGPT logs, including deleted chats

Comments