That’ll be the day. But even if it does happen, major AI players have the resources to move to a more ‘flexible’ country, if there isn’t a loophole that involves them closing their eyes really really tight while outsourced webscrapers collect totally legit and not illegally obtained data
And the fact that the litigation was over copyright is an insignificant detail. It could have been anything. Literally anything, like a murder investigation, for example. It only helps OpenAI here, because it's easy to say "nobody cares about copyright", and "nobody cares about murder" sounds less defendable.
Anyway, the issue here is not copyright, nor "AI", it's the venerated legal system, which very much by design allows for a single woman to decide on a whim, that a company with millions of users must start collecting user data, while users very much don't want that, and the company claims it doesn't want that too (mostly, because it knows how much users don't want that: otherwise it'd be happy to). Everything else is just accidental details, it really has nothing to do neither with copyright, nor with "AI".
>> Wang apparently thinks the NY Times' boomer copyright concerns trump the privacy of EVERY @OpenAI USER—insane!!! -- someone on twitter
> Apparently not having your shit stolen is a boomer idea now.
It's surprising to me, because you'd think a site like Ars would attract a generally more knowledgable audience, but reading through their comment section feels like looking at Twitter or YouTube comments -- various incendiary and unsubstantial hot takes.
On the other end: while copyright has been perverted over the centuries, the goal is still overall to protect small inventors. They have no leverage otherwise and this gives them some ability to fight if they aren't properly compensated. I definitely do not want it abolished outright. Just reviewed and reworked for modern times.
Google and others fought it pretty hard
It's a publicity stunt, ordered by executives. If you think OpenAI is doing this out of principle, you're nuts.
Disclaimer: I'm not Chinese. But I recognise crass hypocrisy when I see it.
Bodily slavery or mental slavery ... take your pick.
Whenever I "delete" a social media account, or "Trash" anything on a cloud storage provider, I repeat the mantra, "revoking access for myself!" which may be sung to the tune of "Evergreen".
In the second case, you can choose to trust or distrust the cloud storage provider. Trust being backed by contractual obligations and the right to sue if those obligations are not met. Of course, most EULAs for consumer products are toothless is this respect. On the other hand, that doesn't prevent companies from offering contracts which have some teeth (which they may do for business clients).
It’s not that simple: this command already exists, it’s called `shred`, and as the manual[1] notes:
The shred command relies on a crucial assumption: that the file system and hardware overwrite data in place. Although this is common and is the traditional way to do things, many modern file system designs do not satisfy this assumption.
[1] https://www.gnu.org/software/coreutils/manual/html_node/shre...
Deletion of data is achieved by permanently discarding the encryption key which is stored and managed elsewhere where secure erasure can be guaranteed.
If implemented honestly, this procedure WORKS and cloud storage is secure. Yes the emphasis is on the "implemented honestly" part but do not generalize cloud storage as inherently insecure.
> This chat won't appear in history, use or update ChatGPT's memory, or be used to train our models. For safety purposes, we may keep a copy of this chat for up to 30 days.
The "may keep a copy" is doing a lot of work in that sentence.
The government also says thank you for your ChatGPT logs.
This makes no sense to me. Shouldn't we address the damage before it's done vs handwringing after the fact?
I didn't think any of the ongoing "fair use" lawsuits had reached a conclusion on that.
The purpose of training in many of the AI Labs being sued mostly matches the conditions that Ross Intelligence was found to have violated, and the question of copying is almost guaranteed if they trained on it.
[1] Thomson Reuters Enterprise Centre GmbH et al v. ROSS Intelligence Inc. https://www.ded.uscourts.gov/sites/ded/files/opinions/20-613...
We'll see if the courts deem it legal but it's, without a doubt, unehtical.
Like we could be upset when that credit checking company dumped all those social security numbers on the net and had to pay the first 200k claimants a grand total of $21 for their trouble?
By that point it’s far too late.
Does not give them permission. What if LEO asks for the data? Should they hand it over just because they have it? Remember, this happens all the time with metadata from other companies (phone carriers for example). Having the data means it's possible to use it for other purposes as opposed to not possible. There is always pressure to do so both from within and outside a company.
Not unless LEO sues OpenAI while it's preserving data from the first discovery, otherwise they cannot be compelled to give up data. Nor are they allowed to violate their TOS and use the data outside of retention, despite the FUD you want to spread about it.
> Having the data means it's possible
No, it doesn't. That's not how any of this works.
In short: OpenAI's business practices caused this. They wouldn't have been sued if using legal data. They might still not have an order like this if more open about their training, like Allen Institute.
The question is whether AI itself is aware what the source is. It certainly knows the source.
Feel like they would still be great for a lot of applications like "Search my local hard drive for the file that matches this description"
When I followed up with how to save chat information for future use in the LLM's context window, I was given a rather lengthy process involving setting up an SQL database, writing some Python tp create a "pre-prompt injection wrapper"....
That's cool and all, but wishing there was something a little more "out of the box" that did this sort of thing for the "rest of us". GPT did mention Tome, LM Studio, a few others....
PLEASE NOTE: We do not engage in "profiling" or otherwise engage in automated processing of Personal Data to make decisions that could have legal or similarly significant impact on you or others.
Hopefully they don't though.
To your observation, it's certainly relevant to the situation at hand but has little to do with your original supposition. A US court can order any company that operates in the US to do anything within the bounds of US law, in the same way that an EU court can do the converse. Such an order might well make it impossible to legally do business in one or the other jurisdiction.
If OpenAI Ireland is a subsidiary it will be interesting to see to what extent the court order applies (or doesn't apply) to it. I wonder if it actually operates servers locally or if it's just a frontend that sends all your queries over to a US based backend.
People elsewhere in this comment section observed that the GDPR has a blanket carve out for things that are legally required. Seeing as compliance with a court order is legally required there is likely no issue regardless.
> What happens when you delete a chat?
> The chat is immediately removed from your chat history view.
> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless:
> It has already been de-identified and disassociated from your account, or
> OpenAI must retain it for security or legal obligations.
That final clause now voids the entire section. All chats are preserved for "legal obligations".
I regret all the personal conversations I've had with AI now. It's very enticing when you need some help / validation on something challenging, but everyone who warned how much of a privacy risk that is has been proven right.
That's why you read the whole thing? It's not exactly a long read. Do you expect them to update their docs every time they get a subpoena request?
Many things in life are "misleading" when your context window is less than 32 words[1], or can't bother to read that far.
[1] number of words required to get you to "unless", which should hopefully tip you off that not everything gets deleted.
It's like saying "we will delete your chats, unless the sun rises tomorrow". At that point, just say that the chats aren't deleted.
(The snark from your replies seems unnecessary as well.)
That's one giant cop-out.
All you had to do was delete the user_id column and you can keep the chat indefinitely.
> That risk extended to users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI said.
This seems very bad for their business.
Really, it’s funny watching from the outside and waiting for English to finally stop holding it in and get itself some sort of spelling reform to meaningfully move in a phonetic direction. My amateur impression, though, is that mandatory secondary education has made “correct” spelling such a strong social marker that everybody (not just English-speaking countries) is essentially stuck with whatever they have at the moment. In which case, my condolences to English speakers, your history really did work out in an unfortunate way.
That said, phonetic spelling reform would of course privilege the phonemes as spoken by whoever happens to be most powerful or prestigious at the time (after all, the only way it could possibly stick is if it's pushed by the sufficiently powerful), and would itself fall out of date eventually anyway.
Isn't the "a" in "have" elided along with the "h?"
Shouldn't've Should not have
What am I missing?
Also I have always liked this humorous plan for spelling reform: https://guidetogrammar.org/grammar/twain.htm
English popularity was solely and exclusively driven by its use as a lingua franca. As times change, so too will the language we speak.
There are certainly languages with even more spoken complexity - e.g. 4+ consonant clusters like "vzdr" typical of Slavic - but even so spoken English is not that easy to learn to understand, and very hard to learn to speak without a noticeable accent.
So, its something like:
For example, in Year 1 that useless letter "c" would be dropped to be [replased](replaced) either by "k" or "s", and likewise "x" would no longer be part of the alphabet.
It becomes quite useful in the later sentences as more and more reformations are applied.A phonetic respelling would destroy the languages, because there are too many dialects without matching pronunciations. Though rendering historical texts illegible, a phonemic approach would work: https://en.wiktionary.org/wiki/Appendix:English_pronunciatio... But that would still mean most speakers have 2-3 ways of spelling various vowels. There are some further problems with a phonemic approach: https://alexalejandre.com/notes/phonetic-vs-phonemic-spellin...
Here's an example of a phonemic orthography, which is somewhat readable (to me) but illustrates how many diacritics you'd need. And it still spells the vowel in "ask" or "lot" with the same ä! https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
Not only that, but since pronunciation tends to diverge over time, it will create a never-ending spelling-pronunciation drift where the same words won't be pronounced the same in, e.g. 100-200 years, which will result in future generations effectively losing easy access to the prior knowledge.
Once you switch to a phonetic respelling this is no longer a frequent problem. It does not happen, or at least happens very rarely with existing phonetic languages such as Turkish.
In the rare event that the pronunciation of a sound changes in time, the spelling doesn't have to change. You just pronounce the same letter differently.
If it's more than one sound, well, then you have a problem. But it happens in today's non-phonetic English as well (such as "gost" -> "ghost", or more recently "popped corn" -> "popcorn").
Oh, but it does. It's just the standard is held as the official form of the language and dialects are killed off through standardized education etc. To do this in English would e.g. force all Australians, Englishmen etc. to speak like an American (when in the UK different cities and social classes have quite divergent usage!) This clearly would not work and would cause the system to break apart. English exhibits very minor diaglossia, as if all Turkic peoples used the same archaic spelling but pronounced it their own ways, e.g. tāg, kök, quruq, yultur etc. which Turks would pronounce as dāg, gök, yıldız etc. but other Turks today say gurt for kurt, isderik, giderim okula... You just say they're "wrong" because the government chose a standard and (Turkic people's outside of Turkey weren't forced to use it.)
As a native English speaker, I'm not even sure how to pronounce "either" (how it should be done in my dialect) and seemingly randomly reduce sounds. We'd have to change a lot of things before being able to agree on a single right version and slowly making everyone speak like that.
Nor is it some kind of insurmountable barrier to communication. For example, Serbian, Croatian, and Bosnian are all idiolects of the same language with some differences in phonemes (like i/e/ije) and the corresponding differences in standard orthographies, but it doesn't preclude speakers from understanding each other's written language anymore so than it precludes them from understanding each other's spoken language.
are based on the exact same Štokavian dialect, ignoring Kajkavian, Čajkavian, Čakavian and Torlakian dialects. There is _no_ difference in standard orthography, because yat reflexes have nothing to do with national boundaries. Plenty of Serbs speak Ijekavian, for example. Here is a dialect map: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fc...
Your example is literally arguing that Australian English should have the same _phonetic_ orthography, even. But Australian English must have the same orthography or else Australia will no longer speak English in 2-3 generations. The difference between Australian and American English is far larger than between modern varieties of naš jezik. Australians code switches talking to foreigners while Serbs and Croats do not.
But there is, though, e.g. "dolijevati" vs "dolivati". And sure, standard Serbian/Montenegrin allows the former as well, but the latter is not valid in standard Croatian orthography AFAIK. That this doesn't map neatly to national borders is irrelevant.
If Australian English is so drastically different that Australians "won't speak English in 2-3 generations" if their orthography is changed to reflect how they speak, that would indicate that their current orthography is highly divergent from the actual spoken language, which is a problem in its own right. But I don't believe that this is correct - Australian English content (even for domestic consumption, thus no code switching) is still very much accessible to British and American English speakers, so any orthography that would reflect the phonological differences would be just as accessible.
> current orthography is highly divergent from the actual spoken language, which is a problem in its own right
The orthography is no more divergent to an Australians speech as to an American's speech, let alone a Londoner or Oxfordian. But why would it be a problem?
Sorry, I didn't mean that it would be a smooth transition. It might even be impossible. What I wrote above is (paraphrasing myself) "Once you switch to a phonetic respelling [...] pronunciation [will not] tend to diverge over time [that much]". "Once you switch" is the key.
> To do this in English would e.g. force all Australians, Englishmen etc. to speak like an American
Why? There is nothing that prevents Australians from spelling some words differently (as we currently do, e.g. colour vs color, or tyre vs tire).
Consider three English words that have survived over the multiple centuries and their respective pronunciation in Old English (OE), Middle English around the vowel shift (MidE) and modern English, using the IPA: «knight», «through» and «daughter»:
«knight»: [knixt] or [kniçt] (OE) ↝ kniçt] or [knixt] (MidE) ↝ [naɪt] (E)
«through»: [θurx] (OE) ↝ [θruːx] or [θruɣ] (MidE) ↝ [θruː] (E)
«daughter»: [ˈdoxtor] (OE) ↝ [ˈdɔuxtər] or [ˈdauxtər] (MidE) ↝ [ˈdɔːtə] (E)
It is not possible for a modern English speaker to collate [knixt] and [naɪt], [θurx] and [θruː], [ˈdoxtor] and [ˈdɔːtə] as the same word in each case.Regular re-spelling results in a loss of the linguistic continuity, and particularly so over a span of a few or more centuries.
There are occasional mixed horrors like "ptarmigan", which is a Gaelic word which was Romanized using Greek phonology, so it has the same silent p as "pterodactyl".
There's no academy of the English language anyway, so there's nobody to make such a change. And as others have said, the accent variation is pretty huge.
Just because training on data is opt out doesn't mean business can't trust it. Not the best for user's privacy though.
Like there's the algorithm by which a hedge fund is doing algorithmic trading, they'd be insane to take the risk. Then there's the code for a video game, it's proprietary, but competitors don't benefit substantially from an illicit copy. You ship the compiled artifacts to everyone, so the logic isn't that secret. Copies of the similar source code have linked before with no significant effects.
Many algo strategies are indeed programmatically simple (e.g. use some sort of moving average), but the parametrization and how it's used is the secret sauce and you don't want that information to leak. They might be tuned to exploit a certain market behavior, and you want to keep this secret since other people targeting this same behavior will make your edge go away. The edge can be something purely statistical or it can be a specific timing window that you found, etc.
It's a bit like saying that a Formula 1 engine is not that far from what you'd find in a textbook. While it's true that it shares a lot of properties with a generic ICE, the edge comes from a lot of proprietary research that teams treat as secret and definitely don't want competitors to find out.
A lot of the use is fairly mundane and basically replaces junior analysts. E.g. it's digesting and summarizing the insane amounts of research that is produced. I could ask an intern to summarize the analysis on platinum prices over the last week, and it'll take them a day. Alternatively, I can feed in all the analysis that banks produce to an LLM and have it done immediately. The data fed in is not a trade secret really, and neither is the output. What I do with the results is where the interesting things happen.
And wrapper-around-ChatGPT startups should double-check their privacy policies, that all the "you have no privacy" language is in place.
If a court orders you to preserve user data, could you be held liable for preserving user data? Regardless of your privacy policy.
This, however, is horrible for AI regardless of whether or not you can sue.
A court ordering you to stop selling pigeons doesn't mean you can keep your store for pigeons open and pocket the money without delivering pigeons.
> Legal Requirements: If required to do so by law or in the good faith belief that such action is necessary to (i) comply with a legal obligation, including to meet national security or law enforcement requirements, (ii) protect and defend our rights or property, (iii) prevent fraud, (iv) act in urgent circumstances to protect the personal safety of users of the Services, or the public, or (v) protect against legal liability.
- They are legally obligated to divulge that data to the government
- Once they do so, they are civilly liable for breach of contract, as they have committed to never divulging this data. This may trigger additional breaches of contract, as others may have not had the right to share data with a company that can share it with third parties
Your users aren't obligated to know that you're using open ai or other provider.
No, because you turn up to court and show the court order.
It's possible a subsequent case could get the first order overturned, but you can't be held liable for good faith efforts to comply with court orders.
However, if you're operating internationally, then suddenly it's possible that you may be issued competing court orders both of which are "valid". This is the CLOUD Act problem. In which case the only winning move becomes not to play.
For example, if there are business consequences for leaking customer data, you better run that LLM yourself.
Technically you could probably just run it on EC2, but then you’d still need HIPPA compliance
There are multinational corporations with heavy presence in Europe, that run their whole business on Microsoft cloud, including keeping and processing there privacy-sensitive data, business-critical data and medical data, and yes, that includes using some of this data with LLMs - hosted on Azure. Companies of this size cannot ignore regulatory compliance and hope no one notices. This only works because MS figured out how to keep it compliant.
Point being, if there are business consequences, you'll be better off using Azure-hosted LLMs than running a local model yourself - they're just better than you or me at this. The only question is, whether you can afford it.
Microsoft v. United States (https://en.wikipedia.org/wiki/Microsoft_Corp._v._United_Stat...) showed the government wants, and was willing to do whatever required, access to data held in the E.U. The passing of the CLOUD Act (https://en.wikipedia.org/wiki/CLOUD_Act) basically codified it in to law.
Of course, it could be just all talk, like all general European globalist talks, and Europe will do a 360 once a more friendly party takes over the US.
Just hyperbole, but it seems the regulations are designed with the big cloud providers in mind, but then why don't they just ban US big tech and roll out the regulations more slowly? This neoliberalism makes everything so unnecessarily complicated.
If Solows paradox is true and not the result of bad measurement, then one might expect that it could be workable without sacrificing much productivity. Certainly abandoning the cloud would be possible if the regulatory environment allowed for rapid development of alternative non-cloud solutions, as I really don't think the cloud improved productivity (besides for software developers in certain cases) and is more of a rent seeking mechanism (hot take on hacker news I'm sure, but look at any big corpo IT dept outside the tech industry and I think you will see tons of instances where modern tech like the cloud is causing more problems than it's worth productivity-wise).
Computers in general I am much less sure of and lean towards mismeasurement hypothesis. I suspect any "return to 1950" project would render a company economically less competitive (except in certain high end items) and so the EU would really need to lean on Linux hard and invest massively in domestic hardware (not a small task as the US is finding out) in order to escape the clutches of the US and/or China.
I don't think they have the political will to do it, but I would love it if they tried and proved naysayers wrong.
The European Court of Justice ruled at least twice that it doesn't matter what kind of contract they give you, and what kind of bilateral agreement there are between the US and the EU, as long as the US have the patriot act and later regulations, using Microsoft means it's violating European privacy laws.
Yes, and that's why the European Commission keeps being pushed back by the Court of Justice of the EU (the Safe Harbor was ruled out, Privacy Shield as well, and it's likely a matter of time before the CJEU kills the Data Privacy Framework as well), but when it takes 3-4 years to get a ruling and then the Commission can just make a new (illegal) framework that will last for a couple years, the violation can carry on indefinitely.
This is learned helplessness and it’s only true if you don’t put any effort into building that expertise.
You're right, I should get right to it. Plenty of time for it after work, especially if I cut down HN time.
But at least if you use an API in the same region as your customers, court order shenanigans won't get you caught between different jurisdictions.
Point 1: Every company has profit incentive to sell the data in the current political climate, all they need is a sneaky way to access it without getting caught. That includes the combo of LLM provider and Escrow non-entity.
Point 2: No company has profit incentive to defend user privacy, or even the privacy of other businesses. So who could run the Escrow service? Another business? Then they have incentive to cheat and help the LLM provider access the data anyway. The government (and which one)? Their intelligence arms want the data just as much as any company does so you're back to square one again.
"Knowledge is power" combined with "Knowledge can be copied without anyone knowing" means that there aren't any currencies presently powerful enough to convince any other entity to keep your secrets for you.
But, since, I think, there are mechanisms by which they could keep logs, but in a way they cannot access them, they could still claim you will have privacy this way - even though they have the option to keep un-encrypted log, much like they could retain the logs in the first place. So the messaging may remain pretty much the same - from "we promise to delete your logs and keep no other copies, trust us" to "we promise to 3p-encrypt your archived logs and keep no other copies, trust us".
> No company has profit incentive to defend user privacy, or even the privacy of other businesses.
> They have incentive to cheat and help the LLM provider access the data anyway
Why would a company whose role is that of a 3p escrow be incentivised to risk their reputation by doing this? If that's the case every company holding PII has the same problem.
> Their intelligence arms want the data
In the EU at least, GDPR or similar. If you explicit law breaking, that's a more general problem. But what company has a "intelligence arms" in this manner? Are you talking about another big-tech corp?
I'd say this type of cheating it's be a risky proposition from the POV from that 3pe - it'd destroy their business, and they'd be penalised heavily b/c sharing keys is pretty explicitly illegal - any company caught could maybe reduce their own punishment by providing the keys as evidence of the 3pe crime. A viable 3pe business would also need multiple client companies to be viable, so you'd need all of them to play ball - a single whistle-blower in any of them will get you caught, and again, all they need is a single key to prove your guilt.
> "Knowledge is power" combined with "Knowledge can be copied without anyone knowing" means that there aren't any currencies presently powerful enough to convince any other entity to keep your secrets for you.
On that same basis, large banks could cheat the stock market; but there is regulation in place to address that somewhat.
Maybe 3p-escrows should be regulated more, or required to register as a currently-regulated type. That said, if you want to protect data from the government, prism etc, you're SOOL, no one can stop them cheating. let's focus on big-/tech/-startup cheats.
You> But what company has a "intelligence arms" in this manner? Are you talking about another big-tech corp?
"Their" in this circumstance refers to any government that might try to back Escrow.
I've reviewed a lot of SaaS contracts over the years.
Nearly all of them have clauses that allow the vendor to do whatever they have to if ordered to by the government. That doesn't make it okay, but it means OpenAI customers probably don't have a legal argument, only a philosophical argument.
Same goes for privacy policies. Nearly every privacy policy has a carve out for things they're ordered to do by the government.
They can still have legal contracts with other companies, that stipulate that they don't train on any of their data.
Do search engine companies have this requirement as well? I remember back in the old days deanonymizing “anonymous” query logs was interesting. I can’t imagine there’s any secrecy left today.
There's a qualitative difference resulting from quantitatively much easier access (querying some database vs. having to physically look through court records) and processing capabilities (an army of lawyers reading millions of pages vs. anyone, via an LLM) that doesn't seem to be accounted for.
Of course I haven’t sent anything actually sensitive to ChatGTP, but the use of copyright law in order to enforce a stricter surveillance regime is giving very strong “Right to Read” vibes.
> each book had a copyright monitor that reported when and where it was read, and by whom, to Central Licensing. (They used this information to catch reading pirates, but also to sell personal interest profiles to retailers.)
> It didn’t matter whether you did anything harmful—the offense was making it hard for the administrators to check on you. They assumed this meant you were doing something else forbidden, and they did not need to know what it was.
It's one thing if you get pwned because a hacker broke into your servers. It is another thing if you get pwned because a hacker broken into somebody else's servers.
At this point, do we believe OpenAI has a strong security infrastructure? Given the court order, it doesn't seem possible for them to have sufficient security for practical purposes. Your data might be encrypted at rest, but who has the keys? When you're buying secure instances, you don't want the provider to have your keys...
Look at it this way. If you your phone was stolen would you want it to self destruct or keep everything? (Assume you can decide to self destruct it) clearly the latter is safer. Maybe the data has been pulled off and you're already pwned. But by deleting, if they didn't get the data they now won't be able to.
You just don't want to give adversaries infinite time to pwn you
Well, it is gonna be all _AI Companies_ very soon so unless everyone switches to local models which don't really have the same degree of profitability as a SaaS, its probably not going to kill a company to have less user privacy because tbh people are used to not having privacy these days on the internet.
It certainly will kill off the few companies/people trusting them with closed source code or security related stuff but you really should not outsource that anywhere.
For now. This is going to devolve into either "openAI has to do this, so you do too" or "we shouldn't have to do this because nobody else does!" and my money is not on the latter outcome.
> —after news organizations suing over copyright claims accused the AI company of destroying evidence.
Like, none of the AI companies are going to avoid copyright related lawsuits long term until things are settled law.
See the whole LIBOR chat business.
And how many companies have proprietary code hosted on Github?
We've always done self-hosted as old as things like gerrit and what not that aren't even really feature complete as competitors where I've worked.
They have a fair bit. Local models lets companies sell you a much more expensive bit of hardware. Once Apple gets their stuff together it could end up being a genius move to go all in on local after the others have repeated scandals of leaking user data.
As far as I understand it, this ruling does not apply to Microsoft, does it?
https://x.com/paulg/status/1913338841068404903
"It's a very exciting time in tech right now. If you're a first-rate programmer, there are a huge number of other places you can go work rather than at the company building the infrastructure of the police state."
---
So, courts order the preservation of AI logs, and government orders the building of a massive database. You do the math. This is such an annoying time to be alive in America, to say the least. PG needs to start blogging again about what's going on now days. We might be entering the digital version of the 60s, if we're lucky. Get local, get private, get secure, fight back.
This raises serious questions about the supposed "anonymization" of chat data used for training their new models, i.e. when users leave the "improve model for all users" toggle enabled in the settings (which is the default even for paying users). So, indeed, very bad for the current business model which appears to rely on present users (voluntarily) "feeding the machine" to improve it.
[0] https://cdn.arstechnica.net/wp-content/uploads/2025/06/NYT-v...
So, the NYT asked for this back in January and the court said no, but asked OpenAI if there was a way to accomplish the preservation goal in a privacy-preserving manner. OpenAI refused to engage for 5 f’ing months. The court said “fine, the NYT gets what they originally asked for”.
Nice job guys.
How is it with using openrouter?
If I have users that use OpenAI through my API keys am I responsible?
I have so many questions…
Yes. You are OpenAI's customer, and they expect you to follow their ToS. They do provide a moderation API to reject inappropriate prompts, though.
. . .
Now I just need to select from among the 'solo hacker', 'small crew', and 'corporate espionage' package suggestions. Price goes up fast, though.
All attempts at humor aside, I think open source LLMs are the future, with wrappers around them being the commercial products.
P.S. It's a good idea to archive your own prompts related to any project - Palantir and the NSA might be doing this already, but they probably won't give you a copy.
(Even has an OG motorcycle avatar, ha.)
At least it wasn't a link to a screenshot.
To my knowledge, the court is forcing the company to change its policy. The obligation isn’t broken, its terms were just changed on a going-forward basis. (Would be different if the court required preserving records predating the order.)
apple I trust as much as I trust politicians
sent from my iphone :)
I thought the entire game these guys are playing is rushing to market to collect more data to diversify their supply chain from the stolen data they've used to train their current model. Sure, certain enterprise use cases might have different legal requirements, but certainly the core product and the average "import openai"-enjoyer.
Becuase they are bound by their terms of service? Because if they won't no business would ever use their service and without businesses using their service they won't have any revenue?
Not your computer, or not your encryption keys, not your data.
There are myriad ways courts balance privacy and legal-interest concerns.
(The Times et al are alleging that OpenAI is aiding copyright violation by letting people get the text of news stories from the AI).
Does the Times believe that other people can get this text while it can't get it itself? To prove that the AI is stealing the info, Times does not need access to people's logs. All it has to show is that it can get that text.
This sounds like Citizen United again to AstroTurf and gets access to logs with a fake cause.
That logic makes no sense because if they don't get it right now then it does not mean that they will not get it in future.
If Times and its staff can get the text, is all that matters because the use and rate of data usage is not material as it can change any time in future.
Capone isn't allowed to burn his protection racket documents claiming he's protecting the privacy of the business owners who paid protection money. The Court can take steps to protect their privacy (including swearing the plaintiff to secrecy on information learned immaterial to the case, or pre-filtering the raw data via a party trusted by the Court).
She asked OpenAI's legal team to consider a ChatGPT user who "found some way to get around the pay wall" and "was getting The New York Times content somehow as the output." If that user "then hears about this case and says, 'Oh, whoa, you know I’m going to ask them to delete all of my searches and not retain any of my searches going forward,'" the judge asked, wouldn't that be "directly the problem" that the order would address?
Mastercard wouldn't get away with saying "it would be too hard to preserve evidence of our wrongdoing, so we're making sure it's all deleted".
Presumably, this same ruling will come for all AI systems soon; Gemini, Grok, etc.
Now they can’t pretend anymore.
Although keeping deleted chats is evil.
OpenAI does this as well of course. Any EU customers are going to insist on paying via an EU based entity in euros and will be talking to EU hosted LLMs with all data and logs being treated under EU law, not US law. This is not really optional for commercial use of SAAS services in the EU. To get lucrative enterprise contracts outside the US, OpenAI has no other choice but to adapt to this. If they don't, somebody else will and win those contracts.
I actually was at a defense conference in Bonn last week talking to a representative of Google Cloud. I was surprised that they were there at all because the Germans are understandably a bit paranoid about trusting US companies with hosting confidential stuff (considering some scandals a few years ago about the CIA spying on the German government a few years ago). But they actually do offer some services to the BWI, which is the part of the German army that takes care of their IT needs. And German spending on defense is of course very high right now so there are a lot of companies trying to sell in Germany, on Germany's terms. Including Google.
Did they already go that route and lose — or is this an example of caving early?
And yes, I know, I worked on the only Android/iMessage crossover project to exist, and it was clear they had multiple breaches even just in delivery as well as the well known iCloud on means all privacy is void issue.
Storing them on something that has hours to days retrieval window satisfies the court order, is cheaper, and makes me as a customer that little bit more content with it (mass data breach would take months of plundering and easily detectable).
That is probably the solution right there.
Sounds like bullshit lawyer speak. What exactly is the difference between the two?
!define would
> Used to express desire or intent -- https://www.wordnik.com/words/would
!define cannot
> Can not ( = am/is/are unable to) -- https://www.wordnik.com/words/cannot
"I will not be able to do this"
"I cannot do this"
There is no semantic or legal difference between the two, especially when coming from a tech company. Stalling and wordplay is a very common legal tactic when the side has no other argument.
https://cdn.arstechnica.net/wp-content/uploads/2025/06/NYT-v...
> I asked:
> > Is there a way to segregate the data for the users that have expressly asked for their chat logs to be deleted, or is there a way to anonymize in such a way that their privacy concerns are addressed... what’s the legal issue here about why you can’t, as opposed to why you would not?
> OpenAI expressed a reluctance for a "carte blanche, preserve everything request," and raised not only user preferences and requests, but also "numerous privacy laws and regulations throughout the country and the world that also contemplate these type of deletion requests or that users have these types of abilities."
A "reluctance to retain data" is not the same as "technically or physically unable to retain data". Judge decided OpenAI not wanting to do it was less important than evidence being deleted.
“I won’t be able to make the 5:00 dinner.” -> You could normally come, but there’s another obligation. There’s an implication that if the circumstances were different, you might be able to come.
“I cannot make the 5:00 dinner.” -> You could not normally come. There’s a rigid reason for the circumstance, and there is no negotiating it.
If you're talking to ChatGPT about being hunted by a Mexican cartel, and having escaped to your Uncle's vacation home in Maine -- which is the sort of thing a tiny (but non-zero) minority of people ask LLMs about -- that's 100% identifying.
And if the Mexican cartel finds out, e.g. because NY Times had a digital compromise at their law firm, that means someone is dead.
Legally, I think NY Times is 100% right in this lawsuit holistically, but this is a move which may -- quite literally -- kill people.
So, sure, no panacea, but .. why not for the cases where it would be a barrier?
I apologize for not having cites or a better memory at this time.
see also: https://en.wikipedia.org/wiki/Differential_privacy which alleges to solve this; that is, wiki says that the only attacks are side-channel attacks like errors in the algorithm or whatever.
If that’s too big a risk it really is time to consider locally hosted LLMs.
Even if I wrote it, I don't care if someone read out loud in public court "user <insert_hash_here> said: <insert nastiest thing you can think of here>"
I had colleagues chat with GPT, and they send all kinds of identifying information to it.
I think this is a private Mastodon instance on someone's personal website so it makes sense that it might have been overwhelmed by the traffic.
(As in, an actual article, not just a mastodon-tweet from some unknown (maybe known? Not by me) person making the title claim, with no more info.)
By 'know' I mean recognise the name as some sort of authority. I don't 'know' Jon Gruber or Sam Altman or Matt Levine, but I'll recognise them and understand why we're discussing their tweet.
The linked tweet (whatever it's called) didn't say anything more than the title did here, so it was pointless to click through really. In replies someone asked the source and someone else replied with the link I commented above. (I don't 'know' those people either, but I recognise Ars/even if I didn't appreciate the longer form with more info.)
> The linked tweet (whatever it's called)
"post" works for social media regardless of the medium; not an admonishment, an observation. Also, by the time i saw this, it was already an Ars link, leaving some comments with less context that i apparently didn't pick up on. I was able to make my observation because someone mentioned mastodon (i think), but that was an assumption on my part that the original link was mastodon.
So i asked the question to make sure it wasn't some bias against mastodon (or the fediverse), because I'd have liked to ask, "for what reason?"
> "post" works for social media regardless of the medium; not an admonishment, an observation.
It also works for professional journalism and blog-err-posts though, the distinction from which was my point.
> I was able to make my observation because someone mentioned mastodon (i think), but that was an assumption on my part that the original link was mastodon.
As for assuming/'someone' mentioning Mastodon, my own comment you initially replied to ended:
> (As in, an actual article, not just a mastodon-tweet from some unknown (maybe known? Not by me) person making the title claim, with no more info.)
Which was even the bit ('unknown') you objected to.
So, why is Safari not forced to save my web browsing history too (even of I delete it)? Why not also the "private" tabs I open?
Just OpenAI, huh?
But moreover, Safari isn't a third party, it's a tool you are using whose data is in your possession. That means that in the US things like fourth amendment rights are much stronger. A blanket order requiring that Safari preserve everyone's browsing history would be an illegal general warrant (in the US).
> opt out
alright, sympathy lost
>At a conference in January, Wang raised a hypothetical in line with her thinking on the subsequent order. She asked OpenAI's legal team to consider a ChatGPT user who "found some way to get around the pay wall" and "was getting The New York Times content somehow as the output." If that user "then hears about this case and says, 'Oh, whoa, you know I’m going to ask them to delete all of my searches and not retain any of my searches going forward,'" the judge asked, wouldn't that be "directly the problem" that the order would address?
If the user hears about this case, and now this order, wouldn't they just avoid doing that for the duration of the court order?
I don't know anyone's agenda in terms of commenters, so they'd have to be very blatant for me to use such a word.
This is the main reason I can’t use any LLM agents or post any portion of my code into a prompt window at work. We have NDAs and government regulations (like ITAR) we’d be breaking if any code left our servers.
This just proves the point. Until these tools are local, privacy will be an Achilles heal for LLMs.
Say oai implements something that makes their service 2x better. Just using it for a while should give people who live and breathe this stuff enough information to tease out how to implement something like it, and eventually it'll make it into the local-only applications, and models.
https://getdeploying.com/guides/run-deepseek-r1 this is the "how to do it"
https://news.ycombinator.com/item?id=42897205 posted here, a link to how to set it up on an AMD Epyc machine, ~$2000. IIRC a few of the comments discuss how many GPUs you'd need (a lot of the 80GB GPUs, 12-16 i think), plus the mainboards and PSUs and things. however to just run the largest deepseek you merely need memory to hold the model and the context, plus ~10% and i forget why +10% but that's my hedge to be more accurate.
note: i have not checked if LM Studio can run the large deepseek model; i can't fathom a reason it couldn't, at least on the Epyc CPU only build.
note too: I just asked in their discord and it appears "any GGUF model will load if you have the memory for it" - "GGUF" is like the format the model is in. Someone will take whatever format mistral or facebook or whoever publishes and convert it to GGUF format, and from there, someone will start to quantize the models into smaller files (with less ability) as GGUF.
If i can find it later, as i couldn't find it last night when i replied, there is an article that explains how to start adding consumer GPUs or even 1-2 Nvidia A100 80GB GPUs to the epyc build, to speed that up. I have a vague recollection that can get you up to 20t/s or thereabouts, but don't quote me on that, it's been a while.
Yup. Trivial.
But also, running LLMs locally is easy. I don't know what goes into hosting them, as a service for your org, but just getting an LLM running locally is a straightforward 30-minute task.
because the privacy aspect has nothing to do with LLMs and everything to do with relying on cloud providers. HN users have been vocal about that since long before LLMs existed.
Second, we’re going to need technology that can simply defy government orders, as digital technology expands the ability of one government order violating rights at scale. Otherwise, one judge — whether in the U.S., China, or India — can impose a sweeping decision that undermines the privacy and autonomy of billions.
Spicy. European courts and governments will love to see their laws and legal opinions being shrugged away in ironic quotes.
We need many strong AI players. This would be a great way to ensure Europe can grow its own.
The reason this doesn't happen is because of Europe's internal issues, not because of foreign competition.
It's hard/pointless to motivate engineers to use other options and their significance doesn't grow since Engineers won't blog that much about them to show their expertise, etc. Certification and experience with a provider with 10%-80% market share is a future employment reason to put up with a lot of trash, and the amount of help to work around that trash that has made it into places like ChatGPT is mindboggling.
And that's a key institution in a democracy, given the frequency with which either the executive or legislative branches try to do illegal things (defined by constitutions and/or previously passed laws).
You're right though, in a perfect world courts would be apolitical.
All systems can be bent, broken, or subverted. Still, we need to make systems which do the best within the bounds of reality.
As a lifelong independent, I can tell you that this sort of thinking is incredibly prevalent and also incredibly wrong. Even a casual look at recent history proves this. How do you define "democracy"? Most of us define it as "the will of the people". Just recently, however, when "the will of the people" has not been the will of the ruling class, the "will of the people" has been decried as dangerous populism (nothing new but something that has re-emerged recently in the so-called Western World). It is our "institutions" they argue, that are actually democracy, and not the will of the foolish people who are ignorant and easily swayed.
>All systems can be bent, broken, or subverted.
Very true, and the history of our nation is proof of that, from the founding right up to the present day.
>Still, we need to make systems which do the best within the bounds of reality.
It would be nice, but that is a long way from how things are, or have ever been (so far).
I think it's a misreading to say the government should do whatever the whim of the most vocal, gerrymandered jurisdictions are. Instead, it is a supposed to be a republic with educated, ethical professionals doing the lawmaking within a very rigid structure designed to limit power severely in order to protect individual liberty.
For me, the amount of outright lying, propaganda, blatant corruption, and voter abuse makes a claim like "democracy is the will of the most people who agree" seem misguided at best (and maybe actively deceitful).
Re reading your comment, the straw man about "democracy is actually the institutions" makes me think I may have fallen for a troll so I'm just going to stop here.
You haven't, so be assured.
>I think it's a misreading to say the government should do whatever the whim of the most vocal, gerrymandered jurisdictions are.
It shouldn't, and I didn't argue that. My argument is that the people in charge have completely disregarded the will of the people en mass for a long time, and that the people are so outraged and desperate that at this point they are willing to vote for anyone who will upend the elite consensus that refuses to change.
>Instead, it is a supposed to be a republic with educated, ethical professionals doing the lawmaking within a very rigid structure designed to limit power severely in order to protect individual liberty.
How is that working out for us? Snowden's revelations were in 2013. An infinite number of blatantly illegal and unconstitutional programs actively being carried out by various government agencies. Who was held to account? Nobody. What was changed? Nothing. Who was in power? The supposedly "good" team that respects democracy. Go watch the conformation hearing of Tulsi Gabbard from this year. Watch Democratic Senator after Democratic Senator denounce Snowden as a traitor and repeatedly demand that she denounce him as well, as a litmus test for whether or not she could be confirmed as DNI (this is not a comment on Gabbard one way or another). My original comment disputed the contention that one party was for democracy and the other party was against it. Go watch that video and tell me that the Democrats support liberty, freedom, democracy and a transparent government. I don't support either of the parties, and this is one of the many reasons why.
Most other western democracies are a lot closer to a perfect world, it seems.
Or UK where you can get locked up for blasphemy[3] or where they lock up ~30 people a day for saying offensive things online because of their Online Safety Act?[4]
Or perhaps Romania where an election that didn't turn out the way the EU elites wanted is overturned based on nebulous (and later proven false) accusation that the election was somehow influenced by a TikTok campaign by the Russians that later turned out to have been funded by a Romanian opposition party.[5]
I could go on and on, but unfortunately most other western democracies are just as flawed, if not worse. Hopefully we can all strive for a better future and flush the authoritarians, from all the parties.
[1] https://www.youtube.com/watch?v=-bMzFDpfDwc
[2] https://www.euronews.com/2023/10/19/mass-arrests-following-p...
[3] https://news.sky.com/story/man-convicted-after-burning-koran...
[4] https://www.thetimes.com/uk/crime/article/police-make-30-arr...
[5] https://www.politico.eu/article/investigation-ties-romanian-...
But is there any reason to believe that judged were pressured/compelled by political powers to make these decisions? Apart from, of course, the law created by these politicians, which is how the system is intended to work.
No, but I have every reason to believe that the judges who made these decisions were people selected by political powers so that they would make them.
>Apart from, of course, the law created by these politicians, which is how the system is intended to work.
But the system isn't working for the people, it is horribly broken. The people running the system are mostly corrupt and/or incompetent, which is why so many voters from a wide variety of countries, and across the political spectrum, are willing to vote for anyone (even people who are clearly less than ideal) that shits all over the system and promises to smash it. Because the system is currently working exactly how it's intended to work, most people hate it and nobody feels like they can do anything about it.
While folks believe all sorts of things, I don’t think anyone is going to call international relations apolitical!
I can see that factoring in in a decision to penalise an US company when it breaks EU law, US court order or not.
a) highly qualified people, even European natives move to Silicon Valley. There is a famous photo of the OpenAI core team with 6 Polish engineers and only 5 American ones;
b) culture of calculated risk when it comes to investment. Here, bankruptcy is an albatross around your neck, both legally and culturally, and is considered a sign of you being fundamentally inept instead of maybe just a misalignment with the market or even bad luck. You'd better succeed on your first try, or your options for funding will evaporate.
On risk, we're hardly the Valley, but a failed startup isn't a black mark at all. It's a big plus in most tech circles.
But in many continental countries, bankruptcy is a serious legal stigma. You will end up on public "insolvency lists" for years, which means that no bank will touch you with a 5 m pole and few people will even be willing to rent you or your new startup office space. You may even struggle to get banal contracts such as "five SIMs with data" from mobile phone operators.
There seems to be an underlying assumption that people who go bankrupt are either fatally inept or fraudsters, and need to be kept apart from the "healthy" economy in order not to endanger it.
GDPR is about personally identifiable humans. I'm not sure how critical that information really is to these models, though given the difficulty of deleting it from a trained model when found, yes I agree it poses a huge practical problem.
That's because they are obviously trained on copyrighted content but nobody wants to admit it openly because that opens them to even more legal trouble. Meanwhile China has no problem violating copyright or IP so they will gladly gobble up whatever they can.
I don't think you can really compete in this space with the EU mindset, US is playing it smart and leaving this to play out before regulating. This is why EU is not the place for these kinds of innovations, the bureaucrats and the people aren't willing to tolerate disruption.
At least in sensitive contexts (healthcare etc.) I could imagine this resulting in further restrictions, assuming the order is upheld even for European user's data.
> Paragraphs 1 and 2 shall not apply to the extent that processing is necessary:
> ...
> for the establishment, exercise or defence of legal claims.
‘Processor’ means a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller.
> there is a legal obligation to keep that data;
https://commission.europa.eu/law/law-topic/data-protection/r...
There's also an obvious compromise here – modify the US court ruling to exclude data of non-US users. Let's hope that cool heads prevail.
Making separate manufacturing lines for Europe vs US is too expensive, so in effect, Europe forced a US company to be less shitty globally.
Given the sheer number of devices we interact with in a single day, USB-C as a standard is worth the trade off for an increase in our threat surface area.
1000 Attackers can carry around N extra charging wires anyway.
10^7 users having to keep say, 3 extra charging wires on average? That’s a huge increase in costs and resources.
(Numbers made up)
1) Surely the world conquering robo-army could get some adapters.
2) To the extend to which this makes anything more difficult, it is just that it makes everything a tiny bit less convenient. This includes the world-conquering robo-army, but also everything else we do. It is a general argument against capacity, which can’t be right, right?
That's not everyone wins. The people that actually bought these devices now have cables that don't work and need to replace with a lower quality product, and the people who were already using something else are continuing to not need cables for these devices. The majority breaks even, a significant minority loses.
Simply not choosing one cable to rule them all lets everyone win. There is no compelling reason for one size to fit all.
If some people like hip hop but more people like country, it's not a win for everybody to eliminate the hip hop radio stations so we can all listen to a single country station.
Further, rail gauge is not a consumer choice. If there were two rail gauges and your local rail station happened to have a different gauge than your destination, you'd be SOL. A different rail gauge may provide benefits for people with specific needs, but you don't get to take advantage of those benefits except by blind luck.
There is no such benefit from standardizing cable connectors. If someone charges their phone with the same style cable as you, you gain nothing. If someone uses a different cable, you lose nothing. There is no reason for anyone not to use their preferred cable which is optimal for their use case.
I'm sure there are more than a few people that would end up throwing out their perfectly functional accessories, only for the convenience of carrying less cables.
I don't want to ship another cable across the Pacific Ocean from China so I can have a cable that works on my devices.
I want to keep using them until they don't work and I can't repair them any more.
That is great you spent the money for this, but I'm not ready to throw away my perfectly fine devices.
The accurate comparison here isn’t between random low-budget USB-C implementations and Lightning on iPhones, but between USB-C and Lightning both on iPhones, and as far as I can tell, it’s holding up nicely.
I despise USB-C with all my heart. Amount of cable trash has tripled over the years.
I find it superior to both lightning and USB-C.
On a specsheet basis it also charges faster and has a higher data transmission rate.
Lightning cables are not more robust. They are known to commonly short across the power pins, often turning the cable into an only-works-on-one-side defect. I replaced at least one cable every year due to this.
And... yeah, it turned out better than the standard. Their engineers have really good taste.
A US company can always stop serving EU customers if it doesn't want to comply with EU laws, but for most the market is too big to ignore.
There is no supreme law at that level; the two nations have to hash it out between them.
But the EU willingly participates in this. Probably because they know there's no viable alternative for the big clouds.
This is coming now though since the US instantly.
Don't use them! They cost too much in dollar terms, they all try to EEE lock you in with "managed" (and subtly incompatible) versions of services you would otherwise run yourself. They are too big to give a shit about laws or customer whims.
I have plenty of experience with the big three clouds and given the choice I'll run locally, or on e.g. Hetzner, or not at all.
My company loves to penny-pinch things like lunch reimbursement and necessary tools, but we piss away latge sums on unused or under-used cloud capacity with glee, be ause that is magically billed elsewhere (from the point of view of a given manager).
It's a racket, and I'm by mo means the first to say so. The fact that this money-racket is also a dat-racket doesn't surprise me in the least. It's just good racketeering!
However that's not what most traditional companies do. What my employer does, is picking up the physical servers they had in our datacenters, dump them on an AWS compute box they run 24/7 without any kind of orchestration and call it "cloud". That's not what cloud is, that is really just someone else's computer. We spend a LOT more now but our CIO wanted to "go cloud" because everyone is so it was more a tickbox than a real improvement.
Microservices, object storage etc, that is cloud.
Parts of the European Commission "influenced" by lobbyists collude with the US.
Good job.
Sarbanes–Oxley would like a word.
The GDPR allows to retain data when require by law as long as needed. People that make regulations may make mistakes sometimes, but they are no that stupid as to not understand the law and what things it may require.
The data was correctly deleted on user demand. But it cannot be deleted where there is a Court order in place. The conclusion of "GDPR is in conflict with the law" looks like rage baiting.
If any non-eu country can circumvent GDPR by just making a law that it doesn't apply, the entire point of the regulation vanishes.
Where is the HQ of the company?
Where does the company operate?
What country is the individual user in?
What country do the servers and data reside in?
Ditto for service vendors who also deal with user data.
Even within the EU, this is a mess and companies would rather use a simple heuristic like put all servers and store all data for EU users in the most restrictive country (I’ve heard Germany).
If outside EU, then they need to accept EU jurisdiction and notify who is representative plenipotentiary (== can make decisions and take liability on behalf of the company).
> Where does the company operate?
Geography mostly doesn't matter as long as they interact with EU people. Because people are more important.
> What country is the individual user in?
Any EU (or EEA) country.
> What country do the servers and data reside in?
Again, doesn't matter, because people > servers.
It's almost like if bureaucrats who are writing regulations are experienced in writing regulations in such a way they can't be circumvented.
EDIT TO ADD:
From OpenAI privacy policy:
> 1. Data controller
> If you live in the European Economic Area (EEA) or Switzerland, OpenAI Ireland Limited, with its registered office at 1st Floor, The Liffey Trust Centre, 117-126 Sheriff Street Upper, Dublin 1, D01 YC43, Ireland, is the controller and is responsible for the processing of your Personal Data as described in this Privacy Policy.
> If you live in the UK, OpenAI OpCo, LLC, with its registered office at 1960 Bryant Street, San Francisco, California 94110, United States, is the controller and is responsible for the processing of your Personal Data as described in this Privacy Policy.
If it was easier or more cost-effective for these companies not to have a foot in the EU they wouldn't bother, but they do.
Americans often seem to have the view that lawmakers are bumbling buffoons who just make up laws on the spot with no thought given to loop holes or consequences. That might be how they do it over there, but it's not really how it works here.
It's all about jurisdiction. Do business in Country X? Then you need to follow Country X's laws.
Same as if you go on vacation to County Y. If you do something that is illegal in Country Y while you are there, even if it's legal in your home country, you still broke the law in Country Y and will have to face the consequences.
Do you mean that I, an EU citizen am being granted some special privilege from EU leadership to send my data to the US?
I say temporary because it keeps being shot down in court for lax privacy protections and the EU keeps refloating it under a different name for economic reasons. Before this name it was called safe harbor and after that it was privacy shield.
I mean, sometimes the government steps in when you willingly try to hand over something on your own will, such as very strict rules around organ donation, I can't simply decide to give my organs to some random person for arbitrary reasons even if I really want to. But I'm not sure if data should be the same category where the government steps in and says "no you can't upload your personal data to an American website"
EU companies are required to act in compliance with the GDPR. This includes all sensitive data that is transfered to business partners.
They must make sure that all partners handle the (sensitive part of the) transfered data in a GDPR compliant way.
So: No law is overriden. But in order to do business with EU companies, US companies "must" offer to treat the data accordingly.
As a result, this means EU companies can not transfer sensitive data to US companies. (Since the president of the US has in principle the right to order any US company to turn over their data.)
But in practice, usually no one cares. Unless someone does and then you might be in trouble.
That is why international agreements and cooperation is so important.
Agreement with the United States on mutual legal assistance: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=legissum...
Regulatory entities are quite competent and make sure that most common situations are covered. When some new situation arises an update to the treaty will be created to solve it.
There's "legitimate interest", which makes the whole GDPR null and void. Every website nowdays has the "legitimate interest" toggled on for "track user across services", "measure ad performance" and "build user profile". And it's 100% legal, even though the official reason for GDPR to exist in the first place is to make these practices illegal.
- Direct Marketing
- Preventing Fraud
- Ensuring information security
It's weasel words all the way down. Having to take into account "reasonable" expectations of data subjects etc. Allowed where the subject is "in the service of the controller"
Very broad terms open to a lot of lengthy debate
> Very broad terms open to a lot of lengthy debate
Because otherwise no law would eve be written, because you would have to explicitly define every single possible human activity to allow or disallow.
direct marketing that I believe is legitimate - offers with rebate on heightened service level if you currently have lower service level.
direct marketing that is not legitimate, this guy has signed up for autistic service for our video service (silly example, don't know what this would be), therefore we will share his profile with various autistic service providers so they can market to him.
Fraud prevention is literally "collect enough cross-service info to identify a person in case we want to block them in the future". Weasel words for tracking.
> therefore we will share his profile with various autistic service providers so they can market to him.
This again falls under legitimate interest. The user, being profiled as x, may have legitimate interest in services targeting x. But we can't deliver this unless we are profiling users, so we cross-service profile users, all under the holy legitimate interest
You're literally not allowed to store that data for years, or to sell/use that data for marketing and actual tracking purposes.
Websites A and B buy fraud prevention service FPS, website A flags user x as fraudulent, how should FPS flag user x as high risk for website B if consent from user x was required?
Legitimate interest literally allows FPS to track users, build cross-service profiles, process and store their data in case FPS needs that data sometime in the future. Under legitimate interest response to query "What's the ratio of disputed transactions for this user?" is perfectly legal trigger to put all that data to use, even though it is for all intents and purposes indistinguishable from pre-GDPR tracking.
"Legitimate interests is now our legal basis for using your information to improve Meta Products"
Fun read https://www.facebook.com/privacy/policy?section_id=7-WhatIsO...
But don't worry, "None of these allow you to just willy-nilly send/sell info to third parties." !
It's a farce and just like the US constitution they'll just continuously argue about the meanings of words and erode then over time
Session cookies and profiles on logged in users is where I see most companies stretching for legitimate interest. But cross service data sharing and persistent advertising cookies without consent are clearly no bueno.
https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
none of these others are legitimate interest. Furthermore combining the data from legitimate interest (email address to keep track of your logged in user) with illegitimate goals such as tracking across services would be illegitimate.
Add to that the fact that the EU’s heavy influence on the courts is a well-documented, ongoing deal, and the GDPR comes off as a surveillance law dressed up to seem the total opposite.
Which courts are influenced by the EU? I don't think it's true of US courts, and courts in EU nations are supposed to be influenced by it, it's in the EU treaties.
I see one of them: The New York Times.
We need to let people know who the other ones are.
You might be a more benign user of chatGPT. Other people have turned it into a therapist and shared wildly intimate things with it. There is a whole cottage industry of journal apps that also now have "ai integration". At least some of those apps are using openAI on the back end...
That's not a good analogy. They're ordered to preserve records they would otherwise delete, not create records they wouldn't otherwise have.
Why the NYT cares about a random ChatGPT user bypassing their paywall when an archive.ph link is posted on every thread is beyond me.
This is from DuckDuckGo's privacy policy: "We don’t track you. That’s our Privacy Policy in a nutshell. We don’t save or share your search or browsing history when you search on DuckDuckGo or use our apps and extensions."
If the court compelled DuckDuckGo to log all searches, I would be equally concerned.
It would be interesting to know how much Microsoft logs or tracks.
OpenAI (and other services) log and preserve your interactions, in order to either improve their service or to provide features to you (e.g., your chat history, personalized answers, etc., from OpenAI). If a court says "preserve all your user interaction logs," they exist and need to be preserved.
DDG explicitly does not track you or retain any data about your usage. If a court says "preserve all your users interaction logs," there is nothing to be preserved.
It is a very different thing - and a much higher bar - for a court to say "write code to begin logging user interaction data and then preserve those logs."
The court is after evidence that users use ChatGPT to bypass paywalls. Anonymizing the data in a way that makes it impossible to 1) pinpoint the users and 2) reconstruct the generic user conversation history would preserve privacy and allow OpenAI to comply in good faith with the order.
The fact that they are blaring sirens and hide behind the "we can't, think about users' privacy" feels akin to willingful negligence or that they know they have something to hide.
Not at all; there is a presumption of innocence. Unless a given user is plausibly believed to be violating the law, there is no reason to search their data.
In the case of a warez site they would never have logged such a "conversation" to begin with. So if the court requested that they produce all such communications the warez site would simply declare that as, "Impossibility of Performance".
In the case of OpenAI the courts are demanding that they preserve all future communications from all their end users—regardless of whether or not those end users are parties (or even relevant) to the case. The court is literally demanding that they re-engineer their product to record all communications where none existed previously.
I'm not a lawyer but that seems like it would violate FRCP 26(b)(1) which covers "proportionality". Meaning: The effort required to record the evidence is not proportional relative to the value of the information sought.
Also—generally speaking—courts recognize that a party is not required to create new documents or re-engineer systems to satisfy a discovery request. Yet that is exactly what the court has requested of OpenAI.
Does this pertain to Google Gemini, Meta chat, Anthropic, etc. also?
Run LLM in an enclave that generates ephemeral encryption keys. Have users encrypt text directly to those enclave ephemeral keys, so prompts are confidential and only ever visible in an environment not capable of logging.
All plaintext data will always end up in the hands of governments if it exists, so make sure it does not exist.
Decentralize hosting and encryption then centralized developers of the open source software will be literally unable to comply.
This well proven strategy would however only be possible if anything about OpenAI was actually open.
The privacy agreement is a contract, not a law. A judge is well within their rights to issue such an order, and the privacy agreement doesn't matter at all if OpenAI has to do something to comply with a lawful order from a court of competent jurisdiction.
OpenAI are like the new Facebook when it comes to spin.
Court orders are like temporary, extremely finely scoped laws, as I understand them. A court order can’t compel an entity to break the law, but it can compel an entity to behave as if the court just set a law (for the specified entity, for the specified period of time, or the end of the case, whichever is sooner).
Normally Courts are oblivious to advanced opsec, which is one fundamental reason they got breached, badly, a few years ago. I just saw a new local order today on this very topic.[1] Courts are just waking up to security concepts that have been second nature to IT professionals.
From my perspective, the magistrate judge here made two major goofs: (1) ignoring opsec as a reasonable privacy right for customers of an internet service and (2) essentially demanding that several hundred million of them intervene in her court to demand that she respect their ability to protect their privacy.
The fact that the plaintiff is the news organization half the US loves to hate does not help, IMO. Why would that half of the country trust some flimsy "order" to protect their most precious secrets from an organization that lives and breathes cloak-and-dagger leaks and political subterfuge. NYT needed to keep their litigation high and tight and instead they drove it into a ditch with the help of a rather disappointing magistrate.
[1] https://www.uscourts.gov/highly-sensitive-document-procedure...
us law does not supersede foreign contract law. how would that even work? why would you think that was possible?
Speaksy to the rescue for privacy and curiosity (easy-access jailbroken qwen3 8B in free tier, 32B in paid) May quickly hit capacity since it's all locally run for privacy.
To protect their users from the this massive overreach, OpenAI should defy this order and eat the fines IMO.
Anyone concerned about their privacy wouldn't use these services to begin with. The fact they are so popular is indicative that most people value the service over their privacy, or simply don't care.
Yes, they want to use everyone's data. But they also want everyone as a customer, and they can't have both at once. Offering people an opt-out is a popular middle-ground because the vast majority of people don't care about it, and those that do care are appeased
They have every incentive not to, and no oversight to hold them accountable if they don't. Do you really want to trust your data is safe based on a pinky promise from a company?
Or, the general populace just doesn't understand the actual implications. The HN crowd can be guilty of severely overestimating the average person's tech literacy, and especially their understanding of privacy policies and ToS. Many may think they are OK with it, but I'd argue it's because they don't understand the potential real-world consequences of such privacy violations.
That might've been the case in the first generations of ad-supported business models on the web. But after two decades, even non-technical users have understood the implications of "free" services.
IME talking to non-technical people about this topic, I can't remember the last time someone mentioned not being aware of the ToS and privacy policies they agree to, even if they likely hadn't read the legalese. Whereas the most common excuses I've heard are "I have nothing to hide", and "I don't use it often".
So I think you're underestimating the average person's tech literacy. I'm sure people who who still don't understand the implications exist, but they're in the minority.
Let's say someone creates Russian propaganda with thesemodels or create fraudulent documents.
And here are the links to the court irders and responses if you are curious: https://social.wildeboer.net/@jwildeboer/114530814476876129
At least the Chinese are open about their authoritarian system and constant snooping on users.
Time to switch to Chinese AI. It can't be any worse than using American AI.
Anyway the future is open models.
And ironically, this now includes the Chinese AI companies too.
Old bureaucratic fogeys will be the death of this nation.
Why would that not apply to LLM chat services?
https://learn.microsoft.com/en-us/legal/cognitive-services/o...
It's entirely reasonable, and standard practice for courts to say 'while the legal proceedings is going on don't delete potential evidence relevant to the case'.
More special case whining from tech bro's - who don't seem to understand the basic concepts of fairness or justice.
I'm not talking about contractual control (which is largely mooted as pretty much every cloud service has a ToS that's grossly skewed toward their own interests over yours, with clauses like indemnifications, blanket grants to share your data with "partners" without specifying who they are or precisely what details are conveyed, mandatory arbitration, and all kinds of other exceptions to what you'd consider respectful decency), but rather where your data lives and is processed.
If you truly want to maintain confidence it'll remain private, don't send it to the cloud in the first place.
Like why one good other bad?
There is no technical reason for chats people have with ChatGPT or any similar service to be available on the web to everyone, so there is no way for them to be scraped.
“If one is good the other must be good” is far too simplistic thinking to apply to a situation like this.
I’ve asked ChatGPT medical things that are private but not incriminating or anything, because I trust ChatGPT’s profit motive to just not care about my individual issues. But I would be pretty irritated if the government stepped in and mandated they make my searches public and linkable to me.
Are you perhaps taking an absolutist view where anything less than perfect attention to all privacy is the same as making all logs of everyone public?
Who is calling for this? Are you perhaps taking an absolutist view where "not destroying evidence" is the same as "mandated they make my searches public and linkable to me"? That's quite ridiculous.
*my legal argument is "possession is 9/10ths of the law"
I can't even get important announcements from my kids' school without signing up for yet another cloud service.
Couldn't have said it better.
Just consider Apple as an example: Some time ago, they used to sell the Time Capsule, a local-first NAS built for wireless Time Machine backups. Today, not only has the Time Capsule been discontinued, but it's outright impossible to make local backups of iOS devices (many people's primary computing devices!) without a computer and a USB cable.
Considering Apple's resources, it would take negligible effort to add a NAS backup feature to iOS with polished UX ("simply tap your phone on your Time Capsule 2.0 to pair it, P2P Wi-Fi faster than your old USB cable" etc.) – but they won't, because now it's all about "services": Why sell a Gigabyte once if you can lease it out and collect rent for it every month instead?
I mean yes. but if you host it, then you'll be taken to court to hand that data over. Which means that you'll have less legal talent at your disposal to defend against it.
Not in this case. The Times seems to be claiming that OpenAI is infringing rather than any particular user. If one does not use OpenAI then their data is not subject to this.
I imagine a 90s era software industry for today’s tech world: person buys a server computer, person buys an internet connection with stable ip, person buys server software boxes to host content on the internet, person buys security suite to firewall access.
Where is the problem in this model? Aging computers? Duplicating computing hardware for little use? Unsustainable? Not green/efficient?
As frustrating as it is, the answer seems to be everyone and no one. Data in some respects is just an observation. If I walk through a park, and I see someone with red hair, I just collected some data about them. If I see them again, perhaps strike up a conversation, I learn more. In some sense, I own that data because I observed it.
On the other other hand, I think most decent people would agree that respecting each other's right to privacy is important. Should the owner of the red hair ask me to not share personal details about them, I would gladly accept, because I personally recognize them as the owner of the source data. I may possess an artifact or snapshot of that data, but it's their hair.
In a digital world where access controls exist, we have an opportunity to control the flow of our data through the public space. Unfortunately, a lot of work is still needed to make this a reality...if it's even possible. I like the Solid Project for it's attempt to rewrite the internet to put more control in the hands of the true data owners. But, I wonder if my observation metaphor is still possible even in a system like Solid.
Billion people use the internet daily. If any organization suspects some people use the Internet for illicit purposes eventually against their interests, would the court order the ISP to log all activities of all people? Would Google be ordered to save the search of all its customers because some might use it for bad things? And once we start, where will we stop? Crimes could happen in the past or in the future, will the court order the ISP and Google to retain the logs for 10 years, 20 years? Why not 100 years? Who should bear the cost for such outrageous demands?
The consequences of such orders are of enormous impact the puny judge can not even begin to comprehend. Privacy right is an integral part of the freedom of speech, a core human right. If you don’t have private thoughts, private information, anybody can be incriminated against them using these past information. We will cease to exist as individuals and I argue we will cease to exist as human as well.
> Sam Altman is the most ethical man I have ever seen in IT. You cannot doubt he is vouching and fighting for your privacy. Especially on YCombinator website where free speech is guaranteed.
Openly destroying evidence isn’t usually accepted by courts.
Furthermore there is no conceivable harm resulting from requiring evidence to be preserved for an active trial. Find a better framing.
>For OpenAI, risks of breaching its own privacy agreements could not only "damage" relationships with users but could also risk putting the company in breach of contracts and global privacy regulations. Further, the order imposes "significant" burdens on OpenAI, supposedly forcing the ChatGPT maker to dedicate months of engineering hours at substantial costs to comply, OpenAI claimed. It follows then that OpenAI's potential for harm "far outweighs News Plaintiffs’ speculative need for such data," OpenAI argued.
This ruling is about preservation of evidence, not (yet) about delivering that information to one of the parties.
If judges couldn't compel parties to preserve evidence in active cases, you could see pretty easily that parties would aggressively destroy evidence that might be harmful to them at trial.
There's a whole later process (and probably arguments in front of the judge) about which evidence is actually delivered, whether it goes to the NYT or just to their lawyers, how much of it is redacted or anonymized, etc.
The power does not extend to any of your hypotheticals, which are not about active cases. Courts do not accept cases on the grounds that some bad thing might happen in the future; the plaintiff must show some concrete harm has already occurred. The only thing different here is how much potential evidence OpenAI has been asked to retain.
This post appears to be full of people who aren’t actually angry at the results of this case but angry at how the US legal system has been working for decades, possibly centuries since I don’t know when this precedent was first set
The company records and uses this stuff internally, retention is about keeping information accurate and accessible.
Lawsuits allow in a limited context the sharing of non public information held by individuals/companies in the lawsuit. But once you submit something to OpenAI it’s now there information not just your information.
Maybe so, but this has always been the case for hundreds of years.
After all, how on earth do you propose having getting fair hearing if the other party is allowed to destroy the evidence you asked for in your papers?
Because this is what would happen:
You: Your Honour, please ask the other party to turn over all their invoices for the period in question
Other Party: We will turn over only those invoices we have
*Other party goes back to the office and deletes everything.
The thing is, once a party in a suit asks for a certain piece of evidence, the other party can't turn around and say "Our policy is to delete everything, and our policy trumps the orders of this court".
For me, one company obligated to retain business records during civil litigation against another company, reviewed within the normal discovery process is tolerable. Considering the alternative is lawlessness. I'm fine with it.
Companies that make business records out of invading privacy? They, IMO, deserve the fury of 1000 suns.
However, if the ISP, for instance, is sued, then it (immediately and without a separate court order) becomes illegal for them to knowingly destroy evidence in their custody relevant to the issue for which they are being sued, and if there is a dispute about their handling of particular such evidence, a court can and will order them specifically to preserve relevant evidence as necessary. And, with or without a court order, their destruction of relevant evidence once they know of the suit can be the basis of both punitive sanctions and adverse findings in the case to which the evidence would have been relevant.
Not "all", just the ones involved in a current suit. They already routinely do this anway (Party A is involved in a suit and is ordered to retain any and all evidence for the duration of the trial, starting from the first knowledge that Party A had of the trial).
You are mischaracterising what happens; you are presenting it as "Any court, at any time can order any party who is not involved in any suit in that sourt to forever hold user data"
That is not what is happening.
No, actually, it doesn't. Ordering a party to stop destroying evidence relevant to a current case (which is its obligation even without a court order) irrespective of whether someone else asks it to destroy that evidence is both within the well-established power of the court, and routine.
> Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs.
OpenAI is the alleged infringer in the case.
That is ridiculous. The company itself receives that order, and is IMMEDIATELY legally required to comply - from the CEO to the newest-hired member of the cleaning staff.
No it needs to show how often it happens to prove a point of how much impact its had.
It’s very impressive they managed to do such innovation in their spare time while running a newspaper and site
which is irrelevant at this stage. Its a legal principle that both sides can fairly discover evidence. As finding out how much openAI has infringed copyright is pretty critical to the case, they need to find out.
After all, if its only once or twice, thats a couple of dollars, if its millions of times, that hundreds of millions
Because its a copyright infringement case, so existence and the scale of the infringement is relevant to both whether there is liability and, if so, how much; the issue isn't that it is possible for infringement to occur.
The allegation is not that merely that infringement is possible; the actual occurrence and scale are relevant to the case.
Just erasing the userid isn’t enough to actually anonymize the data, and if you scrubbed location data and entities out of the logs you might have violated the court order.
Though it might be in our best interests as a society we should probably be honest about the risks of this tradeoff; anonymization isn’t some magic wand.
Courts have been dealing with discovery including secrets that litigants never want to go public for longer than AOL has existed.
Depending on the exact issues in the case, a court might allow that (more likely, it would allow only turning over anonymized data in discovery, if the issues were such that that there was no clear need for more) but generally the obligation to preserve evidence does not include the right to edit evidence or replace it with reduced-information substitutes.
Google can (and would) file to keep that data private and only the relevant parts would be publicly available.
A core aspect to civil lawsuits is everyone gets to see everyone else's data. It's that way to ensure everything is on the up and up.
Which would be chump change[0] compared to the costs of an actual trial with multiple lawyers/law firms, expert witnesses and the infrastructure to support the legal team before, during and after trial.
Not just that, even without a specific court order parties to existing or reasonably anticipated litigation have a legal obligation that attaches immediately to preserve evidence. Courts tend to issue orders when a party presents reason to believe another party is out of compliance with that automatic obligation, or when there is a dispute over the extent of the obligation. (In this case, both factors seem to be in play.)
https://codiscovr.com/news/fumiko-lopez-et-al-v-apple-inc/
https://app.ediscoveryassistant.com/case_law/58071-lopez-v-a...
Perhaps the larger lesson here is: if you don't want your service provider to end up being required to retain your private queries, there's really no way to guarantee it, and the only real mitigation is to choose a service provider who's less likely to be sued!
(Not a lawyer, this is not legal advice.)
Are these contradictory?
If you overhear a friend gossiping, can't you spread that gossip?
Also, where are human rights located? I'll give you a microscope.(sorry, I'm a moral anti-realist/expressivist and I can't help myself)
Probably because they bothered to pursue such a thing and hundreds of millions people did not.
How do you conclusively know if someone's content generating machine infringe with your rights? By saving all of its input/output for investigation.
It's ridiculous, sure but is it less ridiculous than AI companies claiming that the copyrights shouldn't apply to them because it will be bad for their business?
IMHO those are just growth pain. Back in the day people used to believe that the law don't apply on them because they did it on the internet and they were mostly right because the laws were made for another age. Eventually the laws both for criminal stuff and copyright caught up. Will be the same for AI, now we are in the wild west age of AI.
Anyway, the laws were not written with this type of processing in mind. In fact the whole idea of intellectual property breaks down now. Just like the early days of the internet.
No, it doesn’t. Play 10% of a movie for the purpose of critiquing it, perhaps.
https://fairuse.stanford.edu/overview/fair-use/four-factors/
Fair Use is not an a priori exemption or exception; Fair Use is an “affirmative defense” so once you have your day in court and the judge asks your attorney why you needed to play 10% of Priscilla, Queen of the Desert for your Gender Studies class, then you can run down those Four Factors enumerated by the Stanford article.
Particularly “amount and substantiality”.
Teachers and churches get tripped up by this all the time. But I’ve also been blessed with teachers who were very careful academically and sought to impart the same caution on all students about using copyrighted materials. It is not easy when fonts have entered the chat!
The same reason you or your professor cannot show/perform 100% of an unlicensed film under any circumstance, is the same basis that creators are telling the scrapers that they cannot consume 100% of copyrighted works on that end. And if the risks may involve reproducing 87% of the same work in their outputs, that’s beyond the standard thresholds.
https://www.copyright.gov/title17/92chap1.html#110 "Notwithstanding the provisions of section 106, the following are not infringements of copyright:
(1) performance or display of a work by instructors or pupils in the course of face-to-face teaching activities of a nonprofit educational institution, in a classroom or similar place devoted to instruction, unless, in the case of a motion picture or other audiovisual work, the performance, or the display of individual images, is given by means of a copy that was not lawfully made under this title, and that the person responsible for the performance knew or had reason to believe was not lawfully made;"
That is why it is not a good comparison with the broader Fair Use Four Factors test (defined in section 107: https://www.copyright.gov/title17/92chap1.html#107) because it doesn't need to even get to that analysis, it is exempted from copyright.
1. This would also be a massive legal headache,
2. It would become impossibly expensive
3. We obviously wouldn't have the AI we have today, which is an incredible technology (if immature) if this happened. Instead the growth of AI would have been strangled by rights holders wanting infinity money because they know once their data is in that model, they aren't getting it back, ever—it's a one-time sale.
I'm of the opinion that AI is and will continue to be a net positive for society. So I see this as essentially saying "let's go an remove this and delay the development of it by 10-20 years and ensure people can't train and run their own models feasibly for a lot longer because only big companies can afford real training datasets."
It's bad faith because they are saying "well, they should have done [unreasonable thing]". I explored their version of things from my perspective (it's not possible) and from a conciliatory perspective (okay, let's say they somehow try to navigate that hurdle anyways, is society better off? Why do I think it's infeasible?)
Edit: Authors Guild, Inc. v. Google, Inc. is a great example of a case where a tech giant tried to legally get the rights to use a whole bunch of copyrighted content (~all books ever published), but failed. The net result was they had to completely shut off access to most of the Google Books corpus, even though it would have been (IMO) a net benefit to society if they had been able to do what they wanted.
In any other context, this would be known as "civil disobediance". It's generally considered something to applaud.
For what it's worth, I haven't made up my mind about the current state of AI. I haven't yet seen an ability for the systems to perform abstract reasoning, to _actually_ learn. (Show me an AI that has been fed with nothing but examples in languages A and B. Then demonstrate, conclusively, that it can apply the lessons it has learned in language M, which happens to be nothing like the first two.)
No, civil disobedience is when you break the law expecting to be punished, to force society to confront the evil of the law. The point is that you get publicly arrested, possibly get beaten, get thrown in jail. This is not at all like what Open AI is doing.
Absolutely. Which, presumably, means that you're fine with the argument that your DNA (and that of each member of your family) could provide huge benefits to medicine and potentially save millions of lives.
But significant research will be required to make that happen. As such, we will be requiring (with no opt outs allowed) you and your whole family to provide blood, sperm and ova samples weekly until that research pays off. You will receive no compensation or other considerations other than the knowledge that you're moving the technology forward.
May we assume you're fine with that?
Absent a special legal carve-out, you need to get judges to do the Fair Use Four Factors test, and decide on how AI should be treated. To my very much engineer and not legal eye, AI does great on point 3, but loses on points 1, 2, and 4, so it is something that will need to be decided by the judges, how to balance those four factors defined in the law.
AI companies have, in fact, said that the law shouldn't apply to them or they won't make money. That is literally the argument Nick Clegg is using to ague that copyright protection should be removed from authors and musicians in the UK.
Since that wasn't ever a real argument, your strawman is indeed ridiculous.
The argument is that requiring people to have a special license to process text with an algorithm is a dramatic expansion of the power of copyright law. Expansions of copyright law will inherently advantage large corporate users over individuals as we see already happening here.
New York Times thinks that they have the right to spy on the entire world to see if anyone might be trying to read articles for free.
That is the problem with copyright. That is why copyright power needs to be dramatically curtailed, not dramatically expanded.
The law is not a deterministic computer program. It’s a complex body of overlapping work and the courts are specifically chartered to use judgement. That’s why briefs from two parties in a dispute will often cite different laws and precedents.
For instance, Winter v. NRDC specifically says that courts must consider whether an injunction is in the public interest.
First - in the US, privacy is not a constitutional right. It should be, but it's not. You are protected against government searches, but that's about it. You can claim it's a core human right or whatever, but that doesn't make it true, and it's a fairly reductionist argument anyway. It has, fwiw, also historically not been seen as a core right for thousands of years. So i think it's a harder argument to make than you think despite the EU coming around on this. Again, I firmly believe it should be a core right, but asserting that it is doesn't make that true.
Second, if you want the realistic answer - this judge is probably overworked and trying to clear a bunch of simple motions off their docket. I think you probably don't realize how many motions they probably deal with on a daily basis. Imagine trying to get through 145 code reviews a day or something like that. In this case, this isn't the trial, it's discovery. Not even discovery quite yet, if i read the docket right. Preservation orders of this kind are incredibly common in discovery, and it's not exactly high stakes most of the time. Most of the discovery motions are just parties being a pain in the ass to each other deliberately. This normally isn't even a thing that is heard in front of a judge directly, the judge is usually deciding on the filed papers.
So i'm sure the judge looked at it for a few minutes, thought it made sense at the time, and approved it. I doubt they spent hours thinking hard about the consequences.
OpenAI has asked to be heard in person on the motion, i'm sure the judge will grant it, listen to what they have to say, and determine they probably fucked it up, and fix it. That is what most judges do in this situation.
What? The supreme court disagreed with you in Griswold v. Connecticut (1965) and Roe v. Wade (1973).
While one could argue that they were vastly stretching the meaning of words in these decisions the point stands that at this time privacy is a constitutional right in the USA.
They also explicitly stated a constitutional right to privacy does not exist, and pointed out that Casey abandoned any such reliance on this sort of claim.
Griswold also found a right to marital privacy. Not general privacy.
Griswold is also barely considered good law anymore, though i admit it has not been explicitly overruled - it is definitely on the chopping block, as more than just Thomas has said.
In any case, more importantly, none of them have found any interesting right to privacy of the kind we are talking about here, but instead more specific rights to privacy in certain contexts. Griswold found a right to marital privacy in "the penumbra of the bill of rights". Lawrence found a right to privacy in your sexual activity.
In dobbs, they explicitly further denied a right to general privacy, and argued previous decisions conflated these: " As to precedent, citing a broad array of cases, the Court found support for a constitutional “right of personal privacy.” Id., at 152. But Roe conflated the right to shield information from disclosure and the right to make and implement important personal decisions without governmental interference."
You are talking about the former, which none of these cases were about. They are all about the latter.
So this is very far afield from a general right to privacy of the kind we are talking about, and more importantly, one that would cover anything like OpenAI chats.
So basically, you have a ~200 year period where it was not considered a right, and then a 50 year period where specific forms of privacy were considered a right, and now we are just about back to the former.
The kind of privacy we are talking about here ("the right to shield information from disclosure") has always been subject to a balancing of interests made by legislatures, rather than a constitutional right upon which they may not infringe. Example abound - you actually don't have to look any further than court filings themselves, and when you are allowed to proceed anonymously or redact/file things under seal. The right to public access is considered much stronger than your right to not want the public to know embarassing or highly private things about your life. There are very few exceptions (minors, etc).
Again, i don't claim any of this is how it is should be. But it's definitely how it is.
I did not know this, thank you!
>> Again, i don't claim any of this is how it is should be. But it's definitely how it is.
Agreed.
This doesn't seem true. I'd assume you know more about this than I do though so can you explain this in more detail? The concept of privacy is definitely more than thousands of years old. The concept of a "human right", is arguably much newer. Do you have particular evidence that a right to privacy is a harder argument to make that other human rights?
While the language differs, the right to privacy is enshrined more or less explicitly in many constitutions, including 11 USA states. It isn't just a "european" thing.
Nothing has been seen as a core right for thousands of years, as the concept of human rights is only a few hundred years old.
I completely agree with you, but as a ChatGPT user I have to admit my fault in this too.
I have always been annoyed by what I saw as shameless breaches of copyright of thousands of authors (and other individuals) in the training of these LLMs, and I've been wary of the data security/confidentiality of these tools from the start too - and not for no reason. Yet I find ChatGPT et al so utterly compelling and useful, that I poured my personal data[0] into these tools anyway.
I've always felt conflicted about this, but the utility just about outweighed my privacy and copyright concerns. So as angry as I am about this situation, I also have to accept some of the blame too. I knew this (or other leaks or unsanctioned use of my data) was possible down the line.
But it's a wake up call. I've done nothing with these tools which is even slightly nefarious, but I am today deleting all my historical data (not just from ChatGPT[1] but other hosted AI tools) and will completely reassess my approach of using them - likely with an acceleration of my plans to move to using local models as much as I can.
[0] I do heavily redact my data that goes into hosted LLMs, but there's still more private data in there about me than I'd like.
[1] Which I know is very much a "after the horse has bolted" situation...
100 years from now, nobody will GAF about the New York Times.
Copyright is not a natural right by any measure; it's something we pulled out of our asses a couple hundred years ago in response to a need that existed at the time. To the extent copyright interferes with progress, as it appears to have sworn to do, it has to go.
Sorry. Don't shoot the messenger.
It's not just about profits, it's about paying reporters to do honest work and not cut corners in their reporting and data collection.
If you think the data is valuable, then you should be prepared to pay the people who collect it, same as you pay for the service that collates it (ChatGPT)
Now, as you point out, companies like OpenAI have a problem, and so do the rest of us. Fair compensation for journalists and editors requires attribution before anything else can even be negotiated, and AI literally transforms its input into something that is usually (but obviously not always) untraceable. For the big AI players, the solution to that problem might involve starting or acquiring news and content networks of their own. Synergies that Microsoft and NBC were hoping might materialize could actually be feasible now.
So to answer your question, maybe ChatGPT will end up paying journalists directly.
Again, I don't know how plausible that kind of scenario might turn out to be. But I am absolutely certain that countries that allow their legacy rightsholders to impede progress in AI are going to be outcompeted by those with less to lose.
I sometimes wonder if people commenting on this topic on HN really understand how fundamental copyright as a concept is to the entire tech industry. And indeed even to capitalism itself.
It simply didn't. ChatGPT hasn't deleted any user data.
> "OpenAI did not 'destroy' any data, and certainly did not delete any data in response to litigation events," OpenAI argued. "The Order appears to have incorrectly assumed the contrary."
It's a bit of a stretch to think a big tech company like ChatGPT is deleting users' data.
OpenAI already has a business, and not one they want to violate by having a massive amount of customer data stolen if they get hacked.
Well maybe some people in power have pressured the court into this decision? The New York Times surely has some power as well via their channels
Otherwise, you are picking your data privacy champions as the exact same companies, people and investors that sold us social media, and did something quite untoward with the data they got. Fool me twice, fool me three times… where is the line?
In other words - OAI has to save logs now? Candidly they probably were already, or it’s foolish not to assume that.
But also, no - Just self-host or it's all your fault is never ever a sufficient answer to the problem.
It's exactly the same as when Exxon says "what are you doing to lower your own carbon footprint?" It's shifting the burden unfairly; companies like OpenAI put themselves out there and thus must ALWAYS be held to task.
If you send your neighbour nudes then they have your nudes. You can put in as many contracts as you want, maybe they never digitised it but their friend is over for a drink and walks out of the door with the shoebox of film. Do not pass GO, do not collect.
Conceivably we can try to control things like e.g. is your cellphone microphone on at all times, but once someone else, particularly an arbitrary entity (e.g. not a trusted family member or something) has the data, it is silly to treat it as anything other than gone.
I wish it was different and I agree that there’s a massive accountability hole with… who could it be?
Pragmatically it is what it is, self host and hope for bigger picture change.
You lose your rights to privacy in your papers without a warrant once you hand data off to a third party. Nothing in this ruling is new.
Because the law favors preservation of evidence for an active case above most other interests. It's not a matter of arbitrary preference by the particular court.
There is a longstanding precedent with regards to business document retention, and chat logs have been part of that for years if not decades. The article tries to make this sound like this is something new, but if you look at the e-retention guidelines in various cases over the years this is all pretty standard.
For a business to continue operating, they must preserve business documents and related ESI upon an appropriate legal hold to avoid spoliation. They likely weren't doing this claiming the data was deleted, which is why the judge ruled in favor against OAI.
This isn't uncommon knowledge either, its required. E-discovery and Information Governance are things any business must meet in this area; and those documents are subject to discovery in certain cases, where OAI likely thought they could avoid it maliciously.
The matter here is OAI and its influence rabble are churning this trying to do a runaround on longstanding requirements that any IT professional in the US would have reiterated from their legal department/Information Governance policies.
There's nothing to see here, there's no real story. They were supposed to be doing this and didn't, were caught, and the order just forces them to do what any other business is required to do.
I remember an executive years ago (decades really), asking about document retention, ESI, and e-discovery and how they could do something (which runs along similar lines to what OAI tried as a runaround). I remember the lawyer at the time saying, "You've gotta do this or when it goes to court you will have an indefensible position as a result of spoliation...".
You are mistaken, and appear to be trying to frame this improperly towards a point of no accountability.
I suggest you review the longstanding e-discovery retention requirements that courts require of businesses to operate.
This is not new material, nor any different from what's been required for a long time now. All your hyperbole about privacy is without real basis, they are a company; they must comply with law, and it certainly is not outrageous to hold people who break the law to account, and this can only occur when regulatory requirements are actually fulfilled.
There is no argument here.
References: Federal Rules of Civil Procedure (FRCP) 1, 4, 16, 26, 34, 37
There are many law firms who have written extensively on this and related subjects. I encourage you to look at those too.
(IANAL) Disclosure: Don't take this as legal advice. I've had the opportunity to work with quite a few competent ones, but I don't interpret the law; only they can. If you need someone to provide legal advice seek out competent qualified counsel.
Can't you use the same arguments against, say, Copyright holders? Billionaires? Corporations doing the Texas two-step bankruptcy legal maneuver to prevent liability from allegedly poisoning humanity?
I sure hope so.
Edit: ... (up to a point)
Inevitably such a far-reaching state power will be abused for prurient purposes, for the sexual titillation of the investigators, and to suppress political dissent.
A computer in service of an individual absolutely follows copyright because the creator is in control of the distribution and direction of the content.
Besides, copyright is a civil statute, not criminal. Everything about this comment is the most obtuse form of FUD possible. I’m pro copyright reform, but this is “Uncle off his meds ranting on Facebook” unhinged and shouldn’t be given credence whatsoever.
Nope. https://www.justia.com/intellectual-property/copyright/crimi...
I don’t understand what means. A computer in service of an individual turns copyright law into mattress tag removal law—practically unenforceable.
The Twitter users quoted by Ars Technica, who cite "boomer copyright concerns" are pretty short sighted. The NYT and other mainstream sources, with all their flaws, provide the historical record that pundits can use to discuss issues.
Glenn Greenwald can only point out inconsistencies of the NYT because the NYT exists. It is often the starting point for discussions.
Some YouTube channels like the Grayzone and Breaking Points send reporters directly to press conferences etc. But that is still not the norm and important information should not be stored in a disorganized YouTube soup.
So papers like the NYT need to survive for democracy to function.
Ona T. Wang (she/her) ( https://www.linkedin.com/in/ona-t-wang-she-her-a1548b3/ ) would have a difficult time getting a job at OpenAI but she is given the full force of US law to direct the company in almost anyway she sees fit.
The wording is quite explcit, and forceful:
Accordingly, OpenAI is NOW DIRECTED to preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court (in essence, the output log data that OpenAI has been destroying), whether such data might be deleted at a user’s request or because of “numerous privacy laws and regulations” that might require OpenAI to do so.
Again, I have no idea how to fix this, but it seems broken.
"In the filing, OpenAI alleged that the court rushed the order based only on a hunch raised by The New York Times and other news plaintiffs. And now, without "any just cause," OpenAI argued, the order "continues to prevent OpenAI from respecting its users’ privacy decisions." That risk extended to users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI said."
This is the consequence and continuation of the dystopian reality we have been living for many years. One where a random person, influencer, media outlet, politician attacks someone, a company or an entity to cause harm and even total destruction (losing your job, company boycott, massive loss of income, reputation destruction, etc.). This morning, on CNBC, Palantir's CEO discussed yet another false accusation made against the company by --surprise-- the NY Times, characterizing it as garbage. Prior to that was the entirety of the media jumping on Elon Musk accusing him of being a Nazi for a gesture used by dozens and dozens of politicians and presenters, most recently Corey Booker.
Lies and manipulation. I do think that people are waking up to this and massively rejecting professional mass manipulators. We now need to take the next step and have them suffer real legal consequences for constant lies and, for Picard's sake, also address any influence they might have over the courts.
Accessing information on a customer's server or tenant (I have been assured) would require a court order for the customer directly.
But... as an 365 E5 user with an Azure account using the 4o through Foundry... I am much more nervous than I ever have been.
Whereas parties to litigation that receive sensitive data are subject to limits on how it can be used.
There's no reasonable narrative in which OpenAI are not villains, but NYT is notoriously one to shoot a man in Reno just to see him die.
Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.
When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."
Please don't fulminate. Please don't sneer, including at the rest of the community.
Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
Eschew flamebait. Avoid generic tangents. Omit internet tropes.
Please don't use Hacker News for political or ideological battle. It tramples curiosity.
Ofcourse if OpenAI was scanning your chat history for verbatim NYT text and editing and deleting that would be another thing, but that itself would also get noticed.
However, I find it unlikely the OpenAI hasn't already built filters to prevent their output from appearing to be regurgitated NYTimes content.
ColinWright•1d ago
"After court order, OpenAI is now preserving all ChatGPT user logs, including deleted chats, sensitive chats, etc."
righthand•1d ago
hyperhopper•1d ago
JKCalhoun•1d ago
Aeolun•1d ago
amanaplanacanal•1d ago
djrj477dhsnv•1d ago
Aeolun•22h ago
baobun•21h ago
girvo•1d ago
I don't disagree, but that ship sailed at least 15+ years ago. Soft delete is the name of the game basically everywhere...
eurekin•1d ago
simonw•1d ago
dijksterhuis•1d ago
https://ico.org.uk/for-the-public/your-right-to-get-your-dat...
eurekin•22h ago
sahila•1d ago
Gigachad•1d ago
liamYC•1d ago
aiiane•1d ago
jandrewrogers•1d ago
Databases with data from multiple users largely can’t work this way unless you are comfortable with a several order of magnitude loss of performance. It has been built many times but performance is so poor that it is deemed unusable.
alisonatwork•1d ago
It is true that protecting the user's privacy costs more than not protecting it, but some organizations feel a moral obligation or have a legal duty to do so. And some users value their own privacy enough that they are willing to deal with the decreased convenience.
As an engineer, I find it neat that figuring out how to delete data is often a more complicated problem than figuring out how to create it. I welcome government regulations that encourage more research and development in this area, since from my perspective that aligns actually-interesting technical work with the public good.
jandrewrogers•1d ago
Unfortunately, this is a deeply hard problem in theory. It is not as though it has not been thoroughly studied in computer science. When GDPR first came out I was actually doing core research on “delete-optimized” databases. It is a problem in other domains. Regulations don’t have the power to dictate mathematics.
I know of several examples in multiple countries where data deletion laws are flatly ignored by the government because it is literally impossible to comply even though they want to. Often this data supports a critical public good, so simply not collecting it would have adverse consequences to their citizens.
tl;dr: delete-optimized architectures are so profoundly pathological to query performance, and a lesser extent insert performance, that no one can use them for most practical applications. It is fundamental to the computer science of the problem. Denial of this reality leads to issues like the above where non-compliance is required because the law didn’t concern itself with the physics of computation.
If the database is too slow to load the data then it doesn’t matter how fast your deterministic hard deletion is because there is no data to delete in the system.
Any improvements in the situation are solving minor problems in narrow cases. The core theory problems are what they are. No amount of wishful thinking will change this situation.
alisonatwork•1d ago
Every database I have come across in my career has a delete function. Often it is slow. In many places I worked, deleting or expiring data cost almost as much as or sometimes more than inserting it... but we still expired the data because that's a fundamental requirement of the system. So everything costs 2x, so what? The interesting thing is how to make it cost less than 2x.
Gigachad•1d ago
Seems that companies are capable of moving mountains when the task is tracking the user and bypassing privacy protections. But when the task is deleting the users data it’s “literally impossible”
blagie•1d ago
It's easy enough to have a SQL query to delete a users' data from the production database for real.
It's all the other places the data goes that's a mess, and a robust system of deletion via encryption could work fine in most of those places, at least in the abstract with the proper tooling.
catlifeonmars•21h ago
You can instead switch to a password-based key derivation function for the row encryption key if you want the row to be encrypted by a user provided password
jandrewrogers•19h ago
The issue is that, at a minimum, you have added 32 bytes to a row just for the key. That is extremely expensive and in many cases will be a large percentage of the entire row; many years ago PostgreSQL went to heroic efforts to reduce 2 bytes per row for performance reasons. It also limits you to row storage, which means query performance will be poor.
That aside, you overlooked the fact that you'll have to compute a key schedule for each row. None of the setup costs of the encryption can be amortized, which makes processing a row extremely expensive computationally.
There is no obvious solution that actually works. This has been studied and implemented extensively. The reason no one does it isn't because no one has thought of it before.
catlifeonmars•19h ago
Obviously context matters and there are some applications where the cost does not outweigh the benefit
infinite8s•13h ago
catlifeonmars•9h ago
But I think there must also be constraints other than scale. The profit margins must also be razor thin.
alisonatwork•1d ago
sahila•1d ago
alisonatwork•22h ago
The point is that none of these problems are insurmountable - they are all processes and practices that have been in place since long before GDPR and long before I started in this industry 25+ years ago. Even if deletion is only eventually consistent, even if a few pieces of data slip through the cracks, it is not hard to have policies in place that at least provide a best effort at upholding users' privacy and complying with the regulations.
Organizations who choose not to bother, claiming that it's all too difficult, or that because deletion cannot be done 100% perfectly it should not even be attempted at all, are making weak excuses. The cynical take would be that they are just covering for the fact that they really do not respect their users' privacy and simply do not want to give up even the slightest chance of extracting value from that data they illegally and immorally choose to retain.
crdrost•1d ago
Backup retention policy 60 days, respond within a week or two telling someone that you have purged their data from the main database but that these backups exist and cannot be changed, but that they will be automatically deleted in 60 days.
The only real difficulty is if those backups are actually restored, then the user deletion needs to be replayed, which is something that would be easy to forget.
Trasmatta•1d ago
gruez•1d ago
miki123211•1d ago
With how modern systems, languages, databases and file systems are designed, deletion often means "mark this as deleted" or "erase the location of this data". This is true on all possible levels of the stack, from hardware to high-level application frameworks.
Changing this would slow computers down massively. Just to give a few examples, backups would be prohibited, so would be garbage collection and all existing SSD drives. File systems would have to wipe data on unlink(), which would increase drive wear and turn operations which everybody assumed were O(1) for years into O(n), and existing software isn't prepared for that. Same with zeroing out memory pages, OSes would have to be redesigned to do it all at once when a process terminates, and we just don't know what the performance impact of that would be.
Gigachad•1d ago
jandrewrogers•1d ago
There have been several attempts to build e.g. databases that worked this way. The performance and scalability was so poor compared to normal databases that they were essentially unusable.
girvo•1d ago
But that's not the only solve. It's easy to change the words we use instead to make it clear to users that the data isn't irrevocably deleted.
aranelsurion•1d ago
Maybe not today on its heyday, but who knows what happens in 20 years once OpenAI becomes Yahoo of AI, or loses much of its value, gets scrapped for parts and bought by less sophisticated owners.
It's better to regard that data as already public.
jandrewrogers•1d ago
There have been many attempts to build e.g. databases that support deterministic hard deletes. Unfortunately, that feature is sufficiently ruinous to efficient software architecture that performance is extremely poor such that no one uses them.
tarellel•1d ago
causal•1d ago
3abiton•1d ago
ColinWright•20h ago
The original submission was a link to a post on Mastodon. The post itself was too long to fit in the title, so I trimmed it, and put the full post here in a comment.
But with the URL in the submission being changed, this doesn't really make sense any more! In the future I'll make sure I include in the comment the original link with the original text so it makes sense even if (when?) the submission URL gets changed.