Fully homomorphic encryption and the dawn of a private internet

449•barisozmen•7mo ago

Comments

dcow•7mo ago

Assuming speed gets solved as predicted, for an application like search, the provider would have to sync a new database of “vectors” to all clients every time the index updates. On top of that, these DBs are tens if not hundreds of GB huge.

blintz•7mo ago

I say this as a lover of FHE and the wonderful cryptography around it:

While it’s true that FHE schemes continue to get faster, they don’t really have hope of being comparable to plaintext speeds as long as they rely on bootstrapping. For deep, fundamental reasons, bootstrapping isn’t likely to ever be less than ~1000x overhead.

When folks realized they couldn’t speed up bootstrapping much more, they started talking about hardware acceleration, but it’s a tough sell at time when every last drop of compute is going into LLMs. What $/token cost increase would folks pay for computation under FHE? Unless it’s >1000x, it’s really pretty grim.

For anything like private LLM inference, confidential computing approaches are really the only feasible option. I don’t like trusting hardware, but it’s the best we’ve got!

ipnon•7mo ago

Don't you think there is a market for people who want services that have provable privacy even if it costs 1,000 times more? It's not as big a segment as Dropbox but I imagine it's there.

poly2it•7mo ago

???

For the equivalent of $500 in credit you could self host the entire thing!

haiku2077•7mo ago

You're not joking. If you're like most people and have only a few TiB of data in total, self hosting on a NAS or spare PC is very viable. There are even products for non-technical people to set this up (e.g. software bundled with a NAS). The main barrier is having an ISP with a sufficient level of service.

kube-system•7mo ago

Sure, hardware is cheap.

However if you actually follow the 3-2-1 rule with your backups, then you need to include a piece of real estate in your calculation as well, which ain’t cheap.

bcraven•7mo ago

I keep a small backup drive at my office which I bring home each month to copy my most sensitive documents and photos onto.

All my ripped media could be ripped again: I only actually have a couple of Tb of un-lose-able data.

adastra22•7mo ago

FHE is so much more expensive that it would still be cheaper.

palata•7mo ago

If you self-host your NAS, then your server has access to the data in clear to do fancy stuff, and you can make encrypted backups to any cloud you like, right?

haiku2077•7mo ago

Some people I know make a deal with a friend or relative to do cross backups to each others' homes. I use AWS Glacier as my archival backup, costs like 3 bucks a month for my data; you could make a copy onto two clouds if you like. There are tools to encrypt the backups transparently, like the rclone crypt backend.

dismalpedigree•7mo ago

I have true 3-2-1 backups on a server running proxmox with 32 cores, 96gb of ram, and 5TB of ssd disks (2TB usable for VMs). Cost me $1500 for the new server hardware 2 years ago. Runs in my basement and uses ~30w of power on average (roughly $2.50/mo). The only cloud part is the encrypted backups at backblaze which cost about $15/mo.

Its a huge savings over a cloud instance of comparable performance. The closest match on AWS is ~$1050/mo and I still have to back it up.

The only outage in 2 years was last week when there was a hardware failure of the primary ssd. I was back up and running within a few hours and had to leverage the full 3-2-1 backup depth, so I am confident it works.

If i was really desperate i could have deployed on a cloud machine temporarily while i got the hardware back online.

johnisgood•7mo ago

Only $1500? How much would this setup cost today?

haiku2077•7mo ago

Each 2TB of SSD is like $85, double it if you want local redundancy in your software RAID.

The rest is basically a nice custom PC minus a cool case and a high end GPU. A 9950X is $500, a 2x48GB kit is maybe $200. A few hundo more for a mobo, PSU and basic case.

dismalpedigree•7mo ago

Some quick checking on Newegg and I came up with this. https://newegg.io/64113a4 About $1200. I didn’t look into the power draw for this setup. Added bonus there is space for a GPU if you want to do some AI stuff.

dinosaurdynasty•7mo ago

You don't need homomorphic encryption for a backup, normal encryption suffices.

hoppp•7mo ago

But if you have a lot of data, self hosting is still cheaper.

Its always gonna be cheaper because you don't have the cloud provider's profit margin, which can be quite high.

ralferoo•7mo ago

It can be quite high, but it doesn't have to be. For instance, I have a 7TB storage server from Hosthatch that's $190 for 2 years. That's $7.92 per month, or £5.88 at today's exchange rates. That's under 20p per day.

Just on electricity costs alone, this is good value. My electricity costs are 22.86p/kWh which is pretty cheap for the UK. That means that if having that drive plugged in and available 24/7 uses more than 37W, it's more expensive to self host at home than rent the space via a server. Also, I've not needed to buy the drive or a NAS, nor do I have to worry about replacing hardware if it fails.

crtasm•7mo ago

Do they offer deals like that often? List price is "from $24/month" for 6TB (no further details provided without registering an account).

ralferoo•7mo ago

They tend to do promotions, typically only valid for 24h and only advertised on certain forums like LET, a couple of times per year - typically at least around their company anniversary date or Black Friday.

There are others too, e.g. Servarica who keep their Black Friday offers running all year round.

BolexNOLA•7mo ago

> There are others too, e.g. Servarica who keep their Black Friday offers running all year round.

I don’t understand the logic here so I’m going to assume I’m being obtuse. Doesn’t that just mean that’s their standard price? Why or how would you ever pay more?

ralferoo•7mo ago

Yeah, I kind of agree in the latter case. Black Friday deals often have lower priority support etc.

I guess with Servarica, they have their standard deals, but for Black Friday deals are generally thin margins, but still enough to cover costs. Typically every year, they have special deals that are a bit different to their previous offerings. As a result, some people prefer the previous deals, some prefer the new ones, so they keep them all going. It's a bit unusual. They've also got a few interesting deals, like start with N TB and it grows a bit every day. If you keep these more than about 3-4 years, these are probably better value for money, but I think you're paying too much in the first few years. It's interesting if your primary use case is incremental backups.

Hosthatch's deals are a bit different as they're usually preorders and at almost cost with basically minimal support, whereas they keep their normal stuff in stock and have higher support levels.

I should also add that I've not personally used Servarica, even though they look interesting - just because they only have a Canadian datacenter. I have 4 Hosthatch servers spread all over the globe so that I have more redundancy in my backups. I only buy them when they have deals, assuming I don't miss them as they're only for 24h.

haiku2077•7mo ago

For very large amounts of data, the cloud provider can hit economies of scale using tape drives ($$$$ to buy a tape drive yourself) or enterprise-class hard drives (very loud + high price of entry if you want redundancy + higher failure rate than other storage). That's why storing data in the slower storage classes in S3 and other object stores is so cheap compared to buying and replacing drives.

drcolly•7mo ago

The statements made in the linked description of this cannot be true, such as Google not being able to read what you sent them and not being able to read what they responded with.

Having privacy is a reasonable goal, but VPNs and SSL/TLS provide enough for most, and at some point your also just making yourself a target for someone with the power to undo your privacy and watch you more closely- why else would you go through the trouble unless you were to be hiding something? It’s the same story with Tor, VPN services, etc.- those can be compromised at will. Not to say you shouldn’t use them if you need to have some level of security functionally, but no one with adequate experience believes in absolute security.

NoImmatureAdHom•7mo ago

> The statements made in the linked description of this cannot be true, such as Google not being able to read what you sent them and not being able to read what they responded with.

The beautiful thing is: they are :-)

throwaway478484•7mo ago

If Google’s services can respond to queries, they must be able to read them.

If A uses a cereal box cipher and B has a cereal box cipher, B can can make sense of encoded messages A sends them, A can ask about the weather, and B can reply with an encoded response that A can decode and read. B is able to read A’s decoded query, and B knew what the weather was, and responded to A with that information.

Security is not magic.

eynsham•7mo ago

What do you think fully homomorphic encryption is, then?

NoImmatureAdHom•7mo ago

The thing that you find magical is not only actually possible but implemented and in use! What a day for you! Enjoy it, this is a rare event :-D

NoMoreNicksLeft•7mo ago

This is pointless, but I'll try anyway.

Yes, they can read both. But it's just gobbledygook to them. If you send them a "nonsense" query, they can reply with a "nonsense" response which is actually carefully computed to be something you can make sense of. But they can't make sense of it other than that it should be a relevant to the query you sent them.

mahmoudimus•7mo ago

there is, it's called governments. however this technology is so slow that using it in mission critical systems (think communication / coordinates during warfare) that it is not feasible IMO.

the parent post is right, confidential compute is really what we've got.

landl0rd•7mo ago

For LLM inference, the market that will pay $20,000 for what is now $20 is tiny.

oakwhiz•7mo ago

For most this would mean only specially treating a subset of all the sensitive data they have.

bawolff•7mo ago

If we are talking 1000x more latency, that is a pretty hard sell.

Something that normally takes 30 seconds now takes over 8 hours.

hoppp•7mo ago

Its like, python can be 400 times slower than C++, but people still use it.

bawolff•7mo ago

Yeah, because people use python when it doesn't matter and c++ when it does (including implicitly by calling modules that are backed by c implementations).

That is not an option with FHE. You have to go all in.

hoppp•7mo ago

Yes but with FHE it also depends on the use-case and how valuable the output is and who is processing it and decrypting the final output.

There are plenty of viable schemes like proxy re-encryption, where you operate on a symmetric key and not on a large blob of encrypted data.

Or financial applications where you are operating on a small set of integers, the speed is not an issue and the output is valuable enough to make it worth it.

It only becomes a problem when operating FHE on a large encrypted dataset to extract encrypted information. The data extracted will need to offset the costs. As long as companies don't care about privacy, this use-case is non-existent so its not a problem that its slow.

For military operations on the other hand, it might be worth the wait to run a long running process

reactordev•7mo ago

And people will use FHE where it matters and plaintext where it doesn’t…

klabb3•7mo ago

For compute, which is a small part of things computers do. Many things are I/O and network bound.

I’m not at all a fan of Python, but perf is the least of my concerns with it.

pxc•7mo ago

If Python devs/users had to actually use all pure Python libraries, no C bindings or Rust bindings, no RPC to binaries written in faster languages, it would get dropped for a ton of use cases, absolutely including its most prominent ones (machine learning, bioinformatics, numeric analysis, etc.).

cestith•7mo ago

It would probably especially include those before most others. The best thing about Python IMO is the FFI and the ecosystem built around it.

moffkalast•7mo ago

Or more like, something that normally takes 50ms like a http request, would take a minute.

taeric•7mo ago

Honestly, no? Unless you get everyone using said services, then a market that is only viable to people trying to hide bad behavior becomes the place you look for people doing bad things?

This is a large part of why you have to convince people to hide things even if "they have nothing to hide."

PeterisP•7mo ago

FHE solves privacy-from-compute-provider and doesn't affect any other privacy risks of the services. The trivial way to get privacy from the compute provider is to run that compute yourself - we delegate compute to cloud services for various reasonable efficiency and convenience reasons, but a 1000-fold less efficient cloud service usually isn't competitive with just getting a local device that can do that.

txdv•7mo ago

I get that there is a big LLM hype, but is there really no other application for FHE? Like for example trading algorithms (not the high speed once) that you can host on random servers knowing your stuff will be safe or something similar?

seanhunter•7mo ago

I speak as someone who used to build trading algorithms (not the high speed ones) for a living for several years, so knows that world pretty well. I highly doubt anyone who does that will host their stuff on random servers even if you had something like FHE. Why? Because it's not just the code that is confidential.

1) if you are a registered broker dealer you will just incur a massive amount of additional regulatory burden if you want to host this stuff in any sort of "random server"

2) Whoever you are, you need the pipe from your server to the exchange to be trustworthy, so no-one can MITM your connection and front-run your (client's) orders.

3) This is an industry where when people host servers in something like an exchange data center it's reasonably common to put them in a locked cage to ensure physical security. No-one is going to host on a server that could be physically compromised. Remember that big money is at stake and data center staff typically aren't well paid (compared to someone working for an IB or hedge fund), so social engineering would be very effective if someone wanted to compromise your servers.

4)Even if you are able to overcome #1 and are very confident about #2 and #3, even for slow market participants you need to have predictable latency in your execution or you will be eaten for breakfast by the fast players[1]. You won't want to be on a random server controlled by anyone else in case they suddenly do something that affects your latency.

[1] For example, we used to have quite slow execution ability compared with HFTs and people who were co-located at exchanges, so we used to introduce delays when we routed orders to multiple exchanges so the orders would arrive at their destinations at precisely the same time. Even though our execution latency was high, this meant no-one who was colocated at the exchange could see the order at one exchange and arb us at another exchange.

darkwater•7mo ago

But shouldn't proper FHE address most of these concerns? I mean, most of those extra measures are exactly because if you can physically access the server, it's game over. With FHE, if the code is trusted, even tampering with the hardware should not compromise the software.

seanhunter•7mo ago

How does FHE help with someone executing a process on the server that affects the latency of your trading algo? eg by sucking up the CPU resources you need to do FHE.

How does FHE help with the fact that regulators generally want single-tenant shared-nothing for registered broker/dealers? Have you tried to explain a technical mitigation like FHE to a financial regulator? I have, there are 2 standard responses:

1) (in the US) "We strongly prefer single-tenant shared nothing. I won't officially say whether or not we deem your technical mitigation of using FHE to be sufficient. If we think it's insufficient we may take regulatory action against you in the future. Us not taking action doesn't mean we think it's sufficient."

2) (in places like Switzerland) "We strongly prefer single-tenant shared nothing. I'm not sure I fully understand the technical mitigation of FHE you are putting in place, but I'm going to increase your regulatory capital reserves. Send us some more white papers describing the solution and we may not increase your capital reserves further".

Singapore is the only exception where you have a regulator who is tech-savvy and will give you a clear answer as to whether something or not is OK.

konstantinua00•7mo ago

why would latency matter if the trading we're talking about isn't high-speed?

seanhunter•7mo ago

I give a concrete example in the GP post but the reason is that the high-speed people can take advantage of you in certain circumstances if you don’t have extremely accurate timing of things like order placement.

As another example, imagine you are placing an options order on one exchange and a cash hedge on another exchange (eg for a delta hedge). If someone sees one half of your order and has faster execution than you, they can trade ahead of you on the other leg of your trade, which increases your execution cost. This is even more important if you’re doing something like an index options trade on one side and the cash basket (all the stocks in the index) on the hedge side.

The fix for this is to use hi-res exchange timestamps (which the exchange gives you on executed trades) to tune a delay on one leg of your execution so both halves hit at precisely the same time. This ensures that HFTs can’t derive an information advantage from seeing one half of your trade before you place the other half of the order.

toolslive•7mo ago

I encountered the situation where one company had the data, and considered this to be really valuable and did not want to show/share it. Another company had a model, which was very considered very valuable and did not want to show it. So they were stuck in a catch22. Eventually they solved the perceived risk via contracts, but it could have been solved technically if FHE were viable.

deknos•7mo ago

From your perspective: which FHE is actually usable? Or is only PHE actually usable?

Tryk•7mo ago

Interesting! Can you provide some sources for this claim?

reliabilityguy•7mo ago

Even without bootstrapping FHE will never be as fast as plaintext computation: the ciphertext is about three orders of magnitude much larger than the plaintext data it encrypts, which means you have to have more memory bandwidth and more compute. You can’t bridge this gap.

paulgerhardt•7mo ago

That actually sounds pretty reasonable and feels almost standard at this point?

To pick one out of a dozen possible examples: I regularly read 500 word news articles from 8mb web pages with autoplaying videos, analytics beacons, and JS sludge.

That’s about 3 orders of magnitude for data and 4-5 orders of magnitude for compute.

TechDebtDevin•7mo ago

I dont remember the last time I saw a news page that was <50mb

SilasX•7mo ago

There’s still Druge Report.

https://www.drudgereport.com

kiicia•7mo ago

This is basically rss

reliabilityguy•7mo ago

Sure, but downloading a lot of data is not the same as compute on this data. With web you simply download the data, and pass the pointers to this data around. With FHE, you have to compute on extremely large cipher texts, using every byte of them. FHE is roughly 1000x more data to process and it takes about 1000x more time.

blintz•7mo ago

Technically, there are rate-1 homomorphic encryption schemes, where ‘rate’ refers to the size ratio between the plaintext and the ciphertext. They’re not super practical, so your general point stands.

reliabilityguy•7mo ago

Oh, interesting. Can you point to a paper about one?

blintz•7mo ago

This is a good one: https://eprint.iacr.org/2019/720.pdf

reliabilityguy•7mo ago

Thank you, I’ll give it a read.

asah•7mo ago

Thx! I'm curious about your thoughts...

- FHE for classic key-value stores and simple SQL database tables?

- the author's argument that FHE is experiencing accelerated Moore's law, and therefore will close 1000x gap quickly?

Thx!

mti•7mo ago

There is an even more fundamental reason why FHE cannot realistically be used for arbitrary computation: it is that some computations have much larger asymptomatic complexity on encrypted data compared to plaintext.

A critical example is database search: searching through a database on n elements is normally done in O(log n), but it becomes O(n) when the search key is encrypted. This means that fully homomorphic Google search is fundamentally impractical, although the same cannot be said of fully homomorphic DNN inference.

blintz•7mo ago

There has been a theoretical breakthrough that makes search a O(log n) problem, actually, (https://eprint.iacr.org/2022/1703) but it is pretty impractical (and not getting much faster).

mti•7mo ago

Good point. Note however that PIR is a rather restricted form of search (e.g., with no privacy for the server), but even so, DEPIR has polylog(n) queries (not log n), and requires superlinear preprocessing and a polynomial blowup in the size of the database. I think recent concrete estimates are around a petabyte of storage for a database of 2^20 words. So as you say, pretty impractical.

tonetegeatinst•7mo ago

I'd also like to comment on how everything used to be a PCIE expansion card.

Your GPU was, and we also used to have dedicated math coprocessor accelerators. Now most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task.

Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.

I'd argue the only AI accelerators are something like what goes into modern SXM (sockets). This ditches the power issues and opens up more bandwidth. However only servers have the sxm sockets....and those are not cheap.

pxeger1•7mo ago

> most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task

I think one reason they can be as good as or better than dedicated silicon is that they can be adjusted on the fly. If a hardware bug is found in your network chip, too bad. If one is found in your software emulation of a network chip, you can update it easily. What if a new network protocol comes along?

Don't forget the design, verification, mask production, and other one-time costs of making a new type of chip are immense ($millions at least).

> Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.

I think you may have the wrong impression of what modern GPUs are like. They may be descended from graphics cards (as in graphics ), but today they are designed fully with the AI market in mind. And they are design to strike an optional balance between fixed functionality for super-efficient calculations that we believe AI will always need, and programmability to allow innovation in algorithms. Anything more fixed would be unviable immediately because AI would have moved on by the time it could hit the market (and anything less fixed would be too slow).

benlivengood•7mo ago

I think the only thing that could make FHE truly world-changing is if someone figures out how to implement something like multi-party garbled circuits under FHE where anyone can verify the output of functions over many hidden inputs since that opens up a realm of provably secure HSMs, voting schemes, etc.

bruce511•7mo ago

I get the "client side" of this equation; some number of users want to keep their actions/data private enough that they are willing to pay for it.

What I don't think they necessarily appreciate is how expensive that would be, and consequently how few people would sign up.

I'm not even assuming that the compute cost would be higher than currently. Let's leave aside the expected multiples in compute cost - although they won't help.

Assume, for example, a privacy-first Google replacement. What does that cost? (Google revenue is a good place to start that Calc.) Even if it was say $100 a year (hint; it's not) how many users would sign up for that? Some sure, but a long long way away from a noticeable percentage.

Once we start adding zeros to that number (to cover the additional compute cost) it gets even lower.

While imperfect, things like Tor provide most of the benefit, and cost nothing. As an alternative it's an option.

I'm not saying that HE is useless. I'm saying it'll need to be paid for, and the numbers that will pay to play will be tiny.

barisozmen•7mo ago

An FHE Google today would be incredible expensive and incredibly slow. No one would pay for it.

The key question I think is how much computing speed will improve in the future. If we assume FHE will take 1000x more time, but hardware also becomes 1000x faster, then the FHE performance will be similar to today's plaintext speed.

Predicting the future is impossible, but as software improves and hardware becoming faster and cheaper every year, and as FHE provides a unique value of privacy, it's plausible that at some point it can become the default (if not 10 years, maybe in 50 years).

Today's hardware is many orders of magnitudes faster compared to 50 years ago.

There are of course other issues too. Like ciphertext size being much larger than plaintext, and requirement of encrypting whole models or indexes per client on the server side.

FHE is not practical for most things yet, but its venn diagram of feasible applications will only grow. And I believe there will be a time in the future that its venn diagram covers search engines and LLMs.

demaga•7mo ago

> If we assume FHE will take 1000x more time, but hardware also becomes 1000x faster, then the FHE performance will be similar to today's plaintext speed

Yeah but this also means you can do 1000x more things on plaintext.

latentsea•7mo ago

But think of the children?

paulrudy•7mo ago

> FHE enables computation on encrypted data

This is fascinating. Could someone ELI5 how computation can work using encrypted data?

And does "computation" apply to ordinary internet transactions like when using a REST API, for example?

pluto_modadic•7mo ago

a simple example of partial homomorphic encryption (not full), would be if a system supports addition or multiplication. You know the public key, and the modulus, so you can respect the "wrap around" value, and do multiplication on an encrypted number.

other ones I imagine behave kinda like translating, stretching, or skewing a polynomial or a donut/torus, such that the point/intercepts are still solveable, still unknown to an observer, and actually represent the correct mathematical value of the operation.

just means you treat the []byte value with special rules

paulrudy•7mo ago

Thank you. So based on your examples it sounds like the "computation" term is quite literal. How would this apply at larger levels of complexity like interacting anonymously with a database or something like that?

strangecasts•7mo ago

There are FHE schemes which effectively allow putting together arbitrary logical circuits, so you can make larger algorithms FHE by turning them into FHE circuits -- Jeremy Kun's 2024 overview [1] has a good summary

[1] https://www.jeremykun.com/2024/05/04/fhe-overview/ - discussed previously: https://news.ycombinator.com/item?id=40262626

dachrillz•7mo ago

A very basic way of how it works: encryption is basically just a function e(m, k)=c. “m” is your plaintext and “c” is the encrypted data. We call it an encryption function if the output looks random to anyone that does not have the key

If we could find some kind of function “e” that preserves the underlying structure even when the data is encrypted you have the outline of a homomorphic system. E.g. if the following happens:

e(2,k)*e(m,k) = e(2m,k)

Here we multiplied our message with 2 even in its encrypted form. The important thing is that every computation must produce something that looks random, but once decrypted it should have preserved the actual computation that happened.

It’s been a while since I did crypto, so google might be your friend here; but there are situations when e.g RSA preserves multiplication, making it partially homomorphic.

littlecranky67•7mo ago

I get how that works for arithmetic operations - what about stuff like sorting, finding an element in a set etc? This would require knowledge of the cleartext data, wouldn't it?

barisozmen•7mo ago

You can reduce anything happening on the computer to arithmetic operations. If you can do additions and multiplications, then it's turing complete. All others can be constructed from them.

littlecranky67•7mo ago

While correct, that doesn't answer the question at all, though. If I have my address book submited into an FHE system and want to sort by name - how do you do that if the FHE system does not have access to cleartext names?

barisozmen•7mo ago

You can do that by encrypting the names. You send encrypted names to the FHE-server, and then the server does necessary sorting computations on it.

The point of FHE is it can operate on gibberish-looking ciphertext, and when this ciphertext decrypted afterwards, the result is correct.

Indeed, there are those working on faster FHE sorting: https://eprint.iacr.org/2021/551.pdf

gadders•7mo ago

Honestly it breaks my brain as well. I just have to take it on trust that it apparently works.

Tryk•7mo ago

When comparing two ciphertexts A,B a FHE sorting function will output a sorted pair of two new ciphertexts:

E.g. FHE_SORT(A,B) -> (X,Y)

where Dec(X)<Dec(Y)

But without decoding, there's no way of knowing whether X (or Y) comes from A or B.

Source: II. D of https://eprint.iacr.org/2015/995.pdf

dcow•7mo ago

It’s not that simple. The client has to send the server the comparison function.

To do anything practical the server usually needs to provide the client with gigabytes of per-client-key encrypted seed data.

j2kun•7mo ago

Comparisons can be implemented by approximating a < b with

    0.5 * (sign(a - b) + 1)

And the sign function can be approximated by a polynomial that uses only additions and multiplications and products with constants.

Other FHE schemes have support for small-bitwidth lookup tables that makes supporting comparison more direct.

xhrpost•7mo ago

This made me wonder if there is such a thing as homomorphic compression. A cursory search says yes but seems like limited information.

Tryk•7mo ago

What do you mean by homomorphic compression?

Given that the operations you can execute on the ciphertext are Turing complete (it suffices to show that we can do addition and multiplication) then it follows that any conceivable computation can be performed on the ciphertext.

xhrpost•7mo ago

Oh this is outside the context of encryption. My curiosity was, is there such a compression function that permits operations on the compressed data without first decompressing it?

dachrillz•7mo ago

One that is kind of in this spirit is that you can describe sparse matrices by omitting all the zeros and only describe the indices that have data. In this compression you can still perform normal matrix operations without having to unpack them into the “normal form”. Now this is neither encryption nor a particularly interesting compression, but it does prove that it is possible in principle ;p

JohnFen•7mo ago

> If we could find some kind of function “e” that preserves the underlying structure even when the data is encrypted

But isn't such a function a weakened form of encryption? Properly encrypted data should be indistinguishable from noise. "Preserving underlying structure" seems to me to be in opposition to the goal of encryption.

paulrudy•7mo ago

Thank you, this really clarified things for me!

harvie•7mo ago

Ok, lets stop being delusional here. I'll tell you how this will actualy work:

Imagine your device sending Google an encrypted query and getting back the exact results it wanted — without you having any way of knowing what that query was or what result they returned. The technique to do that is called Fully Homomorphic Encryption (FHE).

pluto_modadic•7mo ago

queries are Oblivious Transfer - a second limited case of FHE that actually addresses the filter threat model.

teo_zero•7mo ago

I think the opening example involving Google is misleading. When I hear "Google" I think "search the web".

The articles is about getting an input encrypted with key k, processing it without decrypting it, and sending back an output that is encrypted with key k, too. Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query (which could be encrypted with key k) and a multi-terabyte database of pre-digested information that's Google's whole selling point, and there's no way this database could be encrypted with key k.

In other words this technique can be used when you have the complete control of all the inputs, and are renting the compute power from a remote host.

Not saying it's not interesting, but the reference to Google can be misunderstood.

ElFitz•7mo ago

> Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query […] and a multi-terabyte database […]

That’s not the understanding I got from Apple’s CallerID example[0][1]. They don’t seem to be making an encrypted copy of their entire database for each user.

[0]: https://machinelearning.apple.com/research/homomorphic-encry...

[1]: https://machinelearning.apple.com/research/wally-search

yorwba•7mo ago

They do not explicitly state this fact, but they link to the homomorphic encryption scheme they're using, which works like this. To perform an operation between a plaintext value and an encrypted value, you first encrypt the plaintext with the public key and then you can do your operation on the encrypted values to get the encrypted output.

Moreover, even if the details were slightly different, a scheme that reveals absolutely no information about the query while interacting with a database always needs to do a full scan. If some parts remain unread depending on the query, this tells you what the query wasn't. If you're okay with revealing some information, you can also hash the query and take a short prefix of the hash with many colliders, then only scan values with the same hash prefix. This is how browsers typically do safe browsing lookups, but by downloading that subset of the database instead of doing the comparison homomorphically on the server.

ElFitz•7mo ago

Is that what they mean in the Wally paper post by

> In previous private search systems, for each client query, the server must perform at least one expensive cryptographic operation per database entry.

yorwba•7mo ago

Exactly. (I had only looked at the homomorphic encryption post, not the Wally post.) Wally tries to work around this limitation by only using homomorphic encryption for a subset of the database, and reducing the resulting information leakage by using an anonymous network to hide which client is querying which subset. They say this network is operated by a third party, but ultimately you still have to trust that the network operator isn't colluding with the server operator to deanonymize your queries. That's a weaker privacy guarantee, but at least it's not painfully slow.

meindnoch•7mo ago

Homomorphically encrypted services don't need a priori knowledge of the encryption key. That's literally the whole point.

Consider the following (very weak) encryption scheme:

m, k ∈ Z[p], E(m) = m * k mod p, D(c) = c * k⁻¹ mod p

With this, I can implement a service that receives two cyphertexts and computes their encrypted sum, without knowledge of the key k:

E(x) + E(y) = x * k + y * k mod p = (x + y) * k mod p = E(x + y)

Of course, such a service is not too interesting, but if you could devise an algebraic structure that supported sufficiently complex operations on cyphertexts (and with a stronger encryption), then by composing these operations one could implement arbitrarily complex computations.

yorwba•7mo ago

In the case of searching Google, E(x) is the encrypted query and y is Google's database. Can you compute E(x + y) without doing at least as much work as computing E(y)? I don't think so. Instead, you use public key cryptography so that the server can compute E(y) (yes, encrypting the entire database) without being able to decrypt D(E(x)) = x.

meindnoch•7mo ago

Wrong. In the case of searching, the database is the function that you feed the input query into.

E.g. consider the following system:

E(x) = x ^ k, D(x) = x ^ k

So a one-time pad. Let's say that I provide a service that lets you decide whether a number is even or odd:

IsOdd(E(x)) = E(x) mod 2

You give it an encrypted number, and it gives you back an encrypted bit that you can decrypt to see if the original number was even or not. All while it has zero knowledge of your plaintext number.

A homomorphically encrypted database is just like IsOdd(x), except orders of magnitude more complex. One idea is that any computation can be turned into a Boolean circuit, so if you have homomorphic building blocks for Boolean circuits, you can implement any computation. Obviously, some caveats apply, like all loops have to be unrolled, etc. That's why the whole thing is so inefficient. But mathematically it works.

yorwba•7mo ago

If I'm generous, that "database" stores a single number. If you transform a more realistic database (one that can store many arbitrary values) into a Boolean circuit, you'll generally end up with at least one operation per stored value. And when you evaluate the circuit on encrypted data, you have to evaluate all those operations every time. That's why I wrote "without doing at least as much work as computing E(y)" instead of just "without computing E(y)." Yes, you do not necessarily explicitly encrypt all stored values. But you'll end up performing a corresponding amount of computation anyways.

bcrl•7mo ago

It's an excellent way of upholding Wirth's Law: software will continue to get slower more rapidly than hardware becomes faster.

teo_zero•7mo ago

> IsOdd(E(x)) = E(x) mod 2

I don't get this. Did you mean

IsOdd(E(x)) = x mod 2

But who provides such function? In the case of a web search,the function is the search engine. I expect Google to provide it, not the user: the refinement and completeness level of its engine is the only reason to choose Google over its competitor. If I have to provide the function, then I'm only buying the compute power from them.

meindnoch•7mo ago

>I don't get this. Did you mean

>IsOdd(E(x)) = x mod 2

No, that's literally the point. IsOdd() operates on the cyphertext E(x). It doesn't see the plaintext x. And yet, due to its algebraic properties, the server can answer the query without decrypting it.

For example:

  Client:
    x = 13
    k = 45
    E(x) = 13 xor 45 = 32

  Server:
    IsOdd(E(x)) = 32 mod 2 = 0

  Client:
    D(IsOdd(E(x))) = D(0) = 0 xor TrimLength(45, 1) = 1 (we trim the decryption key to the length of the response, which is 1bit)

So what happens in this case is the server sends you a bit, which you'll either flip or not flip, depending on the key k. The server doesn't know whether you're going to flip its answer or not, so it doesn't know if your plaintext number is odd or even.

teo_zero•7mo ago

Now I see it. Thank you.

What I still don't understand is how this can be used to have the remote host give me responses that I couldn't have calculated myself locally, because all the data is stored in the cloud, not locally (that's how Google search works).

meindnoch•7mo ago

Well, consider a function that is vastly more complex than IsOdd() from my previous example. IsOdd() was a function from Z[2]^n to Z[2]. Instead, imagine a function from Z[2]^n to Z[2]^m, aka a general N-to-M bit Boolean function. Any such Boolean function can be represented as a logic circuit, that is, AND/OR/NOT gates and connections between them. Now, if we can implement these logic gates (and the connections between them) homomorphically, then we can implement a homomorphic realization of arbitrary functions. One "small" caveat when transforming programs into logic circuits is that all data must be inlined, all loops must be unrolled, all branches must be executed.

charles_f•7mo ago

I don't know, when I hear Google I hear Gmail, Google docs, and every other service they have to know about people. My mom would probably think mostly about search, but then she would not read an article about HME

aitchnyu•7mo ago

E2EE git was invented. I asked the creator if server can enforce protected branches or force pushes. He has no solution for evil clients. Maybe this could lead to E2EE Github?

https://news.ycombinator.com/item?id=44530927

athrowaway3z•7mo ago

> Internet's "Spy by default" can become "Privacy by default".

I've been building and promoting digital signatures for years. Its bad for people and market-dynamics to have Hacker News or Facebook be the grand arbiter of everyone's identity in a community.

Yet here we are because its just that much simpler to build and use it this way, which gets them more users and money which snowballs until alternatives dont matter.

In the same vein, the idea that FHE is a missing piece many people want is wrong. Everything is still almost all run on trust, and that works well enough that very few use cases want the complexity cost - regardless of operation overhead - to consider FHE.

JumpCrisscross•7mo ago

> that works well enough that very few use cases want the complexity cost

FHE + AI might be the killer combination, the latter sharing the complexity burden.

immibis•7mo ago

Is there any reason to think this is a meaningful combination, or do you just like saying the word AI?

JumpCrisscross•7mo ago

Potentially the only thing AI is good at is trudging through tedium. The barrier OP identified for FHE is tedium.

Searching Google query by query as one prosecutes a question with FHE would be annoying. Asking an on-device LLM to go back and forth with Google using FHE is not. I'm also assuming that FHE won't cover all operations, and that its coverage would be both constantly changing and well documented, which is another place where an LLM could abstract away smoothly failing back from FHE to open querying.

Put another way, AIs' text-first prompt-oriented UI seems to be a good fit for FHE in a way that e.g. a dashboard is not.

immibis•7mo ago

The best tool for trudging through tedium is a for loop.

lcnPylGDnU4H9OF•7mo ago

What do you do in the for loop?

zahlman•7mo ago

One iteration of the computation, presumably.

bigfishrunning•7mo ago

> I've been building and promoting digital signatures for years.

I agree with this wholeheartedly, and yet I do get the following question a lot "What's all that nonsense at the end of your emails". Any explanation is met with eye-rolls and 1000 yard stares. Have you managed to get laypeople on-board with any kind of client-side cryptography? how?

derangedHorse•7mo ago

Everything needs to be built into the application in a way that users don't notice if they don't care. A signature at the end of your emails is more cryptic than the recipient's email client showing an icon with a green checkmark or something similar. I take Chrome's rollout of tls-enforcement by default as a great example of this. All I would had to say to people was "check that the padlock next to your url bar is green."

gblargg•7mo ago

The idea that these will keep being improved on in speed reminds me of the math problem about average speed:

> An old car needs to go up and down a hill. In the first mile–the ascent–the car can only average 15 miles per hour (mph). The car then goes 1 mile down the hill. How fast must the car go down the hill in order to average 30 mph for the entire 2 mile trip?

Past improvement is no indicator of future possibility, given that each improvement was not re-application of the same solution as before. These are algorithms, not simple physical processes shrinking.

perching_aix•7mo ago

41 mph, assuming the person asking the question was just really passionate about rounding numbers and/or had just the bare minimum viable measurement tooling available :)))

swores•7mo ago

I'm afraid your maths doesn't add up, so you've missed their point: it can't be done.

To average 30mph over 2 miles, you need to complete those 2 miles in 4 minutes.

But travelling the first mile at 15mph means that took 4 minutes. So from that point the only way to do a second mile and bring your average to 30mph is to teleport it in 0 seconds.

(Doing the second mile at 41mph would give you an average speed of just under 22mph for the two miles.)

perching_aix•7mo ago

Of course.

My math only "checks out" if you accept and account for the additional assumption I made there: that the datapoints provided in the question have been rounded or were low resolution from the get-go.

The motivation behind this assumption is twofold: the numbers in the question are awfully whole (atypical for any practical problem), and that just the rote derivation of it all doesn't produce very interesting results (gives you the infinite speed answer). :)

Try introducing some error terms and see how the result changes! It's pretty fun, and it's how I was able to eek out that 41 mph result in the end.

swores•7mo ago

Considering the first mile would need to have been faster than 23mph for your 41mph to give an average of 30... your answer is either completely wrong, or is "if we pretend that the numbers are completely different to what they are then my answer is right", either way it just seems pointlessly wrong rather than pretty fun. But I guess good for you if you enjoyed working out that answer.

perching_aix•7mo ago

I can appreciate if someone doesn't (or if most people don't, even) see the fun in this, sure.

> Considering the first mile would need to have been faster than 23mph

That said, note that it's not just the up-leg's average speed that's been provided (15 mph), but also the distance as you say (1 mile). If you explore the error term for both of these, you'll see that it's not necessary to go at the ludicrously high speed of 23 mph after all. """15 mph""" (15.444... mph) will do just fine.

swores•7mo ago

If you go as far as assuming the mile distance is wrong then the entire question is pointless, maybe they've already travelled more than two miles and they need to go backwards to average 30mph! At that point literally any number is just as correct as 41 is...

perching_aix•7mo ago

It does allow for making any number between 40.833546516585657030783661635542 (exact) and ∞ work, 41 just being the lowest whole number that works. Travelling backwards, being arbitrarily wrong, or any arbitrary number working doesn't fit these new constraints still, however.

ralferoo•7mo ago

Is the downhill section a cliff? Google informs me terminal velocity of a car is 200-300mph, so to fall a mile at 300mph, the car will need 12 seconds, so let's round up to 15 seconds to account for the time it's accelerating.

To cover the full 2 miles at an average of 30mph, we need to complete the entire journey in 4 minutes, leaving 225 seconds for the ascent.

We know that the old car was averaging 15 miles per hour, but the speedo on an old car is likely inaccurate, and we only need to assume a 6% margin of error for the car to show 15 miles per hour and cover the mile in 225 seconds. You probably couldn't even tell the difference between 15 and 16 on the speed anyway, but let's say that we also fitted out the car with brand new tyres (so the outer circumference will be more than old worn tyres), and it's entirely possible.

So, let's say 240mph. That's the average speed of our mile freefall in 15 seconds.

DeathArrow•7mo ago

Most states will probably either forbid this or demand back doors.

latentsea•7mo ago

FHE stands for Federally Hacked Encryption

IshKebab•7mo ago

I think this should talk about the kinds of applications you can actually do with FHE because you definitely can't implement most applications (not at a realistic scale anyway).

j2kun•7mo ago

You might enjoy https://jeremykun.com/fhe-in-production

IshKebab•7mo ago

I meant it should say what can't be implemented. What are the constraints? Currently the article makes it sound like you can do anything, just slower, which definitely isn't the case.

utf_8x•7mo ago

As someone who knows basically nothing about cryptography - wouldn't training an LLM to work on encrypted data also make that LLM extremely good at breaking that encryption?

I assume that doesn't happen? Can someone ELI5 please?

mynameismon•7mo ago

From my understanding of cryptography, most schemes are created with the assumption that _any_ function that does not have access to the secret key will have a probabilistically small chance of decoding the correct message (O(exp(-key_length)) usually). As LLMs are also a function, it is extremely unlikely for cryptographic protocols to be broken _unless_ LLMs can allow for new types of attacks all together.

4gotunameagain•7mo ago

Because math. The data that would be necessary to train an LLM to break (properly) encrypted information would be indistinguishable from random bytes.

How do you train a model when the input has no apparent correlation to the output ?

strangecasts•7mo ago

Good encryption schemes are designed so that ciphertexts are effectively indistinguishable from random data -- you should not be able to see any pattern in the encrypted text without knowledge of the key and the algorithm.

If your encryption scheme satisfies this, there are no patterns for the LLM to learn: if you only know the ciphertext but not the key, every continuation of the plaintext should be equally likely, so trying to learn the encryption scheme from examples is effectively trying to predict the next lottery numbers.

This is why FHE for ML schemes [1] don't try to make ML models work directly on encrypted data, but rather try to package ML models so they can run inside an FHE context.

[1] It's not for language models, but I like Microsoft's CryptoNets - https://www.microsoft.com/en-us/research/wp-content/uploads/... - as a more straightforward example of how FHE for ML looks in practice

reliabilityguy•7mo ago

I am confused: you can implement LLM learning with FHE. It’s a different problem than learning on encrypted data.

strangecasts•7mo ago

I didn't mean to suggest otherwise! That's why I also linked the CryptoNets paper - to show that you're transforming the inference to happen inside an FHE context, not trying to learn encrypted data

reliabilityguy•7mo ago

Yes, you can do Cryptonets. What I’m saying is that you don’t have to do cryptonets, you can simply use FHE to train the network in fully encrypted manner: both the network and the data are FHE-encrypted, so the training itself is an FHE application. It would be insanely slow and I doubt it can be done today even for “small” LLMs due to high overheads of FHE.

derangedHorse•7mo ago

> This is why FHE for ML schemes [1] don't try to make ML models work directly on encrypted data, but rather try to package ML models so they can run inside an FHE context.

I don't think @strangecasts was trying to say you couldn't. I believe their point was that you can't have a model learn to coherently respond to encrypted inputs with just traditional learning mechanisms (so without FHE). Doing so would require an implicit breaking of the encryption scheme by the model because it would need a semantic understanding of the plaintext to provide a cogent, correctly encrypted response.

strangecasts•7mo ago

That's a much better way of phrasing what I was trying to get at :)

zkmon•7mo ago

What baffles me is, how can code perform computations and comparisons on data that is still encrypted in memory.

VMG•7mo ago

> 3. Data while processing is un-encrypted, as code need to 'see' the data

read the article again

baby•7mo ago

code in FHE doesn't need to see the data

tsimionescu•7mo ago

It's simple conceptually: you find an encryption method Enc that guarantees `Sum(Enc(x), Enc(y)) = Enc(Sum(x, y))`. That's ultimately all there is to it. Then, you give the server enc_x and enc_y, the server computes the sum, and returns to you enc_sum. You then decrypt the value you got and that's x+y.

Since lots of functions behave in this way in relation to sums and products, you "just" need to find ones that are hard to reverse so they can be used for encryption as well.

Unfortunately this turns out to not work so simply. In reality, they needed to find different functions FHESum and FHEMultiply, that are actually much harder to compute (1000x more CPU than the equivalent "plaintext" function is a low estimate of the overhead) but that guarantee the above.

orwin•7mo ago

I interrupted this fascinating read to tell that "actually", quantum computers are great at multi-dimensional calculation if you find the correct algorithms. It's probably the only thing they will ever be great at. You want to show that finding the algorithm is not possible with our current knowledge.

anyway, making the computer do the calculation is one thing, getting it to spew the correct data is another.... But still, the article (which seems great at the moment) brushes it of a bit too quickly.

redleader55•7mo ago

Full homomorphic encryption is not the future for private internet, confidential VMs are. CVMs are using memory encryption and separation from the host OS. ARM has TEE, AMD has SEV and Intel has been fumbling around with SGX and TDX for more than a decade.

udev4096•7mo ago

https://sgx.fail

Retr0id•7mo ago

I think SGX (et al) can still be useful as part of a layered defense. We know how to defeat security mitigations like NX and ASLR, but that doesn't mean they're useless.

The problem is that SGX is marketed as the solution.

immibis•7mo ago

NX and ASLR make it harder for other people to exploit your code on your computer. SGX tries to make it easier for other people to run code on your computer without you seeing the code or what it's doing. They're not in the same category.

Retr0id•7mo ago

SGX on consumer client devices is sucky for that reason, but SGX on the server can be used to defend user interests.

If I put my sensitive customer data inside SGX (such that I can operate on it but not extract it), and the nation-state adversary says "we have a warrant for your customer data, hand it over", I can reasonably say "I can't".

I could also produce attestations that my code really is running inside SGX, verifiable by clients (this is a weak proof since it assumes SGX is not compromised, but it's better than nothing).

The adversary may demand physical access to the server pwn SGX themselves, but like bypassing ASLR or NX, that's an extra step. They're only going to bother if they really care about that data.

immibis•7mo ago

SGX on the server is breakable if and only if SGX on the client is breakable. You can either own other people's computers, or you can prevent other people owning your computer. You can't eat your cake and have it.

Yes, it might be good for ass-covering as you indicate. A lot of ineffective technical solutions are effective legal liability shields anyway. But if this becomes mainstream, the NSA will develop something they can covertly (or not) install on any such server to break SGX, so make sure you have a backup plan anyway.

Also note that Intel removed SGX from their processors because it was breakable and underused.

Retr0id•7mo ago

They removed SGX but are still working on SGX-like technologies (I forget the acronyms) specifically for server-oriented processors.

I'm sure the NSA already has various tools to break SGX but they'll be protective of that investment, they're probably not going to be using them against lower-priority targets.

I used NX and ASLR as a point of comparison because they are mitigations that are routinely bypassed - but we still usually consider them a good idea.

glitchc•7mo ago

As long as the key and compute are custodied by the vendor, confidential compute is little more than "trust us, we'll keep your data safe."

Retr0id•7mo ago

> The entire business model built on harvesting user data could become obsolete.

This is far too optimistic. Just because you can build a system that doesn't harvest data, doesn't necessarily mean it's a profitable business model. I'm sure many of us here would be willing to pay for a FHE search engine, for example, but we're a minority.

meindnoch•7mo ago

Yeah, I can totally see companies rushing to implement homomorphically encrypted services that consume 1000000x more compute than necessary, are impossible to debug, and prevent them from analyzing usage data.

charcircuit•7mo ago

How do you send a password reset email with this. Eventually your mail server will need the plaintext address in order to send the email. And that point can be leaked in a data breach.

It's idealistic to think this could solve data braches because businesses knowing who their customers are is such a fundamental concept.

johnisgood•7mo ago

A password reset e-mail is supposed to expire pretty quickly though, so would it really matter in practice?

charcircuit•7mo ago

The email must be able to be used at any time which means that and attacker may be able to also "use" them.

j2kun•7mo ago

I don't think this is possible with FHE alone.

perlgeek•7mo ago

FHE might allow arbitrary computation, but I use most services because they have some data I want to use: their search index, their knowledge, their database of chemicals, my bank account transactions, whatever.

So unless Google lets me encrypt their entire search index, they can still see my query at the time it interacts with the index, or else they cannot fulfill it.

The other point is incentives: outside of some very few, high-trust high-stakes applications, I don't see why companies would go through the trouble and FHE services.

shikon7•7mo ago

From what I understand, only the sensitive data needs to be encrypted (e.g. your bank transactions). It is still possible to use public unencryped data in the computation, as the function you want to compute doesn't have to be encrypted.

jmcqk6•7mo ago

In a world where Target can figure out a women is pregnant before she knows herself due to her shopping habits, the line that separates sensitive data is pretty ambiguous.

CannotCarrot•7mo ago

Small correction: according to that story, it's before her father knows, not herself.

niclas-183•7mo ago

Exactly what I thought. In the end it really isn't in most of the big corps interest to not see your data/query. They need/want to see it so why would they degrade their ability to do so if they can just say no and you will have to rely on using their services without FHE. For banking applications cool, everyone else debatable if it will ever be accepted.

adamc•7mo ago

Would depend on market pressures, no?

thrance•7mo ago

Here's an implementation of a fully private search engine using FHE that allows querying Wikipedia with the server remaining oblivious as to what you're reading: https://spiralwiki.com/

teddyh•7mo ago

It’s ridiculously easier just to download a Wikipedia database dump and read it locally, with Kiwix: <https://kiwix.org/>

j2kun•7mo ago

You're right about incentives, but wrong about the first part. Private lookups of a plaintext database are possible and have been for a while now (5+ years?). The problem is it often requires some nontrivial preprocessing of the plaintext database, or in the worst case a linear scan of the entire database.

perlgeek•7mo ago

> Private lookups of a plaintext database are possible and have been for a while now (5+ years?). The problem is it often requires some nontrivial preprocessing of the plaintext database, or in the worst case a linear scan of the entire database.

So that basically means that if a company has data that my program might want to use, the entirety of that data needs to be loaded into my program. Not quite feasible for something like the Google search index, which (afaik) doesn't even fit onto a single machine.

Also, while Google is fine with us doing searches, making the whole search index available to a homomorphic encrypted program is probably a quite different beast.

dcow•7mo ago

You can process the data such that only a structured lookup table is shared with the client. That data structure is massive.

The use case isn't really “search Google without them knowing my query”, it’s search my own data without them knowing my data”. Which limits the practically applicable scope considerably.

j2kun•7mo ago

> the entirety of that data needs to be loaded into my program

What? No. I'm not saying the entire Google search index is feasible, but you can do a lot. Here are some concrete numbers from what is now considered an "old" paper (2022; it has been improved since then)

https://eprint.iacr.org/2022/949

To make queries to a 1 GB database [in a scheme called DoublePIR] the client must download a 16 MB "hint" about the database contents; thereafter, the client may make an unbounded number of queries, each requiring 345 KB of communication, and a throughput of 7.4 GB/s/core.

dcow•7mo ago

Which ultimately results in gigabytes of per-client-encrypted data needing to be downloaded, and regenerated and redownloaded every time the index is updated.

Barrin92•7mo ago

"The implications are big. The entire business model built on harvesting user data could become obsolete. Why send your plaintext when another service can compute on your ciphertext?"

Why do people always do this thing where they think inventing a technology has somehow changed economics? I think the implications are very small. There is value in people's user data and people are very eager to barter that value against cheaper services, we can tell because people continue to vote with their wallets and feet.

You could already encrypt or offer zero retention policies on large amounts of internet businesses and every major company has competitors that do, but they exist on the margins because most people don't take that deal.

sim7c00•7mo ago

all great until you realize no one is allowed to export things to other regions if it works too well (crypto). Then besides that, the companies who now litterally live off of your personal data (most of big tech), wont suddenly drop their main source of income on behalf of the privacy of their users which clearly, they care nothing about.

unless replacement services are offered and adopted en masse (they wont be, u cant market against companies who can throw billions at breaking you), those giants wont give away their main source of revenue...

so even if technical challenges are overcome, there are more human and political challenges which will likely be even harder to crack...

JohnFen•7mo ago

Here's what I don't understand about homomorphic encryption and so struggle to trust in the very concept. If you can process encrypted data and get useful results, then a major part of the purpose of encryption is defeated, right? How am I wrong?

ramchip•7mo ago

The result is encrypted. It's useful to the key holder, not to the party doing the computation.

JohnFen•7mo ago

Yes, I understand that part. The part I struggle with is how the very fact that a party without the key can do the computation on it is not an indication that the encryption is leaking information. If the encryption were airtight, then such computation shouldn't be possible.

Given that cryptography experts seem to be asserting otherwise, I assume that there's something important that I'm not understanding here.

prophesi•7mo ago

The tl;dr is that breaking FHE would mean solving lattice problems that have been studied for decades to be nontrivial to break[0].

[0] https://arxiv.org/abs/2208.08125

JohnFen•7mo ago

I'm not talking about the possibility of breaking FHE, though.

What I don't understand is this: if I get encrypted data from someone and, without breaking that encryption, I can perform computations on it that yield a sensible result (even if the result is also encrypted with a key I don't have), then how does that not mean the encryption has been weakened? If the encryption were strong, that should not be possible.

Actually breaking the encryption is a different thing, and I wasn't questioning that.

prophesi•7mo ago

The operations used to perform these computations are utilizing algorithms backed by lattice-problems that are NP-hard to break.

edit: The only leaking information in this case are what operations you'd like to perform on the encrypted data. e.g., you now know that you've incremented the password by x amount, but you don't know what the plaintext was before, the plaintext after, _or_ infer what the value is by knowing you've modified the encrypted data by x amount.

edit2: Now I think I understand the question. The encryption is technically weakened because you can know what operations are made on the underlying data. Though it's still an advancing field, and there are promising developments with what's called circuit privacy[0] to prevent the server knowing the operations made as well.

[0] https://eprint.iacr.org/2022/1459

j2kun•7mo ago

Your assumption that operations leak info is just not correct. RSA has homomorphic properties (you can multiply two RSA ciphertexts and get the encrypted product of the plaintext), just not enough to enable general purpose computation.

JohnFen•7mo ago

> Your assumption that operations leak info is just not correct.

My assumption is that you're right, that my assumption is incorrect. What I'm trying to do is understand why it's incorrect.

It's not just about operations leaking info, though, it's also an issue that, intuitively, leaving enough underlying structure in the encrypted form of the data to allow for this implies that the encrypted form is weaker. I'm also trying to understand how that intuition is wrong.

Edit: OK, The jeremykun link this comment provided gave me a little more clarity: https://news.ycombinator.com/item?id=44602472

I still don't adequately understand, but it does give me a little bit of a handle my brain can grab onto. As I understand it right now, HME is a weaker form of encryption, but perhaps still strong enough to be a worthwhile tradeoff for the use cases being discussed.

j2kun•7mo ago

All modern cryptography is based on problems with some mathematical structure. With enough structure it's true, it does weaken security, and many cryptosystems are broken by exploiting the underlying structure. So I suppose the only solid answer here is that attacks haven't been discovered yet despite many smart people trying.

JohnFen•7mo ago

Heh, I didn't realize that you were the Jeremy Kun that wrote the piece that gave me a bit of enlightenment (that I mentioned in the edit to my comment above). Thank you for writing that. It was helpful.

plopilop•7mo ago

> As I understand it right now, HME is a weaker form of encryption, but perhaps still strong enough to be a worthwhile tradeoff for the use cases being discussed.

Exactly. Homomorphism was first seen as a weakness in encryption, since it implies malleability. For instance, in the one-time pad encryption where you XOR your message with the secret key, flipping a bit in the ciphertext will result in same bit being flipped in the decryption. The attacker does not know what the end result is, but knows that the bit has been flipped, hence OTP encryption is malleable. This is enough for some attacks. With FHE encryption you have a bit of the same, from Enc(a) and Enc(b) it is easy to create Enc(a+b), hence is malleable too.

Cryptography uses several security levels. The top one for encryption is NM-CCA2 (non-malleability under chosen ciphertext attack). For instance, RSA-OAEP is NM-CCA2 secure. Since FHE schemes are malleable, they are not NM-CCA2 secure. However, a slightly lower security notion is IND-CPA (indistinguishability under chosen plaintext attack). FHE schemes are IND-CPA secure. Furthermore, IND-CPA security is shown to be equivalent to semantic security, which means that given a ciphertext the attacker cannot know any bit of information about the underlying cleartext.

Hence, FHE schemes guarantee that for all the ciphertexts they receive, the attacker cannot know anything about the underlying cleartexts. You can run a ton of operations on the ciphertexts, let's say run a homomorphic LLM, the attacker will still have no idea about what the final output is. Hence, in the model where you consider that the attacker has full control over the LLM, will behave honestly but will try to learn your secrets, you are fine. However, in the model where an attacker runs a MITM and just wants to disrupt the numbers you get back from the LLM, then you are not fine, since this encryption is malleable (in theory we could add some verifiable execution proofs but that is another topic).

As you say, everything is a tradeoff.

InfoSecErik•7mo ago

I think the disconnect is that you assume that being able to do useful computation on some data implies that it must be possible to derive some insight into what the data is (side-channels or the like).

It's a fair assumption to start with. But the folks building FHE basically claim "nuh-uh", and I haven't seen anything to indicate they're wrong. Maybe some new Math grad will sort it out.

Qision•7mo ago

If I understand correctly companies like OpenAI could run LLMs without having access to the users new inputs. It seems to me new users data are really useful for further training of the models. Can they still train the models over encrypted data? If this new data is not usable, why would the companies still want it?

Let's assume they can train the LLMs over encrypted data, what if a large number of users inject some crappy data (like it has been seen with the Tay chatbot story). How can the companies still keep a way to clean the data?

j2kun•7mo ago

> Can they still train the models over encrypted data?

Yes but then the model becomes encrypted.

IMO ML training is not a realistic application for FHE, but things like federated training would be the way to do that privately enough.

maerF0x0•7mo ago

FHE is an important tool because right now companies can be coerced by governments to break encryption for specific targets. FHE removes the need for companies to have a back bone, they can simply shrug and say "We literally do not see the plaintext, ever". They can kinda do this with End to End encryption when they're simply the network/carrier, but cannot currently do this anytime they're processing the plaintext data.

I come from a values basis that privacy is a human right, and governments should be extremely limited in retailiatory powers against a just and democratic usage of powers against them. (things like voting, arts, media, free speech etc)

adamc•7mo ago

Very cool, although I have some reservations about "... closest vector problem is believed to be NP-hard and even quantum-resistant". "Believed to be" is kind of different from "known to be".

j2kun•7mo ago

If it makes you feel better, no cryptographic assumptions we use today are known to be NP-hard. Or maybe that makes you feel worse, not sure. But it doesn't really matter because NP-hardness is a statement about worst case inputs and cryptography needs guarantees about average case inputs since keys are generated randomly.

cwmma•7mo ago

all modern encryption is currently held together by asymmetric encryption that are all based on "believed to be" foundations not "known to be" foundations

charles_f•7mo ago

> The only way to protect data is to keep it always encrypted on servers, without the servers having the ability to decrypt.

> If FHE is a possible option, people and institutions will demand it.

I don't think that privacy is a technical problem. To take the article's example, why would Google allow you to search without spying on you? Why would chatgpt discard your training data?

GPG has been around for decades. You can relatively easily add a plug-in to use it on top of gmail. Surely the protocol is not perfect, but could have been made better much more easily than it is to improve HPE, since a lot of its clunkiness can be corrected by UX. But people never cared enough that everything they write is read by Google to encrypt it. And since Google loves reading what you write, they'll never introduce something like HPE without overwhelming adoption and requirements by others.

liampulles•7mo ago

> Privacy awareness of users is increasing. Privacy regulations are increasing.

I beg your unbelievable pardon, but no? This part of the equation is not addressed in the article, but it is by far and away the biggest missing piece for there to be any hope of FHE seeing widespread adoption.

tpurves•7mo ago

It's a distraction to try and imagine homomorphic encryption for generic computing or internet needs. At least not for many more generations of moore's law and then even still.

However, where FHE will shine already is in specific high-value, high consequence and high confidentiality applies, but relatively low complexity computational calculations. Smart contracts, banking, potentially medical have lots of these usecases. And the curve of Moore's law + software optimizations are now starting to finally bend into the zone of practicality for some of these.

See what Zama https://www.zama.ai/ is doing, both on the hardware as well as the devtools for FHE.

vkaku•7mo ago

The tendency to increase complexity will be at odds with privacy and user control.

The problem is that the internet is a centralized system practically even though it is decentralized and some are fighting to keep it free.

Fight for decentralization instead, it will remove the need for unnecessary security and reduce the compute cost significantly.

dandraper•7mo ago

CipherStash founder here: FHE isn't the only option here. Specialized searchable encryption schemes exist and are much faster than FHE. Different flavours can be combined to create a comprehensive search system which is very close to the performance of plaintext information retrieval. FHE remains an option for generalized computation but can be reserved for small datasets that have been narrowed down using fast searchable encryption.

I don't think FHE is the solution to PIR but it might well form a part of it when combined with more practical approaches.

Leaving Google has actively improved my life

OpenAI raises $110B on $730B pre-money valuation

The Robotic Dexterity Deadlock

NASA announces overhaul of Artemis program amid safety concerns, delays

A better streams API is possible for JavaScript

Let's discuss sandbox isolation

Dan Simmons, author of Hyperion, has died

A Chinese official’s use of ChatGPT revealed an intimidation operation

Writing a Guide to SDF Fonts

A new California law says all operating systems need to have age verification

Allocating on the Stack

Kyber (YC W23) Is Hiring an Enterprise Account Executive

Show HN: Claude-File-Recovery, recover files from your ~/.claude sessions

Modeling cycles of grift with evolutionary game theory

"Just a little detail that wouldn't sell anything"

Building secure, scalable agent sandbox infrastructure

Court finds Fourth Amendment doesn’t support broad search of protesters’ devices

Open source calculator firmware DB48X forbids CA/CO use due to age verification

PCB Tracer

Get free Claude max 20x for open-source maintainers

Implementing a Z80 / ZX Spectrum emulator with Claude Code

Reading English from 1000 AD

Can you reverse engineer our neural network?

Tell HN: MitID, Denmark's digital ID, was down

Show HN: RetroTick – Run classic Windows EXEs in the browser

Rob Grant, creator of Red Dwarf, has died

We gave terabytes of CI logs to an LLM

Sprites on the Web

F-Droid Board of Directors nominations 2026

Statement from Dario Amodei on our discussions with the Department of War

Leaving Google has actively improved my life

OpenAI raises $110B on $730B pre-money valuation

The Robotic Dexterity Deadlock

NASA announces overhaul of Artemis program amid safety concerns, delays

A better streams API is possible for JavaScript

Let's discuss sandbox isolation

Dan Simmons, author of Hyperion, has died

A Chinese official’s use of ChatGPT revealed an intimidation operation

Writing a Guide to SDF Fonts

A new California law says all operating systems need to have age verification

Allocating on the Stack

Kyber (YC W23) Is Hiring an Enterprise Account Executive

Show HN: Claude-File-Recovery, recover files from your ~/.claude sessions

Modeling cycles of grift with evolutionary game theory

"Just a little detail that wouldn't sell anything"

Building secure, scalable agent sandbox infrastructure

Court finds Fourth Amendment doesn’t support broad search of protesters’ devices

Open source calculator firmware DB48X forbids CA/CO use due to age verification

PCB Tracer

Get free Claude max 20x for open-source maintainers

Implementing a Z80 / ZX Spectrum emulator with Claude Code

Reading English from 1000 AD

Can you reverse engineer our neural network?

Tell HN: MitID, Denmark's digital ID, was down

Show HN: RetroTick – Run classic Windows EXEs in the browser

Rob Grant, creator of Red Dwarf, has died

We gave terabytes of CI logs to an LLM

Sprites on the Web

F-Droid Board of Directors nominations 2026

Statement from Dario Amodei on our discussions with the Department of War

Fully homomorphic encryption and the dawn of a private internet

Comments