What? No, the server sends you the changes you've not seen yet, you decrypt and merge them, and so you get the latest version of the document. Right?
The homomorphic encryption is a fascinating topic, but it's almost never an answer if you need anything resembling reasonable performance and/or reasonable bandwidth.
I've seen a paper that ingeniuously uses homomorphic encryption to implement arbitrary algorithmic computations, totally secret, by encoding a (custom-crafted) CPU together with RAM and then running "tick a clock" algorithm on them. And it works, so you can borrow some AWS huge instance and run you super-important calculations there — at 1 Hz. I am not kidding, it's literally 1 virtual CPU instruction per second. Well, if you are okay with such speed and costs, you either have very small data — at which point just run your computation locally, or you're really, really rich — at which point just buy your own goddamn hardware and, again, run it locally.
If the friend is online then sending operations is possible, because they can be decrypted and merged.
This scheme doesn't require them two people to be on-line simultaneously — all updates are mediated via the sync server, after all. So, where am I wrong?
This could be done to reduce the time required for a client to catch up once it comes online (because it would need to replay all changes that have happened since it last connected to achieve the conflict free modification). But the article also mentions something about keeping the latest version quickly accessible.
One way to solve this is end-to-end encryption. You and your friend agree
on a secret key, known only to each other. You each use that key to encrypt
your changes before sending them, decrypt them upon receipt, and no one in
the middle is able to listen in. Because the document is a CRDT, you can
each still get the latest document without the sync server merging the
updates.
That is indeed a solution,
but then for some reason claims that this schemes requires both parties to be on-line simultaneously. No, it doesn't, unless this scheme is (tacitly) supposed to be directly peer-to-peer which I find unlikely: if it were P2P, there would be no need for "the sync server" in the first place, and the description clearly states that in this scheme it doesn't do anything with document updates except for relaying them.The way I see it there are a couple of ways this can shake out:
1. If you have a sync server that only relays the updates between peers, then you can of course have it work asynchronously — just store the encrypted updates and send them when a peer comes back online. The problem is that there's no way for the server to compress any of the updates; if a peer is offline for an extended period of time, they might need to download a ton of data.
2. If your sync server can merge updates, it can send compressed updates to each peer when it comes online. The downside, of course, is that the server can see everything.
Ink & Switch's Keyhive (which I link to at the end) proposes a method for each peer to independently agree on how updates should be compressed [1] which attempts to solve the problems with #1.
[1] https://github.com/inkandswitch/keyhive/blob/main/design/sed...
That introduces a communication overhead, but is still likely to be orders of magnitude cheaper than homomorphic encryption
The naive raw stream of changes is far too inefficient due to the immense amount of overhead required to indicate relationships between changes. Changing a single character in a document needs to include the peer ID (e.g., a 128-bit UUID, or a public key), a change ID (like a commit hash - also about 128-bit), and the character’s position in the document (usually a reference to the parent’s ID and relative marker indicating the insert is either before or after the parent).
The other obvious compression is deletions. They will be compressed to tombstones so that the original change messages for deleted content does not need to be relayed.
And I know it is only implied, but peer to peer independent edits are the point of CRDTs. The “relay server” is there only for the worst case scenario described: when peers are not simultaneously available to perform the merge operation.
So instead of merging changes on the server, all you need is some way of knowing which messages you haven’t received yet. Importantly this does not require the server to be able to actually read those messages. All it needs is some metadata (basically just an id per message), and when reconnecting, it needs to send all the not-yet-received messages to the client, so it’s probably useful to keep track of which client has received which messages, to prevent having to figure that out every time a client connects.
Or the user's client can flatten un-acked changes and tell the server to store that instead.
It can just allways flatten until it hears back from a peer.
The entire scenario is over-contrived. I wish they had just shown it off instead of making the lie of a justification.
The purpose of these papers is to map out what's possible, etc, which might at some point help with actual R&D.
Ouch!
The first FHE scheme required keys of several TB/PB, bootstrapping (an operation that is pivotal in FHE schemes, when too many multiplications are computed) would take thousands of hours. We are now down to keys of "only" 30 MB, and bootstrapping in less than 0.1 second.
Hopefully progress will continue and FHE will become more practical.
edit so i bring some "proof" of my claim: from this very page : `To calculate the new map, the server must go through and merge every single key. After that, it needs to transfer the full map to each peer — because remember, as far as it knows, the entire map is different.`
By having a server in the mix it feels like we're forcing a hub/spoke model on something that wants to be a partial mesh. Not surprising that the hub is stressed out.
What kinds of CRDTs specifically are you referring to? On its own this statement sounds far too broad to be meaningful. It's like saying "nested for loops are crazy slow".
See: https://josephg.com/blog/crdts-go-brrr/
(And even these optimizations are nascent. It can still get so much better.)
The section you quoted describes an effect of homomorphic encryption alone.
There is the problem that both CRDTs and encryption add some overhead, and the overhead is additive when use together. But I can’t tell if that is the point you are trying to make.
The overhead is usually multiplicative per-item. Let's say you're doing N things. CRDTs make that O(Nk) for some scaling factor k, and adding encryption makes it O(Nkj) for some scaling factor j.
Give or take some multiplicative log (or worse) factors depending on the implementation.
You must back up your extraordinary claim with some extraordinary evidence. There is nothing inherently slow in CRDTs.
Also, applying changes is hardly on anyone's hot path.
The only instance where I saw anyone complaining about CRDT performance, it turned out to be from very naive implementations that tried to spam changes with overly chatty implementations. If you come up with any code that requires a full HTTPS connection to send a single character down the wire, the problem is not the algorithm.
compared to what? c'mon
On code signing and the SETI@home screensaver
To name a few: Nice style (colors, font, style), "footnotes" visible on the margin, always on table of contents, interactivity and link previews on hover.
Nice. What's your tech stack?
In other words the server could forward and not store if all parties are always online (at the same time).
Client before upload of data, check for hash/etag of blob he originally fetched. If blob on server has different one, it will download it, decrypt, patch new data on existing one, encrypt and reupload.
Whats the catch?
AES is hardware accelerated on the most devices - so with all the ops it will be significantly faster than any homomorphic enc nowadays.
qualeed•4h ago
https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...
Xeoncross•3h ago