Private Inference: https://confer.to/blog/2026/01/private-inference/
Private Inference: https://confer.to/blog/2026/01/private-inference/
I mean, e2ee is great and welcome, of course. That's a wonderful thing. But I need more.
> LLMs are fundamentally stateless—input in, output out—which makes them ideal for this environment. For Confer, we run inference inside a confidential VM. Your prompts are encrypted from your device directly into the TEE using Noise Pipes, processed there, and responses are encrypted back. The host never sees plaintext.
I don’t know what model they’re using, but it looks like everything should be staying on their servers, not going back to, eg, OpenAI or Anthropic.
(It got submitted a few times but did not get any comments - might as well consolidate these threads)
Edit: https://news.ycombinator.com/item?id=46600839 this comment says that the gpu have such capabilities as well, So I am interested what you were mentioning in the first place?
Even so, you're still exposing your data to Confer, and so you have to trust them that they'll behave as you want. That's a security problem that Confer doesn't help with.
I'm not saying Confer isn't useful, though. e2ee is very useful. But it isn't enough to make me feel comfortable.
They use a https://en.wikipedia.org/wiki/Trusted_execution_environment and iiuc claim that your client can confirm (attest) that the code they run doesn't leak your data, see https://confer.to/blog/2026/01/private-inference/
So you should be able to run https://github.com/conferlabs/confer-image yourself and get a hash of that and then confer.to will send you that same hash, but now it's been signed by Intel I guess? to tell you that yes not only did confer.to send you that hash, but that hash is indeed a hash of what's running inside the Trusted Execution Environment.
I feel like this needs diagrams.
Using remote attestation in the browser to attest the server rather than the client is refreshing.
Using passkeys to encrypt data does limit browser/hardware combinations, though. My Firefox+Bitwarden setup doesn't work with this, unfortunately. Firefox on Android also seems to be broken, but Chrome on Android works well at least.
"This application requires passkey with PRF extension support for secure encryption key storage. Your browser or device doesn't support these advanced features.Please use Chrome 116+, Firefox 139+, or Edge 141+ on a device with platform authentication (Face ID, Touch ID, Windows Hello, etc.)."
We are allowed into the blog though! https://confer.to/blog/
https://developer.nvidia.com/blog/confidential-computing-on-...
This is similar to the weasely language Google is now using with the Magic Cue feature ever since Android 16 QPR 1. When it launched, it was local only -- now it's local and in the cloud "with attestation". I don't like this trend and I don't think I'll be using such products
Similarly with AI. The AI is one of the ends of the conversation.
Think PCs in 5y to 10y that can run SoTA multi-modal LLMs (cf Mac Pro) will cost as much as cars do, and I reckon folks will buy it.
Lots of ifs there, though. I do trust Moxie in terms of execution though. Doesn’t seem like the type of person to take half measures.
Sure, for e.g. E2E email, the expectation is that all the computation occurs on the client, and the server is a dumb store of opaque encrypted stuff.
In a traditional E2E chat app, on the other hand, you've still got a backend service acting as a dumb pipe, that shouldn't have the keys to decrypt traffic flowing through it; but you've also got multiple clients — not just your own that share your keybag, but the clients of other users you're communicating with. "E2E" in the context of a chat app, means "messages are encrypted within your client; messages can then only be decrypted within the destination client(s) [i.e. the client(s) of the user(s) in the message thread with you.]"
"E2E AI chat" would be E2E chat, with an LLM. The LLM is the other user in the chat thread with you; and this other user has its own distinct set of devices that it must interact through (because those devices are within the security boundary of its inference infrastructure.) So messages must decrypt on the LLM's side for it to read and reply to, just as they must decrypt on another human user's side for them to read and reply to. The LLM isn't the backend here; the chat servers acting as a "pipe" are the backend, while the LLM is on the same level of the network diagram as the user is.
Let's consider the trivial version of an "E2E AI chat" design, where you physically control and possess the inference infrastructure. The LLM infra is e.g. your home workstation with some beefy GPUs in it. In this version, you can just run Signal on the same workstation, and connect it to the locally-running inference model as an MCP server. Then all your other devices gain the ability to "E2E AI chat" with the agent that resides in your workstation.
The design question, being addressed by Moxie here, is what happens in the non-trivial case, when you aren't in physical possession of any inference infrastructure.
Which is obviously the applicable case to solve for most people, 100% of the time, since most people don't own and won't ever own fancy GPU workstations.
But, perhaps more interesting for us tech-heads that do consider buying such hardware, and would like to solve problems by designing architectures that make use of it... the same design question still pertains, at least somewhat, even when you do "own" the infra; just as long as you aren't in 100% continuous physical possession of it.
You would still want attestation (and whatever else is required here) even for an agent installed on your home workstation, so long as you're planning to ever communicate with it through your little chat gateway when you're not at home. (Which, I mean... why else would you bother with setting up an "E2E AI chat" in the first place, if not to be able to do that?)
Consider: your local flavor of state spooks could wait for you to leave your house; slip in and install a rootkit that directly reads from the inference backend's memory; and then disappear into the night before you get home. And, no matter how highly you presume your abilities to detect that your home has been intruded into / your computer has been modified / etc once you have physical access to those things again... you'd still want to be able to detect a compromise of your machine even before you get home, so that you'll know to avoid speaking to your agent (and thereby the nearby wiretap van) until then.
Also fwiw I think tees and remote attestation are a pretty pragmatic solution here that meaningfully improves on the current state of the art for llm inference and I'm happy to see it.
The entire point of E2EE is that both "ends" need to be fully under your control.
From Wikipedia: "End-to-end encryption (E2EE) is a method of implementing a secure communication system where only the sender and intended recipient can read the messages."
Both ends do not need to be under your control for E2EE.
> This application requires passkey with PRF extension support for secure encryption key storage. Your browser or device doesn't support these advanced features.
> Please use Chrome 116+, Firefox 139+, or Edge 141+ on a device with platform authentication (Face ID, Touch ID, Windows Hello, etc.).
(Running Chrome 143)
So... does this just not support desktops without overpriced webcams, or am I missing something?
I see references to vLLM in the GitHub but not which actual model (Llama, Mistral, etc.) or if they have a custom fine tune, or you give your own huggingface link?
AdmiralAsshat•7h ago