SturdyRefs are tricky. My feeling is that they don’t really belong in the RPC protocol itself, because the mechanism by which you restore a SturdyRef is very dependent on the platform in which you're running. Cloudflare Workers, for example, may soon support storing capabilities into Durable Object storage. But the way this will work is very tied to the Cloudflare Workers platform. Sandstorm, similarly, had a persistent capability mechanism, but it only made sense inside Sandstorm – which is why I removed the whole notion of persistent capabilities from Cap’n Proto itself.
The closest thing to a web standard for SturdyRefs is OAuth. I could imagine defining a mechanism for SturdyRefs based on OAuth refresh tokens, which would be pretty cool, but it probably wouldn’t actually be what you want inside a specific platform like Sandstorm or Workers.
I’ve ended up building similar things over and over again. For example, simplifying the worker-page connection in a browser or between chrome extension “background” scripts and content scripts.
There’s a reason many prefer “npm install” on some simple sdk that just wraps an API.
This also reminds me a lot of MCP, especially the bi-directional nature and capability focus.
The name "Cap'n Proto" came from "capabilities and protobuf". The first, never-released version was based on Protobuf serialization. The first public release (way back on April 1, 2013) had its own, all-new serialization.
There's also a pun with it being a "cerealization protocol" (Cap'n Cruch is a well-known brand of cereal).
Tiny remark for @kentonv if you're reading: it looks like you've got the wrong code sample immediately following the text "Putting it together, a code sequence like this".
The code was supposed to be:
let namePromise = api.getMyName();
let result = await api.hello(namePromise);
console.log(result);
That is, the client is not packaging up all its logic and sending a single blob that describes the fully-chained logic to the server on its initial request. Right?
When I first read it, I was thinking it meant 1 client message and 1 server response. But I think "one round trip" more or less message "1 server message in response to potentially many client messages". That's a fair use of "1 RTT", but took me a moment to understand.
Just to make that distinction clear from a different angle, suppose the client were _really_ _really_ slow and it did not send the second promise message to the server until AFTER the server had computed the result for promise1. Would the server have already responded to the client with the result? That would be a way to incur multiple RTTs, albeit the application wouldn't care since it's bottlenecked by the client CPU, not the network in this case.
I realize this is unlikely. I'm just using it to elucidate the system-level guarantee for my understanding.
As always, thanks for sharing this, Kenton!
The client sends over separate 3 calls in one message, or one message describing some computation (run this function with the result of this function) and the server responds with one payload.
See "But how do we solve arrays" part:
> > .map() is special. It does not send JavaScript code to the server, but it does send something like "code", restricted to a domain-specific, non-Turing-complete language. The "code" is a list of instructions that the server should carry out for each member of the array
But the client can send all three messages back-to-back without waiting for any replies from the server. In terms of network communications, it's effectively the same as sending one message.
Although it seems to solve one of the problems that GraphQL solved that trpc doesn't (the ability to request nested information from items in a list or properties of an object without changes to server side code), there is no included solution for the server side problem that creates that the data loader pattern was intended to solve, where a naive GraphQL server implementation makes a database query per item in a list.
Until the server side tooling for this matures and has equivalents for the dataloader pattern, persisted/allowlist queries, etc., I'll probably only use this for server <-> server (worker <-> worker) or client <-> iframe communication and keep my client <-> server communication alongside more pre-defined boundaries.
However, if your database is sqlite in a Cloudflare Durable Object, and the RPC protocol is talking directly to it, then N+1 selects are actually just fine.
Building an operation description from the callback inside the `map` is wild. Does that add much in the way of restrictions programmers need to be careful of? I could imagine branching inside that closure, for example, could make things awkward. Reminiscent of the React hook rules.
So it turns out it's actually not easy to mess up in a map callback. The main thing you have to avoid is side effects that modify stuff outside the callback. If you do that, the effect you'll see is those modifications only get applied once, rather than N times. And any stubs you exfiltrate from the callback simply won't work if called later.
edit: Downvoted, is this a bad question? The title is generically "web servers", obviously the content of the post focuses primarily on TypeScript, but i'm trying to determine if there's something unique about this that means it cannot be implemented in other languages. The serverside DSL execution could be difficult to impl, but as it's not strictly JavaScript i imagine it's not impossible?
* Use Cap'n Proto in your Rust backend. This is what you want in a type-safe language like Rust: generated code based on a well-defined schema.
* We'll build some sort of proxy that, given a Cap'n Proto schema, converts between Cap'n Web and Cap'n Proto. So your frontend can speak Cap'n Web.
But this proxy is just an idea for now. No idea if or when it'll exist.
> as of this writing, the feature set is not exactly the same between the two. We aim to fix this over time, by adding missing features to both sides until they match.
do you think once the two reach parity, that that parity will remain, or more likely that Cap'n Web will trail cloudflare workers, and if so, by what length of time?
[1] https://github.com/cloudflare/capnweb/tree/main?tab=readme-o...
If anything I'd expect Cap'n Web to run ahead of Workers RPC (as it is already doing, with the new pipeline features) because Cap'n Web's implementation is actually much simpler than Workers'. Cap'n Web will probably be the place where we experiment with new features.
Cap'n Proto is inspired by ProtoBuf, protobuf has gRPC and gRPC web.
We've been using ProtoBuf/gRPC/gRPC-web both in the backends and for public endpoints powering React / TS UI's, at my last startup. It worked great, particularly with the GCP Kubernetes infrastructure. Basically both API and operational aspects were non-problems. However, navigating the dumpster fire around protobuf, gRPC, gRPC web with the lack of community leadership from Google was a clusterfuck.
This said, I'm a bit at loss with the meaning of schemaless. You can have different approaches wrt schema (see Avro vs ProtoBuf) but otherwise, can't fundamentally eschew schema/types. It's purely information tied to a communication channel that needs to be somewhere, whether that's explicit, implicit, handled by the RCP layer, passed to the type system, or worse all the way to the user/dev. Moreover, schemas tend to evolve and any protocol needs to take that into account.
Historically, ProtoBuf has done a good job managing various tradeoffs, here but had no experience using Capt'n Proto, yet seen mostly good stuff about it, so perhaps I'm just missing something here.
But Cap'n Web itself does not need to know about any of that. Cap'n Web just accepts whatever method call you make, sends it to the other end of the connection, and attempts to deliver it. The protocol itself has no idea if your invocation is valid or not. That's what I mean by "schemaless" -- you don't need to tell Cap'n Web about any schemas.
With that said, I strongly recommend using TypeScript with Cap'n Web. As always, TypeScript schemas are used for build-time type checking, but are then erased before runtime. So Cap'n Web at runtime doesn't know anything about your TypeScript types.
I'm confused. How is this a "protocol" if its core premises rely on very specific implementation of concurrency in a very specific language?
Anyway, the point here is that early RPC systems worked by blocking the calling thread while performing the network request, which was obviously a terrible idea.
Some friends and I still jokingly troll each other in the vein of these, interjecting with "When async programming was discovered in 2008...", or "When memory safe compiled languages were invented in 2012..." and so forth.
edit: was skimming the github repo https://github.com/cloudflare/capnweb/tree/main?tab=readme-o...
and saw this which answers my question:
> Supports passing functions by reference: If you pass a function over RPC, the recipient receives a "stub". When they call the stub, they actually make an RPC back to you, invoking the function where it was created. This is how bidirectional calling happens: the client passes a callback to the server, and then the server can call it later.
> Similarly, supports passing objects by reference: If a class extends the special marker type RpcTarget, then instances of that class are passed by reference, with method calls calling back to the location where the object was created.
Gonna skim some more to see if i can find some example code.
The part that's most exciting to me is actually the bidirectional calling. Having set this up before via JSON RPC / custom protocol the experience was super "messy" and I'm looking forward to a framework making it all better.
Can't wait to try it out!
OTOH, JSON RPC is extremely simple. Cap'n Web is a relatively complicated and subtle underlying protocol.
Is the server holding onto some state in memory that this specific client has already authenticated? Or is the API key somehow stored in the new AuthenticatedSession stub on the client side and included in subsequent requests? Or is it something else entirely?
You mention that it’s schemaless as if that’s a good thing. Having a well defined schema is one of the things I like about tRPC and zod. Is there some way that you get the benefits of a schema with less work?
Well, except you don't get runtime type checking with TypeScript, which might be something you really want over RPC. For now I actually suggest using zod for type checks, but my dream is to auto-generate type checks based on the TypeScript types...
One thing about a traditional RPC system where every call is top-level and you pass keys and such on every call is that multiple calls in a sequence can usually land on different servers and work fine.
Is there a way to serialize and store the import/export tables to a database so you can do the same here, or do you really need something like server affinity or Durable Objects?
When using WebSockets, that's the lifetime of the WebSocket.
But when using the HTTP batch transport, a session is a single HTTP request, that performs a batch of calls all at once.
So there's actually no need to hold state across multiple HTTP requests or connections, at least as far as Cap'n Web is concerned.
This does imply that you shouldn't design a protocol where it would be catastrophic if the session suddenly disconnected in the middle and you lost all your capabilities. It should be possible to reconnect and reconstruct them.
RPC SDKs should have session management, otherwise you end up in this situation:
"Any sufficiently complicated gRPC or Cap'n'Proto program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Akka"
It looks like the server affinity is accomplished by using websockets. The http batching simply sends all the requests at once and then waits for the response.
I don't love this because it makes load balancing hard. If a bunch of chatty clients get a socket to the same server, now that server is burdened and potentially overloadable.
Further, it makes scaling in/out servers really annoying. Persistent long lived connections are beasts to deal with because now you have to handle that "what do I do if multiple requests are in flight?".
One more thing I don't really love about this, it requires a timely client. This seems like it might be trivial to DDOS as a client can simply send a stream of push events and never pull. The server would then be burdened to keep those responses around so long as the client remains connected. That seems bad.
That said, type checking is called out both in the blog post (in the section on TypeScript) and in the readme (under "Security Considerations"). You probably should use some runtime type checking library, just like you should with traditional JSON inputs.
In the future I'm hoping someone comes up with a way to auto-generate type checks based on TypeScript types.
beckford•2h ago
kentonv•1h ago
But it may be tough to justify when we already have working Cap'n Proto implementations speaking the existing protocol, that took a lot of work to build. Yes, the new implementations will be less work than the original, but it's still a lot of work that is essentially running-in-place.
OTOH, it might make it easier for Cap'n Proto RPC to be implemented in more languages, which might be worth it... idk.
beckford•1h ago
That makes sense. There is some opportunity though since the Cap'n Proto RPC had always lacked a JavaScript RPC implementation. For example, I had always been planning on using the Cap'n Proto OCaml implementation (which had full RPC) and using one of the two mature OCaml->JavaScript frameworks to get a JavaScript implementation. Long story short: Not now, but I'd be interested in seeing if Cap'n Web can be ported to OCaml. I suspect other language communities may be interested. Promise chaining is a killer feature and was (previously) difficult to implement. Aside: Promise chaining is quite undersold on your blog post; it is co-equal to capabilities in my estimation.
CobrastanJorji•48s ago