Cap'n Web: a new RPC system for browsers and web servers

https://blog.cloudflare.com/capnweb-javascript-rpc-library/

124•jgrahamc•4h ago

Comments

beckford•2h ago

Since Cap'n Web is a simplification of Cap'n Proto RPC, it would be amazing if eventually the simplification traveled back to all the languages that Cap'n Proto RPC supports (C++, etc.). Or at least could be made to be binary compatible. Regardless, this is great.

kentonv•1h ago

Yeah I now want to go back and redesign the Cap'n Proto RPC protocol to be based on this new design, as it accomplishes all the same features with a lot less complexity!

But it may be tough to justify when we already have working Cap'n Proto implementations speaking the existing protocol, that took a lot of work to build. Yes, the new implementations will be less work than the original, but it's still a lot of work that is essentially running-in-place.

OTOH, it might make it easier for Cap'n Proto RPC to be implemented in more languages, which might be worth it... idk.

beckford•1h ago

Disclaimer: I took over maintenance of the Cap'n Proto C bindings a couple years ago.

That makes sense. There is some opportunity though since the Cap'n Proto RPC had always lacked a JavaScript RPC implementation. For example, I had always been planning on using the Cap'n Proto OCaml implementation (which had full RPC) and using one of the two mature OCaml->JavaScript frameworks to get a JavaScript implementation. Long story short: Not now, but I'd be interested in seeing if Cap'n Web can be ported to OCaml. I suspect other language communities may be interested. Promise chaining is a killer feature and was (previously) difficult to implement. Aside: Promise chaining is quite undersold on your blog post; it is co-equal to capabilities in my estimation.

CobrastanJorji•48s ago

You mean redesign Cap'n Proto to not have a schema? Or did you mean the API, not the protocol?

davexunit•1h ago

This has some similarities and significant differences from OCapN [0]. Capability transfer and promise pipelining are part of both, and both are schemaless. Cap'n web lacks out-of-band capabilities, which OCapN has in the form of URIs known as sturdyrefs. I suppose this difference is why the examples show API key authentication since anyone can connect to the Cap'n Web endpoint. This is not necessary in OCapN because a sturdyref is an unguessable token so by possessing it you have the authority to send messages to the endpoint it designates. Cap'n Web also seems to lack the ability for Alice to introduce Bob to Carol, a feature in OCapN called third-party handoffs. Handoffs are needed for distributed applications. So I guess Cap'n Web is more for traditional client-server SaaS but now with a dash of ocaps.

[0] https://ocapn.org/

kentonv•1h ago

I’d love to add 3PH support in the future, but it wasn’t a priority for an initial release as this is very focused on enabling browser<->web server communications specifically.

SturdyRefs are tricky. My feeling is that they don’t really belong in the RPC protocol itself, because the mechanism by which you restore a SturdyRef is very dependent on the platform in which you're running. Cloudflare Workers, for example, may soon support storing capabilities into Durable Object storage. But the way this will work is very tied to the Cloudflare Workers platform. Sandstorm, similarly, had a persistent capability mechanism, but it only made sense inside Sandstorm – which is why I removed the whole notion of persistent capabilities from Cap’n Proto itself.

The closest thing to a web standard for SturdyRefs is OAuth. I could imagine defining a mechanism for SturdyRefs based on OAuth refresh tokens, which would be pretty cool, but it probably wouldn’t actually be what you want inside a specific platform like Sandstorm or Workers.

electric_muse•1h ago

This is such a useful pattern.

I’ve ended up building similar things over and over again. For example, simplifying the worker-page connection in a browser or between chrome extension “background” scripts and content scripts.

There’s a reason many prefer “npm install” on some simple sdk that just wraps an API.

This also reminds me a lot of MCP, especially the bi-directional nature and capability focus.

HDThoreaun•1h ago

This a reference to cap'n jazz?

kentonv•1h ago

No, in fact this is the first I've heard of Cap'n Jazz.

The name "Cap'n Proto" came from "capabilities and protobuf". The first, never-released version was based on Protobuf serialization. The first public release (way back on April 1, 2013) had its own, all-new serialization.

There's also a pun with it being a "cerealization protocol" (Cap'n Cruch is a well-known brand of cereal).

chrisweekly•1h ago

aha, the Cap'n Crunch "cerealization" pun is solid

evansd•1h ago

I need time to try this out for real, but the simplicity/power ratio here looks like it could be pretty extraordinary. Very exciting!

Tiny remark for @kentonv if you're reading: it looks like you've got the wrong code sample immediately following the text "Putting it together, a code sequence like this".

kentonv•1h ago

Ugh, looks like a copy-pasto when moving the blog into the CMS, will get that fixed, thanks.

The code was supposed to be:

    let namePromise = api.getMyName();
    let result = await api.hello(namePromise);

    console.log(result);

fitzn•1h ago

Just making sure I understand the "one round trip" point. If the client has chained 3 calls together, that still requires 3 messages sent from the client to the server. Correct?

That is, the client is not packaging up all its logic and sending a single blob that describes the fully-chained logic to the server on its initial request. Right?

When I first read it, I was thinking it meant 1 client message and 1 server response. But I think "one round trip" more or less message "1 server message in response to potentially many client messages". That's a fair use of "1 RTT", but took me a moment to understand.

Just to make that distinction clear from a different angle, suppose the client were _really_ _really_ slow and it did not send the second promise message to the server until AFTER the server had computed the result for promise1. Would the server have already responded to the client with the result? That would be a way to incur multiple RTTs, albeit the application wouldn't care since it's bottlenecked by the client CPU, not the network in this case.

I realize this is unlikely. I'm just using it to elucidate the system-level guarantee for my understanding.

As always, thanks for sharing this, Kenton!

benpacker•1h ago

My understanding is that your first read is right and your current understanding is wrong.

The client sends over separate 3 calls in one message, or one message describing some computation (run this function with the result of this function) and the server responds with one payload.

Elucalidavah•1h ago

> the client is not packaging up all its logic and sending a single blob that describes the fully-chained logic to the server on its initial request. Right

See "But how do we solve arrays" part:

> > .map() is special. It does not send JavaScript code to the server, but it does send something like "code", restricted to a domain-specific, non-Turing-complete language. The "code" is a list of instructions that the server should carry out for each member of the array

kentonv•44m ago

To chain three calls, the client will send three messages, yes. (At least when using the WebSocket transport. With the HTTP batch transport, the entire batch is concatenated into one HTTP request body.)

But the client can send all three messages back-to-back without waiting for any replies from the server. In terms of network communications, it's effectively the same as sending one message.

fitzn•29m ago

Yep - agreed. Thanks!

benpacker•1h ago

This seems great and I'm really excited to try it in place of trpc/orpc.

Although it seems to solve one of the problems that GraphQL solved that trpc doesn't (the ability to request nested information from items in a list or properties of an object without changes to server side code), there is no included solution for the server side problem that creates that the data loader pattern was intended to solve, where a naive GraphQL server implementation makes a database query per item in a list.

Until the server side tooling for this matures and has equivalents for the dataloader pattern, persisted/allowlist queries, etc., I'll probably only use this for server <-> server (worker <-> worker) or client <-> iframe communication and keep my client <-> server communication alongside more pre-defined boundaries.

kentonv•11m ago

I generally agree that the .map() trick doesn't actually replace GraphQL without some sort of server-side optimizations to avoid turning this into N+1 selects.

However, if your database is sqlite in a Cloudflare Durable Object, and the RPC protocol is talking directly to it, then N+1 selects are actually just fine.

https://www.sqlite.org/np1queryprob.html

crabmusket•1h ago

Really nice to have something I could potentially use across the whole app. I've been looking into things I can use over HTTP, websockets, and also over message channels to web workers. I've usually ended up implementing something that rounds to JSON-RPC (i.e. just use an `id` per request and response to tie them together). But this looks much sturdier.

Building an operation description from the callback inside the `map` is wild. Does that add much in the way of restrictions programmers need to be careful of? I could imagine branching inside that closure, for example, could make things awkward. Reminiscent of the React hook rules.

kentonv•40m ago

The .map() callback receives as its input an RpcPromise, not the actual value. You can't do any computation (including branching) on an RpcPromise, the only thing you can do is pipeline on it. Since the map callback must be synchronous, you can't await the promise either.

So it turns out it's actually not easy to mess up in a map callback. The main thing you have to avoid is side effects that modify stuff outside the callback. If you do that, the effect you'll see is those modifications only get applied once, rather than N times. And any stubs you exfiltrate from the callback simply won't work if called later.

crabmusket•31m ago

Yeah that's what I meant, reading/writing variables captured into the callback. But that sounds like the kind of code that would be easy to sniff out in a code review, or write lints for.

unshavedyak•1h ago

What would this look like for other language backends to support? Eg would be neat if Rust (my webservers) could support this on the backend

edit: Downvoted, is this a bad question? The title is generically "web servers", obviously the content of the post focuses primarily on TypeScript, but i'm trying to determine if there's something unique about this that means it cannot be implemented in other languages. The serverside DSL execution could be difficult to impl, but as it's not strictly JavaScript i imagine it's not impossible?

kentonv•32m ago

I'm hoping the answer will be:

* Use Cap'n Proto in your Rust backend. This is what you want in a type-safe language like Rust: generated code based on a well-defined schema.

* We'll build some sort of proxy that, given a Cap'n Proto schema, converts between Cap'n Web and Cap'n Proto. So your frontend can speak Cap'n Web.

But this proxy is just an idea for now. No idea if or when it'll exist.

krosaen•1h ago

This looks pretty awesome, and excited it's not only a cloudflare product (Cap'n Web exists alongside cloudflare Workers). Reading this section [1], can you say more about:

> as of this writing, the feature set is not exactly the same between the two. We aim to fix this over time, by adding missing features to both sides until they match.

do you think once the two reach parity, that that parity will remain, or more likely that Cap'n Web will trail cloudflare workers, and if so, by what length of time?

[1] https://github.com/cloudflare/capnweb/tree/main?tab=readme-o...

kentonv•48m ago

I think we'll likely keep them pretty close to in-sync, at least when it comes to features that make sense in both.

If anything I'd expect Cap'n Web to run ahead of Workers RPC (as it is already doing, with the new pipeline features) because Cap'n Web's implementation is actually much simpler than Workers'. Cap'n Web will probably be the place where we experiment with new features.

random3•1h ago

It's inspired by and created by a coauthor of [Cap'n Proto](https://capnproto.org), which is also what OCapN (referenced in a separate comment) name refers to.

Cap'n Proto is inspired by ProtoBuf, protobuf has gRPC and gRPC web.

We've been using ProtoBuf/gRPC/gRPC-web both in the backends and for public endpoints powering React / TS UI's, at my last startup. It worked great, particularly with the GCP Kubernetes infrastructure. Basically both API and operational aspects were non-problems. However, navigating the dumpster fire around protobuf, gRPC, gRPC web with the lack of community leadership from Google was a clusterfuck.

This said, I'm a bit at loss with the meaning of schemaless. You can have different approaches wrt schema (see Avro vs ProtoBuf) but otherwise, can't fundamentally eschew schema/types. It's purely information tied to a communication channel that needs to be somewhere, whether that's explicit, implicit, handled by the RCP layer, passed to the type system, or worse all the way to the user/dev. Moreover, schemas tend to evolve and any protocol needs to take that into account.

Historically, ProtoBuf has done a good job managing various tradeoffs, here but had no experience using Capt'n Proto, yet seen mostly good stuff about it, so perhaps I'm just missing something here.

chrisweekly•1h ago

100% agreed (as will anyone sane who's tried to use it), grpc-web is a trainwreck.

kentonv•34m ago

Of course, all programming language APIs even in dynamic languages have some implied type (aka schema). You can't write code against an API without knowing what methods it provides, what their inputs and outputs are, etc. -- and that's a schema, whether or not it's actually written out as such.

But Cap'n Web itself does not need to know about any of that. Cap'n Web just accepts whatever method call you make, sends it to the other end of the connection, and attempts to deliver it. The protocol itself has no idea if your invocation is valid or not. That's what I mean by "schemaless" -- you don't need to tell Cap'n Web about any schemas.

With that said, I strongly recommend using TypeScript with Cap'n Web. As always, TypeScript schemas are used for build-time type checking, but are then erased before runtime. So Cap'n Web at runtime doesn't know anything about your TypeScript types.

dannyobrien•24m ago

I think rather than related to each other, Cap'n and OCapN are both references to object capabilities, aka ocaps. (Insert joke about unforgeable references here)

divan•1h ago

> RPC is often accused of committing many of the fallacies of distributed computing. > But this reputation is outdated. When RPC was first invented some 40 years ago, async programming barely existed. We did not have Promises, much less async and await.

I'm confused. How is this a "protocol" if its core premises rely on very specific implementation of concurrency in a very specific language?

kentonv•51m ago

What do you mean? Async programming exists in tons of languages. Just off the top of my head, I've used async/await in JavaScript, C++, Python, Rust, C#, ...

Anyway, the point here is that early RPC systems worked by blocking the calling thread while performing the network request, which was obviously a terrible idea.

chao-•41m ago

Reminds me of the old "MongoDB is Web Scale" series of comedy videos:

https://youtu.be/bzkRVzciAZg

Some friends and I still jokingly troll each other in the vein of these, interjecting with "When async programming was discovered in 2008...", or "When memory safe compiled languages were invented in 2012..." and so forth.

vmg12•1h ago

I see that it supports websockets for the transport layer, is there any support for two way communication?

edit: was skimming the github repo https://github.com/cloudflare/capnweb/tree/main?tab=readme-o...

and saw this which answers my question:

> Supports passing functions by reference: If you pass a function over RPC, the recipient receives a "stub". When they call the stub, they actually make an RPC back to you, invoking the function where it was created. This is how bidirectional calling happens: the client passes a callback to the server, and then the server can call it later.

> Similarly, supports passing objects by reference: If a class extends the special marker type RpcTarget, then instances of that class are passed by reference, with method calls calling back to the location where the object was created.

Gonna skim some more to see if i can find some example code.

jimmyl02•46m ago

This seems like a similar and more feature complete / polished version of JSON RPC?

The part that's most exciting to me is actually the bidirectional calling. Having set this up before via JSON RPC / custom protocol the experience was super "messy" and I'm looking forward to a framework making it all better.

Can't wait to try it out!

kentonv•26m ago

Yeah, JSON RPC doesn't support the pass-by-reference and lifecycle management stuff. You just have a static list of top-level functions you can call. This makes a pretty big difference in what kinds of APIs you can express.

OTOH, JSON RPC is extremely simple. Cap'n Web is a relatively complicated and subtle underlying protocol.

cbarrick•45m ago

What's going on under the hood with that authentication example?

Is the server holding onto some state in memory that this specific client has already authenticated? Or is the API key somehow stored in the new AuthenticatedSession stub on the client side and included in subsequent requests? Or is it something else entirely?

dimal•44m ago

Looks very cool, especially passing functions back and forth. But then I wonder, what would I actually use that for?

You mention that it’s schemaless as if that’s a good thing. Having a well defined schema is one of the things I like about tRPC and zod. Is there some way that you get the benefits of a schema with less work?

kentonv•24m ago

You can use TypeScript to define your API, and get all the benefits of schemas.

Well, except you don't get runtime type checking with TypeScript, which might be something you really want over RPC. For now I actually suggest using zod for type checks, but my dream is to auto-generate type checks based on the TypeScript types...

spankalee•33m ago

Looking at this quickly, it does seem to require (or strongly encourage?) a stateful server to hold on to the import and export tables and the state of objects in each.

One thing about a traditional RPC system where every call is top-level and you pass keys and such on every call is that multiple calls in a sequence can usually land on different servers and work fine.

Is there a way to serialize and store the import/export tables to a database so you can do the same here, or do you really need something like server affinity or Durable Objects?

kentonv•29m ago

The state only lives for a single RPC session.

When using WebSockets, that's the lifetime of the WebSocket.

But when using the HTTP batch transport, a session is a single HTTP request, that performs a batch of calls all at once.

So there's actually no need to hold state across multiple HTTP requests or connections, at least as far as Cap'n Web is concerned.

This does imply that you shouldn't design a protocol where it would be catastrophic if the session suddenly disconnected in the middle and you lost all your capabilities. It should be possible to reconnect and reconstruct them.

doctorpangloss•9m ago

^ maybe the most knowledgable person in the world about these gritty details

RPC SDKs should have session management, otherwise you end up in this situation:

"Any sufficiently complicated gRPC or Cap'n'Proto program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Akka"

cogman10•18m ago

That's what I noticed reading through this.

It looks like the server affinity is accomplished by using websockets. The http batching simply sends all the requests at once and then waits for the response.

I don't love this because it makes load balancing hard. If a bunch of chatty clients get a socket to the same server, now that server is burdened and potentially overloadable.

Further, it makes scaling in/out servers really annoying. Persistent long lived connections are beasts to deal with because now you have to handle that "what do I do if multiple requests are in flight?".

One more thing I don't really love about this, it requires a timely client. This seems like it might be trivial to DDOS as a client can simply send a stream of push events and never pull. The server would then be burdened to keep those responses around so long as the client remains connected. That seems bad.

afiori•3m ago

My limited knowledge of Cap'n'Proto (read the docs once years ago) server and clients can peer stubs, so that is server C receives a stub originated from server A via client B then C can try to call A directly.

benmmurphy•10m ago

are there security issues with no schemas + callback stubs + language on the server with little typing. for example with this `hello(name)` example the server expects a string but can the client pass an callback object that is string-like and then use this to try and trick the server into doing something bad?

kentonv•3m ago

The protocol explicitly blocks overriding `toString()` (and all other Object.prototype members), as well as `toJSON()`, to prevent the obvious ways that you might accidentally invoke a callback when you weren't expecting to. How else might you invoke a callback by accident?

That said, type checking is called out both in the blog post (in the section on TypeScript) and in the readme (under "Security Considerations"). You probably should use some runtime type checking library, just like you should with traditional JSON inputs.

In the future I'm hoping someone comes up with a way to auto-generate type checks based on TypeScript types.

PlanetScale for Postgres is now GA

The American nations across North America

Cloudflare is sponsoring Ladybird and Omarchy

A simple way to measure knots has come unraveled

SWE-Bench Pro

Cap'n Web: a new RPC system for browsers and web servers

Mentra (YC W25) Is Hiring to build smart glasses

OpenAI and Nvidia Announce Partnership to Deploy 10GW of Nvidia Systems

Easy Forth (2015)

A New Internet Business Model?

CompileBench: Can AI Compile 22-year-old Code?

What is algebraic about algebraic effects?

What if we treated Postgres like SQLite?

How I, a beginner developer, read the tutorial you, a developer, wrote for me

The Strange Tale of the Hotchkiss

SGI demos from long ago in the browser via WASM

Anti-*: The Things We Do but Not All the Way

Human-Oriented Markup Language

Dear GitHub: no YAML anchors, please

Beyond the Front Page: A Personal Guide to Hacker News

A board member's perspective of the RubyGems controversy

Kmart's use of facial recognition to tackle refund fraud unlawful

A Beautiful Maths Game

UK Millionaire exodus did not occur, study reveals

Emerald Source Code Commentary

You did this with an AI and you do not understand what you're doing here

Privacy and Security Risks in the eSIM Ecosystem [pdf]

Biconnected components

Show HN: Software Freelancers Contract Template

The death rays that guard life

Cap'n Web: a new RPC system for browsers and web servers

Comments

PlanetScale for Postgres is now GA

The American nations across North America

Cloudflare is sponsoring Ladybird and Omarchy

A simple way to measure knots has come unraveled

SWE-Bench Pro

Cap'n Web: a new RPC system for browsers and web servers

Mentra (YC W25) Is Hiring to build smart glasses

OpenAI and Nvidia Announce Partnership to Deploy 10GW of Nvidia Systems

Easy Forth (2015)

A New Internet Business Model?

CompileBench: Can AI Compile 22-year-old Code?

What is algebraic about algebraic effects?

What if we treated Postgres like SQLite?

How I, a beginner developer, read the tutorial you, a developer, wrote for me

The Strange Tale of the Hotchkiss

SGI demos from long ago in the browser via WASM

Anti-*: The Things We Do but Not All the Way

Human-Oriented Markup Language

Dear GitHub: no YAML anchors, please

Beyond the Front Page: A Personal Guide to Hacker News

A board member's perspective of the RubyGems controversy

Kmart's use of facial recognition to tackle refund fraud unlawful

A Beautiful Maths Game

UK Millionaire exodus did not occur, study reveals

Emerald Source Code Commentary

You did this with an AI and you do not understand what you're doing here

Privacy and Security Risks in the eSIM Ecosystem [pdf]

Biconnected components

Show HN: Software Freelancers Contract Template

The death rays that guard life