It doesn't quite work at the capabilities level, but it does provide some novel protections against unusual supply-chain attacks such as denial-of-service attacks which may otherwise require no special capabilities.
This is similar to my work on LavaMoat (https://lavamoat.github.io/) which provides runtime supplychain security protections to js apps (nodejs or browser) by eliminating ambient authority and only exposing global capabilities per npm package according user-specified policy. LavaMoat is used in production at MetaMask, protecting ~300M users.
The first problem is that their attempt to abstract program location has a lot of bugs. You can't solve cross-process GC with the approach they outline (in like, one paragraph). Using finalizers to release cross-process references fails the moment there's a reference cycle. To trace cycles across heaps requires a new kind of GC that doesn't exist today, as far as I know, made much harder by the fact that they're including DoS attacks in their threat model. And intercepting value writes then batching them to save IPCs changes program semantics in pretty subtle ways. I think this is actually the core problem of sandboxing libraries and everything else is a well understood implementation distraction, so it'd be good to have a paper that focused exclusively on that. They also seem to think that modules return DAGs but they don't, program heaps are directed cyclic graphs, not acyclic.
The second problem is that their policy language is too simple and doesn't have any notion of transitive permissions. This is the same issue that made SecurityManager hard to use. You can grant a module filesystem access but in a typical JS app being tested or run on a developer's desktop, that's equivalent to granting all permissions (because it can write to some module directory). Even if you use Docker to ship the app there's no guarantee the js files inside the container are read only, as in a container programs often run as root.
The third problem is the only sandbox they offer is LXC containers, but containers aren't meant to be sandboxes and often aren't "out of the box". And of course they're Linux specific but development of JS apps often takes place on non-Linux machines. The details of actually doing kernel sandboxing for real are rather complex.
Still, something like this is the right direction of travel. The usability issues with process sandboxing arise due to performance problems and the harshness of the address space transition. Allowing object graphs to be 'shadowed' into another process, with proper handling of memory management, and then integrating that into language runtimes seems like the right approach.
Probably the compiled program should just get tbe permissions it needs.
A simple capability system for libraries might be the good that is the enemy of perfect:
Pure - can only access compute and its own memory plus passed in parameters (needs immutable languages or serialization at interop)
Storage IO - Pure but can do Storage IO. IO on what? Anything the program has access to.
Network IO - similar concept
Desktop in Window - can do UI stuff in the window
Desktop General - models, notifications, new windows etc.
Etc...
Not very fine grained but many libraries cab be Pure.
It ain't perfect.
A Pure library that formats a string can still inject some nasty JS hoping that you'll use that string on a web page! Ultimately... useful computation is messy and you can't secure everything in advance through capabilities alone.
https://privacysandbox.google.com/private-advertising/sdk-ru...
From their first high level goal:
> Define a set of portable, modular, runtime-independent, and WebAssembly-native APIs which can be used by WebAssembly code to interact with the outside world. These APIs preserve the essential sandboxed nature of WebAssembly through a Capability-based API design.
https://github.com/WebAssembly/WASI/blob/main/README.md#wasi...
How can you implement an object capability system on WASM? It gives modules a flat memory space in which you can run C so nothing stops one library interfering with the memory of another other than software-level verification, at which point you don't need WASM anymore. At most it could be a Mojo-style system in which cooperating message-passing processes can send each other interfaces.
It's been well known for decades that the germ of an object capability system already exists in Unix - the file descriptor (that's why the control message for transferring them over sockets is called SCM_RIGHTS).
Capsicum was developed to make that potentiality a reality. It didn't require too much work. Since most things are represented by file descriptors, that was just a matter of tightening what you can do with the existing FDs (no going from a directory FD to its parent unless you have the right to the parent, or some parent of that parent, by another FD), and introducing process and anonymous shared memory FDs so that a global namespace is no longer needed to deal with these resources.
So WASI has derived itself from an actually existing object capability architecture - Capsicum - one which happens to be a simple refinement of the Unix API that everyone knows and every modern OS has at least been very much inspired by.
Every modern OS has very much not been inspired by UNIX. Windows has little in common with it e.g. no fork/exec equivalents, the web is a sort of OS these days and has no shared heritage with UNIX, and many operating systems that ship a UNIX core as a convenience don't use its APIs or design patterns at the API level you're meant to use, e.g. an Android app doesn't look anything like a UNIX app, Cocoa APIs aren't UNIX either.
Check "Inside Windows NT" by Helen Custer, an official account. She explicitly credits the handles to Unix. That's not surprising - not only was Unix fresh on the minds of the NT developers, with quite a few of them having Unix backgrounds, but every conceptual ancestor of Windows NT was at least significantly influenced by Unix:
- VMS: The VAX/VMS team were in regular contact with Ken Thompson, and got the idea for channel IDs (= file descriptors) for representing open files and devices from him, as well as the idea of three standard channels which child processes inherit by default: input, output, error (the error one was at the time a very recent development, I think in Unix v6 or v7)
- MICA: Would have been a combined Unix and VMS compatible system.
- OS/2: FDs with read, write, ioctl again.
Even MS-DOS is already highly Unix-influenced: they brought in file descriptors in DOS 2.0 and even called the source files implementing the API "XENIX.ASM" and "XENIX2.ASM" (see the recent open source release.)
I have deliberately chosen to not make anything of the fact that Windows NT was intended to be POSIX compatible either (and even supports fork, which WASI mercifully doesn't) because my point is the fact that all modern general-purpose operating systems are at least very much inspired and deeply indebted to Unix. I would accept that OSes that are not general purpose may not be, and old operating systems made in siloed environments like IBM are fundamentally very different. IBM i is very different to Unix and that's clear in its native APIs even though
Cocoa and Android APIs don't look much like the basic Unix APIs, it's true, even if they are implemented in terms of them. WASI wants to define APIs at that lower level of abstraction. It's tackling the problem at a different level (the inter-process level) to what object capability _languages_ are tackling (the intra-process level).
NT might have been intended to have a POSIX personality in the very beginnings of the project, but that never happened. People who have tried to make this work over the years have always retreated licking their wounds, including Microsoft themselves. WSL1 tried to use this and failed because NT is too different to UNIX to implement a Linux personality on top, so WSL2 is just a regular VM.
On a related note, I found Thomas Leonard's blog post (2023) on Lambda Capabilities to be a very interesting approach: https://roscidus.com/blog/blog/2023/04/26/lambda-capabilitie...
If I had to guess, the supply chain problems that may eventually cause this to be created will need to get, oh, I don't know, call it two orders of magnitude worse before the system as a whole really takes note. Then, since you can't really write a new language just for this, even though I'd like that to happen, it'll get bodged on to the side of existing languages and it won't be all that slick.
That said, I do think there's probably some 80/20 value in creating an annotation to the effect of "this library doesn't need filesystem access or sockets" and having perhaps a linter or some other external tool validate it externally to the main language compiler/runtime. The point of this would not be to solve the capabilities problem for libraries that are doing intrinsically tricky things, because that's really hard to do correctly, but just to get a lot of libraries out of the line of fire. There's a lot of libraries that already don't need to do those things, and more that could easily be tweaked to just take passed-in file handles or whatever if there was a concrete reason to design them that way.
The library that I personally could do the most damage with on my GitHub is a supervision tree library for Go. It doesn't need any capabilities to speak of. The closest thing is that you can pass in a logger object and that is constrained to specific calls too. Even a hack that just lets me say that this library doesn't need anything interesting would at least get that out of the set of libraries that could be exploited.
Or to put it another way, rather than trying to perfectly label all the code doing tricksy stuff, maybe we can start by labelling the code that doesn't.
I'd also point out that I think the question of libraries is different than things like Chrome isolation. Those things are good, but they're for treating data carefully and limiting blast radiuses; I'm looking at the problem of "if I download this library and miss one single file is it going to upload every AWS token it can find to someone who shouldn't have them".
- Loading native code = granting all permissions
- Access to the unsafe package = granting all permissions
- Many syscalls that write data to user buffers = granting all permissions
- Being able to run a sub-process = granting all permissions
So you need at minimum to exclude all of those too. But then you also have:
- Tampering with global state of any kind e.g. the default HTTP server can be mutated by anything in Go (I think?). If you can modify the logging system to write to a new location you might be able to use that to escape the sandbox.
- Deserialization of objects can be a sandbox escape.
And then what about the threat model? If you can cause every process including your library to segfault simultaneously, then that's a DoS attack on a company that can be profitably used for extortion. Are DoS-driven extortions in-scope or out? This is why System.exit is a permission in the SecurityManager.
And so on. The large number of ways you can accidentally configure an exploitable set of permissions is huge, and because nobody seems to care much, there was no tooling to help avoid such misconfigurations.
LavaMoat (https://lavamoat.github.io/), while not quite object capabilities, builds on HardenedJS to provide runtime supplychain security protections to js apps (nodejs or browser) by eliminating ambient authority and only exposing global capabilities per npm package according user-specified policy. LavaMoat is used in production at MetaMask, protecting ~300M users.
OCapN (https://github.com/ocapn/ocapn/) is a nascent effort to standardize a distributed object capability protocol (transferring capabilities across mutually distrusting peers).
This shows up a subtle detail of sandboxing schemes that are often overlooked. The guarantees Java provides around safety are tightly scoped and often little more than saying the JVM itself won't crash. It's not that hard to arrange for a stack overflow to occur whilst some standard library code is running, which means execution can abort in nearly any place. If the code you're calling into isn't fully exception-safe, it means the libraries global variables (if any) can be left in a logically corrupted state, which might be exploitable.
If Java finally blocks had a filter clause, that could help, but finally is sometimes implicit as with try-with-resources.
https://tutorial.ponylang.io/object-capabilities/object-capa...
It's normal for an application to be built from many independent modules that accept their dependencies as inputs via dependency inversion[2]. The modules are initialized at program start by code that composes everything together. Using the "god object" pattern from the article is basically the same thing.
[1] https://en.wikipedia.org/wiki/God_object [2] https://en.wikipedia.org/wiki/Dependency_inversion_principle
Dependency injection doesn't help here much, at least not with today's languages and injectors. The injector doesn't have any opinion on whether a piece of code should be given something or not, it just hands out whatever a dependency requests. And the injector often doesn't have enough information to precisely resolve what a piece of code needs or resolve it at the right time, so you need workarounds like injecting factories. It could be worth experimenting with a security-aware dependency injector, but if you gave it opinions about security it'd probably end up looking a lot like the SecurityManager did (some sort of config language with a billion fine grained permissions).
The application's entry point takes the God Object of capabilities, slices it accordingly to what it thinks its submodules should be able to access, and then initializes the submodules with the capabilities it has decided upon. Obviously, if some submodules declare that they need access to everything plus a kitchen sink, the choice is either a) give up and give them access to everything; b) look for a replacement that requires less capabilities.
It's pretty easy for me to imagine a world where running code was safe by default, and this followed all the way to the top. It's obviously not that onerous, else JavaScript wouldn't be as successful as it is. Most of the details the post touches on are then just package management and grouping concerns.
Ignoring microcontrollers, and tiny embedded stuff, no hardware or modern operating systems I know of works that way.
Modern hardware almost all has an MMU (which blocks I/O once the process table is set up), and most have an IOMMU (which partitions the hardware mutually distrusting operating systems can run directly on the same machine).
The remaining architectural holes are side channel / timing attacks that hit JS just as hard as bare metal.
L0:
move r0, #0
move r1, #0
load.w r2, [r0]
add r1, r1, r2
store.w [r0], #0
add r0, #wordSize
bz r0, #0, L0
move r0, 42
ret
which tries to calculate a silly hash of all the memory it can reach via indirect loads and then to zero whatever memory it can reach via indirect stores (which is usually just the whole of the process's memory on modern systems in both cases). What mechanisms do you propose that would allow one to blindly run this code without erasing all kinds of precious information in the memory, ideally still returning 42 in r0 to the caller in the end but without leaking any sensitive information via r1?But enlightening: I did not previously know that CHERI had explicit tool (CCall) to implement the unspoofable "restore privileges and return from subroutine" instruction.
Answering the question: “Can this process access this resource?” is equivalent to solving the halting problem.
There’s a reason simpler access control models are popular. Even ACLs are completely untenable in practice. Look at all the trouble accidentally-public s3 buckets create.
Is that line of code executed?
Replace the line with “halt”, and change the question to “Does this program halt?”
Anyways, JAAS' Permission class and model are weak, but yeah, they could be used to limit libraries' capabilities. A capability model would be much better than a permission model.
https://docs.oracle.com/en/java/javase/24/docs/api/java.base...
=> "The following methods in this class for user-based authorization that are dependent on Security Manager APIs are deprecated for removal: "
Spritely seems very relevant, but I don't see it get much mentions when this pops up
In fact, this passage from the article describes very closely what it is like to use Bluefin, yet I bet the author doesn't even know of Bluefin's existence! That makes me think I am on the right lines with my design.
> The first is that if you want the entire application to be written in an object-capability style then its main() method must be given a “god object” exposing all the ambient authorities the app begins with. You then have to write proxy objects that restrict access to this god object whilst implementing standard interfaces.
This part is too pessimistic, however:
> No language has such an object or such interfaces in its standard library, and in fact “god objects” are viewed as violating good object oriented design.
As you correctly point out, cryptonector, Haskell's IO is the "god object". It allows you to do roughly anything. And indeed Bluefin's runEff[2] gives you access to the god object (which is IO wrapped up in an effect (or "capability") called IOE). The rest of the program runs by "peeling parts off" this god object so it only uses the parts that it needs.
I have found this a very satisfactory way to program.
[1] https://hackage.haskell.org/package/bluefin
[2] https://hackage-content.haskell.org/package/bluefin-0.0.15.0...
It's cool how the Java dev community keeps finding programming language theory in Haskell and folding it into Java.
Ok it's not a language-based approach. But interesting project aiming for similar goals as discussed in the article. And a nice counter-example to:
"Not one OS has a sandboxing API that is all of documented, stable and usefully powerful"
vvanders•1d ago
Binder handles work just like object capabilities, you can only use what's sent to you and process can delegate out other binder handles.
Android hides most of this behind their permission model but the capability still exist and can be implemented by anyone in the system.
mike_hearn•13h ago
Binder is also somewhat like Mojo in that you can do fast in-process calls with it, iirc. The problem is that, as you note, this isn't very useful in the Android context because within a process there's no way to keep a handle private. Mojo's ability to move code in and out of processes actually is used by Chrome extensively, usually either for testing (simpler to run everything in-process when debugging) or because not every OS it runs on requires the same configuration of process networks.