[0] - https://unikraft.org
Some of us are still fighting for docker images to not include a vim install ("but it's so handy!") and here we've got madlads building their app as its own bootable machine image.
https://github.com/unikraft/unikraft/issues/414
Also - one needs to be careful cause many of the workloads they advertise on their site do not actually run under their kernel - it runs under linux which breaks a completely different type of trust barrier.
As for trust/full disclosure - I'm with nanovms.com
My point is that you shouldn't go around talking about how "secure" you are when you have large gaping things like this. This btw is not the only major security issue they have.
I'd be curious to hear from someone at Google if gVisor gets a ton of internal use there, or it really was built mainly for GCP/GKE
Poor I/O performance and a couple of missing syscalls made it hard to predict how your app was going to behave before you deployed it.
Another example of a switch like this is WSL 1 to WSL 2 on Windows.
It seems like unless you have a niche use case, it's hard to truly replicate a full Linux kernel.
This causes a few issues: - the proxying can be slightly slower - its not a vm, so you cannot use things such as confidential compute (memory encryption) - you can't instrument all syscalls, actually (most work, but there's a few edges cases where it wont and a vm will work just fine)
On the flip side, some potential kernel vulnerabilities will be blocked by gvisor, while it wont in a vm (where it wouldnt be a hypervisor escape, but you'd be able to run code as the kernel).
This is to say: there are some good use cases for gVisor, and there's less of these than for (micro) vms in general.
Google developed both gVisor and crosvm (firecracker and others are based on it) and uses both in different products.
AFAIK, there isn't a ton of gVisor use internally if its not already in the product, though some use it in Borg (they have a "sandbox multiplexer" called vanadium where you can pick and choose your isolation mechanism)
For indexing most languages, we didn't need it, because they were pretty well supported on borg stack with all the Google internals. But Kythe indexes 45 different languages, and so inevitably we ran into problems with some of them. I think it was the newer python indexer?
> really was mainly for GCP/GKE
I mean... I don't know. That could also be true. There's a whole giant pile of internal software at Google that starts out as "built for <XYZ>, but then it gets traction and starts being used in a ton of other unrelated places. It's part of the glory of the monorepo - visibility into tooling is good, and reusability is pretty easy (and performant), because everyone is on the same build system, etc.
But all of those is still less than 30. What am I missing?
1. The core stack of internal (or internally created but also external) - protobuf, gcl, etc
2. Some more well-known languages that aren't as big in Google, but are still used and people wrote indexers for: C#, lisp, Haskell, etc.
3. All the random domain specific langs that people built and then worte indexers for.
There's a bunch more that don't have indexers too.
LD_PRELOAD simply loads a library of your choice that executes code in the process context, that's all. folks usually do this when they cannot recompile or change the running binary, which means they also hook and/or overwrite functions of the said program.
generally folks will have gvisor calls integrated to their sandbox code, before the target process starts, so no need for preloading anything in most cases
gVisor's achilles heel is it's missing or inaccurate syscalls, but the gVisor team is first class in responding to Github issues so it's really quite manageable in practice if you know how to debug and hack on a userspace kernel.
Is gVisor a Kernel or a syscall + select subsystems (like network/gpu) proxy? In my head, a monolith Kernel (like Linux) does more than just syscalls (like memory management, device management, filesystems etc).
I am way out of my depth here, but can anyone make a comparison with the "micro virtual machines" concept?
hyperlight shaves way more off - (eg: no access to various devices that you'd find via qemu or firecracker) it does make use of virtualization but it doesn't try to have a full blown machine so it's better for things like embedding simple functions - I actually think it's an interesting concept but it is very different than what firecracker is doing
EDIT: It seems that gVisor has a KVM mode too. https://gvisor.dev/docs/architecture_guide/platforms/#kvm
I've been working on KVMServer [2] recently which uses TinyKVM to run existing Linux server applications by intercepting epoll calls. While there is a small overhead to crossing the KVM boundary to handle sys calls we get the ability to quickly reset the state of the guest. This means we can provide per-request isolation with an order of magnitude less overhead than alternative approaches like forking a process or even spinning up a v8 isolate.
[1] Previous discussion: https://news.ycombinator.com/item?id=43358980
I suspect harvesting VM state from a production workload would be counterproductive to the goal of isolation.
I’m not saying it’s not helpful. I’m just flagging that JIT research is pretty clear that the performance improvements from JIT are hugely dependent on actually running the realistic code paths and data types that you see over and over again. If there’s divergence you get suboptimal or even negative gains because the JIT will start generating code for the misoptimization which you actually don’t care about. If you actually have control of the JIT then you can mitigate some of these problems but it sounds like you don’t in which case it’s something to keep in mind as a problem at scale. ie could end up being 5-10% of global compute I think if all your traffic is JIT and certainly would negatively impact latencies of this code running on your service. Of course I’m sure you’ve got bigger technical problems to solve. It’s a very interesting approach for sure. Great idea!
For the Varnish TinyKVM vmod they brought up examples of running image transcoding which is definitely something that benefits from per request isolation given the history of exploits for those kinds of C/C++ libraries.
It's worth noting that Cloudflare/AWS Lambda don't have per-request isolation and that's pretty important for server side rendering use cases where code was initially written with client side assumptions.
Not sure this will ever turn into a business for me personally - my motivation is in trying to regain some of the simplicity of the CGI days without giving up the performance gains of modern software stacks. Though it would be helpful to have a production workload to improve at some point.
> It's worth noting that Cloudflare/AWS Lambda don't have per-request isolation and that's pretty important for server side rendering use cases where code was initially written with client side assumptions.
It wasn’t just because of SSR. There’s numerous opportunities for security vulnerabilities because of request confusion in global state. Per request isolation is definitely something Cloudflare would enable if they had a viable solution from that perspective. As such it’s irrelevant the language you write it in - Rust is as equally vulnerable to this problem as JS or anything else.
> If you don't then deploying a container to AWS Lambda or GCP Cloud Run is already pretty easy
Yea, but cloud functions like your talking about are best for running at the edge as close to the user as possible, not for traditional centralized servers. It also promotes a very different programming paradigm that when you fit into it is significantly cheaper to run and maintain because you can decompose your service.
> It might be possible to offer better cold start performance with the TinyKVM approach, but that is still an unknown.
https://blog.cloudflare.com/eliminating-cold-starts-with-clo...
You’d want to start prewarming an instance to be ready to handle the request when a TLS connection for a function comes in.
ericpauley•6mo ago
sidewndr46•6mo ago
bananapub•6mo ago
surajrmal•6mo ago
tptacek•6mo ago
surajrmal•6mo ago
leetrout•6mo ago
abound•6mo ago
ericpauley•6mo ago
PhilippGille•6mo ago
> And, long story short, we now have an implementation of certificate-based SSH, running over gVisor user-mode TCP/IP, running over userland wireguard-go, built into flyctl.
tptacek•6mo ago
https://fly.io/blog/our-user-mode-wireguard-year/
https://fly.io/blog/jit-wireguard-peers/
This is another one of those things where the graph of our happiness about a technical decision is sinusoidal. :)
tptacek•6mo ago
jchw•6mo ago
tptacek•6mo ago
quotemstr•6mo ago
The concept of an OS still makes sense on a system with no privilege level transitions and a single address space (e.g. DOS, FreeRTOS): therefore, mystical low level register goo isn't essential to the concept.
The boundary between the OS is a lot more porous and a lot less arcane than people imagine. In the end, it's just software.
jchw•6mo ago
gVisor's modular design seems to have been its strongest point. It's not that nobody understood the OS is just software or whatever, but actually ripping the Linux TCP stack out and using it in userland isn't really that trivial. Meanwhile though a lot of projects have made use of the gVisor networking components, since they're pretty self-contained.
I think gVisor is one of the coolest things written in Go, although it's not really that easy to convey why.
Seriously, just check out the list of packages in the pkg directory:
https://pkg.go.dev/gvisor.dev/gvisor
(I should acknowledge, though, that I don't know of that many unique use cases for all of these packages; and while the TCP stack is very useful, it's mainly used for Wireguard tunneling and user mode TCP stacks are not particularly new. Still, the gVisor network stack is nicer than hacked together stuff using SLiRP-derived code imo.)