And then this post today which makes a very strong case for it. (Yes, a VM isn’t an entire OS, Yes, it would be lighter weight than a complete OS. Yes, it would be industry-wide. Yes, we’d likely use an existing OS or codebase to start. Yes, nuance.)
I'm going to put a lot of work in anyway to keep the LLM from accidentally overwriting the code running it or messing with customer data incorrectly or not being overwhelmed with implementation details; having a standard for this makes it much easier and lets me rely on other people's model training.
If it's merely that I have to train a dev on an XR SDK, I can pay them a salary or encourage schools to teach it. AI needs an team for an R&D project and compute time, which can get a lot more expensive at the high end.
Control tool access like OSes enforce file permissions: I understand it’s a metaphor, but also isn’t the track record of OSes here pretty bad?
Check whether the agent is allowed to use the booking tool: so a web browser? Isn’t a browser a pretty powerful general-purpose tool, which by the way could also expose the agent to, like, a jailbreak?
> As such, security researchers have to devise new mitigations to prevent AI models taking adversarial actions even with the virtual machine constraints.
An understated reminder that yes, we really ought to solve alignment.
All Hands is incredibly friendly and responsive to feedback as well, and that means a lot.
From a hosting perspective which the article talks about I would worry more about just keeping the AI agent functional/alive in a whatever environment a big challenge, using AI a great but stability in any basically use-case has been rough for me personally.
From a developer perspective I've been using devcontainers with rootless docker via wsl and while I'm sure there's some malware that can bypass that (where this VM approach would be a lot stronger) I feel a lot safer this way than running things on the host OS. Furthermore you get the same benefits like reproducibility and separation of concerns and whenever the AI screws something up in your environment you can simply rebuild the container.
Rather than requiring a new OS, I think a Fuchsia-like system based on WASM and WASI components, that can be hosted from the cloud to the phone, is likely the way this goes.
The agent running in a VM - at least by default - was a key feature during the AI pilot I ran a few months ago.
It's just too bad tcl, lua, forth, js, wasm, etc just aren't AI-scale.
[1] I'm looking at single-person/small office LLMs to do simple jobs: summarize these pdfs, structure this data, help drafting a document, that sort of thing, not a be-all end-all monster. Think of a bunch of highly intelligent Python scripts as opposed to Microsoft Office.
For example, in the book-a-ticket scenario - I want it to be able to check a few websites to compare prices, and I want it to be able to pay for me.
I don't want it to decide to send me to a 37 hour trip with three stops because it is 3$ cheaper.
Alternatively, I want to be able to lookup my benefits status, but the LLM should physically not be able to provide me any details about the benefits status of my coworkers.
That is the _same_ tool cool, but in a different scope.
For that matter, if I'm in HR - I _should_ be able to look at the benefits status of employees that I am responsible for, of course, but that creates an audit log, etc.
In other words, it isn't the action that matters, but what is the intent.
LLM should be placed in the same box as the user it is acting on-behalf-of.
If the knowledge is one-sided, then so is the ability to negotiate. This benefits nobody except the company which already had an advantageous position in negotiations.
What benefits an employee is _eligible_ for - sure, no problem with that being public. What they chose and how they’re using them should be protected.
(Imagine finding out a coworker you thought was single is on the spouse+benefits plan!)
This would cause me to.... do a double take?
In this example, I might want an LLM instance to be able to talk to booking websites, but not send them my SSN and bank account info.
So there's a data provenance and privilege problem here. The more sensitive data a task has access too, the more restricted its actions need to be, and vice-versa. So data needs to carry permission information with it, and a mediator needs to restrict either data or actions that tasks have as they are spawned.
There's a whole set of things that need to be done at the mediator level to allow for parent tasks to safely spawn different-privileged child tasks - eg, the trip planner task spawns a child task to find tickets (higher network access) but the mediator ensures the child only has access to low-sensitive data like a portion of the itinerary, and not PII.
In that light, it's kind of hard to imagine any of this ever working. Given the choice between figuring out exactly how to set up permissions so that I can hire a malicious individual to book my trip, and just booking it myself, I know which one I'd choose.
It's very coarse grained and it's kind of surprising that bad things don't happen more often.
It's also very limiting: very large organizations have enough at stake to generally try to deserve that trust. But most savvy people wouldn't trust all their financial information to Bob's Online Tax Prep.
But what if you could verify that Bob's Online Tax Prep runs in a container that doesn't have I/O access, and can only return prepared forms back to you? Then maybe you'd try it (modulo how well it does the task).
So I think this is less of an AI problem and just a software trust problem that AI just exacerbates a lot.
The danger is when you're calling anything free-form. Even if getting a vetted listing from Airbnb, the listing may have a review that tells AI to re-request the listing, but with password or PII in the querystring to get more information, or whatever. In this case, if any PII is anywhere in the context for some reason, even if the agent doesn't have direct access to it, then it will be shared, without violating any permissions you gave the agent.
This is actually pretty nice because you can check each step for risks independently, and then propagate possible context leaks across steps as a graph.
There's still potential of side channel stuff, like it could write your password to some placeholder like a cookie during the login step, when it has read access to one and write access to the other, and then still exfiltrate it a subsequent step even after it loses access to the password and context has been wiped.
Maybe that's a reasonably robust approach? Or maybe there are still holes it doesn't cover, or the side channel problem is unfixable. But high level it seems a lot better than just providing a single set of permissions for the whole workflow.
The model is simple and LLM agent js a user. Another user on the machine. And given the context it is working it, it is given permissions. Ex. It has read/write permissions under this folder of source code, but read only permissions for this other.
Those permissions vary by context. The LLM Agent working on one coding project would be given different permissions than if it were working on a different project on the same machine.
The permissions are an intersection or subset of the user's permissions that is is running on behalf of. Permissions fall into 3 categories. Allow, Deny and Ask - where it will ask an accountable user if it is allowed to do something. (Ie ask the user on who's behalf it is running if it can perform action x).
The problem is that OSes (and apps and data) generally aren't fine grained enough in their permissions, and will need to become so. It's not that an LLM can or can't use git, it should only be allowed to use specific git commands. Git needs to be designed this way, along with many more things.
As a result we get apps trying to re-create this model in user land and using a hodge-podge of regexes and things to do so.
The workflow is: similar to sudo I launch and app as my LLM Agent user. It inherits its default permissions. I give it a context to work in, it is granted and/or denied permissions due to being in that context.
I make requests and it works on my behalf doing what I permit it to do, and it never can do more than what I'm allowed to do.
Instead now every agentic app needs to rebuild this workflow or risk rogue agents. It needs to be an OS service.
The hacky stepping stone in betwern is to create a temporary user per agent context/usage. Grant that user perms and communicate only over IPC / network to the local LLM running as this user. Though you'll be spinning up and deleting a lot of user accounts in the process.
Unfortunately, no mainstream OS actually implements the capability model, despite some prominent research attempts [2], some half-hearted attempts at commercializing the concept that have largely failed in the marketplace [3], and some attempts to bolt capability-based security on top of other OSes that have also largely failed in the marketplace [4]. So the closest thing to capability-based security that is actually widely available in the computing world is a virtual machine, where you place only the tools that provide the specific capabilities you want to offer in the VM. This is quite imperfect - many of these tools are a lot more general than true capabilities should be - but again, modern software is not built on the principle of least privilege because software that is tends to fail in the marketplace.
[1] https://en.wikipedia.org/wiki/Capability-based_security
Fingers crossed that this is going to change now that there is increased demand due to AI workflows.
The dynamic that led to the Principle of Least Privilege failing in the market is that new technological innovations tend to succeed only when they enter new virgin territory that isn't already computerized, not when they're an incremental improvement over existing computer systems. And which markets will be successful tends to be very unpredictable. When you have those conditions, where new markets exist but are hard to find, the easiest way to expand into them is to let your software platforms do the greatest variety of things, and then expose that functionality to the widest array of developers possible in hopes that some of them will see a use you didn't think of. In other words, the opposite of the Principle of Least Privilege.
This dynamic hasn't really changed with AI. If anything, it's accelerated. The AI boom kicked off when Sam Altman decided to just release ChatGPT to the general public without knowing exactly what it was for or building a fully-baked idea. There's going to be a lot of security misses in the process, some possibly catastrophic.
IMHO the best shot that any capability-based software system has for success is to build out simplified versions of the most common consumer use-cases, and then wait for society to collapse. Because there's a fairly high likelihood of that, where the security vulnerabilities in existing software just allow a catastrophic compromise of the institutions of modern life, and a wholly new infrastructure becomes needed, and at that point you can point out exactly how we got this point and how to ensure it never happens again. On a small scale, there's historical precedence for this: a lot of the reason webapps took off in the early 2000s was because there was just a huge proliferation of worms and viruses targeting MS OSes in the late 90s and early 2000s, and it got to the point where consumers would only use webapps because they couldn't be confident that random software downloaded off the Internet wouldn't steal their credit card numbers.
This is a major reason why capability security has failed in the marketplace.
And totally agree that instead of reinventing the wheel here, we should just lift from how operating systems work, for two reasons:
1. there's a bunch of work and proven systems there already
2. it uses tools that exist in training data, instead of net new tools
This sounds hard; as in: if you can define and enforce what a good enough response from an LLM looks like, you don't really need the LLM
> what is the intent.
For the HR person you have a human with intents you can ask; for an LLMs it's harder as they don't have intents
Even if the LLM is capable of it, websites will find some method to detect an LLM, and up the pricing. Or mess with its decision tree.
Come to think of it, with all the stuff on the cusp, there's going to be an LLM API. After all, it's beyond dumb to spent time making websites for humans to view, then making an LLM spend power, time, and so on in decoding that back to a simple DB lookup.
I'm astonished there isn't an 'rss + json' API anyone can use, without all the crap. Hell, BBS text interfaces from the 70s/80s, or SMS menu systems from early phone era are far superior to a webpage for an LLM.
Just data, and choice.
And why even serve an ad to an LLM. The only ad to serve to an LLM, is one to try to trick it, mess with it. Ads are bad enough, but to be of use when an LLM hits a site, you need to make it far more malign. Trick the LLM into thinking the ad is what it is looking for.
EG, search for a flight, the ad tricks the LLM into thinking it got the best deal.
Otherwise of what use is an ad? The LLM is just going to ignore ads, and perform a simple task.
If all websites had RSS, and all transactional websites had a standard API, we'd already be able to use existing models to do things. It'd just be dealing with raw data.
edit: actually, hilarious. Why not? AI is super simple to trick, at least at this stage. An ad company specifically tailoring AI would be awesome. You could divert them to your website, trick them into picking your deal, have them report to their owner that your company was the best, and more.
Super simple to do, too. Hmm.
In this case, I don't even know if we're in the paradigm of "hard to satisfy types"--a lot of the time you can for example probably use an autobooking feature to get something you'd be okay with as a backup, but since you know it is suboptimal you still want to try to do better if possible. There are also plenty of real world control systems which perform fairly involved calculations, but still perform some basic sanity limits checks on the inputs and outputs to make sure that if the calculations screwed up, things don't fail catastrophically. In such cases the limits are much easier to define than a spec for how the whole thing works.
Yes. Are they proposing a virtual machine execution engine? Docker for LLMs? Or what? This looks like some kind of packaging thing.
Badly designed packaging systems are a curse. Look at how many Python has gone through.
We have user accounts, Read/Write/Exec for User/Groups. Read can grant access tokens which solves temporary+remote requirements. Every other capabilities model can be defined in those terms.
I'd much rather see a simplification of the tools already available, then re-inventing another abstract machine / protocol.
I hope we'll eventually get a fundamental shift in the approach to software as a whole. Currently, everybody is still experimenting with building more new stuff, but it is also a great opportunity to re-evaluate and, at acceptable cost, try to strip out all the cruft and reduce something to its simplest form.
For example - I found an MCP server I liked. Told Claude to remove all the mcp stuff and put it into a CLI. Now I can just call that tool (without paying the context cost). Took me 10 minutes. I doubt, Claude is smart enough to build it back in without heavy guidance.
I think defense in depth will eventually matter more, but there are a LOT of low-hanging fruit for attackers right now when it comes to turning AI agents against their users, which is what I think you’re alluding to!
Of course if the user truly desires a zero-guardrail experience they should be able to get that, but it probably shouldn’t be the default. Software should be on a very short leash until the user has indicated trust, and even then privileges should be granted only on a per-domain basis. A program designed to visually represent disk usage will need full filesystem access for example, but there’s no reason it should be able to sniff around on my local network (or on platforms where package managers handle updates, connect to the internet at all).
Note, this is the case whether running in VM or not, so I agree that VM is not a security solution.
The VM analogy is simply insufficient for securing LLM workflows where you can't trust the LLM to do what you told it to with potentially sensitive data. You may have a top-level workflow that needs access to both sensitive operations (network access) and sensitive data (PII, credentials), and an LLM that's susceptible to prompt injection attacks and general correctness and alignment problems. You can't just run the LLM calls in a VM with access to both sensitive operations and data.
You need to partition the workflow, subtasks, operations, and data so that most subtasks have a very limited view of the world, and use information-flow to track data provenance. The hopefully much smaller subset of subtasks that need both sensitive operations and data will then need to be highly trusted and reviewed.
This post does touch on that though. The really critical bit, IMO, is the "Secure Orchestrators" part, and the FIDES paper, "Securing AI Agents with Information-Flow Control" [1].
The "VM" bit is running some task in a highly restricted container that only has access to the capabilities and data given to it. The "orchestrator" then becomes the critical piece that spawns these containers, gives them the appropriate capabilities, and labels the data they produce correctly (taint-tracking: data derived from sensitive data is sensitive, etc.).
They seem on the right track to me, and I know others working in this area who would agree. I think they need a better hook than "VMs for AI" though. Maybe "partitioning" or "isolation" and emphasize the data part somehow.
The next step is to not provide any tools to the LLM, and ask it to invent them on-the-fly. Some problems need to be brute-forced.
If you're curious to see one real-life implementation of this (I'm sure there are others), we're pretty far along in doing this with Dagger:
- We already had system primitives for running functions in a sandboxed runtime
- We added the ability for functions to 1) prompt LLMs, and 2) pass other functions to the LLMs as callbacks.
- This way, a function can call LLMs, a LLM can call functions, in any permutation.
- This allows exploring the full spectrum from fully deterministic workflows, to autonomous agents, and everything in between - without locking yourself in a particular programming language, library or framework.
- We've also experimented with passing objects to the LLM, and mapping each of the object's methods to a tool call. This opens interesting possibilities, since the objects can carry state - effectively extending the LLM's context from text only, to arbitrary structured data, without additional dependencies like complex databases etc.
Relevant documentation page: https://docs.dagger.io/features/llm
From a security perspective, the real problem seems to me that LLMs cannot distinguish between instructions and data; I don't see how this proposal even attempts to address this, but then I haven't really understood their problem description (if there was one).
mehulashah•6h ago
csmpltn•6h ago
There's a point beyond which LLMs are an overkill, where a simple script or a "classic" program can outdo the LLM across speed, accuracy, scalability, price and more. LLMs aren't supposed to solve "universal computing". They are another tool in the toolbox, and it's all about using the right tool for the problem.
mehulashah•6h ago
baby_souffle•6h ago
Let’s see if that continues to be the case after some time. On a long enough timeline, that 100 line python script that is deterministic is going to beat the non deterministic llm.
They are a tool. They re not an omnitool.
Imustaskforhelp•5h ago
The problem is not the tool, the problem are the people selling the hype, the people accepting the hype as it is and this "everyone" person who I keep on listening about, I wonder why his takes are often times wrong but he doesn't accept them...
People can be wrong, People are wrong in a lot of contexts, I don't think the world is efficient in this sense. We are emotional beings, sell us hype and we would accept it and run with it.
Agree on how it is not an omnitool. Why are we going towards an inferior product (AI for VMS??) when this doesn't make sense when we already have superior product (I think)
I guess somethings don't make sense and if everyone jumps in a well, most people would follow it (maybe including myself, if I didn't want to be contrarian in such sense, my knowledge is really limited regarding AI so all of the above statements can be wrong and hey I am wrong I usually am)
kibwen•6h ago
mehulashah•2h ago
greenavocado•6h ago
dingnuts•5h ago
Swizec•5h ago
Yeah and when I was in college, StackOverflow was full of questions like “how do I add 2 numbers with jQuery”. This is normal. The newbies don’t know what they’re doing and with time they will get enough hard knocks to learn. We’ve all gone through this and still are in areas that are new to us.
LLMs aren’t gonna solve the fundamentals: Seniors still gotta senior and newbies still gotta learn.
vjvjvjvjghv•6h ago
lloydatkinson•6h ago
eldenring•6h ago
narrator•6h ago
moduspol•6h ago
mehulashah•2h ago