It's open source too :) https://github.com/dylibso/mcp.run-servlets/tree/main/servle...
We also use Wasm to sandbox all our servlets https://docs.mcp.run/blog/2025/04/07/mcp-run-security
(I work at Dylibso)
Seems a lot of work to me. Is this really the best way to create and run Python sandboxes?
Other than that... VMs? The fact that people consider JS/WASM engines good security sandboxes is a bit scary tbf.
WASM engines run in almost every browser on earth, billions of times a day. Security problems in those get spotted very quickly.
For example, JS runs in almost every browser on earth too, yet it took V8 devs 2 years to find out that `Math.expm1()` could return -0.0 (https://chromium.googlesource.com/v8/v8.git/+/56f7dda67fdc97...). This is a cherry-picked example, and JS is clearly more complex than WASM, but still.
Just because stuff runs on a lot of devices doesn't mean it's more or less secure.
Linux runs on quite a few devices too, yet we still find bugs, people still don't ship updates to said bugs, yadda yadda yadda.
My point is just that lots of devs often skip the threat modeling and just think "I'll slap it in a WASM thingie an it'll be fine". Well good luck.
To my knowledge they don't yet have a run-Python-in-WASM-on-the-server implementation.
I was looking into using WASM in Python yesterday for some image processing. It requires pulling in a full WASM runtime like wasmtime. Still better than calling out to native binaries like ImageMagick, but definitely more complicated than doing it in Deno. If I was writing it myself I'd do Deno, but LLMs are so good at writing Python.
[1] https://www.erp5.com/NXD-Blog.Scipy.and.Scikit.Learn.Compile... [2] https://donatstudios.com/Read-User-Files-With-Go-WASM
I'm hoping some day to find a recipe I really like for running Python code in a WASM container directly inside Python. Here's the closest I've got, using wasmtime: https://til.simonwillison.net/webassembly/python-in-a-wasm-s...
much better than calling deno, at least if you have no pip dependencies...
just had to update to new api:
# store.add_fuel(fuel) store.set_fuel(fuel) fuel_consumed=fuel-store.get_fuel()
and it works!!
time to hello world: hello_wasm_python311.py 0.20s user 0.03s system 97% cpu 0.234 total
0.000636230 seconds time elapsed
0.000759000 seconds user
0.000000000 seconds sys
That's 36,800% faster. Hand-written assembly was very slightly slower. Using the standard library for output instead of a syscall brought it down to 20,900% faster.(Yes I used percentages to underscore how big the difference is. It's 368x and 209x respectively. That's huge.)
Begrudgingly, here are the standard Python numbers:
real 0m0.019s
user 0m0.015s
sys 0m0.004s
About 1230% faster than the sandbox, i.e. 12.3x. About an order of magnitude, which is typical for these kinds of exercises.Will come with MacOS support very soon :) Does work on Linux
Apple's equivalent is the Apple Virtualization Framework which exposes kvm like functionality at a higher level.
[edit] looks really simple, except I'll have to look into how their raw-exec takes care of writeableRoots: https://github.com/openai/codex/blob/0d6a98f9afa8697e57b9bae...
[edit2] lol raw-exec doesn't do anything at all with writeableRoots, it's handled in the fullPolicy (from scopedWritePolicy)
https://gist.github.com/fzzzy/319d6cbbdfff9c340d0e9c362247ae...
But what would be the usecase for this?
> The code is executed using Pyodide in Deno and is therefore isolated from the rest of the operating system.
To me personally, the premise is a bit naive - it assumes that deno's WASM VM doesn't have exploits, that pyodide doesn't have bugs, etc. It might as well ask the LLM to produce javascript code and run it under deno and then it would be simpler.
In the end, the problem is one of risk budget. If you're running this in a VM you control and it's only you running your own prompts on it, maybe it's "good enough". If on the other hand, you want to sell this service to others who will attack your infrastructure, then no - it's not even close to be enough.
Your question is a bit vague because it doesn't explain what "best way" means for you. Cheap, secure, implementable by a person over a weekend?
Eh, I wouldn't call this naive. Two points:
1. Pyodide bugs should not be a huge concern here. As long as your python code is executing on top of a JS runtime, the runtime is what matters first and foremost from a security pov.
2. Yes, it's possible for Deno to have bugs. But frankly: it's much less likely to than most any other method for doing this sort of sandboxing. Deno sits on v8, which is the engine used by Chrome, and there are very few applications in the world which have a closer eye and larger dedicated security budget than Chrome. V8 can have bugs, sure, but I would expect they (along with JSC and maybe SpiderMonkey) will have far fewer than any other runtime for a serious dynamic language on the market today.
Yes, a VM would be better (and frankly, when you're talking about running Python on top of a JS runtime, might not even be less performance), but the reason why is not that they "have fewer bugs".
At Temporal, we required a sandbox but didn't have any security requirement, so we wrote it from scratch with eval/exec and a custom importer [0]. It is not a foolproof sandbox, but it does a good job at isolating state, intercepting and preventing illegal calls we don't like, and allowing some imports to "pass through" the outside instead of being reloaded for performance reasons.
0 - https://github.com/temporalio/sdk-python?tab=readme-ov-file#...
> but it does a good job at isolating state, intercepting and preventing illegal calls we don't like
Sounds like they put the reason just there.I suspect the downvotes are for “… stupid AI crap.”
But I like using WASM especially in a hosted environment like Deno. It feels like a more scaleable solution and probably less maintenance too with the downside that that we wont be able to run just any cmd.
I am happy to provide more details and point to the tool is anyone is interested. It is not open-source but you can play with it for free.
What agent framework is truly the top dog? Is it just working with the big model providers native frameworks, such as OpenAI’s Agents SDK?
Short-ish version:
ANTHROPIC_API_KEY="$(llm keys get anthropic)" \
uv run --with devtools --with pydantic-ai python -c '
import asyncio
from devtools import pprint
from pydantic_ai import Agent, capture_run_messages
from pydantic_ai.mcp import MCPServerStdio
server = MCPServerStdio(
"deno",
args=[
"run",
"-N",
"-R=node_modules",
"-W=node_modules",
"--node-modules-dir=auto",
"jsr:@pydantic/mcp-run-python",
"stdio",
],
)
agent = Agent("claude-3-5-haiku-latest", mcp_servers=[server])
async def main():
with capture_run_messages() as messages:
async with agent.run_mcp_servers():
result = await agent.run("How many days between 2000-01-01 and 2025-03-18?")
pprint(messages)
print(result.output)
asyncio.run(main())'
Output here: https://gist.github.com/simonw/54fc42ef9a7fb8f777162bbbfbba4...I got it running against Mistral Small 3.1 running locally too - notes on that here: https://simonwillison.net/2025/Apr/18/mcp-run-python/
turnsout•1d ago