> Runs in an isolated sandbox Every task runs in a secure, isolated Daytona sandbox.
Oh, so fake open source? Daytona is an AGPL-licensed codebase that doesn't actually open-source the control plane, and the first instruction in the README is to sign up for their service.
> From the "open-swe" README:
Open SWE can be used in multiple ways:
* From the UI. You can create, manage and execute Open SWE tasks from the web application. See the 'From the UI' page in the docs for more information.
* From GitHub. You can start Open SWE tasks directly from GitHub issues simply by adding a label open-swe, or open-swe-auto (adding -auto will cause Open SWE to automatically accept the plan, requiring no intervention from you). For enhanced performance on complex tasks, use open-swe-max or open-swe-max-auto labels which utilize Claude Opus 4.1 for both planning and programming. See the 'From GitHub' page in the docs for more information.
* * *
The "from the UI" links to their hosted web interface. If I cannot run it myself it's fake open-source
How can it be AGPL and not provide full source? AGPL is like the most aggressive of the GPL license variants. If they somehow circumvented the intent behind this license that is a problem.
I hit an error that was not recoverable. I'd love to see functionality to bring all that context over to a new thread, or otherwise force it to attempt to recover.
This caught my eye too. Given they say 'most', what other tools that support this?
It's not a super surprising coming from this pole of over engineering so thick I'm surprised it wasn't developed by Microsoft in the 90s or 00s
1: https://github.com/sst/opencode
dabockster•6mo ago
> Run asynchronously in the cloud
> cloud
Reality check:
https://huggingface.co/Menlo/Jan-nano-128k-gguf
That model will run, with decent conversation quality, at roughly the same memory footprint as a few Chrome tabs. It's only a matter of time until we get coding models that can do that, and then only a further matter of time until we see agentic capabilities at that memory footprint. I mean, I can already get agentic coding with one of the new Qwen3 models - super slowly, but it works in the first place. And the quality matches or even beats some of the cloud models and vibe coding apps.
And that model is just one example. Researchers all over the world are making new models almost daily that can run on an off-the-shelf gaming computer. If you have a modern Nvidia graphics card, you can run AI on your own computer totally offline. That's the reality.
koakuma-chan•6mo ago
dabockster•6mo ago
koakuma-chan•6mo ago
toshinoriyagi•6mo ago
It outperforms those other models, which are not using tools, thanks to the tool use and specificity.
Because it is only 4B parameters, it is naturally terrible at other things I believe-it's not designed for them and doesn't have enough parameters.
In hindsight, "MCP-based methodology" likely refers to its tool-use.
cbcoutinho•6mo ago
> Most language models face a fundamental tradeoff where powerful capabilities require substantial computational resources. We shatter this constraint with Jan-nano, a 4B parameter language model that redefines efficiency through radical specialization: instead of trying to know everything, it masters the art of finding anything instantly. Fine-tuned from Qwen3-4B using our novel multi-stage Reinforcement Learning with Verifiable Rewards (RLVR) system that completely eliminates reliance on next token prediction training (SFT), Jan-nano achieves 83.2% on SimpleQA benchmark with MCP integration while running on consumer hardware. With 128K context length, Jan-nano proves that intelligence isn't about scale, it's about strategy.
> For our MCP evaluation, we used mcp-server-serper which provides google search and scrape tools
https://arxiv.org/abs/2506.22760
Martinussen•6mo ago
merelysounds•6mo ago
Also, usage patterns can be different; with storage, if I use 90% of my local content only occasionally, I can archive that to the cloud and continue using the remaining local 10%.
prophesi•6mo ago
Of course, the line will always be pushed back as frontier models incrementally improve, but the quality is night and day between these open models consumers can feasibly run versus even the cheaper frontier models.
That said, I too have no interest in this if local models aren't supported and hope that's down the pipeline just so I can try tinkering with it. Though it looks like it utilizes multiple models for various tasks (planner, programmer, reviewer, router, and summarizer) so that only adds to the difficulty of the VRAM bottleneck if you'd like to load different models per task. So I think it makes sense for them to focus on just Claude for now to prove the concept.
edit: I personally use Qwen3 Coder 30B 4bit for both autocomplete and talking to an agent, and switch to a frontier model for the agent when Qwen3 starts running in circles.
diggan•6mo ago
Tiny correction: Even without quantization, you can run GPT-OSS-120B (with full context) on around ~60GB VRAM :)
prophesi•6mo ago
> Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory.
If you _could_ fit it within ~60GB VRAM, the variability of the amount of VRAM required for certain context lengths and prompt sizes would OOM pretty quickly.
edit: Ah and MXFP4 in itself is a quantization, just supposedly closer to the original FP16 than the rest with a smaller VRAM requirement.
diggan•6mo ago
No, the numbers I put above is literally the VRAM usage I see when I load 120B with llama.cpp, it's a real-life number, not theoretical :)