Open SWE: An open-source asynchronous coding agent

https://blog.langchain.com/introducing-open-swe-an-open-source-asynchronous-coding-agent/

27•palashshah•3h ago

Comments

dabockster•2h ago

> We believe that all agents will long more like this in the future - long running, asynchronous, more autonomous. Specifically, we think that they will:

> Run asynchronously in the cloud

> cloud

Reality check:

https://huggingface.co/Menlo/Jan-nano-128k-gguf

That model will run, with decent conversation quality, at roughly the same memory footprint as a few Chrome tabs. It's only a matter of time until we get coding models that can do that, and then only a further matter of time until we see agentic capabilities at that memory footprint. I mean, I can already get agentic coding with one of the new Qwen3 models - super slowly, but it works in the first place. And the quality matches or even beats some of the cloud models and vibe coding apps.

And that model is just one example. Researchers all over the world are making new models almost daily that can run on an off-the-shelf gaming computer. If you have a modern Nvidia graphics card, you can run AI on your own computer totally offline. That's the reality.

koakuma-chan•1h ago

Do you know what "MCP-based methodology" is? I am skeptical of a 4B model scoring twice as high as Gemini 2.5 Pro

dabockster•1h ago

Yeah I know about Model Context Protocol. But it's still only a small part of the AI puzzle. I'm saying that we're at a point now where a whole AI stack can run, in some form, 100% on-device with okayish accuracy. When you think about that, and where we're headed, it makes the whole idea of cloud AI look like a dinosaur.

koakuma-chan•1h ago

I mean, I am asking what "MCP-based methodology" is, because it doesn't make sense for a 4B model to outperform Gemini 2.5 Pro et al by that much.

toshinoriyagi•39m ago

I'm not too sure what "MCP-based methodology" is, but Jan-nano-128k is a small model specifically designed to be able to answer in-depth questions accurately via tool-use (researching in a provided document or searching the web).

It outperforms those other models, which are not using tools, thanks to the tool use and specificity.

Because it is only 4B parameters, it is naturally terrible at other things I believe-it's not designed for them and doesn't have enough parameters.

In hindsight, "MCP-based methodology" likely refers to its tool-use.

Martinussen•1h ago

Data storage has gotten cheaper and more efficient/manageable every year for decades, yet people seem content with having less storage than a mid-range desktop from a decade and a half ago, split between their phone and laptop, and leaving everything else to the "> cloud" - I wouldn't be so sure we're going to see people reach for technological independence this time either.

merelysounds•49m ago

One factor here is people preferring portable devices. Note that portable SSDs are also popular.

Also, cloud storage solutions (like archiving or collaboration) have different usage patterns than AI so far.

prophesi•49m ago

I'm also excited for local LLM's to be capable of assisting with nontrivial coding tasks, but we're far from reaching that point. VRAM remains a huge bottleneck for even a top-of-the-line gaming PC to run them. The best these days for agentic coding that get close to the vibe-check of frontier models seem to be Qwen3-Coder-480B-A35B-Instruct, DeepSeek-Coder-V2-236B, GLM 4.5, and GPT-OSS-120B. The latter being the only one capable of fitting on a 64 to 96GB VRAM machine with quantization.

Of course, the line will always be pushed back as frontier models incrementally improve, but the quality is night and day between these open models consumers can feasibly run versus even the cheaper frontier models.

That said, I too have no interest in this if local models aren't supported and hope that's down the pipeline just so I can try tinkering with it. Though it looks like it utilizes multiple models for various tasks (planner, programmer, reviewer, router, and summarizer) so that only adds to the difficulty of the VRAM bottleneck if you'd like to load different models per task. So I think it makes sense for them to focus on just Claude for now to prove the concept.

edit: I personally use Qwen3 Coder 30B 4bit for both autocomplete and talking to an agent, and switch to a frontier model for the agent when Qwen3 starts running in circles.

cowpig•1h ago

I was excited by the announcement but then

> Runs in an isolated sandbox Every task runs in a secure, isolated Daytona sandbox.

Oh, so fake open source? Daytona is an AGPL-licensed codebase that doesn't actually open-source the control plane, and the first instruction in the README is to sign up for their service.

> From the "open-swe" README:

Open SWE can be used in multiple ways:

* From the UI. You can create, manage and execute Open SWE tasks from the web application. See the 'From the UI' page in the docs for more information.

* From GitHub. You can start Open SWE tasks directly from GitHub issues simply by adding a label open-swe, or open-swe-auto (adding -auto will cause Open SWE to automatically accept the plan, requiring no intervention from you). For enhanced performance on complex tasks, use open-swe-max or open-swe-max-auto labels which utilize Claude Opus 4.1 for both planning and programming. See the 'From GitHub' page in the docs for more information.

* * *

The "from the UI" links to their hosted web interface. If I cannot run it myself it's fake open-source

mitchitized•1h ago

Hol up

How can it be AGPL and not provide full source? AGPL is like the most aggressive of the GPL license variants. If they somehow circumvented the intent behind this license that is a problem.

esafak•55m ago

It's a hosted service with an open source client?

tevon•1h ago

Very cool! Am using it now and really like the sidebar chat that allows you to add context during a run.

I hit an error that was not recoverable. I'd love to see functionality to bring all that context over to a new thread, or otherwise force it to attempt to recover.

OpenAI bringing back GPT-4o to ChatGPT Plus users

Show HN: New Angular OpenAPI Client gen (looking for testers)

Ask HN: Does No Response Mean a Bad Idea?

Jim Lovell Has Died

ChatGPT Will Apologize for Anything

Apollo 13 Commander Jim Lovell has passed away

Show HN: HackMaster Pi – A $30 Flipper Zero Alternative Built with Raspberry Pi

How to Teach Your Kids to Play Poker: Start with One Card

ChatGPT-5 Can't Do Basic Math

Security alerts in Gmail. What a mess

GPT-5 AMA

Johns Hopkins is building its AI wargaming tools for DoD

Fears of population collapse in the US are based on faulty assumptions

GPT-5 Rollout Updates

Cordoomceps – replacing an Amiga's brain with Doom

Millions are flocking to grow virtual gardens in Roblox game created by teenager

The Illustrated TLS 1.2 Connection

The surprising economics of the meat industry – Lewis Bollard

Job growth has slowed sharply; the question is why

Campaigning for Extinction:Eradication of Sparrows and the Great Famine in China

GRETA to Open a New Eye on the Nucleus

HTTP Is Not Simple

Looking for Testers for an AI Privacy Platform

Three Tiers of Responses to Fact

Toxic convenience: what science tells us about plastic's hidden costs

ChatGPT users hate GPT-5's overworked secretary energy, miss their GPT-4o buddy

Welcome to DIY Rich Guy Fantasy Camp

FIN - Fish Extensible Text Editor Written in Fish

json2dir: a JSON-to-directory converter, a fast alternative to home-manager

M5 MacBook Pro No Longer Coming in 2025