frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Qwen3-Coder-Next

https://qwen.ai/blog?id=qwen3-coder-next
179•danielhanchen•1h ago

Comments

danielhanchen•1h ago
For those interested, made some Dynamic Unsloth GGUFs for local deployment at https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF and made a guide on using Claude Code / Codex locally: https://unsloth.ai/docs/models/qwen3-coder-next
ranger_danger•1h ago
What is the difference between the UD and non-UD files?
danielhanchen•1h ago
UD stands for "Unsloth-Dynamic" which upcasts important layers to higher bits. Non UD is just standard llama.cpp quants. Both still use our calibration dataset.
CamperBob2•11m ago
Please consider authoring a single, straightforward introductory-level page somewhere that explains what all the filename components mean, and who should use which variants.

The green/yellow/red indicators for different levels of hardware support are really helpful, but far from enough IMO.

binsquare•58m ago
How did you do it so fast?

Great work as always btw!

simonw•1h ago
This GGUF is 48.4GB - https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/tree/main/... - which should be usable on higher end laptops.

I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful.

Maybe this will be the one? This Unsloth guide from a sibling comment suggests it might be: https://unsloth.ai/docs/models/qwen3-coder-next

vessenes•1h ago
I'm thinking the next step would be to include this as a 'junior dev' and let Opus farm simple stuff out to it. It could be local, but also if it's on cerebras, it could be realllly fast.
ttoinou•1h ago
Cerebras already has GLM 4.7 in the code plans
vessenes•1h ago
Yep. But this is like 10x faster; 3B active parameters.
ttoinou•1h ago
Cerebras is already 200-800 tps, do you need even faster ?
overfeed•8m ago
Yes! I don't try to read agent tokens as they are generated, so if code generation decreases from 1 minute to 6 seconds, I'll be delighted. I'll even accept 10s -> 1s speedups. Considering how often I've seen agents spin wheels with different approaches, faster is always better, until models can 1-shot solutions without the repeated "No, wait..." / "Actually..." thinking loops
danielhanchen•1h ago
It works reasonably well for general tasks, so we're definitely getting there! Probably Qwen3 CLI might be better suited, but haven't tested it yet.
1dom•1h ago
I run Qwen3-Coder-30B-A3B-Instruct gguf on a VM with 13gb RAM and a 6gb RTX 2060 mobile GPU passed through to it with ik_llama, and I would describe it as usable, at least. It's running on an old (5 years, maybe more) Razer Blade laptop that has a broken display and 16gb RAM.

I use opencode and have done a few toy projects and little changes in small repositories and can get pretty speedy and stable experience up to a 64k context.

It would probably fall apart if I wanted to use it on larger projects, but I've often set tasks running on it, stepped away for an hour, and had a solution when I return. It's definitely useful for smaller project, scaffolding, basic bug fixes, extra UI tweaks etc.

I don't think "usable" a binary thing though. I know you write lot about this, but it'd be interesting to understand what you're asking the local models to do, and what is it about what they do that you consider unusable on a relative monster of a laptop?

embedding-shape•1h ago
> I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful

I've had mild success with GPT-OSS-120b (MXFP4, ends up taking ~66GB of VRAM for me with llama.cpp) and Codex.

I'm wondering if maybe one could crowdsource chat logs for GPT-OSS-120b running with Codex, then seed another post-training run to fine-tune the 20b variant with the good runs from 120b, if that'd make a big difference. Both models with the reasoning_effort set to high are actually quite good compared to other downloadable models, although the 120b is just about out of reach for 64GB so getting the 20b better for specific use cases seems like it'd be useful.

gigatexal•48m ago
I’ve a 128GB m3 max MacBook Pro. Running the gpt oss model on it via lmstudio once the context gets large enough the fans spin to 100 and it’s unbearable.
dust42•29m ago
Unfortunately Qwen3-next is not well supported on Apple silicon, it seems the Qwen team doesn't really care about Apple.

On M1 64GB Q4KM on llama.cpp gives only 20Tok/s while on MLX it is more than twice as fast. However, MLX has problems with kv cache consistency and especially with branching. So while in theory it is twice as fast as llama.cpp it often does the PP all over again which completely trashes performance especially with agentic coding.

So the agony is to decide whether to endure half the possible speed but getting much better kv-caching in return. Or to have twice the speed but then often you have again to sit through prompt processing.

But who knows, maybe Qwen gives them a hand? (hint,hint)

ttoinou•22m ago
I can run nightmedia/qwen3-next-80b-a3b-instruct-mlx at 60-74 tps using LM Studio. What did you try ? What benefit do you get from KV Caching ?
dust42•4m ago
KV caching means when you have 10k prompt, all follow up questions return immediately - this is standard with all inference engines.

Now if you are not happy with the last answer, you maybe want to simply regenerate it or change your last question - this is branching of the conversation. Llama.cpp is capable of re-using the KV cache up to that point while MLX does not (I am using MLX server from MLX community project). I haven't tried with LMStudio. Maybe worth a try, thanks for the heads-up.

dehrmann•26m ago
I wonder if the future in ~5 years is almost all local models? High-end computers and GPUs can already do it for decent models, but not sota models. 5 years is enough time to ramp up memory production, consumers to level-up their hardware, and models to optimize down to lower-end hardware while still being really good.
manbitesdog•11m ago
Plus a long queue of yet-undiscovered architectural improvements
infinitezest•7m ago
A lot of manufacturers are bailing on consumer lines to focus on enterprise from what I've read. Not great.
vessenes•1h ago
3B active parameters, and slightly worse than GLM 4.7. On benchmarks. That's pretty amazing! With better orchestration tools being deployed, I've been wondering if faster, dumber coding agents paired with wise orchestrators might be overall faster than using the say opus 4.5 on the bottom for coding. At least we might want to deploy to these guys for simple tasks.
doctorpangloss•1h ago
Time will tell. All this stuff will get more adoption when Anthropic, Google and OpenAI raise prices.
markab21•1h ago
It's getting a lot easier to do this using sub-agents with tools in Claude. I have a fleet of Mastra agents (TypeScript). I use those agents inside my project as CLI tools to do repetitive tasks that gobble tokens such as scanning code, web search, library search, and even SourceGraph traversal.

Overall, it's allowed me to maintain more consistent workflows as I'm less dependent on Opus. Now that Mastra has introduced the concept of Workspaces, which allow for more agentic development, this approach has become even more powerful.

solumunus•18m ago
Are you just exposing mastra cli commands to Claude Code in md context? I’d love you to elaborate on this if you have time.
endymion-light•1h ago
Looks great - i'll try to check it out on my gaming PC.

On a misc note: What's being used to create the screen recordings? It looks so smooth!

zamadatix•1h ago
Can anyone help me understand the "Number of Agent Turns" vs "SWE-Bench Pro (%)" figure? I.e. what does the spread of Qwen3-Coder-Next from ~50 to ~280 agent turns represent for a fixed score of 44.3%: that sometimes it takes that spread of agent turns to achieve said fixed score for the given model?
edude03•1h ago
Essentially the more turns you have the more the agent is likely to fail since the error compounds per turn. Agentic model are tuned for “long horizon tasks” ie being able to go many many turns on the same problem without failing.
zamadatix•1h ago
Much appreciated, but I mean more around "what do the error bars in the figure represent" than what the turn scaling itself is.
esafak•32m ago
For the tasks in SWE-Bench Pro they obtained a distribution of agent turns, summarized as the box plot. The box likely describes the inter-quartile range while the whiskers describe the some other range. You'd have to read their report to be sure. https://en.wikipedia.org/wiki/Box_plot
jsnell•32m ago
That's a box plot, so those are not error bars but a visualization of the distribution of a metric (min, max, median, 25th percentile, 75th percentile).

The benchmark consists of a bunch of tasks. The chart shows the distribution of the number of turns taken over all those tasks.

yorwba•31m ago
SWE-Bench Pro consists of 1865 tasks. https://arxiv.org/abs/2509.16941 Qwen3-Coder-Next solved 44.3% (826 or 827) of these tasks. To solve a single task, it took between ≈50 and ≈280 agent turns, ≈150 on average. In other words, a single pass through the dataset took ≈280000 agent turns. Kimi-K2.5 solved ≈84 fewer tasks, but also only took about a third as many agent turns.
throwaw12•1h ago
We are getting there, as a next step please release something to outperform Opus 4.5 and GPT 5.2 in coding tasks
gordonhart•59m ago
By the time that happens, Opus 5 and GPT-5.5 will be out. At that point will a GPT-5.2 tier open-weights model feel "good enough"? Based on my experience with frontier models, once you get a taste of the latest and greatest it's very hard to go back to a less capable model, even if that less capable model would have been SOTA 9 months ago.
tosh•52m ago
It feels like the gap between open weight and closed weight models is closing though.
theshrike79•32m ago
Mode like open local models are becoming "good enough".

I got stuff done with Sonnet 3.7 just fine, it did need a bunch of babysitting, but still it was a net positive to productivity. Now local models are at that level, closing up on the current SOTA.

When "anyone" can run an Opus 4.5 level model at home, we're going to be getting diminishing returns from closed online-only models.

cirrusfan•43m ago
I think it depends on what you use it for. Coding, where time is money? You probably want the Good Shit, but also want decent open weights models to keep prices sane rather than sama’s 20k/month nonsense. Something like a basic sentiment analysis? You can get good results out of a 30b MoE that runs at good pace on a midrange laptop. Researching things online with many sources and decent results I’d expect to be doable locally by the end of 2026 if you have 128GB ram, although it’ll take a while to resolve.
bwestergard•32m ago
What does it mean for U.S. AI firms if the new equilibrium is devs running open models on local hardware?
selectodude•23m ago
OpenAI isn’t cornering the market on DRAM for kicks…
rglullis•26m ago
I'm going in the opposite direction: with each new model, the more I try to optimize my existing workflows by breaking the tasks down so that I can delegate tasks to the less powerful models and only rely on the newer ones if the results are not acceptable.
yorwba•19m ago
When Alibaba succeeds at producing a GPT-5.2-equivalent model, they won't be releasing the weights. They'll only offer API access, like for the previous models in the Qwen Max series.

Don't forget that they want to make money in the end. They release small models for free because the publicity is worth more than they could charge for them, but they won't just give away models that are good enough that people would pay significant amounts of money to use them.

skhameneh•57m ago
It’s hard to elaborate just how wild this model might be if it performs as claimed. The claims are this can perform close to Sonnet 4.5 for assisted coding (SWE bench) while using only 3B active parameters. This is obscenely small for the claimed performance.
cirrusfan•51m ago
If it sounds too good to be true…
theshrike79•34m ago
Should be possible with optimised models, just drop all "generic" stuff and focus on coding performance.

There's no reason for a coding model to contain all of ao3 and wikipedia =)

noveltyaccount•7m ago
I think I like coding models that know a lot about the world. They can disambiguate my requirements and build better products.
alexellisuk•57m ago
Is this going to need 1x or 2x of those RTX PRO 6000s to allow for a decent KV for an active context length of 64-100k?

It's one thing running the model without any context, but coding agents build it up close to the max and that slows down generation massively in my experience.

Soerensen•56m ago
The agent orchestration point from vessenes is interesting - using faster, smaller models for routine tasks while reserving frontier models for complex reasoning.

In practice, I've found the economics work like this:

1. Code generation (boilerplate, tests, migrations) - smaller models are fine, and latency matters more than peak capability 2. Architecture decisions, debugging subtle issues - worth the cost of frontier models 3. Refactoring existing code - the model needs to "understand" before changing, so context and reasoning matter more

The 3B active parameters claim is the key unlock here. If this actually runs well on consumer hardware with reasonable context windows, it becomes the obvious choice for category 1 tasks. The question is whether the SWE-Bench numbers hold up for real-world "agent turn" scenarios where you're doing hundreds of small operations.

cirrusfan•51m ago
I find it really surprising that you’re fine with low end models for coding - I went through a lot of open-weights models, local and "local", and I consistently found the results underwhelming. The glm-4.7 was the smallest model I found to be somewhat reliable, but that’s a sizable 350b and stretches the definition of local-as-in-at-home.
NitpickLawyer•32m ago
You're replying to a bot, fyi :)
syntaxing•43m ago
Is Qwen next architecture ironed out in llama cpp?
orliesaurus•34m ago
how can anyone keep up with all these releases... what's next? Sonnet 5?
Squarex•20m ago
Well there are rumors sonnet 5 is coming today, so...
gessha•16m ago
Tune it out, come back in 6 months, the world is not going to end. In 6 months, you’re going to change your API endpoint and/or your subscription and then spend a day or two adjusting. Off to the races you go.
cedws•31m ago
I kind of lost interest in local models. Then Anthropic started saying I’m not allowed to use my Claude Code subscription with my preferred tools and it reminded me why we need to support open tools and models. I’ve cancelled my CC subscription, I’m not paying to support anticompetitive behaviour.
wahnfrieden•30m ago
OpenAI committed to allowing it btw. I don't know why Anthropic gets so much love here
jmathai•28m ago
Probably because the alternatives are OpenAI, Google, Meta. Not throwing shade at those companies but it's not hard to win the hearts of developers when that's your competition.
cedws•26m ago
Thanks, I’ll try out Codex to bridge until local models get to the level I need.
rustyhancock•26m ago
Cause they make the best coding model.

It's that simple. Everyone else is trying to compete in other ways and Anthropic are pushing for dominate the market.

They'll eventually lose their performance edge and suddenly they will back to being cute and fluffy

I've cancelled a clause sub, but still have one.

bheadmaster•14m ago
Agreed.

I've tried all of the models available right now, and Claude Opus is by far the most capable.

I had an assertion triggered in a fairly complex open-source C library I was using, and Claude Opus not only found the cause, but wrote a self-container reproduction code I could add to a GitHub issue. And it also added tests for that issue, and fixed the underlying issue.

I am sincerely impressed by the capabilities of Claude Opus. Too bad its usage is so expensive.

varispeed•8m ago
On the other hand I feel like 5.2 gets progressively dumbed down. It used to work well, but now initial few prompts go in right direction and then it goes off the rails reminding me more of a GPT-3.5.

I wonder what they are up to.

tomashubelbauer•17m ago
Anthropic banned my account when I whipped up a solution to control Claude Code running on my Mac from my phone when I'm out and about. No commercial angle, just a tool I made for myself since they wouldn't ship this feature (and still haven't). I wasn't their biggest fanboy to begin with, but it gave me the kick in the butt needed to go and explore alternatives until local models get good enough that I don't need to use hosted models altogether.
RationPhantoms•11m ago
There is weaponized malaise employed by these frontier model providers and I feel like that dark-pattern, what you pointed out, and others are employed to rate-limit certain subscriptions.
darkwater•9m ago
I control it with ssh and sometimes tmux (but termux+wireguard lead to a surprisingly generally stable connection). Why did you need more than that?
tomashubelbauer•5m ago
I didn't like the existing SSH applications for iOS and I already have a local app that I made that I have open 24/7, so I added a screen that used xterm.js and Bun.spawn with Bun.Terminal to mirror the process running on my Mac to my phone. This let me add a few bells and whistles that a generic SSH client wouldn't have, like notifications when Claude Code was done working etc.
giancarlostoro•10m ago
I do wonder if they locked things down due to people abusing their CC token.
ossicones•29m ago
What browser use agent are they using here?
valcron1000•13m ago
Still nothing to compete with GPT-OSS-20B for local image with 16 VRAM.

Sandboxing AI Agents in Linux

https://blog.senko.net/sandboxing-ai-agents-in-linux
1•speckx•48s ago•0 comments

Project Panama: 2M books scanned and destroyed by Anthropic AI

https://timesofindia.indiatimes.com/etimes/trending/inside-project-panama-2-million-books-scanned...
1•rustoo•1m ago•0 comments

AI helped me through burnout (but not how you think)

https://keygen.sh/blog/ai-helped-me-through-burnout/
1•ezekg•1m ago•0 comments

Deno Sandbox

https://deno.com/blog/introducing-deno-sandbox
3•johnspurlock•3m ago•1 comments

ICE Map

https://www.icemap.dev/
2•hunglee2•3m ago•0 comments

The Gumbel-Max Trick

https://blog.quipu-strands.com/gumbel
1•abhgh•4m ago•0 comments

Show HN: Stigmergy pattern for multi-agent LLMs (80% fewer API calls)

https://github.com/KeepALifeUS/autonomous-agents
1•keepalifeus•4m ago•0 comments

Why the mid-30s are a major turning point for men's heart health

https://www.cnn.com/2026/02/02/health/heart-disease-wellness
2•koolhead17•4m ago•0 comments

Show HN: Orchestrate Claude Code CLI from GitHub

1•elondemirock•5m ago•1 comments

Show HN: VeilStream – Per-Branch Preview Environments

https://www.veilstream.com
2•joram87•8m ago•0 comments

Coding Agents Need More Than Examples. They Need Guardrails

https://medium.com/@stefanvanegmond/coding-agents-need-more-than-examples-they-need-guardrails-1b...
1•stefanve•8m ago•0 comments

Russia's APT28 Rapidly Weaponizes Newly Patched Office Vulnerability

https://www.securityweek.com/russias-apt28-rapidly-weaponizes-newly-patched-office-vulnerability/
2•Bender•9m ago•0 comments

What a Diff a VS Code Fork Makes: Antigravity, Cursor and Windsurf Compared

https://visualstudiomagazine.com/articles/2026/01/26/what-a-difference-a-vs-code-fork-makes-antig...
2•daram•9m ago•1 comments

Show HN: Prism – 7 AI stories daily with credibility tags, no doomscrolling

https://www.prismai.news
1•ogulcanunal1•9m ago•0 comments

PDF phishing attack leads to stolen Dropbox credentials

https://www.scworld.com/news/pdf-phishing-attack-leads-to-stolen-dropbox-credentials
1•Bender•10m ago•0 comments

Distillable AI Models

https://openrouter.ai/collections/distillable-models
1•ddtaylor•10m ago•0 comments

'npx skills add' installs it globally for all AI agents

https://twitter.com/ZackKorman/status/2018376316681171367
1•tosh•10m ago•0 comments

Why 6-7 is the best meme

https://shreyanjain.net/2026/02/02/why-is-the-best-meme.html
1•ulrischa•11m ago•1 comments

Adam Smith's "New Imperialism"

https://www.cambridge.org/core/journals/social-philosophy-and-policy/article/federalism-and-the-u...
1•brandonlc•11m ago•0 comments

Show HN: Homomorphically Encrypted Vector Database

https://github.com/cloneisyou/HEVEC
2•cloneisme•11m ago•1 comments

Humans are infiltrating the social network for AI bots

https://www.theverge.com/ai-artificial-intelligence/872961/humans-infiltrating-moltbook-openclaw-...
3•thm•11m ago•0 comments

Hosaka3 audiovisual stimulation to modify brainwaves

https://www.whiteclinic.net/
1•cslr•11m ago•1 comments

In Praise of Earnestness

https://www.autodidacts.io/earnestness/
1•Curiositry•11m ago•0 comments

Billions wiped off media and financial data groups after Anthropic AI launch

https://www.ft.com/content/48ec5657-c2e7-4111-a236-24a96a8d49e7
2•thm•12m ago•0 comments

Intel and SoftBank Subsidiary Saimemory Collaborate to Advance Next-Gen Memory

https://community.intel.com/t5/Blogs/Intel/Policy-Intel/Intel-and-SoftBank-Subsidiary-SAIMEMORY-C...
1•Bootvis•12m ago•0 comments

Senior staff departing OpenAI as firm prioritizes ChatGPT development

https://arstechnica.com/ai/2026/02/senior-staff-departing-openai-as-firm-prioritizes-chatgpt-deve...
3•Bender•12m ago•0 comments

Revisiting Disaggregated LLM Serving for Performance and Energy Implications

https://arxiv.org/abs/2601.08833
1•PaulHoule•12m ago•0 comments

Adobe Animate is shutting down as company focuses on AI

https://techcrunch.com/2026/02/02/adobe-animate-is-shutting-down-as-company-focuses-on-ai/
2•01-_-•12m ago•1 comments

A Demonstration of Self-Profiling

https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/ex/profile/demo.htm
2•ingve•13m ago•0 comments

Are Wall Street Analysts Bullish on Salesforce Stock?

https://www.barchart.com/story/news/37373790/are-wall-street-analysts-bullish-on-salesforce-stock
1•01-_-•14m ago•0 comments