Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

https://venturebeat.com/technology/alibabas-new-open-source-qwen3-5-medium-models-offer-sonnet-4-5-performance

53•lostmsu•2h ago

Comments

xenospn•1h ago

Are there any non-Chinese open models that offer comparable performance?

aliljet•45m ago

Is this actually true? I want to see actual evals that match this up with Sonnet 4.5.

lostmsu•35m ago

Not exactly, but pretty close: https://artificialanalysis.ai/models/capabilities/coding?mod...

Somewhere between Haiku 4.5 and Sonnet 4.5

CharlesW•20m ago

> Somewhere between Haiku 4.5 and Sonnet 4.5

That's like saying "somewhere between Eliza and Haiku 4.5". Haiku is not even a so-called 'reasoning model'.¹

¹ To preempt the easily-offended, this is what the latest Opus 4.6 in today's Claude Code update says: "Claude Haiku 4.5 is not a reasoning model — it's optimized for speed and cost efficiency. It's the fastest model in the Claude family, good for quick, straightforward tasks, but it doesn't have extended thinking/reasoning capabilities."

pinum•9m ago

Looks much closer to Haiku than Sonnet.

Maybe "Qwen3.5 122B offers Haiku 4.5 performance on local computers" would be a more realistic and defensible claim.

mark_l_watson•30m ago

The new 35b model is great. That said, it has slight incompatibility's with Claude Code. It is very good for tool use.

johnnyApplePRNG•7m ago

Claude code is designed for anthropic models. Try it with opencode!

erelong•29m ago

What kind of hardware does HN recommend or like to run these models?

xienze•25m ago

It's less than you'd think. I'm using the 35B-A3B model on an A5000, which is something like a slightly faster 3080 with 24GB VRAM. I'm able to fit the entire Q4 model in memory with 128K context (and I think I would probably be able to do 256K since I still have like 4GB of VRAM free). The prompt processing is something like 1K tokens/second and generates around 100 tokens/second. Plenty fast for agentic use via Opencode.

rahimnathwani•14m ago

There seem to be a lot of different Q4s of this model: https://www.reddit.com/r/LocalLLaMA/s/kHUnFWZXom

I'm curious which one you're using.

suprjami•9m ago

Unsloth Dynamic. Don't bother with anything else.

msuniverse2026•7m ago

I've had an AMD card for the last 5 years, so I kinda just tuned out of local LLM releases because AMD seemed to abandon rocm for my card (6900xt) - Is AMD capable of anything these days?

suprjami•19m ago

The cheapest option is two 3060 12G cards. You'll be able to fit the Q4 of the 27B or 35B with an okay context window.

If you want to spend twice as much for more speed, get a 3090/4090/5090.

If you want long context, get two of them.

If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM.

dajonker•7m ago

Radeon R9700 with 32 GB VRAM is relatively affordable for the amount of RAM and with llama.cpp it runs fast enough for most things. These are workstation cards with blower fans and they are LOUD. Otherwise if you have the money to burn get a 5090 for speeeed and relatively low noise, especially if you limit power usage.

solarkraft•18m ago

What are the recommended 4 bit quants for the 35B model? I don’t see official ones: https://huggingface.co/models?other=base_model:quantized:Qwe...

solarkraft•15m ago

Smells like hyperbole. A lot of people making such claims don’t seem to have continued real world experience with these models or seem to have very weird standards for what they consider usable.

Up until relatively recently, while people had already long been making these claims, it came with the asterisks of „oh, but you can’t practically use more than a few K tokens of context“.

Minimal now supports 22 hardened container images

The Windows 95 User Interface: A Case Study in Usability Engineering

The Great Transition

Iran's Ayatollah Ali Khamenei is killed in Israeli strike, ending 36-year rule

The Bitter Lesson is coming for AI products, not just AI research

Building a Minimal Transformer for 10-digit Addition

Ask HN: Builder.ai ($1B Microsoft-backed AI company) who's lookin at the assets?

Runaway

Ali Khamenei Is Dead

Ask HN: Dora metrics exist for eng. Equivalent for AI in ops, finance, CS?

Hypeman – Run Containerized Workloads in VMs, Powered by Cloud Hypervisor / QEMU

Show HN: InstallerStudio – Create MSI Installers Without WiX or InstallShield

Show HN: Sampler Step Explorer – for understanding diffusion sampler updates

Show HN: Pure Python web framework using free-threaded Python

The Next Four Years, an experimental novel

Show HN: O-O – HTML/bash polyglot files that rewrite themselves (update)

Ask HN: How would you start a small private math circle for talented kids?

Show HN: OpenGem – Free, self-healing load-balanced proxy for Google Gemini API

Ask HN: My YC company is hiring one engineer/day but there's not enough work

Trump Says Iran's Supreme Leader Ayatollah Ali Khamenei Is Dead

Intermittent fasting no better than typical weight loss diets

A World Where All Is Free? That's Elon Musk's Theory of Abundance

Amiga Alien Breed HD

My 24 Rules for Reading

Khamenei dead, say US and Israel

What are GPS jammers and how do you combat them?

Show HN: Lovepdf – open-source self hosting alternative to ilovepdf

Show HN: Focusmo – a Mac focus app with a local Claude MCP server

There Are No Psychopaths

AI, Networks and Mechanical Turks