Qwen3.6-Plus: Towards Real World Agents

123•pretext•1h ago

Comments

srmatto•1h ago

The benchmarks provided are for Opus-4.5, not for the latest Opus-4.6 and Qwen is still lagging in a lot of them.

thegeomaster•1h ago

And it seems they've decided to go closed-source for their largest, best models.

FuckButtons•34m ago

3.5-plus was also only available via api. I don’t know what the long term business model for open weights is, I hope there is one, but it seems foolish to assume that companies will be willing to spend millions of dollars of compute on an asset worth zero in perpetuity.

kgeist•9m ago

>I don’t know what the long term business model for open weights is

Maybe the hypothesis is that if a developer is fine with the intelligence of smaller open-weights variants, it's not worth spending scarce compute on them - they can just run those on their own hardware, probably not much economic gain for Alibaba here? And at the same time, if such a developer later wants more intelligence when they scale, they're more likely to switch to a larger model in the same model family (similar quirks, less prompt tweaking), and that's where Alibaba can start charging more? Basically, lack of compute in China forces them to focus on a few larger clients that need maximum intelligence, and open-weights models are like a free trial that they don't have to pay for.

kgeist•24m ago

They've always had closed-source variants:

- Qwen3.5-Plus

- Qwen3-Max

- Qwen2.5-Max

etc. Nothing really changed so far.

Aurornis•1h ago

There is no reason to benchmark against Opus 4.5 when Opus 4.6 has been out so long, other than to be misleading.

jgbuddy•1h ago

Worth noting that this model, unlike almost all qwen models, is not open-weight, nor is the parameter count exposed. Also odd that it is compared against opus 4.5 even though 4.6 was released like 2 months ago.

pferdone•1h ago

They said in the last paragraph[0]:

"[...] In the coming days, we will also open-source smaller-scale variants, reaffirming our commitment to accessibility and community-driven innovation. [...]"

[0] https://qwen.ai/blog?id=qwen3.6#summary--future-work

deaux•59m ago

> we will also open-source smaller-scale variants

In other words, like GP said, this Qwen3.6-Plus model is not open-weight unlike the other Qwen models.

pferdone•55m ago

> unlike almost all qwen models

Almost all means there have been ones before that were not open. So, no contradiction there.

kennywinker•49m ago

> unlike the other Qwen models

Please send the download link for qwen 3.5-plus.

Also, who cares? If you have the hardware to run a ~400b model i don’t think you count as a home user anymore.

dgb23•41m ago

In a practical sense, I'm primarily interested in small to medium sized models being open. I think that might be common sentiment.

However, my hope is that there will be at least somewhat competitive big and open models as well, from an ethical/ideological perspective. These things were trained on data that was provided by people without their consent, so they should at least be be publicly accessible or even public domain.

thepasch•34m ago

Qwen3.5-Plus is the largest variant of the open weight Qwen3.5 model, expanded with a 1M context window and fine-tuned on the Qwen-native harness’ specific tools.

zozbot234•30m ago

I wouldn't say "almost all" seeing as -MAX and -Omni models were always closed.

Art9681•1h ago

How convenient of them to compare themselves to the last generation Opus and GPT models to make their model look better than it really is.

MarsIronPI•1h ago

It's not open weights so I'm not interested.

karimf•1h ago

> In the coming days, we will also open-source smaller-scale variants, reaffirming our commitment to accessibility and community-driven innovation.

Aurornis•1h ago

This is their hosted-only model, not an open weight model like they’ve become known for. They got a lot of good publicity for their open weight model releases, which was the goal. The hard part is pivoting from an open weight provider to being considered as a competitor to Claude and ChatGPT. Initial reactions are mostly anger from everyone who didn’t realize that the play along was to give away the smaller models as advertising, not because they were feeling generous.

Comparing to Opus 4.5 instead of the current 4.6 and other last-gen models is clearly an attempt to deceive, which isn’t winning them any points either.

I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper. I don’t know how successful they’ll be in the race to the bottom in this market niche, though. Most users of cheap API tokens are not loyal to any brand and will change providers overnight each time someone releases a slightly better model.

cubefox•55m ago

> I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper.

There isn't, pretty much everyone wants the best of the best.

scoopdewoop•51m ago

That isn't true. In a Codex or Claude Code instance, sure... but those are not the main users of APIs. If you are using LLMs in a service for customers, costs matter.

Aurornis•50m ago

The market for API tokens is bigger than people like you and I (who also want the best) using then for code.

There are a lot of data science problems that benefit from running the dataset through an LLM, which becomes bottlenecked on per-token costs. For these you take a sample subset and run it against multiple providers and then do a cost versus accuracy tradeoff.

The market for API tokens is not just people using OpenCode and similar tools.

sidrag22•48m ago

maybe there isnt, but as understanding grows people will understand that having an orchestration agent delegate simple work to lesser agents is significant not only for cost savings, but also for preserving context window space.

Someone1234•44m ago

> There isn't, pretty much everyone wants the best of the best.

For direct user interaction or coding problems, perhaps. But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets, or as sub-agents called from expensive SOTA models.

For example, in Claude, using Opus as an orchestrator to call Sonnet sub-agents, is a popular usage "hack." That only gets more powerful, as the Sonnet equivalent model gets cheaper. Now you can spawn entire teams of small specialized sub-agents with small context windows but limited scope.

regularfry•43m ago

Everyone may want the best, but the amount of AI-addressable work outstrips the budget available for buying the best by quite a wide margin.

joefourier•43m ago

Ever hit your daily limit on Claude Code and saw how expensive it is to pay per token?

PhilippGille•40m ago

The OpenRouter usage stats indicate the opposite: https://openrouter.ai/rankings?view=month

jjice•32m ago

OpenRouter usage is likely skewed towards LLMs that are more niche and/or self-hostable by solid hardware that's available, but most consumers don't have on hand. I can imagine Anthropic and OpenAI LLMs often get called directly from their APIs instead.

At least from my experience and friends of mine, we use OpenRouter for cases where we want to use smaller LLMs like Qwen, but when I've used ChatGPT and Claude, I use those APIs directly.

noman-land•36m ago

OpenCode allows for free inference tho.

esafak•9m ago

That's only because current models don't saturate people's needs. Once they are fast and smart enough people will pick cheaper ones.

wolttam•8m ago

Nope. I get very good results from GLM 5 and 5.1. I’m not working on anything so complex and groundbreaking that I need the best.

Coding is a rung on the ladder of model capability. Frontier models will grow to take on more capabilities, while smaller more focused models start becoming the economical choice for coding

wongarsu•5m ago

For coding I want the best. Both me and $work do lots of things besides coding where smaller models like qwen3.5-27b work great, at much lower cost.

zozbot234•31m ago

> not an open weight model like they’ve become known for.

Right, they state that they'll release "smaller" variants openly at some point, with few details as to what that means. Will there be a ~300B variant as with Qwen 3.5? The blog post doesn't say.

dev_l1x_be•8m ago

How stupid somebody has to be to mix up Opus with Qwen?

daft_pink•58m ago

Not really interested in using models hosted on alibaba cloud.

Like Qwen local for it’s privacy, but I trust the privacy of Google/OpenAI/Anthropic more than alibaba.

rvz•50m ago

> Like Qwen local for it’s privacy, but I trust the privacy of Google/OpenAI/Anthropic more than alibaba.

None should be trusted, unless you are running them locally.

the_pwner224•42m ago

I had the exact opposite reaction. I stopped using OpenAI/Google a while ago due to privacy and moved to local Qwen, now I'm considering using Alibaba cloud. You know Google and OpenAI are going to share everything with the US government and Western ad networks. But with Alibaba, who cares if the CCP & Chinese ad networks have a comprehensive profile on me? From a pragmatic perspective it's much better for (outcomes related to) privacy.

zobzu•28m ago

so if China has the data good, us has the data bad, got it lol.

us actually has laws around this and they arent sharing very much with thr us gov today. china shares 100% as required by law. and neither care much about "how long do i cook eggs for", but they do care about code generation a lot.

thereitgoes456•14m ago

> so if China has the data good, us has the data bad

It's not that, it's about relative risk to your own life. Asking questions about "DEI" for example is much more likely to have adverse effects on your life if you ask Grok or an OpenAI chatbot, though still not that likely.

wongarsu•8m ago

From an espionage perspective your own government is the safest. But from a civil rights perspective your own government is your most immediate threat. China isn't going to arrest me for my opinions on Netanyahu, my own government could

And the US government has repeatedly shown that it is very interested in collecting all the data available, not unlike China. In China this is simply done in the open while the US has a veneer of protection for citizens. But where the data collection is forbidden by law they either ignore the law or ask another five eyes member to do the spying and share the results. Both are well documented

CamperBob2•18m ago

As with all arguments equivalent to "I have nothing to hide, so I have nothing to fear," it may be true now, but it may not be true later. The only certainty is that this will not be your call.

the_pwner224•16m ago

Agreed

woeirua•58m ago

Just more evidence that the B tier models are six months behind. Ultimately that’s good. Opus 4.6 level intelligence will be cheap later this year!

eis•58m ago

Quite strong results in the benchmarks but why Gemini 3 Pro instead of 3.1? Why only for a few of the benchmarks? Why is OpenAI not there in the coding benchmarks? Why Opus 4.5 and not 4.6? Just jumps out into my eye as a bit strange.

As always, we'll have to try and see how it performs in the real world but the open weight models of Qwen were pretty decent for some tasks so still excited to see what this brings.

esafak•46m ago

Does anyone have experience with Alibaba's coding plan? Not that I'm very tempted at $50/month...

linolevan•29m ago

I’m surprised that people are surprised. Qwen has been hosting private plus and max variants for a while now.

giancarlostoro•27m ago

I hope their open source variants are just as good, having a 1 million token window for a fully offline model would be VERY interesting.

sosodev•15m ago

I don't know how well it performs, but you can extend Qwen3.5 to 1 million token context using YaRN. Also, Nemotron 3 Super was recently released and scales up to 1 million token context natively.

throwaw12•12m ago

I would love to hear from people using both (Claude Code OR Codex) AND (Qwen) and their experience with Qwen models, are they on par, or how far are they?

scottcha•2m ago

I switch between Claude Code (Opus/Sonnet) and Qwen (OpenCode, OpenClaw) multiple times throughout the day and Qwen 3.5 is really nice. I do also use KimiK2.5 and GLM5 pretty often too and I'm starting to get a sense that the agent tool is becoming a little more important than the model with these level of models. As long as tool calling and prompt quality is all configured correctly by the provider.

furyofantares•6m ago

I'll diverge from some of these comments, I don't find it misleading to compare to Opus 4.5.

I can remember how good Opus 4.5 was. If I'm considering using this, it's most informative to me to compare to the model it's closest to that I have familiarity with.

I'm obviously not switching to this if I want the best model. I'm switching if I'm hopeful that the smaller versions are close to it, or if I want to have more options for providers, or for any other reasons unrelated to getting the highest quality responses possible.

Modi Government's Digital Censorship Regime

Deep dive analysis of browser extension malware live on Firefox extension store

Man dies in storm as Saharan dust shrouds Crete

Show HN: EasySEO – prioritized AI SEO roadmap in 15 minutes

Claude Code Still Exhibiting Fast Usage Exhaustion

Rec Room Shutting Down

Taquitos.net – Snacking the World

Amazon Ads with Claude Pt. 2: Making Skills

Show HN: AI-first PostgreSQL client for Mac

Another reason we can't measure our productivity with AI

Who Owes What and to Whom

Music-bench: An LLM benchmark for reading printed music

Mistrust

Software never had a soul

Show HN: Incplot – CLI tool drawing great looking plots to terminal and to HTML

Fifty Years of Hard-Won Rights Are on the Line: The Fight to Save Section 504

JSON Zen – compact JSON schema for sanity checks and LLM context

Show HN: I tested 11 AI frameworks for basic security – none passed

Building a More Open Internet: The X402 Foundation Launches

Apache Shiro – Lightweight framework-agnostic security framework

Show HN: I made an free app that moans when you tap your iPhone, iPad, or Mac

OpenYak – open-source desktop AI agent that works with your local files

Show HN: I built a free open-source Wispr Flow clone

Snow melt-off in American west stuns scientists

Ask HN: How do you benchmark your sports prediction models?

OpenHalo: Use MySQL Syntax over PostgreSQL

Data Inlining in DuckLake: Unlocking Streaming for Data Lakes – DuckLake

How fine-tuning made my chatbot worse and broke my RAG pipeline

The War Against Misinformation Is Over. The Lies Won

A PHP license change is imminent