Qwen3-VL: Sharper vision, deeper thought, broader action

https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancements-list

93•natrys•2h ago

Comments

natrys•2h ago

Models:

- https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking

- https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct

causal•2h ago

That has got to be the most benchmarks I've ever seen posted with an announcement. Kudos for not just cherrypicking a favorable set.

esafak•53m ago

We should stop reporting saturated benchmarks.

be7a•1h ago

The biggest takeaway is that they claim SOTA for multi-modal stuff even ahead of proprietary models and still released it as open-weights. My first tests suggest this might actually be true, will continue testing. Wow

ACCount37•1h ago

Most multi-modal input implementations suck, and a lot of them suck big time.

Doesn't seem to be far ahead of existing proprietary implementations. But it's still good that someone's willing to push that far and release the results. Getting multimodal input to work even this well is not at all easy.

Computer0•55m ago

I feel like most Open Source releases regardless of size claim to be similar in output quality to SOTA closed source stuff.

drapado•1h ago

Cool! Pity they are not releasing a smaller A3B MoE model

daemonologist•58m ago

Their A3B Omni paper mentions that the Omni at that size outperformed the (unreleased I guess) VL. Which is interesting - I'd have expected the larger model to have more weights to "waste" on additional modalities and thus for the opposite to be true (or for the VL to outperform in both cases, or for both to benefit from knowledge transfer).

Relevant comparison is on page 15: https://arxiv.org/abs/2509.17765

willahmad•1h ago

China is winning the hearts of developers in this race so far. At least, they won mine already.

swyx•47m ago

so.. why do you think they are trying this hard to win your heart?

llllm•32m ago

they aren’t even trying hard, it’s just that no one else is trying

willahmad•23m ago

They might have dozens of reasons, but they already did what they did.

Some of the reasons could be:

- mitigation of US AI supremacy

- Commodify AI use to push forward innovation and sell platforms to run them, e.g. if iPhone wins local intelligence, it benefits China, because China is manufacturing those phones

- talent war inside China

- soften the sentiment against China in the US

- they're just awesome people

- and many more

brokencode•16m ago

Maybe they just want to see one of the biggest stock bubble pops of all time in the US.

sergiotapia•1h ago

Thank you Qwen team for your generosity. I'm already using their thinking model to build some cool workflows that help boring tasks within my org.

https://openrouter.ai/qwen/qwen3-235b-a22b-thinking-2507

Now with this I will use it to identify and caption meal pictures and user pictures for other workflows. Very cool!

deepdarkforest•1h ago

The Chinese are doing what they have been doing to the manufacturing industry as well. Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency. As simple as that. Super impressive. These models might be bechmaxxed but as another comment said, i see so many that it might as well be the most impressive benchmaxxing today, if not just a genuinely SOTA open source model. They even released a closed source 1 trillion parameter model today as well that is sitting on no3(!) on lm arena. EVen their 80gb model is 17th, gpt-oss 120b is 52nd https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2...

BUFU•56m ago

The open source models are no longer catching up. They are leading now.

jadbox•45m ago

How does it compare to Omni?

helloericsf•15m ago

If you're in SF, you don't want to miss this. The Qwen team is making their first public appearance in the United States, with the VP of Qwen Lab speaking at the meetup below during SF teach week. https://partiful.com/e/P7E418jd6Ti6hA40H6Qm Rare opportunity to directly engage with the Qwen team members.

Find SF parking cops

Qwen3-VL

Libghostty is coming

From Rust to reality: The hidden journey of fetch_max

Markov chains are the original language models

YouTube says it'll bring back creators banned for Covid and election content

Kitty – GPU based terminal emulator

Getting AI to work in complex codebases

Go has added Valgrind support

How to draw construction equipment for kids

Context Engineering for AI Agents: Lessons

Launch HN: Strata (YC X25) – One MCP server for AI to handle thousands of tools

Is Fortran better than Python for teaching basics of numerical linear algebra?

From MCP to shell: MCP auth flaws enable RCE in Claude Code, Gemini CLI and more

Always Invite Anna

Podman Desktop celebrates 3M downloads

Apple A19 SoC die shot

Show HN: Ggc – A Git CLI tool written in Go with interactive UI

Mesh: I tried Htmx, then ditched it

Is life a form of computation?

consumed.today

Denmark wants to push through Chat Control

Triple Buffering in Rendering APIs

Shopify, pulling strings at Ruby Central, forces Bundler and RubyGems takeover

Zip Code Map of the United States

Getting More Strategic

Android users can now use conversational editing in Google Photos

Show HN: The Blots Programming Language

Zinc (YC W14) Is Hiring a Senior Back End Engineer (NYC)

Structured Outputs in LLMs