frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Qwen3-Max-Thinking

https://qwen.ai/blog?id=qwen3-max-thinking
135•vinhnx•1h ago

Comments

throwaw12•1h ago
Aghhh, I wished they release a model which outperforms Opus 4.5 in agentic coding in my earlier comments, seems I should wait more. But I am hopeful
wyldfire•57m ago
By the time they release something that outperforms Opus 4.5, Opus 5.2 will have been released which will probably be the new state-of-the-art.

But these open weight models are tremendously valuable contributions regardless.

wqaatwt•53m ago
Qwen 3 Max wasn’t originally open, or did they realease?
OGEnthusiast•55m ago
Check out the GLM models, they are excellent
khimaros•19m ago
Minimax m2.1 rivals GLM 4.7 and fits in 128GB with 100k context at 3bit quantization.
lofaszvanitt•17m ago
Like these benchmarks mean anything.
frankc•5m ago
One of the ways the chinese companies are keeping up is by training the models on the outputs of the American fronteir models. I'm not saying they don't innovate in other ways, but this is part of how they caught up quickly. However, it pretty much means they are always going to lag.
siliconc0w•59m ago
I don't see a hugging face link, is Qwen no longer releasing their models?
tosh•50m ago
afaiu not all of their models are open weight releases, this one so far is not open weight (?)
sidchilling•42m ago
What would a good coding model to run on an M3 Pro (18GB) to get Codex like workflow and quality? Essentially, I am running out quick when using Codex-High on VSCode on the $20 ChatGPT plan and looking for cheaper / free alternatives (even if a little slower, but same quality). Any pointers?
medvezhenok•40m ago
Short answer: there is none. You can't get frontier-level performance from any open source model, much less one that would work on an M3 Pro.

If you had more like 200GB ram you might be able to run something like MiniMax M2.1 to get last-gen performance at something resembling usable speed - but it's still a far cry from codex on high.

mittermayr•40m ago
at the moment, I think the best you can do is qwen3-coder:30b -- it works, and it's nice to get some fully-local llm coding up and running, but you'll quickly realize that you've long tasted the sweet forbidden nectar that is hosted llms. unfortunately.
Mashimo•39m ago
A local model with 18GB of ram that has the same quality has codex high? Yeah, nah mate.

The best could be GLN 4.7 Flash, and I doubt it's close to what you want.

atwrk•35m ago
"run" as in run locally? There's not much you can do with that little RAM.

If remote models are ok you could have a look at MiniMax M2.1 (minimax.io) or GLM from z.ai or Qwen3 Coder. You should be able to use all of these with your local openai app.

duffyjp•34m ago
Nothing. This summer I set up a dual 16GB GPU / 64GB RAM system and nothing I could run was even remotely close. Big models that didn't fit on 32gb VRAM had marginally better results but were at least of magnitude slower than what you'd pay for and still much worse in quality.

I gave one of the GPUs to my kid to play games on.

jgoodhcg•32m ago
Z.ai has glm-4.7. Its almost as good for about $8/mo.
dust42•50m ago
Max was always closed.
Mashimo•55m ago
I tried to search, could not find anything, do they offer subscriptions? Or only pay per tokens?
isusmelj•54m ago
I just wanted to check whether there is any information about the pricing. Is it the same as Qwen Max? Also, I noticed on the pricing page of Alibaba Cloud that the models are significantly cheaper within mainland China. Does anyone know why? https://www.alibabacloud.com/help/en/model-studio/models?spm...
epolanski•31m ago
I guess they want to partially subsidize local developers?

Maybe that's a requirement from whoever funds them, probably public money.

segmondy•6m ago
Seriously? Does Netflix or Spotify cost the same everywhere around the world? They earn less and their buying power is less.
arendtio•50m ago
> By scaling up model parameters and leveraging substantial computational resources

So, how large is that new model?

DeathArrow•43m ago
Mandatory pelican on bicycle: https://www.svgviewer.dev/s/U6nJNr1Z
kennykartman•39m ago
Ah ah I was curious about that! I wonder if (when? if not already) some company is using some version of this in their training set. I'm still impressed by the fact that this benchmark has been out for so long and yet produce this kind of (ugly?) results.
saberience•27m ago
Because no one cares about optimizing for this because it's a stupid benchmark.

It doesn't mean anything. No frontier lab is trying hard to improve the way its model produces SVG format files.

simonw•22m ago
+1 to "it's a stupid benchmark".
lofaszvanitt•13m ago
It shows that these are nowhere near anything resembling human intelligence. You wouldn't have to optimize for anything if it would be a general intelligence of sorts.
CamperBob2•10m ago
Here's a pencil and paper. Let's see your SVG pelican.
obidee2•8m ago
Why stupid? Vector images are widely used and extremely useful directly and to render raster images at different scales. It’s also highly connected with spacial and geometric reasoning and precision, which would open up a whole new class of problems these models could tackle. Sure, it’s secondary to raster image analysis and generation, but curious why it would be stupid to persue?
NitpickLawyer•23m ago
It would be trivial to detect such gaming, tho. That's the beauty of the test, and that's why they're probably not doing it. If a model draws "perfect" (whatever that means) pelicans on a bike, you start testing for owls riding a lawnmower, or crows riding a unicycle, or x _verb_ on y ...
Sharlin•14m ago
It could still be special-case RLHF trained, just not up to perfection.
lofaszvanitt•13m ago
A salivating pelican :D.
airstrike•40m ago
2026 will be the year of open and/or small models.
lysace•25m ago
I tried it at https://chat.qwen.ai/.

Prompt: "What happened on Tiananmen square in 1989?"

Reply: "Oops! There was an issue connecting to Qwen3-Max. Content Security Warning: The input text data may contain inappropriate content."

asciii•19m ago
This is what I find hilarious when these articles assess "factual" knowledge..

We are at the realm of semantic / symbolic where even the release article needs some meta discussion.

It's quite the litmus test of LLMs. LLMs just carry humanities flaws

lysace•16m ago
(Edited, sorry.)

Yes, of course LLMs are shaped by their creators. Qwen is made by Alibaba Group. They are essentially one with the CCP.

lifetimerubyist•12m ago
What happens when you run one of their open-weight models of the same family locally?
lysace•9m ago
Last time I tried something like that with an offline Qwen model I received a non-answer.
tekno45•11m ago
ask who was responsible for the insurrection on january 6th
lysace•8m ago
You do it, my IP is now flagged - they want to have my phone number to let me continue :).
xcodevn•6m ago
I'm not familiar with these open-source models. My bias is that they're heavily benchmaxxing and not really helpful in practice. Can someone with a lot of experience using these, as well as Claude Opus 4.5 or Codex 5.2 models, confirm whether they're actually on the same level? Or are they not that useful in practice?
miroljub•5m ago
I don't know where your impression about benchmaxxing comes from. Why would you assume closed models are not benchmaxxing? Being closed and commercial, they have more incentive to fake it than the open models.

pi

https://buildwithpi.ai/
1•tosh•22s ago•0 comments

India, EU wrap up talks for landmark trade deal amid strained US ties

https://www.reuters.com/world/china/india-eu-wrap-up-talks-landmark-trade-deal-amid-strained-us-t...
1•alephnerd•1m ago•0 comments

Did they just nuke Opus 4.5 into the ground?

https://old.reddit.com/r/ClaudeCode/comments/1qmsfyo/did_they_just_nuke_opus_45_into_the_ground/
1•tamnd•1m ago•0 comments

Maia 200: The AI accelerator built for inference

https://blogs.microsoft.com/blog/2026/01/26/maia-200-the-ai-accelerator-built-for-inference/
1•Handy-Man•2m ago•0 comments

Women Game Designers

https://daily.jstor.org/the-hidden-history-of-women-game-designers/
1•ohjeez•2m ago•0 comments

Show HN: a habit tracker that only lets you track one habit

https://ahabit.app
1•davvie•3m ago•0 comments

Sometimes Your Job Is to Stay the Hell Out of the Way

https://randsinrepose.com/archives/sometimes-your-job-is-to-stay-the-hell-out-of-the-way/
2•Tomte•4m ago•1 comments

Ask HN: How do you prevent children from accessing your products?

3•eastoeast•4m ago•0 comments

Show HN: EhAye Engine – Give your AI a voice

https://ehaye.io
1•avidcoder•6m ago•0 comments

Ask HN: Open Source PM Work

2•conner_h5•6m ago•0 comments

Return Void (Hacking a Vespera II Telescope)

https://thomasloupe.com/project/a-telescope-for-the-world/
3•alnwlsn•7m ago•0 comments

Former astronaut on lunar spacesuits: "I don't think they're great "

https://arstechnica.com/space/2026/01/former-astronaut-on-lunar-spacesuits-i-dont-think-theyre-gr...
2•CharlesW•7m ago•0 comments

Show HN: Chord: Clawdbot alternative with a security layer

https://github.com/tvytlx/chord-releases
1•arctanx•7m ago•0 comments

Show HN: OffLingua on Device AI Translator

https://offlingua.rdcopilot.com/
1•mvpasarel•7m ago•0 comments

Literature Clock

https://literature-clock.jenevoldsen.com/
1•grajmanu•8m ago•0 comments

Temperature and the Sackur–Tetrode Equation [video]

https://www.youtube.com/watch?v=gRPv4rd_6O4
1•surprisetalk•9m ago•0 comments

The mountain that weighed the Earth

https://signoregalilei.com/2026/01/18/the-mountain-that-weighed-the-earth/
2•surprisetalk•9m ago•0 comments

What Drives Retention (2019)

https://www.raphkoster.com/2019/01/30/what-drives-retention/
1•surprisetalk•9m ago•0 comments

Bop It Playing Robot [video]

https://www.youtube.com/watch?v=i9Kmm2tILVo
1•surprisetalk•9m ago•0 comments

Study: More market freedom may mean fewer homicides

https://news.uga.edu/more-market-freedom-fewer-homicides/
1•giuliomagnifico•9m ago•0 comments

Show HN: Dhi – 520x faster data validation for Python, 77x faster for TypeScript

https://github.com/justrach/dhi
1•rachpradhan•9m ago•0 comments

Manage AI Agent skills easily with one CLI command

https://ai-devkit.com/docs/7-skills/
1•hoangnnguyen•11m ago•0 comments

Curl Project Drops Bug Bounties Due to AI Slop Blog – By Maya Posch

https://hackaday.com/2026/01/26/the-curl-project-drops-bug-bounties-due-to-ai-slop/
1•grajmanu•11m ago•1 comments

MCP and Skills: Why Not Both?

https://kvg.dev/posts/20260125-skills-and-mcp/
2•tanelpoder•12m ago•0 comments

The Death of Software 2.0 (A Better Analogy)

https://www.fabricatedknowledge.com/p/the-death-of-software-20-a-better
1•sasvari•13m ago•0 comments

Show HN: Recal – Turn meetings and Slack threads into actionable tasks

https://tryrecal.com
1•markbuilds•14m ago•0 comments

Challenging the dance of bailout and austerity (2025)

https://www.cambridge.org/core/journals/finance-and-society/article/challenging-the-dance-of-bail...
1•robtherobber•15m ago•0 comments

RTX 5090 pricing has risen by 55% since July

https://overclock3d.net/news/gpu-displays/rtx-5090-pricing-spikes-55-increase/
1•akyuu•15m ago•0 comments

Let's Make Sweet Music – a music editor built in Svelte, Vite, and Opus 4.5

https://lets-make-sweet-music.com
1•paulbjensen•16m ago•0 comments

Post-Perihelion Integral Field Spectroscopy of the Interstellar Comet 3I/Atlas

https://arxiv.org/abs/2601.16983
1•bikenaga•17m ago•1 comments