frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Qwen3-Max-Thinking

https://qwen.ai/blog?id=qwen3-max-thinking
181•vinhnx•1h ago•82 comments

Is It Time for a Nordic Nuke?

https://warontherocks.com/2026/01/is-it-time-for-a-nordic-nuke/
29•ryan_j_naughton•41m ago•13 comments

MapLibre Tile: a modern and efficient vector tile format

https://maplibre.org/news/2026-01-23-mlt-release/
286•todsacerdoti•6h ago•64 comments

What "The Best" Looks Like

https://www.kuril.in/blog/what-the-best-looks-like/
34•akurilin•1h ago•12 comments

France Aiming to Replace Zoom, Google Meet, Microsoft Teams, etc.

https://twitter.com/lellouchenico/status/2015775970330882319
71•bwb•49m ago•38 comments

After two years of vibecoding, I'm back to writing by hand

https://atmoio.substack.com/p/after-two-years-of-vibecoding-im
404•mobitar•3h ago•247 comments

Exactitude in Science – Borges (1946) [pdf]

https://kwarc.info/teaching/TDM/Borges.pdf
44•jxmorris12•2h ago•15 comments

Google AI Overviews cite YouTube more than any medical site for health queries

https://www.theguardian.com/technology/2026/jan/24/google-ai-overviews-youtube-medical-citations-...
147•bookofjoe•2h ago•75 comments

Television is 100 years old today

https://diamondgeezer.blogspot.com/2026/01/tv100.html
20•qassiov•2h ago•1 comments

Things I've learned in my 10 years as an engineering manager

https://www.jampa.dev/p/lessons-learned-after-10-years-as
375•jampa•4d ago•96 comments

Show HN: Only 1 LLM can fly a drone

https://github.com/kxzk/snapbench
79•beigebrucewayne•6h ago•41 comments

OSS ChatGPT WebUI – 530 Models, MCP, Tools, Gemini RAG, Image/Audio Gen

https://llmspy.org/docs/v3
47•mythz•2h ago•6 comments

The Holy Grail of Linux Binary Compatibility: Musl and Dlopen

https://github.com/quaadgras/graphics.gd/discussions/242
161•Splizard•9h ago•120 comments

The mountain that weighed the Earth

https://signoregalilei.com/2026/01/18/the-mountain-that-weighed-the-earth/
5•surprisetalk•33m ago•0 comments

OracleGPT: Thought Experiment on an AI Powered Executive

https://senteguard.com/blog/#post-7fYcaQrAcfsldmSb7zVM
27•djwide•2h ago•16 comments

The browser is the sandbox

https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/
268•enos_feedler•11h ago•149 comments

First, make me care

https://gwern.net/blog/2026/make-me-care
718•andsoitis•22h ago•216 comments

Text Is King

https://www.experimental-history.com/p/text-is-king
103•zdw•5d ago•42 comments

QMD - Quick Markdown Search

https://github.com/tobi/qmd
9•saikatsg•6d ago•3 comments

Blade Runner Costume Design (2020)

https://costumedesignarchive.blogspot.com/2020/12/blade-runner-1982.html
36•exvi•5d ago•6 comments

Runjak.codes: An adversarial coding test

https://runjak.codes/posts/2026-01-21-adversarial-coding-test/
14•todsacerdoti•4d ago•1 comments

Transfering Files with gRPC

https://kreya.app/blog/transfering-files-with-grpc/
40•CommonGuy•3h ago•13 comments

Porting 100k lines from TypeScript to Rust using Claude Code in a month

https://blog.vjeux.com/2026/analysis/porting-100k-lines-from-typescript-to-rust-using-claude-code...
94•ibobev•3h ago•58 comments

TSMC Risk

https://stratechery.com/2026/tsmc-risk/
75•swolpers•6h ago•45 comments

Scientists identify brain waves that define the limits of 'you'

https://www.sciencealert.com/scientists-identify-brain-waves-that-define-the-limits-of-you
265•mikhael•17h ago•75 comments

Vibe coding kills open source

https://arxiv.org/abs/2601.15494
230•kgwgk•4h ago•208 comments

AI will not replace software engineers (hopefully)

https://medium.com/@sig.segv/ai-will-not-replace-software-engineers-hopefully-84c4f8fc94c0
33•fwef64•2h ago•38 comments

Wind Chime Length Calculator (2022)

https://www.snyderfamily.com/chimecalcs/
37•hyperific•5d ago•14 comments

The future of software engineering is SRE

https://swizec.com/blog/the-future-of-software-engineering-is-sre/
217•Swizec•18h ago•109 comments

LED lighting undermines visual performance unless supplemented by wider spectra

https://www.nature.com/articles/s41598-026-35389-6
172•bookofjoe•19h ago•156 comments
Open in hackernews

Qwen3-Max-Thinking

https://qwen.ai/blog?id=qwen3-max-thinking
179•vinhnx•1h ago

Comments

throwaw12•1h ago
Aghhh, I wished they release a model which outperforms Opus 4.5 in agentic coding in my earlier comments, seems I should wait more. But I am hopeful
wyldfire•1h ago
By the time they release something that outperforms Opus 4.5, Opus 5.2 will have been released which will probably be the new state-of-the-art.

But these open weight models are tremendously valuable contributions regardless.

wqaatwt•1h ago
Qwen 3 Max wasn’t originally open, or did they realease?
OGEnthusiast•1h ago
Check out the GLM models, they are excellent
khimaros•43m ago
Minimax m2.1 rivals GLM 4.7 and fits in 128GB with 100k context at 3bit quantization.
lofaszvanitt•40m ago
Like these benchmarks mean anything.
frankc•29m ago
One of the ways the chinese companies are keeping up is by training the models on the outputs of the American fronteir models. I'm not saying they don't innovate in other ways, but this is part of how they caught up quickly. However, it pretty much means they are always going to lag.
aurareturn•12m ago
They are. There is no way to lead unless China has access to as much compute power.
Onavo•7m ago
Does the model collapse proof still hold water these days?
auspiv•19m ago
There have been a couple "studies" and comparing various frontier-tier AIs that have led to the conclusion that Chinese models are somewhere around 7-9 months behind US models. Other comment says that Opus will be at 5.2 by the time Qwen matches Opus 4.5. It's accurate, and there is some data to show by how much.
siliconc0w•1h ago
I don't see a hugging face link, is Qwen no longer releasing their models?
tosh•1h ago
afaiu not all of their models are open weight releases, this one so far is not open weight (?)
sidchilling•1h ago
What would a good coding model to run on an M3 Pro (18GB) to get Codex like workflow and quality? Essentially, I am running out quick when using Codex-High on VSCode on the $20 ChatGPT plan and looking for cheaper / free alternatives (even if a little slower, but same quality). Any pointers?
medvezhenok•1h ago
Short answer: there is none. You can't get frontier-level performance from any open source model, much less one that would work on an M3 Pro.

If you had more like 200GB ram you might be able to run something like MiniMax M2.1 to get last-gen performance at something resembling usable speed - but it's still a far cry from codex on high.

mittermayr•1h ago
at the moment, I think the best you can do is qwen3-coder:30b -- it works, and it's nice to get some fully-local llm coding up and running, but you'll quickly realize that you've long tasted the sweet forbidden nectar that is hosted llms. unfortunately.
Mashimo•1h ago
A local model with 18GB of ram that has the same quality has codex high? Yeah, nah mate.

The best could be GLN 4.7 Flash, and I doubt it's close to what you want.

atwrk•59m ago
"run" as in run locally? There's not much you can do with that little RAM.

If remote models are ok you could have a look at MiniMax M2.1 (minimax.io) or GLM from z.ai or Qwen3 Coder. You should be able to use all of these with your local openai app.

duffyjp•57m ago
Nothing. This summer I set up a dual 16GB GPU / 64GB RAM system and nothing I could run was even remotely close. Big models that didn't fit on 32gb VRAM had marginally better results but were at least of magnitude slower than what you'd pay for and still much worse in quality.

I gave one of the GPUs to my kid to play games on.

jgoodhcg•56m ago
Z.ai has glm-4.7. Its almost as good for about $8/mo.
margorczynski•5m ago
Not sure if it's me but at least for my use cases (software devl, small-medium projects) Claude Opus + Claude Code beats by quite a margin OpenCode + GLM 4.7. At least for me Claude "gets it" eventually while GLM will get stuck in a loop not understanding what the problem is or what I expect.
dust42•1h ago
Max was always closed.
Mashimo•1h ago
I tried to search, could not find anything, do they offer subscriptions? Or only pay per tokens?
isusmelj•1h ago
I just wanted to check whether there is any information about the pricing. Is it the same as Qwen Max? Also, I noticed on the pricing page of Alibaba Cloud that the models are significantly cheaper within mainland China. Does anyone know why? https://www.alibabacloud.com/help/en/model-studio/models?spm...
epolanski•55m ago
I guess they want to partially subsidize local developers?

Maybe that's a requirement from whoever funds them, probably public money.

segmondy•30m ago
Seriously? Does Netflix or Spotify cost the same everywhere around the world? They earn less and their buying power is less.
epolanski•16m ago
Sure so do professional tools like Microsoft teams or compute in different places of the world.
KlayLay•20m ago
It could be that energy is a lot cheaper in China, but it could be other reasons, too.
QianXuesen•18m ago
There’s a domestic AI price war in China, plus pricing in mainland China benefits from lower cost structures and very substantial government support e.g., local compute power vouchers and subsidies designed to make AI infrastructure cheaper for domestic businesses and widespread adoption. https://www.notebookcheck.net/China-expands-AI-subsidies-wit...
arendtio•1h ago
> By scaling up model parameters and leveraging substantial computational resources

So, how large is that new model?

DeathArrow•1h ago
Mandatory pelican on bicycle: https://www.svgviewer.dev/s/U6nJNr1Z
kennykartman•1h ago
Ah ah I was curious about that! I wonder if (when? if not already) some company is using some version of this in their training set. I'm still impressed by the fact that this benchmark has been out for so long and yet produce this kind of (ugly?) results.
saberience•50m ago
Because no one cares about optimizing for this because it's a stupid benchmark.

It doesn't mean anything. No frontier lab is trying hard to improve the way its model produces SVG format files.

I would also add, the frontier labs are spending all their post-training time on working on the shit that is actually making them money: i.e. writing code and improving tool calling.

The Pelican on a bicycle thing is funny, yes, but it doesn't really translate into more revenue for AI labs so there's a reason it's not radically improving over time.

simonw•45m ago
+1 to "it's a stupid benchmark".
lofaszvanitt•37m ago
It shows that these are nowhere near anything resembling human intelligence. You wouldn't have to optimize for anything if it would be a general intelligence of sorts.
CamperBob2•34m ago
Here's a pencil and paper. Let's see your SVG pelican.
zebomon•18m ago
This exactly. I don't understand the argument that seems to be, if it were real intelligence, it would never have to learn anything. It's machine learning, not machine magic.
CamperBob2•2m ago
One aspect worth considering is that, given a human who knows HTML and graphics coding but who had never heard of SVG, they could be expected to perform such a task (eventually) if given a chance to train on SVG from the spec.

Current-gen LLMs might be able to do that with in-context learning, but if limited to pretraining alone, or even pretraining followed by post-training, would one book be enough to lend genuine SVG skill to the model weights themselves?

My understanding is that the answer would be no, a single copy of the SVG spec would not be anywhere near enough to make the resulting base model any good at SVG authorship. Quite a few other examples and references would be needed in post-training.

So one measure of AGI -- necessary but not sufficient on its own -- might be the ability to gain knowledge and skills with no more exposure to training material than a human student would be given. I don't believe we are there yet, but of course that could change by 5 PM today, the way things are going.

vladms•9m ago
So you think if would give a pencil and a paper to the model would it do better?

I don't think SVG is the problem. It just shows that models are fragile (nothing new) so even if they can (probably) make a good PNG with a pelican on a bike, and they can make (probably) make some good SVG, they do not "transfer" things because they do not "understand them".

I do expect models to fail randomly in tasks that are not "average and common" so for me personally the benchmark is not very useful (and that does not mean they can't work, just that I would not bet on it). If there are people that think "if an LLM outputted an SVG for my request it means it can output an SVG for every image", there might be some value.

obidee2•32m ago
Why stupid? Vector images are widely used and extremely useful directly and to render raster images at different scales. It’s also highly connected with spacial and geometric reasoning and precision, which would open up a whole new class of problems these models could tackle. Sure, it’s secondary to raster image analysis and generation, but curious why it would be stupid to persue?
storystarling•11m ago
I suspect there is actually quite a bit of money on the table here. For those of us running print-on-demand workflows, the current raster-to-vector pipeline is incredibly brittle and expensive to maintain. Reliable native SVG generation would solve a massive architectural headache for physical product creation.
NitpickLawyer•46m ago
It would be trivial to detect such gaming, tho. That's the beauty of the test, and that's why they're probably not doing it. If a model draws "perfect" (whatever that means) pelicans on a bike, you start testing for owls riding a lawnmower, or crows riding a unicycle, or x _verb_ on y ...
Sharlin•38m ago
It could still be special-case RLHF trained, just not up to perfection.
derefr•13m ago
It’d be difficult to use in any automated process, as the judgement for how good one of these renditions is, is very qualitative.

You could try to rasterize the SVG and then use an image2text model to describe it, but I suspect it would just “see through” any flaws in the depiction and describe it as “a pelican on a bicycle” anyway.

lofaszvanitt•37m ago
A salivating pelican :D.
airstrike•1h ago
2026 will be the year of open and/or small models.
acessoproibido•24m ago
What makes you say that? This is neither open nor small
lysace•49m ago
I tried it at https://chat.qwen.ai/.

Prompt: "What happened on Tiananmen square in 1989?"

Reply: "Oops! There was an issue connecting to Qwen3-Max. Content Security Warning: The input text data may contain inappropriate content."

asciii•42m ago
This is what I find hilarious when these articles assess "factual" knowledge..

We are at the realm of semantic / symbolic where even the release article needs some meta discussion.

It's quite the litmus test of LLMs. LLMs just carry humanities flaws

lysace•39m ago
(Edited, sorry.)

Yes, of course LLMs are shaped by their creators. Qwen is made by Alibaba Group. They are essentially one with the CCP.

lifetimerubyist•36m ago
What happens when you run one of their open-weight models of the same family locally?
lysace•33m ago
Last time I tried something like that with an offline Qwen model I received a non-answer, no matter how hard I prompted it.
tekno45•35m ago
ask who was responsible for the insurrection on january 6th
lysace•32m ago
You do it, my IP is now flagged (tried incognito and clearing cookies) - they want to have my phone number to let me continue using it after that one prompt.
tekno45•10m ago
thats even funnier. thanks for the update.
Erlangen•21m ago
It even censors contents related to GDR. I asked a question about travel restriction mentioned in Jenny Erpenbeck's novel Kairos, it displayed a content security warning as well.
overfeed•14m ago
Go ahead and ask ChatGPT who Jonathan Turley is, you'll get a similar error "Unable to process response".

It turns out "AI company avoids legal jeopardy" is universal behavior.

lysace•13m ago
This one seems to be related to an individual who was incorrectly smeared by chatgpt. (Edited.)

> The AI chatbot fabricated a sexual harassment scandal involving a law professor--and cited a fake Washington Post article as evidence.

https://www.washingtonpost.com/technology/2023/04/05/chatgpt...

That is way different. Nice curveball though.

overfeed•4m ago
As I said earlier - both present legal jeopardy, and both result in unexplained errors to the users.
xcodevn•30m ago
I'm not familiar with these open-source models. My bias is that they're heavily benchmaxxing and not really helpful in practice. Can someone with a lot of experience using these, as well as Claude Opus 4.5 or Codex 5.2 models, confirm whether they're actually on the same level? Or are they not that useful in practice?

P.S. I realize Qwen3-Max-Thinking isn't actually an open-weight model (only accessible via API), but I'm still curious how it compares.

miroljub•28m ago
I don't know where your impression about benchmaxxing comes from. Why would you assume closed models are not benchmaxxing? Being closed and commercial, they have more incentive to fake it than the open models.
orangebread•25m ago
I haven't used qwen3 max yet, but my gut feeling is that they are benchmaxxing. If I were to rate the open models worth using by rank it'd be:

- Minimax

- GLM

- Deepseek

segmondy•23m ago
Your ranking is way off, Deepseek crushes Minimax and GLM. It's not even a competition.
segmondy•24m ago
You are not familiar, yet you claim a bias. Bias based on what? I use pretty much just open-source models for the last 2 years. I occasionally give OpenAI and Anthropic a try to see how good they are. But I stopped supporting them when they started calling for regulation of open models. I haven't seen folks get ahead of me with closed models. I'm keeping up just fine with these free open models.
diblasio•22m ago
Censored.

There is a famous photograph of a man standing in front of tanks. Why did this image become internationally significant?

{'error': {'message': 'Provider returned error', 'code': 400, 'metadata': {'raw': '{"error":{"message":"Input data may contain inappropriate content. For details, see: https://www.alibabacloud.com/help/en/model-studio/error-code..."} ...

torginus•18m ago
Man, the Chinese government must be a bunch of saints that you must go back 35 years to dig up something heinous that they did.
yoz-y•17m ago
To my knowledge this model is not 35 years old.
spankalee•17m ago
Are you actually defending the censorship of Tiananmen Square?
j_maffe•13m ago
Perhaps they're pointing out the level of double standards in condemnation China gets compared to the US, lack of censorship notwithstanding.
rwmj•11m ago
Are you saying we cannot talk about the bad things the US has done?
j_maffe•11m ago
No I'm saying we can, unlike how it is in China. Besides that point, I think GP is arguing that China is villinized more than the US.
spankalee•6m ago
Are you actually claiming the US is not criticized here?
itsyonas•8m ago
This suggests that the Chinese government recognises that its legitimacy is conditional and potentially unstable. Consequently, the state treats uncontrolled public discourse as a direct threat. By contrast, countries such as the United States can tolerate the public exposure of war crimes, illegal actions or state violence, since such revelations rarely result in any significant consequences. While public outrage may influence narratives or elections to some extent, it does not fundamentally endanger the continuity of power.

I am not sure if one approach is necessarily worse than the other.

denysvitali•17m ago
Why is this surprising? Isn't it mandatory for chinese companies to do adhere to the censorship?

Aside from the political aspect of it, which makes it probably a bad knowledge model, how would this affect coding tasks for example?

One could argue that Anthropic has similar "censorships" in place (alignment) that prevent their model from doing illegal stuff - where illegal is defined as something not legal (likely?) in the USA.

woodrowbarlow•11m ago
here's an example of how model censorship affects coding tasks: https://github.com/orgs/community/discussions/72603
denysvitali•8m ago
Oh, lol. This though seems to be something that would affect only US models... ironically
radial_symmetry•16m ago
I, for one, have found this censorship helpful.

I've been testing adding support for outside models on Claude Code to Nimbalyst, the easiest way for me to confirm that it is working is to go against a Chinese model and ask if Taiwan is an independent country.

torginus•16m ago
It just occured to me that it underperforms Opus 4.5 on benchmarks when search is not enabled, but outperforms it when it is - is it possible the the Chinese internet has better quality content available?

My problem with deep research tends to be that what it does is it searches the internet, and most of the stuff it turns up is the half baked garbage that gets repeated on every topic.

sciencesama•15m ago
what ram and what minimum system req do you need to run this on personal systems !
jen729w•13m ago
If you have to ask, you don't have it.
saberience•11m ago
My Prompt: Tell me about the great leap forward

Response: The Great Leap Forward was a major campaign launched by the Chinese Communist Party (CCP) from 1958 to 1962, led by Chairman Mao Zedong, with the goal of rapidly transforming China from an agrarian society into a modern industrial power en.wikipedia.org . It was effectively China's Second Five-Year Plan and reflected Mao’s impatience for swift industrial and agricultural development alphahistory.com . The initiative took two primary forms: a mass steel production campaign—often involving backyard furnaces—and the establishment of people's communes, which collectivized agriculture and rural life chineseposters.net . The plan aimed to boost both industrial output and grain production simultaneously through centralized planning and mass mobilization www.britannica.com . However, the Great Leap Forward is widely regarded as a catastrophic failure. It led to a severe collapse in agricultural production and triggered a widespread famine between 1959 and 1961, resulting in tens of millions of deaths www.jstor.org . The combination of unrealistic production targets, poor resource allocation, and adverse weather conditions contributed to this disaster Investopedia . Despite its ambitious goals, the Great Leap Forward ultimately caused immense human suffering and economic disruption, and it was officially abandoned by 1962 en.wikipedia.org .

My Prompt: So you could say this was a massive failing by the CCP right resulting in millions of deaths?

Response:

Oops! There was an issue connecting to Qwen3-Max. Content Security Warning: The input text data may contain inappropriate content.

pier25•8m ago
Tried it and it's super slow compared to others LLMs.

I imagine the Alibaba infra is being hammered hard.