frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Qwen3-4B-Thinking-2507

https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507
92•IdealeZahlen•2h ago

Comments

gok•1h ago
So this 4B dense model gets very similar performance to the 30B MoE variant with 7.5x smaller footprint.
smallerize•9m ago
It gets similar performance to the old version of the 30B MoE model, but not the updated version. https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507
esafak•1h ago
This one should work on personal computers! I'm thankful for Chinese companies raising the floor.
johndhi•17m ago
Are some of the comments in this thread (and the others about Qwen) potentially generated by shills?

I get the feeling I'm being slightly propagandized in this comment.

redman25•12m ago
I’m American. Just giving some background to the feeling. There’s some discontent with some western communities (localllama) that Chinese developers have been open weighting all of their models while most western models have been closed weights.
thatwasunusual•10m ago
Let's say any country create the most powerful - and thus best - LLMs. They over time infiltrate it with their political will. Over 20-30 years, I'd imagine people asking those LLMs will have their minds' shifted.

But. That's just me, my pessimism-sci-fi scenario.

whimsicalism•3m ago
You can obviously see not from clicking on their name, I wouldn’t assume every positive comment about China is a ‘shill’ - there are many people unhappy with our current neo-cold war.

I’d also reread the HN guidelines

frontsideair•1h ago
According to the benchmarks, this one is improved in every one of them compared to the previous version, some better than 30B-A3B. Definitely worth a try, it’ll easily fit into memory and token generation speed will be pleasantly fast.
GaggiX•33m ago
There is a new Qwen3-30B-A3B, you are compare it to the old one. https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507
tolerance•1h ago
Is there like a leaderboard or power rankings sort of thing that tracks these small open models and assigns ratings or grades to them based on particular use cases?
esafak•1h ago
https://artificialanalysis.ai/leaderboards/models?open_weigh...
cowpig•54m ago
Compare these rankings to actual usage: https://openrouter.ai/rankings

Claude is not cheap, why is it far and away the most popular if it's not top 10 in performance?

Qwen3 235b ranks highest on these benchmarks among open models, but I have never met someone who prefers its output over Deepseek R1. It's extremely wordy and often gets caught in thought loops.

My interpretation is that the models at the top of ArtificialAnalysis are focusing the most on public benchmarks in their training. Note I am not saying XAI is necessarily nefariously doing this, could just be that they decided it's better bang for the buck to rely on public benchmarks than to try to focus on building their own evaluation systems.

But Grok is not very good compared to the anthropic, openai, or google models despite ranking so highly in benchmarks.

GaggiX•43m ago
Claude Opus is in the top 10, also people via OpenRouter mostly use these models for coding and Claude models are particularly good at this, the benchmark doesn't account only for coding capacities tho
byefruit•39m ago
The openrouter rankings can be biased.

For example, Google's inexplicable design decisions around libraries and APIs means it's often worth the 5% premium to just use OpenRouter to access their models. In other cases it's about which models particular agents default to.

Sonnet 4 is extremely good for tool-usage agentic setups though - something I have found other models struggle to do over a long-context.

ImageXav•17m ago
Thanks for sharing that. Interesting that the leaderboard is dominated by Anthropic, Google and DeepSeek. Openai doesn't even register.
jampa•22m ago
I am reading this right, is this model way better than Gemma 3n[1]? (For only the benchmarks that are common among the models)

=====

LiveCodeBench

E4B IT: 13.2

Qwen: 55.2

===== AIME25

E4B IT: 11.6

Qwen: 81.3

[1]: https://huggingface.co/google/gemma-3n-E4B

film42•14m ago
Is there a crowd-sourced sentiment score for models? I know all these scores are juiced like crazy. I stopped taking them at face value months ago. What I want to know is if other folks out there actually use them or if they are unreliable.
nurettin•10m ago
This has been around for a while https://lmarena.ai/leaderboard/text/coding
klohto•9m ago
openrouter usage stats
esafak•5m ago
https://openrouter.ai/rankings

The new qwen3 model is not out yet.

Sentry Returning Intermittent 500s

https://status.sentry.io/incidents/8zd66t4svq4k
1•tjwds•12s ago•0 comments

Govt. Website 'Glitch' Removes Trump's Least Favorite Part of Constitution

https://www.rollingstone.com/politics/politics-features/trump-least-favorite-part-constitution-deleted-1235401874/
1•LopRabbit•2m ago•0 comments

Ultraprocessed vs. Minimally Processed Diets

https://www.nature.com/articles/s41591-025-03842-0
2•bookofjoe•3m ago•0 comments

Ask HN: Should schools have a subject where students can think about whatever?

1•amichail•3m ago•0 comments

Apple to invest $100B after pressure from Trump

https://www.bbc.com/news/articles/cdx0n7y29kdo
1•tartoran•3m ago•0 comments

Ask HN: Is Kundalini Dangerous?

2•praxipro•4m ago•1 comments

What's Trending on Open Library?

https://blog.openlibrary.org/2025/08/06/whats-trending-on-open-library/
2•raybb•6m ago•1 comments

How to interactively debug GitHub Actions with netcat

https://jacobtomlinson.dev/posts/2021/how-to-interactively-debug-github-actions-with-netcat/
2•mihau•6m ago•0 comments

Citizen Lab director warns cyber industry about US authoritarian descent

https://techcrunch.com/2025/08/06/citizen-lab-director-warns-cyber-industry-about-us-authoritarian-descent/
2•mdhb•6m ago•0 comments

Could lithium stave off Alzheimer's disease?

https://www.science.org/content/article/could-lithium-stave-alzheimer-s-disease
3•bikenaga•6m ago•0 comments

New hope for Alzheimer's: lithium supplement reverses memory loss in mice

https://www.nature.com/articles/d41586-025-02471-4
2•nullhole•10m ago•0 comments

Norway's Hedged Bet on Europe's Energy Future: A Garbage Disposal for Emissions

https://www.nytimes.com/2025/08/05/business/norway-cabon-capture-northern-lights.html
1•mitchbob•10m ago•1 comments

Trump, Apple to Announce New $100B Commitment to Manufacturing in US

https://www.cbsnews.com/news/trump-apple-committing-100-billion-manufacturing-us/
8•m463•11m ago•0 comments

Gleam v1.12.0 Released

https://github.com/gleam-lang/gleam/blob/main/changelog/v1.12.md
3•Alupis•11m ago•0 comments

AstraZeneca signs AI research deal with China's CSPC

https://www.reuters.com/business/healthcare-pharmaceuticals/astrazeneca-agrees-research-deal-worth-up-522-billion-with-cspc-2025-06-13/
2•colinprince•11m ago•0 comments

LLMs have loss aversion too

https://substack.com/inbox/post/170287169
2•mathattack•12m ago•1 comments

GPT-5 Livestream starts Aug 7 10am Pacific

https://x.com/i/trending/1953149560030949822
2•bretpiatt•13m ago•0 comments

The possibility of a giant impact on Venus

https://arxiv.org/abs/2508.03239
3•bikenaga•15m ago•0 comments

Fraudsters access KLM customer details in data breach

https://www.amlintelligence.com/2025/08/news-fraudsters-access-klm-customer-details-in-data-breach/
3•vinni2•16m ago•0 comments

Trump to Announce Additional $100B Apple Investment in U.S.

https://www.nytimes.com/2025/08/06/us/politics/trump-apple-investment.html
7•2OEH8eoCRo0•17m ago•0 comments

Minimize AI hallucinations and deliver up to 99% verification accuracy

https://aws.amazon.com/blogs/aws/minimize-ai-hallucinations-and-deliver-up-to-99-verification-accuracy-with-automated-reasoning-checks-now-available/
3•kurhan•18m ago•0 comments

80 Years Ago, Nuclear Annihilation Came to Japan

https://www.nytimes.com/2025/08/05/world/asia/hiroshima-nagasaki-japan-nuclear-photos.html
9•thm•19m ago•1 comments

Good context leads to good code: How we built an AI-Native Eng Culture

https://blog.stockapp.com/good-context-good-code/
4•waleedk•20m ago•1 comments

New Gemini app tools to help students learn, understand and study better

https://blog.google/products/gemini/new-gemini-tools-students-august-2025/
3•srameshc•21m ago•0 comments

An Open-Source Asynchronous Coding Agent

https://github.com/langchain-ai/open-swe
3•saikatsg•23m ago•0 comments

Car Reinforcement Learning Training

https://github.com/leesweqq/car_chase_robot_RL
3•kyleliiii•25m ago•1 comments

Intel struggles with key manufacturing process for next PC chip

https://www.reuters.com/world/asia-pacific/intel-struggles-with-key-manufacturing-process-next-pc-chip-sources-say-2025-08-05/
3•selimthegrim•25m ago•0 comments

Quick Read: What Happens If Every Light in the World Is Switched on at Once?

https://www.sciencealert.com/what-happens-if-every-light-in-the-world-is-switched-on-at-once
3•gautamsomani•26m ago•1 comments

Show HN: Text Symbols

https://symbol.so/
3•liquid99•26m ago•0 comments

Web Guide: An experimental AI-organized search results page

https://blog.google/products/search/web-guide-labs/
3•ilamont•26m ago•0 comments