frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
64•ColinWright•58m ago•31 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
18•surprisetalk•1h ago•15 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
120•AlexeyBrin•7h ago•24 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
96•alephnerd•1h ago•45 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
823•klaussilveira•21h ago•248 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
55•vinhnx•4h ago•7 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
53•thelok•3h ago•6 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
103•1vuio0pswjnm7•8h ago•118 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
1057•xnx•1d ago•608 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
75•onurkanbkrc•6h ago•5 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
478•theblazehen•2d ago•175 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
202•jesperordrup•11h ago•69 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
546•nar001•5h ago•252 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
213•alainrk•6h ago•332 comments

Selection Rather Than Prediction

https://voratiq.com/blog/selection-rather-than-prediction/
8•languid-photic•3d ago•1 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
34•rbanffy•4d ago•7 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
27•marklit•5d ago•2 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
113•videotopia•4d ago•30 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
73•speckx•4d ago•74 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
68•mellosouls•4h ago•73 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
273•isitcontent•21h ago•37 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
199•limoce•4d ago•111 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
285•dmpetrov•22h ago•153 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
21•sandGorgon•2d ago•11 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
155•matheusalmeida•2d ago•48 comments

Ga68, a GNU Algol 68 Compiler

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
43•matt_d•4d ago•18 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
555•todsacerdoti•1d ago•268 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
424•ostacke•1d ago•110 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
472•lstoll•1d ago•312 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
348•eljojo•1d ago•215 comments
Open in hackernews

Qwen3-235B-A22B-Thinking-2507

https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507
155•tosh•6mo ago

Comments

danielhanchen•6mo ago
I'm making dynamic GGUFs for local inference at https://huggingface.co/unsloth/Qwen3-235B-A22B-Thinking-2507... Also guide to run them https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tun...
arcanemachiner•6mo ago
Would I notice a difference between the Q2_K and Q2_K_XL variants?
danielhanchen•6mo ago
Oh I would always use Q2_K_XL :) It uses our dynamic methodology to quantize certain layers in different bits ie 2, 3, 4, 5, 6, 8 bits - the more important the layer is, the higher the bitrate
Squeeze2664•6mo ago
How do you determine the importance of a layer in this case?
kkzz99•6mo ago
Afaik they have a test bench that they use and take the activation data from that.
danielhanchen•6mo ago
Yes we have around 1 to 3 million tokens of high quality self verified data that we use to calibrate models!
smallerize•6mo ago
https://unsloth.ai/blog/dynamic-v2
danielhanchen•6mo ago
Yes also https://unsloth.ai/blog/deepseekr1-dynamic, https://unsloth.ai/blog/dynamic-4bit, https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs
DrPhish•6mo ago
I generally download the safetensors and make my own GGUFs, usually at Q8_0. Is there any measurable benefit to your dynamic quants at that quant level? I looked at your dynamic quant 2.0 page, but all the charts and graphs appear to cut off at Q4.
danielhanchen•6mo ago
Oh I also upload Q8_K_XL for eg, which will upcast important layers to BF16 / F16 as well!

Oh the blog at https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs does talk about 1, 2, 3, 4, 5, 6 and 8bit dynamic GGUFs as well!

There definitely is a benefit for dynamically selecting layers to be at diff bit rates - I wrote about the difference between naively quantizing and selectively quantizing: https://unsloth.ai/blog/deepseekr1-dynamic

DrPhish•6mo ago
Thanks Daniel. I know you upload them, but I was hoping for some solid numbers on your dynamic q8 vs a naive quant. There doesn't seem to be anything on either of those links to show improvement at those quant levels.

My gut feeling is that there's not enough benefit to outweigh the risk of putting a middleman in the chain of custody from the original model to my nvme.

However, I can't know for sure without more testing than I have the time or inclination for, which is why I was hoping there had been some analysis you could point me to.

mycpuorg•6mo ago
@danielhanchen, can we use these steps to fine tune other qwen3 models too? like 480B coder or embeddings model?
danielhanchen•6mo ago
Oh for finetuning - we do have some code for MoE finetuning for Qwen at https://github.com/unslothai/unsloth, but we haven't yet announced it yet!
aliljet•6mo ago
I see the term 'local inference' everywhere. It's an absurd misnomer without hardware and cost defined. I can also run a coal fired power plant in my backyard, but in practice, there's no reasonable way to make that economical beyond being a toy.

(And I should add, you are a hero for doing this work, only love in my comment, but still a demand for detail$!)

regularfry•6mo ago
Hardware and cost is assumed to be approximately desktop-class. If you've got a gaming rig with an RTX 4090 and 128MB RAM, you can run this if you pick the right quant.
cmpxchg8b•6mo ago
128MB? Quantization has come a long way!
danielhanchen•6mo ago
I think they mis-spoke 128GB* :)
regularfry•6mo ago
Wishful thinking there on my part.
danielhanchen•6mo ago
Though technically < 1GB is enough - one had offload it to the SSD, albeit with very slow speeds!
danielhanchen•6mo ago
The trick of llama.cpp and our dynamic quants is you can actually offload the model to RAM / even an SSD! If you have GPU VRAM + RAM + SSD > the model size (say 90GB for dynamic 2bit quant), then it'll run well!

Ie you can actually run it on a local desktop or even your laptop now! You don't need a 90GB GPU for example, but say a 24GB GPU + 64GB to 128GB RAM.

The speeds are around 3 to 5 tokens / second, so still ok! I write more about improving speed for local devices here: https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tun...

lostmsu•6mo ago
How usable are 1 and less than 1 bit quants?
danielhanchen•6mo ago
Oh 1bit reminder isn't 1bit, but a mixture of 1 to 8bit! I would still use 2bit dynamic - 1bit dynamic sometimes can go into weird repetitive loops, but it still does produce reasonable output.

Larger models with 1bit do better - for eg 480B Coder 1bit actually does very well!

tosh•6mo ago
If the evals hold up this is a mindblowing new weight to capability ratio

edit: afaiu deepseek r1 was 671B with 37B active params

energy123•6mo ago
Am I the only one who ignores evals, unless they're holdout datasets like a new IMO competition, or at a minimum evals with a semi-private test set like ARC-AGI 2? How can we trust that these companies don't put copies of these benchmarks in their training data? They can get whatever score they want, up to 100%, easily, by just training on that data sufficiently many times.
christianqchung•6mo ago
There is something of a middle ground here for benchmark skepticism. Big companies wouldn't want a massive divergence between benchmarks and real performance that people could actually notice, and I'd argue for the most part that this hasn't happened too much (although above I posted a problem with Qwen and ARC). However, finetunes by random people/groups don't carry the same downside so I'm basically skeptical of all finetunes before using them for a particular case.
energy123•6mo ago
I don't believe these companies see their customers as being able to tell the difference between a real GPQA score and a GPQA score that's fudged upwards by 10%. Look at Elon Musk presenting himself to the world as a Path of Exile expert when in reality he likely hired an expert to level up his account while he himself is an amateur. They think we are idiots and will lie to us to capture market share and lock us into their ecosystem.
christianqchung•6mo ago
That's true, I certainly wouldn't be able to tell. I was thinking on the order of a 20% score vs 70%, but I realize that's not a very compelling range for my point when people are boasting about <5% shifts.
nonhaver•6mo ago
impressive evals. i wonder how much of that can be attributed to the enhanced context understanding. i feel like that/length are the bottleneck of the majority of commercial models.
Eisenstein•6mo ago
I don't know, I think that extending context windows is actually detrimental because people assume they can just dump things in there until it fills up. You still have to deal with the limited attention that the models have, and only filling the context with things relevant to the particular thing you are trying to solve is going to be the most effective approach. If you have too much information for it to fit into a 128K window, I think you just have too much information. The entirety of Don Quixote at over 1000 pages is less than 64,000 tokens.
CamperBob2•6mo ago
That sounds low by about 10x, assuming Don Quixote has 430k words (per Google).

Still, yes, I don't know of a single model that doesn't go off the rails if you actually try to take advantage of its context length specification.

Eisenstein•6mo ago
Well, I loaded up Llama 3 and downloaded the novel, and for the English translation we get 545997 tokens and in the original Spanish 653981 tokens. So when I estimated it did lose a an order of magnitude. Thanks for the correction.
pama•6mo ago
Anyone here has tips for the code and hardware setup to get best per-GPU throughput on H200 or B200 hardware for large reasoning traces and inputs of around 10k–40k tokens? Is there an equivalent effort to sglang’s optimization of the V3/R1 throughput for this class of models?
sophia01•6mo ago
If this is actually competitive with Gemini 2.5 Pro that would be insane esp for an Apache2 truly open weights model, let's hope it's not too hacked to shine on benchmarks!
lvl155•6mo ago
Qwen3 models are solid and at such a low cost, it doesn’t hurt to pair it with something like Sonnet 4 as a check. I mean it does eliminate a lot of Claude’s “You’re absolutely right!” loops.
apwell23•6mo ago
> I mean it does eliminate a lot of Claude’s “You’re absolutely right!” loops.

not as scary as "Let me try a completely different approach" . Now you have to throw out all the AI slop and start from scratch.

cma•6mo ago
If you aren't using source control
christianqchung•6mo ago
For what it's worth, the Qwen team misreported an ARC-AGI score benchmark on the non-thinking model by a factor of 4, which has not been explained yet. They claimed a score of 41.8% on ARC-AGI 1 [0] which is much higher than what non-chain of thought models have been able to achieve (GPT 4.5 got 10%). The ARC team later benchmarked it at 11%[1], which is still a high score, but not the same as 41.8%. It's still probably a significant update on the model though.

[0] https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507

[1] https://x.com/arcprize/status/1948453132184494471

ducviet00•6mo ago
Maybe 41.8% is the score of Qwen3-235B-A22B-Thinking-2507, lol. 11% for the non-thinking model is pretty high
jug•6mo ago
Makes sense, it's in line with Gemini 2.5 Pro in that case. It aligns with their other results in the post.
christianqchung•6mo ago
They made it very clear that they were reporting that score for the non-thinking model[0]. I still don't have any guesses as to what happened here, maybe something format related. I can't see a motivation to blatantly lie on a benchmark which would very obviously be publicly corrected.

[0] https://x.com/JustinLin610/status/1947836526853034403

mattnewton•6mo ago
Could it be the public eval set vs the private eval the ARC team has? The public eval set is slightly easier and may have had a some unintentional data leakage since it was released before their training data cutoff.
coolspot•6mo ago
They have provided repro for the 41.8% result here: https://github.com/QwenLM/Qwen3/tree/main/eval
Alifatisk•6mo ago
Alibaba has been on fire lately, do they even sleep?
esafak•6mo ago
A rhetorical question, I suppose: https://en.wikipedia.org/wiki/996_working_hour_system
Der_Einzige•6mo ago
Folks who know how to train SOTA LLMs dictate their own working conditions. No one is doing 996 there unless they want to.
donedanadone•6mo ago
Evals aside why are American labs not able to release open source models at the same speec?
ttul•6mo ago
The Chinese labs can’t compete on inference scale because they have been prevented from accessing the most efficient chips. But since training is a mere fraction of inference these days, they can at least hurt the American companies that are generating billions via inference services.

If you can’t beat ‘em, at least pour some sand into their moat, giving China some time to perfect its own nanometer-scale fabrication. It’s a society-wide effort.

Eisenstein•6mo ago
They don't release such huge open weights models because people who run open weights don't have the capability to run them effectively. Instead they concentrate on models like Gemma 3 which goes from 1B to 27B, which when quantized fits perfectly into the VRAM you can get on a consumer GPU.
regularfry•6mo ago
That shouldn't be the case here. Yes, it's memory-bandwidth-limited, but this is an MOE with 22B active. As long as the whole thing fits in RAM, it should be tolerable. It's right at the limit, though.
lossolo•6mo ago
> They don't release such huge open weights models because people who run open weights don't have the capability to run them effectively

This is a naive take. There are multiple firms that can host these models for you, or you can host them yourself by renting GPUs. Thousands of firms could also host open-source models independently. They don’t release them because they fear competition and losing their competitive advantage. If it weren’t for Chinese companies open-sourcing their models, we’d be limited to using closed-source, proprietary models from the U.S., especially considering the recent LLaMA fiasco.

Eisenstein•6mo ago
Given the assumption that Google has Google's own interests at heart, the question isn't 'why doesn't Google release models that allow other companies to compete with them' but 'what is the reasoning behind the models they release' and that reasoning is 'for research and for people to use personally on their own hardware'.

We should be asking why Meta released the large Llama models and why the Chinese are releasing large models. I can't figure out a reason for it except prestige.

bugglebeetle•6mo ago
They could, they’re just greedy, self-serving, and short-sighted. China’s doing the AI equivalent of Belt and Road to reap tremendous strategic advantages, as well as encourage large-scale domestic innovation.
adamredwoods•6mo ago
Interesting, Qwen won't answer specific historical events (Tiananmen Square).
OldfieldFund•6mo ago
It's made by Alibaba :)
yunohn•6mo ago
Is it really that interesting to point out for every Chinese oss model release?
mceachen•6mo ago
Is it not relevant to reiterate the bias (or lobotomization) for people new to this space?
lurking_swe•6mo ago
No, it’s not really relevant. Should I point out that all the models from providers in the west are very “left-leaning” every time one is released? Is that helpful to the technical discussion, in any way?

If you are using an LLM for historical knowledge, questions, or research, then the chinese censorship is relevant. Or for questions about geopolitics.

mattnewton•6mo ago
If you had a specific example where the LLMs showed “left leaning” bias, then yes, it would be interesting to me.
lurking_swe•6mo ago
Like i said i normally don’t point this out, but because you asked, here you go:

https://www.gsb.stanford.edu/insights/popular-ai-models-show...

mattnewton•6mo ago
> On average, models from Google and DeepSeek were seen as statistically indistinguishable from neutral, while models from Elon Musk’s xAI, which touts its commitment to unbiased output, were perceived as exhibiting the second-highest degree of left-leaning slant among both Democratic and Republican respondents.

This is somewhat interesting but this is hardly in line with censorship of historical events.

ondra•6mo ago
Yes. The original DeepSeek-R1 answered those questions just fine. The newer models seem to be much more brainwashed.
osti•6mo ago
For the coding benchmarks, does anyone know what are OJBench and CFEval?
OldfieldFund•6mo ago
Impressive evals, but... benchmarks aren't everything.

Put this prompt into qwen3-thinking, and then compare with gemini 2.5 pro:

---

As candidates for creators, we should first address chaos. What is chaos? If for a given event X in A, all possible events can occur in B, and if such independence is universal, we are faced with chaos. If, however, event X in A limits in some way what can occur in B, a relationship exists between A and B. If X in A limits B unequivocally (we flip a switch, the lamp turns on), the relationship between A and B is deterministic. If X in A limits B in such a way that after X in A, events Y or Z can occur in B, where Y occurs 40 times out of 100 after X in A, while Z occurs 60 times, then the relationship between A and B is probabilistic.

---

You have to rewrite the above acting as David Foster Wallace in 2025. Don't mention the year. Make it postmodern. Refer to current and projected events and trends. AI, robotics, etc. you have full creative control. you can make it long if you wish. change every word. make it captivating and witty. You are acting as a demiurge DFW. You need to pass the Turing test here. Sell it to the reader. Write good, high-brow fiction. Avoid phrases that are typical to LLMs/AI writers.

sophia01•6mo ago
Have been using this all morning for some integral-heavy math for my PhD (trying to bound certain analytically intractable integrals). It's a bit hit-or-miss. It's been able to come up with some pretty impressive bounds but also feels like more than half it does some really dumb stuff. Compared to Gemini 2.5 Pro it's pretty solid. Its thought traces are really silly though sometimes: it'll pretend to check websites or "pull out a calculator".