frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Qwen3-Next

https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancement...
135•tosh•3h ago•45 comments

Examples from The LaTeX Companion book (3rd edition)

https://ctan.org/pkg/tlc3-examples
15•teleforce•1h ago•2 comments

Float Exposed

https://float.exposed/
228•SomaticPirate•10h ago•53 comments

Debian 13, Postgres, and the US time zones

https://rachelbythebay.com/w/2025/09/11/debtz/
125•move-on-by•7h ago•49 comments

Top model scores may be skewed by Git history leaks in SWE-bench

https://github.com/SWE-bench/SWE-bench/issues/465
398•mustaphah•15h ago•125 comments

Using Emacs Org-Mode With Databases: A getting-started guide

https://gitlab.com/ryanprior/emacs-org-data-starter
35•adityaathalye•3d ago•3 comments

Claude’s memory architecture is the opposite of ChatGPT’s

https://www.shloked.com/writing/claude-memory
347•shloked•15h ago•183 comments

Classic GTK1 GUI Library

https://gitlab.com/robinrowe/gtk1
9•MaximilianEmel•3d ago•1 comments

Logging in Go with Slog: A Practitioner's Guide

https://www.dash0.com/guides/logging-in-go-with-slog
12•ayoisaiah•3d ago•4 comments

Show HN: I made a generative online drum machine with ClojureScript

https://dopeloop.ai/beat-maker/
4•chr15m•1h ago•1 comments

Doorbell prankster that tormented residents of apartments turns out to be a slug

https://www.theguardian.com/world/2025/sep/08/doorbell-prankster-that-tormented-residents-of-germ...
187•robin_reala•3d ago•91 comments

AirPods live translation blocked for EU users with EU Apple accounts

https://www.macrumors.com/2025/09/11/airpods-live-translation-eu-restricted/
346•thm•22h ago•398 comments

XFN – XHTML Friends Network (2003)

https://gmpg.org/xfn/
29•thinkingemote•4d ago•6 comments

Building my childhood dream PC

https://fabiensanglard.net/2168/
144•joexbayer•4d ago•43 comments

Our website looks like an operating system

https://posthog.com/blog/why-os
356•bnc319•10h ago•259 comments

Behind the scenes of Bun Install

https://bun.com/blog/behind-the-scenes-of-bun-install
386•Bogdanp•21h ago•128 comments

Show HN: C++ Compiler Support Page

https://cppstat.dev
44•cemdervis•4d ago•11 comments

Toddlerbot: Open-Source Humanoid Robot

https://toddlerbot.github.io/
66•base698•10h ago•14 comments

Rails on SQLite: new ways to cause outages

https://andre.arko.net/2025/09/11/rails-on-sqlite-exciting-new-ways-to-cause-outages/
155•ingve•15h ago•47 comments

Samsung taking market share from Apple in U.S. as foldable phones gain momentum

https://www.cnbc.com/2025/08/16/samsungs-us-market-share-apple-rivalry-foldable-phones.html
223•mgh2•1d ago•263 comments

Full Moon: Seestar S50 vs. Samsung S25

https://www.4rknova.com//blog/2025/09/08/moon-photos
31•ibobev•3d ago•26 comments

Bulletproof host Stark Industries evades EU sanctions

https://krebsonsecurity.com/2025/09/bulletproof-host-stark-industries-evades-eu-sanctions/
181•todsacerdoti•16h ago•68 comments

Danish supermarket chain is setting up "Emergency Stores"

https://swiss.social/@swaldorff/115186445638788782
279•sohkamyung•11h ago•264 comments

From burner phones to decks of cards: NYC teens adjusting to the smartphone ban

https://gothamist.com/news/from-burner-phones-to-decks-of-cards-nyc-teens-are-adjusting-to-the-sm...
238•geox•20h ago•184 comments

CRISPR offers new hope for treating diabetes

https://www.wired.com/story/no-more-injections-crispr-offers-new-hope-for-treating-diabetes/
213•manveerc•20h ago•55 comments

The challenge of maintaining curl

https://lwn.net/Articles/1034966/
131•signa11•8h ago•35 comments

Conway's Game of Life, but musical

https://www.hudsong.dev/digital-darwin
189•hudsongr•20h ago•32 comments

NT OS Kernel Information Disclosure Vulnerability

https://www.crowdfense.com/nt-os-kernel-information-disclosure-vulnerability-cve-2025-53136/
137•voidsec•18h ago•29 comments

‘Robber bees’ invade apiarist’s shop in attempted honey heist

https://www.cbc.ca/news/canada/british-columbia/robber-bees-terrace-bc-apiary-1.7627532
141•lemonberry•17h ago•77 comments

A Web Framework for Zig

https://www.jetzig.dev/
123•nivethan•16h ago•16 comments
Open in hackernews

Qwen3-Next

https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list
135•tosh•3h ago

Comments

Jgoauh•2h ago
Seems impressive, i believe better architectures are really the path forward, i don't think you need more than 100B params taking this model and what GPT OSS 120B can acchieve
NitpickLawyer•1h ago
New arch seems cool, and it's amazing that we have these published in the open.

That being said, qwen models are extremely overfit. They can do some things well, but they are very limited in generalisation, compared to closed models. I don't know if it's simply scale, or training recipes, or regimes. But if you test it ood the models utterly fail to deliver, where the closed models still provide value.

vintermann•1h ago
Could you give some practical examples? I don't know what Qwen's 36T-token training set is like, so I don't know what it's overfitting to...
NitpickLawyer•1h ago
Take math and coding for example:

- in math, if they can solve a problem, or a class of problems, they'll solve it. If you use a "thinking" model + maj@x, you'll get strong results. But if you try for example to have the model consider a particular way or method of exploring a problem, it'll default to "solving" mode. It's near impossible to have it do something else with a math problem, other than solving it. Say "explore this part, in this way, using this method". Can't do it. It'll maybe play a bit, but then enter "solving" mdoe and continue to solve it as it was trained.

In practice, this means that "massive parallel" test time compute becomes harder to do with these models, because you can't "guide" them towards certain aspects of a problem. They are extremely "stubborn".

- in coding it's even more obvious. Ask them to produce any 0shot often tested and often shown things (spa, game, visualisation, etc) - and they do it. Convincingly.

But ask them to look at a piece of code and extract meaning, and they fail. Or ask them to reverse an implementation. Figure out what a function does and reverse its use, or make it do something else, and they fail.

elbear•1h ago
It sounds like some people.
vintermann•54m ago
Oof, that sounds frustrating. Yeah, I can relate to this failure mode, it's basically "did you mean (more likely query)" up to 11.

It does sound like an artifact of the dialog/thinking tuning though.

croemer•2h ago
ERR_NAME_NOT_RESOLVED
croemer•2h ago
https://archive.is/JH9XL
jychang•1h ago
Coolest part of Qwen3-Next, in my opinion, (after the linear attention parts) is that they do MTP without adding another un-embedding matrix.

Deepseek R1 also has a MTP layer (layer 61) https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/mod...

But Deepseek R1 adds embed_tokens and shared_head.head tensors, which are [129280, 7168] or about 2GB in size at FP8.

Qwen3-Next doesn't have that: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct/blob...

So it saves a few GB in active parameters for MTP, which is a Big Deal. This is one of the changes that helps significantly speeds up inference.

puilp0502•1h ago
What kind of benefit does Multi-Token Prediction bring to the inference side? Is it only relevant in pretraining efficiency?
rfoo•1h ago
It could be a better speculative model than separately trained EAGLE etc for speculative decoding.
jychang•1h ago
Speculative decoding! It makes inference a LOT faster.

Instead of generating tokens one at a time, you generate the second one as well, and then use speculative decoding on that second token (instead of having it be produced by a draft model like Qwen 0.6b). If the token is checked and is correct, then the 2nd token gets generated MUCH faster.

If it's wrong, you have to generate it again the normal way (a lot slower than just checking it). Usually, it's correct, so inference is a lot faster.

moffkalast•11m ago
Hmm but isn't the checking only required because the draft model is not the same model and can only speculate what the main one is thinking, hence the name? If the main model generates two tokens itself, then how can it be wrong about its own predictions?
SonOfLilit•4m ago
If you ask me to guess an answer, I'll _usually_ produce the same answer as if I had time to think about it deeply, but not always...
slimebot80•1h ago
Complete newbie here - some questions, if I may!

This stuff can run on a local machine without internet access, correct?

And it can pretty much match Nano Banana? https://github.com/PicoTrex/Awesome-Nano-Banana-images/blob/...

Also -- what are the specs for a machine to run it (even if slowly!)

Davidzheng•1h ago
Isn't this one a text model
slimebot80•1h ago
Ah, maybe! I am lost reading this page with all the terminology
arcanemachiner•54m ago
You'll get used to it.

Make sure to lurk on r/LocalLlama.

prawel•1h ago
what you mean is Qwen Image and Qwen Image Edit, you can run it on local machine, using Draw Things application for example.

the model discussed here is text model, so similar to ChatGPT. You can also run it on your local machine, but not yet, as apps need to be updated with Qwen 3 Next support (llama.cpp, Ollama, etc)

NitpickLawyer•1h ago
This model can be run completely offline, yes. You'll need anywhere from 60-200 gb of RAM (either VRAM for high speeds, or a combination of VRAM and RAM, or just CPU+RAM). The active params are really low (3B) so it'll likely run fine even on CPU. Should get 10-15+t/s even on old DDR4 systems. Offload some experts to a GPU (can be as low as 8-16gb) and you'll see greater speeds.

This has nothing to do with nano banana, or image generation. For that you want the qwen image edit[1] models.

1 - https://huggingface.co/Qwen/Qwen-Image-Edit

dragonwriter•1h ago
> This stuff can run on a local machine without internet access, correct?

Yes.

> And it can pretty much match Nano Banana?

No, Qwen3-Next is not a multimodal model, it has no image generation function.

mynti•1h ago
For anyone curious about what the Gated Delta Network is: https://arxiv.org/pdf/2412.06464
yorwba•1h ago
Also, Gated Attention: https://arxiv.org/abs/2505.06708
yekanchi•1h ago
how much vram it requires?
DiabloD3•1h ago
Thats not a meaningful question. Models can be quantized to fit into much smaller memory requirements, and not all MoE layers (in MoE models) have to be offloaded to VRAM to maintain performance.
yekanchi•1h ago
i mean 4bit quantized. i can roughly calculate vram for dense models by model size. but i don't know how to do it for MOE models?
EnPissant•1h ago
MoE models need just as much VRAM as dense models because every token may use a different set of experts. They just run faster.
regularfry•46m ago
This isn't quite right: it'll run with the full model loaded to RAM, swapping in the experts as it needs. It has turned out in the past that experts can be stable across more than one token so you're not swapping as much as you'd think. I don't know if that's been confirmed to still be true on recent MoEs, but I wouldn't be surprised.
EnPissant•29m ago
What you are describing would be uselessly slow and nobody does that.
NitpickLawyer•1h ago
A good rule of thumb is to think that one param is one unit of storage. The "default" unit of storage these days is bf16 (i.e. 16 bits for 1 weight). So for a 80B model that'll be ~160GB of weights. Then you have quantisation, usually in 8bit and 4bit. That means each weight is "stored" in 8bits or 4bits. So for a 80B model that'll be ~80GB in fp8 and ~40GB in fp4/int4.

But in practice you need a bit more than that. You also need some space for context, and then for kv cache, potentially a model graph, etc.

So you'll see in practice that you need 20-50% more RAM than this rule of thumb.

For this model, you'll need anywhere from 50GB (tight) to 200GB (full) RAM. But it also depends how you run it. With MoE models, you can selectively load some experts (parts of the model) in VRAM, while offloading some in RAM. Or you could run it fully on CPU+RAM, since the active parameters are low - 3B. This should work pretty well even on older systems (DDR4).

theanonymousone•1m ago
But the RAM+VRAM can never be less than the size of the total (not active) model, right?
irthomasthomas•1h ago
llm -m qwen3-next-80b-a3b-thinking "An ASCII of spongebob"

Here's a classic ASCII art representation of SpongeBob SquarePants:

     .------.
    /  o o  \
   |         |
   |  \___/  |
    \_______/


  llm -m chutes/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  "An ASCII of spongebob"

  Here's an ASCII art of SpongeBob SquarePants:

      .--..--..--..--..--..--.
    .' \  (`._   (_)     _   \
  .'    |  '._)         (_)  |
  \ _.')\      .----..--.'  /
  |(_.'  |    /    .-\-.  \---.
  \     0|    |   ( O| O) |    |
   |  _  |  .--.____.'._.-.  |
   \ (_) | o         -` .-`  |
    |    \   |`-._ _ _ _ _\ /
    \    |   |  `. |_||_|   |
     | o |    \_       \    |     -.   .-.
     |.-.  \    `--..-'   O |     `.`-' .'
   _.'  .' |     `-.-'      /-.__   ' .-'
  .' `-.` '.|='=.='=.='=.='=|._/_ `-'.'
  `-._  `.  |________/\_____|    `-.'
     .'   ).| '=' '='\/ '=' |
     `._.`  '---------------'
             //___\   //___\
               ||       ||
               ||_.-.   ||_.-.
              (_.--__) (_.--__)


Meta: I generated a few dozen spongebobs last night on the same model and NONE where as good as this. Most started well but collapsed into decoherence at the end - missing the legs off. Then this morning the very same prompt to the same model API produced a perfect bob on the first attempt. Can utilization affect response quality, if all else remains constant? Or was it just random luck?

Edit: Ok, the very next attempt, a few minutes later, failed, so I guess it is just random, and you have about a 1 in 10 chance of getting a perfect spongebob from qwen3-coder, and ~0 chance with qwen3-next.

dev_hugepages•1h ago
memorized: https://www.asciiart.eu/cartoons/spongebob-squarepants
ginko•1h ago
Conveniently removed the artist's signature though.
eurekin•59m ago
Certainly not defending LLMs here, don't mistake with that.

Humans do it too. I have given up on my country's non-local information sources, because I could recognize original sources that are being deliberately omitted. There's a satiric webpage that is basically a reddit scrape. Most of users don't notice and those who do, don't seem to care.

yorwba•54m ago
Yes, the most likely reason the model omitted the signature is that humans reposted more copies of this image omitting the signature than ones that preserve it.
irthomasthomas•54m ago
Yes - they all do that. Actually, most attempts start well but unravel toward the end.

  llm -m chutes/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  "An ASCII of spongebob"
  Here's an ASCII art of SpongeBob SquarePants:
  
  ```
      .--..--..--..--..--..--.
    .' \  (`._   (_)     _   \
  .'    |  '._)         (_)  |
  \ _.')\      .----..--.   /
  |(_.'  |    /    .-\-.  \
  \     0|    |   ( O| O) |
   |  _  |  .--.____.'._.-.
   /.' )  | (_.' .-'"`-. _.-._.-.--.-.
  / .''.  |  .' `-. .-'-. .-'"`-.`-._)
   .'.' |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  ```
irthomasthomas•1h ago
Naturally. That's how LLMs work. During training you measure the loss, the difference between the model output and the ground-truth and try to minimize it. We prize models for their ability to learn. Here we can see that the large model does a great job at learning to draw bob, while the small model performs poorly.
endymion-light•57m ago
I'd argue that actually, the smaller model is doing a better job at "learning" - in that it's including key characteristics within an ascii image while poor.

The larger model already has it in the training corpus so it's not particularly a good measure though. I'd much rather see the capabilities of a model in trying to represent in ascii something that it's unlikely to have in it's training.

Maybe a pelican riding a bike as ascii for both?

keyle•1h ago
For a model that can run offline, they've nailed how the website can too.

And it appears like it's thinking about it! /s

syntaxing•1h ago
The craziest part is how far MoE has come thanks to Qwen. This beats all those 72B dense models we’ve had before and runs faster than 14B model depending on how you off load your VRAM and CPU. That’s insane.
moffkalast•7m ago
In retrospect it's actually funny that last year Meta spent so many resources training a dense 405B model that both underperforms compared to models a tenth its size and is impossible to run at a reasonable speed on any hardware in existence.
pveierland•1h ago
> "The content loading failed."

It's amazing how far and how short we've come with software architectures.

techsystems•37m ago
How does the context length scaling at 256K tokens compare to Llama's 1M in terms of performance? How are the contexts treated differently?
jwr•6m ago
Hmm. 80B. These days I am on the lookout for new models in the 32B range, since that is what fits and runs comfortably on my MacBook Pro (M4, 64GB).

I use ollama every day for spam filtering: gemma3:27b works great, but I use gpt-oss:20b on a daily basis because it's so much faster and comparable in performance.