It's not that they trained a new model, but they took an existing model and RL'd it a bit?
The scores are very close to QwQ-32B, and at the end:
"Overall, as QwQ-32B was already extensively trained with RL, it was difficult to obtain huge amounts of generalized improvement on benchmarks beyond our improvements on the training dataset. To see stronger improvements, it is likely that better base models such as the now available Qwen3, or higher quality datasets and RL environments are needed."
Personal story time: I met a couple of their engineers at an event a few months back. They mentioned they were building a distributed training system for LLMs.
I asked them how they were building it and they mentioned Python. I said something along the lines of “not to be the typical internet commenter guy, but why aren’t you using something like Rust for the distributed system parts?”
They mumbled something about Python as the base for all current LLMs, and then kinda just walked away…
From their article: > “Rust-based orchestrator and discovery service coordinate permissionless workers”
Glad to see that I wasn’t entirely off-base :)
Someone intentionally invoking that history is interesting indeed. Someone doing it by accident might be more so. But I already gave that choice the name I judge it deserves.
./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99
Also it's best to read https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-e... on sampling issues for QwQ based models.
Or TLDR, use the below settings:
./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99 --temp 0.6 --repeat-penalty 1.1 --dry-multiplier 0.5 --min-p 0.00 --top-k 40 --top-p 0.95 --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"
Maybe this could be used as proof of work? To stop wasting computing resources in crypto currencies and get something useful as a byproduct.
Bitcoin is the only major cryptocurrency that still use proof of work today (others are either using “proof of stakes” or are “Layer 2” chains), and due to its (relative lack of) governance structure, it's very unlikely to ever change.
Without the ability to validate that training compute is heading in the globally desired direction, it is unlikely you could use it as the foundation of a (sound) cryptocurrency.
And like, what are you doing? You've managed to find a use case where you don't care that you're doing compute on some untrusted servers online (and no, there's no magic AI homomorphic encryption) but at the same time you're willing to accept the latency of doing the work multiple times AND it's probably all low end 4090s doing the work AND you're willing to pay for the wasted compute? I'm here shuddering at the thought of model setup times when one node in a cluster goes down and you're facing that on... well, probably most inferences? If you're not administering the infra, you get the lowest common denominator of performance.
The model breaks where work can be counterfeited (usually impossible) or where energy prices go to zero, which is why "bitcoin colonialism" was briefly a thing last decade. Much of bitcoin's design, this aspect also, is intended to protect against the bare-fanged, red-eyed money weasels it was also designed to attract.
Wrappers on candy don’t have value intrinsically but improve the quality of the candy.
Not totally convinced the analogy maps but interesting.
Still I do think there's some validity to the comparison. Fiat currencies are not backed by "nothing." They are backed by a state. Some percentage of the cost of operating a state is therefore "work" done to back the currency's value.
The question is: if we had a cryptocurrency backed by digital PoW that scaled to the level of fiat currencies (millions of transactions per second) and had some of their other desirable characteristics, would the state be able to proportionally shrink? That's what I'm not convinced of, but it'd be an interest experiment if we could spin up another universe and try it.
There's nothing provable here. Crypto proof of work is easily verified (does the hash of this value look the way I expect?). How do you prove in ~O(1) time that someone did some operation with their GPU? You don't. You don't even know what the thing is that you're training (without a trained model you don't have the ability to know whether the model the was allegedly trained learned the thing you want it to learn).
The work in this case could be that the weights after the was done work have lower loss than the input weights. Applying the new weights to input to check that it's lower is much cheaper than calculating the weights, which is the same trend as proof of work (not sure about the magnitude of difficulty being enough to replace proof of work though).
- Minimizing loss could be a useful heuristic on a base model. Here, we expect the distribution to be different as we are only doing RL. Measuring loss means we're measuring the difference against the base model inputs: a non-goal, we expect reasoning post RL-training to look quite different from a web scrape.
Let's set that aside. Let's say lower loss = model improved.
- Checking the loss requires the entire dataset used to train the base model + forward pass. That’s O(N·d) where N is samples, d is model size. This takes us from "cool demo of RL can be done on the edge with little benefit" to "we're shipping around terabytes of data constantly among clients"
- Proof of work as a technical term is different from proof of work as a colloquial term: the former is a cryptographic puzzle whose solution is universally and instantly checkable, while the latter just means “I can show I did something,” with no strict guarantee or uniqueness. Randomly perturbing one parameter could show "proof of work" without the work we actually wanted to be done, being done.
- Early in base model training, shaving 0.01 off the loss is easy. Later, impossible. In an RL environment, we're expecting some to go bad. In our interpretation of "loss decrease means model better means you did work", that would mean loss would increase -- that is how it learns in an RL environment. However, that does not mean no work is done.
I think what matters most is that the verification is much, much cheaper than the calculation itself to prove that work was done, it doesn't explicitly have to be O(1), eg. the magnitude difference has to exceed a certain threshold to make proof of work viable.
esafak•2d ago
nsingh2•2d ago
> based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers
https://github.com/PrimeIntellect-ai/toploc
xmasotto•2d ago
At a glance it looks like something akin to a computing a checksum that's locality sensitive, so it's robust to floating point errors, etc.
What's to stop someone from sending bad data + a matching bad checksum?
yorwba•2d ago
The checksum is validated by redoing the computation, but making use of the fact that you already have the entire response to enable greater parallelism than when generating it one token at a time.
DoctorOetker•14h ago