Of course an increase in both is the optimal, but a small sacrifice in performance/accuracy for being 200% faster is worth noting. Around 10% drop in accuracy for 200% speed-up, some would take it!
"Due to the strict new guidelines of the EU AI Act that take effect on August 2nd 2025, we recommend that each R1T/R1T2 user in the EU either familiarizes themselves with these requirements and assess their compliance, or ceases using the model in the EU after August 1st, 2025."
Doesn't the deepseek licence completely forbid any use in the EU already? How can a german company legally build this in the first place (which they presumably did)?
Care to explain?
https://deepseeklicense.github.io/
https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICE...
Calling us a lab is not quite right, we are a consulting company.
But hacking is not just limited to in between placements, everybody has (at least) 2 days per month to do that, regardless of any work for customers.
Also, since AI is such a strategically important topic, we have a team that just works on AI stuff internally. That’s where R1T and R1T2 come from.
We need about three orders of magnitude more tests to make these numbers meaningful.
Anecdotally, I can say that my personal experience with the model is in line with what the benchmarks claim: It’s a bit smarter than R1, a bit faster than R1, much faster than R1-0528, but not quite as smart. (Faster meaning less output tokens). For me, it’s at a sweet spot and I use it as daily driver.
Keep in mind, the speed improvement doesn’t come from the model running any faster (it’s the exact same architecture as R1, after all) but from using less output tokens while still achieving very good results.
UrineSqueegee•6h ago
yorwba•5h ago
It produces 60% fewer output tokens than R1-0528 and scores about 10% higher on their benchmark than R1.
So it's a way to turn R1-0528, which is better than R1 but slower, into a model that's worse than R1-0528 but better and faster than R1.
saubeidl•5h ago