It's crazy. In a few years we will be able to buy Qwen on a chip, doing 10K tokens per second.
androiddrew•57m ago
Yeah, well might just come on your new laptop
bradleyy•27m ago
Or your phone.
comandillos•1h ago
This is still far away from being viable for actually useful models, like bigger MoE ones with much larger context windows. I mean, the technology is very promising just like Cerebras, but we need to see whether they are able to keep up this with the evolution of the models to come in the next few years. Extremely interesting nevertheless.
spzb•1h ago
Is this a paid ad placement? I'm seeing a load of breathless "commentary" on Taalas and next to no serious discussion about whether their approach is even remotely scalable. A one-off tech demo using a comparatively ancient open source model is hardly going to be giving Jensen Huang sleepless nights.
androiddrew•1h ago
Give me a 120B dense model on one of these and yeah my API use will probably drop.
amelius•1h ago
androiddrew•57m ago
bradleyy•27m ago