I took some part of the headline from the top comment from there
The whole complete 2 paragraph message below is what they wrote (Credits to Few_Painter_5588)
We introduce LongCat-Flash-Lite, a non-thinking 68.5B parameter Mixture-of-Experts (MoE) model with approximately 3B activated parameters, supporting a 256k context length through the YaRN method. Building upon the LongCat-Flash architecture, LongCat-Flash-Lite distinguishes itself through the integration of an N-gram embedding table designed to enhance both model performance and inference speed. Despite allocating over 30B parameters to embeddings, LongCat-Flash-Lite not only outperforms parameter-equivalent MoE baselines but also demonstrates exceptional competitiveness against existing models of comparable scale, particularly in the agentic and coding domains.
To my knowledge, this is the first proper openweight model of this size that uses N-gram embedding and it seems to have boosted this model's performance quite substantially. Imagine what deepseek v4 could be if it used this technique
Imustaskforhelp•1h ago
I took some part of the headline from the top comment from there
The whole complete 2 paragraph message below is what they wrote (Credits to Few_Painter_5588)
To my knowledge, this is the first proper openweight model of this size that uses N-gram embedding and it seems to have boosted this model's performance quite substantially. Imagine what deepseek v4 could be if it used this technique