I'm glad to see more domain-focused SLMs, we need more of them! A programming focused MoE should work well across many languages.
Could you teach a 5 year old to drive a car? A 10 year old? A 12 year old? To drive a car requires being able to read, to have judgement about ice or rainy conditions, to anticipate a child running after a ball. By the time a human in in their mid teens they have acquired the base knowledge...
Small models need to have enough base knowledge to be able to be good enough -- even in a seemingly narrow regime. Where is that? Obviously they don't need all the obscure knowledge of a frontier model but there is some base level which is probably more than it would first seem.
aero2146•1h ago
realitysballs•1h ago
pylotlight•36m ago
physPop•1h ago
websap•1h ago
tyre•41m ago
It would look really dumb if someone asked it that, but that's fine. You're trying to make a model that is optimized for efficiency for a specific task. As much as possible, you should prune uncorrelated things.
pylotlight•35m ago
steve_adams_86•14m ago
fwipsy•49m ago
> these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios.
pylotlight•36m ago
nsingh2•12m ago
btown•11m ago