[I guess that must be a useful market niche though, apparently this is by a company selling batch compute on exactly those small open weights models.]
The problem is the author is evaluating by dividing the Artificial Analysis score by a blended cost per token, but most tasks have an intelligence "floor" below which it doesn't matter how cheap something is, it will never succeed. And when you strip out the very high results from super cheap 4B OSS models the rest are significantly outclassed by Flash 2.0 (not on his chart but still worth considering) and 2.5, not to mention other models that might be better in domain specific tasks like grok-3 mini for code.
(Nobody should be using Haiku in 2025. The OpenAI mini models are not as bad as Haiku in p/p and maybe there is a use case for prefering one over Flash but if so I don't know what it is.)
(This is a big advantage of open weight models; even if they're too big to host yourself, if it's worth anything there's a lot of competition for inference)
you should probably use grok 3 mini if you want "cheapest model that is reasonably good at code"
ramesh31•6mo ago
behnamoh•6mo ago
grepfru_it•6mo ago
Aeolun•6mo ago
ekianjo•6mo ago
jbellis•6mo ago
lostmsu•6mo ago
jbellis•6mo ago
cootsnuck•6mo ago
grepfru_it•6mo ago
jacob019•6mo ago
mkl•6mo ago
mgraczyk•6mo ago
mkl•6mo ago
shmoogy•6mo ago
I definitely do appreciate and believe in the value of open source / open weight LLMs - but inference is so cheap right now for non frontier models.
cortesoft•6mo ago
genewitch•6mo ago
I opened aider and gave a small prompt, roughly:
That's it. Several hours later, it finished. The game ran. It was worth it because this was in the winter and it heated my house a bit, yay. I think the resulting 1-shot output is on my github.I know it was in the training set, etc, but I wanted to see how big of a hassle it was, if it would 1-shot with such a small prompt, how long it would take.
Makes me want to try deepseek 671B, but I don't have any machines with >1TB of memory.
I do take donations of hardware.
mechagodzilla•6mo ago
3036e4•6mo ago
xfalcox•6mo ago
diggan•6mo ago
ChromaticPanic•6mo ago
diggan•6mo ago
oooyay•6mo ago
diggan•6mo ago
> I may be spoiled in having worked for companies that have ML
Sounds likely, yeah, how many companies have ML departments today? DS departments seem common, but ML i'm not too sure about
fourthark•6mo ago
achierius•6mo ago
pegasus•6mo ago
cortesoft•6mo ago
dTal•6mo ago
We already went through this with https everywhere. Previously, encryption was considered "only for sensitive data".