[I guess that must be a useful market niche though, apparently this is by a company selling batch compute on exactly those small open weights models.]
The problem is the author is evaluating by dividing the Artificial Analysis score by a blended cost per token, but most tasks have an intelligence "floor" below which it doesn't matter how cheap something is, it will never succeed. And when you strip out the very high results from super cheap 4B OSS models the rest are significantly outclassed by Flash 2.0 (not on his chart but still worth considering) and 2.5, not to mention other models that might be better in domain specific tasks like grok-3 mini for code.
(Nobody should be using Haiku in 2025. The OpenAI mini models are not as bad as Haiku in p/p and maybe there is a use case for prefering one over Flash but if so I don't know what it is.)
(This is a big advantage of open weight models; even if they're too big to host yourself, if it's worth anything there's a lot of competition for inference)
you should probably use grok 3 mini if you want "cheapest model that is reasonably good at code"
ramesh31•8mo ago
behnamoh•8mo ago
grepfru_it•8mo ago
Aeolun•8mo ago
ekianjo•8mo ago
jbellis•8mo ago
lostmsu•8mo ago
jbellis•8mo ago
cootsnuck•8mo ago
grepfru_it•8mo ago
jacob019•8mo ago
mkl•8mo ago
mgraczyk•8mo ago
mkl•8mo ago
shmoogy•8mo ago
I definitely do appreciate and believe in the value of open source / open weight LLMs - but inference is so cheap right now for non frontier models.
cortesoft•8mo ago
genewitch•8mo ago
I opened aider and gave a small prompt, roughly:
That's it. Several hours later, it finished. The game ran. It was worth it because this was in the winter and it heated my house a bit, yay. I think the resulting 1-shot output is on my github.I know it was in the training set, etc, but I wanted to see how big of a hassle it was, if it would 1-shot with such a small prompt, how long it would take.
Makes me want to try deepseek 671B, but I don't have any machines with >1TB of memory.
I do take donations of hardware.
mechagodzilla•8mo ago
3036e4•8mo ago
xfalcox•8mo ago
diggan•8mo ago
ChromaticPanic•8mo ago
diggan•8mo ago
oooyay•8mo ago
diggan•8mo ago
> I may be spoiled in having worked for companies that have ML
Sounds likely, yeah, how many companies have ML departments today? DS departments seem common, but ML i'm not too sure about
fourthark•8mo ago
achierius•8mo ago
pegasus•8mo ago
cortesoft•8mo ago
dTal•8mo ago
We already went through this with https everywhere. Previously, encryption was considered "only for sensitive data".