I legitimately dont understand why anyone would want a 4B model.
They might as well call all models 4B and smaller after psychedelics because they be hallucinating.
hammyhavoc•6h ago
And yet so many people on HN are adamant that the more tokens, the better, and it's all just a matter of throwing more money at it, and it's inevitable it will somehow "get better", because there's "so much money riding on it".
I wonder when the penny will drop?
incomingpain•5h ago
I wonder if my understanding is flawed. I've tested this using lm studio. Lots of dials are involved.
incomingpain•3h ago
I was just testing this a bit more.
I grabbed qwen3:4b. Cranked it to the max of 32k tokens.
It's fast to be sure, and im struggling to get it to hallucinate; but it is giving me a ton of 'The provided context does not include specifics'
Resource-wise its like running 12-16B, but faster. But soon as you expand the 12B to like 10k tokens it's clearly better for barely anymore resources.
mdaniel•5h ago
Plus, this specific use case is also "to detect legally relevant text like license declarations in code and documentation" so I guess they really bought into that regex adage about "and now you have two problems" and thought they'd introduce some floating point math instead
incomingpain•7h ago
They might as well call all models 4B and smaller after psychedelics because they be hallucinating.
hammyhavoc•6h ago
I wonder when the penny will drop?
incomingpain•5h ago
incomingpain•3h ago
I grabbed qwen3:4b. Cranked it to the max of 32k tokens.
It's fast to be sure, and im struggling to get it to hallucinate; but it is giving me a ton of 'The provided context does not include specifics'
Resource-wise its like running 12-16B, but faster. But soon as you expand the 12B to like 10k tokens it's clearly better for barely anymore resources.
mdaniel•5h ago