Open Source @Github

fp.

Ask HN: Small LM or API?

4•ostefani•1h ago

Is small language models still worth it in 2026, or are most people just using APIs now?

Comments

politelemon•1h ago

Depends on what you're using it for, a small model could be viable as long as you're willing to absorb the maintenence overheads of running and deploying your own inference. A simple API would be much more cost effective especially if there are scaling requirements and time constraints.

ostefani•38m ago

Use for support chat bot. I see a lot of open source models. Not sure if it's worth it. Reply via API from LLM should be better?

JaceDev•52m ago

tbh small lms are better if u

ostefani•37m ago

But you need to host it? Small model will provide worse results?

ok_computer_•21m ago

Gemma 4 dropped two days ago and it's a pretty direct answer to this question. Google DeepMind built it explicitly for local deployment, the 26B MoE activates only 3.8B parameters during inference (so it runs at roughly 4B cost while hitting near-31B benchmark quality), and the smaller E4B variant runs fully offline on an 8GB laptop. The 31B Dense currently ranks third among all open models on the Arena AI leaderboard. The quality-per-parameter gap between local and cloud is closing faster than most people expected.

That said, "worth it" still depends heavily on your hardware. A 4070 Ti gets you a very different answer than a 3060.

Disclosure: I'm building localllm-advisor.com, free and client-side, which also helps answer these types of questions. It shows which models fit your GPU with quantization options and estimated tok/s, or which GPU you'd need to run a specific model. Relevant to the question so I'm mentioning it, but take it for what it is.