I'll add one more: a LLM small enough that it can be trained from scratch on one A100 in 24 hours. Is it really small if it takes $10,000 to train? Or leave that term for $200 models?
Back to your definitions, there are sub-1B models people are using. I think I saw one in the 400-600M range for audio. Another person posted here a 100M-200M model for extracting data from web pages. We told them to just use a rules-based approach where possible but they believed the SLM worked better.
Then, there's projects like BabyLM that can be useful at 10M:
Maybe resources needed for fine-tuning would be nice to see.
For those reasons, users might want to train a new model from scratch.
Researchers of training methods have a different problem. They need to see whether a new technique, like an optimization algorithm, gets better results. They try them more quickly with less money if they have small, training runs representative of what larger models do. If BabyLM-10M was representative, they could test each technique at the FLOPS/$ of a 10M model instead of a 1B model.
So, both researchers and users might want new models trained from scratch. The cheaper to train, the better.
Small: runs in an average laptop not optimized for inference of LLMs, like Gemma 3 4B.
Medium: runs in a very high spec computer that people can buy for less than 5k. 30B, 70B dense models or larger MoEs.
Large: Models that big LLM providers sell as "mini", "flash", ...
Extra Large / SOTA: Gemini 2.5 PRO, Claude 4 Opus, ChatGPT O3, ...
These are typically small and performant both in compute and accuracy/utility from what I've seen.
I think with all the hype at the moment sometimes AI/ML has become too synonymous with LLM
However the brand-new Gemma 3n E2B and E4B models might fit with vision.
But as a hobbyist I would prefer to program in an LLM than learn a bunch of algorithms, and sensor readings. It's also very similar to how I would think about it, making it easier to debug.
How is that a "language model"?
Working with time series data would work in that case.
This is the problem I have with the general discourse of "AI" even on Hacker News, of all places. Everything you listed is not an example of a *language model*.
All of those can either be implemented as a simple "if", decision tree, decision table, and finally actual ML in the example of cameras and time series predication.
Using an LLM is not just ridiculous here but totally the wrong fit and a waste of resources.
100%. It has enough technical details that maybe a human did something. But who knows.
“tiny” can run on a microcontroller, “compact” on a Rpi, “small” on a phone, “medium” on a single GPU machine, “large” on AI class workstation hardware, and “huge” on a data center cluster.
Does this mean without a dedicated electric power plant?
I wanted to say "Right, big-sized. Do you want fries with that?", but I couldn't figure out how to work that in, so I won't say it.
zellyn•7h ago
api•6h ago
danielbln•4h ago
Maxious•5h ago