I'll add one more: a LLM small enough that it can be trained from scratch on one A100 in 24 hours. Is it really small if it takes $10,000 to train? Or leave that term for $200 models?
Back to your definitions, there are sub-1B models people are using. I think I saw one in the 400-600M range for audio. Another person posted here a 100M-200M model for extracting data from web pages. We told them to just use a rules-based approach where possible but they believed the SLM worked better.
Then, there's projects like BabyLM that can be useful at 10M:
Maybe resources needed for fine-tuning would be nice to see.
For those reasons, users might want to train a new model from scratch.
Researchers of training methods have a different problem. They need to see whether a new technique, like an optimization algorithm, gets better results. They try them more quickly with less money if they have small, training runs representative of what larger models do. If BabyLM-10M was representative, they could test each technique at the FLOPS/$ of a 10M model instead of a 1B model.
So, both researchers and users might want new models trained from scratch. The cheaper to train, the better.
Could you post a link to this comment or thread. I can't seem to find this model by searching but world love to try it out.
Small: runs in an average laptop not optimized for inference of LLMs, like Gemma 3 4B.
Medium: runs in a very high spec computer that people can buy for less than 5k. 30B, 70B dense models or larger MoEs.
Large: Models that big LLM providers sell as "mini", "flash", ...
Extra Large / SOTA: Gemini 2.5 PRO, Claude 4 Opus, ChatGPT O3, ...
These are typically small and performant both in compute and accuracy/utility from what I've seen.
I think with all the hype at the moment sometimes AI/ML has become too synonymous with LLM
However the brand-new Gemma 3n E2B and E4B models might fit with vision.
But as a hobbyist I would prefer to program in an LLM than learn a bunch of algorithms, and sensor readings. It's also very similar to how I would think about it, making it easier to debug.
The solution for water-constrained operations in the Americas is move to a location with more water, not AI.
For field crops…in the Americas, land and water is too cheap and crop prices are too low to be optimized with AI at the present era. The Americas (10% of world pop) could meet 70% of world food demand if pressed with today’s technologies…40% without breaking a sweat. The Americas are blessed.
Talk to the Saudis, Israel, etc. but, even there, you will lose more production by interfering in the motivations, engagement levels and cultures of working farmers than can be gained by optimizing by any complex opaque technological scheme, AI or no. New cultivars, new chemicals, new machinery even…few problems (but see India for counter examples). Changing millennia of farming practice with expensive, not-locally-maintainable, opaque technology…just no. Great truth learned over the last 70 years of development.
Keep in mind that there are other wireless communication systems that are long range and low power that are specifically designed to handle this scenario
How is that a "language model"?
Working with time series data would work in that case.
This is the problem I have with the general discourse of "AI" even on Hacker News, of all places. Everything you listed is not an example of a *language model*.
All of those can either be implemented as a simple "if", decision tree, decision table, and finally actual ML in the example of cameras and time series predication.
Using an LLM is not just ridiculous here but totally the wrong fit and a waste of resources.
Time and labor are resources too. There's a whole host of problems where "good enough" is tremendously valuable.
100%. It has enough technical details that maybe a human did something. But who knows.
“tiny” can run on a microcontroller, “compact” on a Rpi, “small” on a phone, “medium” on a single GPU machine, “large” on AI class workstation hardware, and “huge” on a data center cluster.
Does this mean without a dedicated electric power plant?
I wanted to say "Right, big-sized. Do you want fries with that?", but I couldn't figure out how to work that in, so I won't say it.
zellyn•8mo ago
api•8mo ago
danielbln•8mo ago
onecommentman•8mo ago
adgjlsfhk1•8mo ago
api•8mo ago
That and running local LLMs pretty much requires an outlet. The GPU goes up to 50 watts. Battery life drops from many hours to less than one.
adgjlsfhk1•8mo ago
Also, when your laptop is using 25W just for the RAM, that's ~20 watts less that the CPU/GPU get when you want to power up the CPU/GPU.
Maxious•8mo ago
mark_l_watson•8mo ago