Why are your models so big? (2023)

38•jxmorris12•2mo ago

Comments

siddboots•2mo ago

I think I have almost the opposite intuition. The fact that attention models are capable of making sophisticated logical constructions within a recursive grammar, even for a simple DSL like SQL, is kind of surprising. I think it’s likely that this property does depend on training on a very large and more general corpus, and hence demands the full parameter space that we need for conversational writing.

semiinfinitely•2mo ago

I don’t understand why today’s laptops are so large. Some of the smallest "ultrabooks" getting coverage sit at 13 inches, but even this seems pretty big to me.

If you need raw compute, I totally get it. Things like compiling the Linux kernel or training local models require a high level of thermal headroom, and the chassis has to dissipate heat in a manner that prevents throttling. In cases where you want the machine to act like a portable workstation, it makes sense that the form factor would need to be a little juiced up.

That said, computing is a whole lot more than just heavy development work. There are some domains that have a tightly-scoped set of inputs and require the user to interact in a very simple way. Something like responding to an email is a good example — typing "LGTM" requires a very small screen area, and it requires no physical keyboard or active cooling. checking the weather is similar: you don’t need 16 inches of screen real estate to go from wondering if it’s raining to seeing a cloud icon.

I say all this because portability is expensive. Not only is it expensive in terms of back pain — maintaining the ecosystem required to run these machines gets pretty complicated. You either end up shelling out money for specialized backpacks or fighting for outlet space at a coffee shop just to keep the thing running. In either case, you’re paying big money (and calorie) costs every time a user types remind me to eat a sandwich.

I think the future will be full of much smaller devices. Some hardware to build these already exists, and you can even fit them in your pocket. This mode of deployment is inspiring to me, and I’m optimistic about a future where 6.1 inches is all you need.

Archelaos•2mo ago

A typical use case for large laptops is when you want to store it away after work or when you only carry it occasionally. I have a PC for coding at home, but use a thinkpad with the largest screen I could get for coding in my camper van (storing it away when not using it, because of lack of space) or when staying at my mother's home for longer (setting it up once at the start of my visit). I also have another very small, light and inexpensive subnotebook that I can carry around easily, but I rarely use it these days and not for coding at all.

bee_rider•2mo ago

I dunno. It kinda works, and points for converting the whole article. But something is lost in the switch-up here. The size of a laptop is more or less the size of the display (unless we’re going to get weird and have a projector built in), so it is basically a figure-of-merit.

Nobody actually wants more weights in their LLMs, right? They want the things to be “smarter” in some sense.

hobs•2mo ago

With a comfortable spread out my hands are 9.5 inches from pinky to thumb, a thirteen inch laptop is so painfully small I can barely use it.

tebruno99•2mo ago

Try being over 30 sitting at a desk your while life and then try and use a 13” screen. Eye strain is a huge deal.

My opinion on this changed drastically when I started interacting with people outside of tech and not my own age. A device you struggle to see is miserable.

unleaded•2mo ago

Still relevant today. Many problems people throw onto LLMs can be done more efficiently with text completion than begging a model 20x the size (and probably more than 20x the cost) to produce the right structured output. https://www.reddit.com/r/LocalLLaMA/comments/1859qry/is_anyo...

_ea1k•2mo ago

Why would you do that when you could spend months building metadata and failing to tune prompts for a >100B parameter LLM? /s

crystal_revenge•2mo ago

I used to work very heavily with local models and swore by text completion despite many people thinking it was insane that I would choose not to use a chat interface.

LLMs are designed for text completion and the chat interface is basically a fine-tuning hack to make prompting a natural form of text completion to have a more "intuitive" interface for the average user (I don't even want to think about how many AI "enthusiasts" don't really understand this).

But with open/local models in particular: each instruct/chat interface is slightly different. There are tools that help mitigate this, but the more you're working closely to the model the more likely you are to make a stupid mistake because you didn't understand some detail about how the instruct interface was fine tuned.

Once you accept that LLMs are "auto-complete on steroids" you can get much better results by programming the way they were naturally designed to work. It also helps a lot with prompt engineering because you can more easily understand what the models natural tendency is and work with that to generally get better results.

It's funny because a good chunk of my comments on HN these days are combating AI hype, but man are LLMs really fascinating to work with if you approach them with a bit more clear headed of a perspective.

hippo22•2mo ago

Maybe? The loop process of try-fail-try-again-succeed is pretty powerful. Not sure how you get that purely with text completion.

lsb•2mo ago

My threshold for “does not need to be smaller” is “can this run on a Raspberry Pi”. This is a helpful benchmark for maximum likely useful optimization.

A Pi has 4 cores and 16GB of memory these days, so, running Qwen3 4B on a pi is pretty comfortable: https://leebutterman.com/2025/11/01/prompt-optimization-on-a...

debo_•2mo ago

2000: My spoon is too big

2023: My model is too big

lynndotpy•2mo ago

> I think the future will be full of much smaller models trained to do specific tasks.

This was the very recent past! Up until we got LLM-crazy in 2021, this was the primary thing that deep learning papers produced: New models meant to solve very specific tasks.

_ea1k•2mo ago

Yeah, it is insane how many people think that tuning models is nearly impossible, or that it requires a multibillion dollar data center.

It is one of the weirdest variations of people buying into too much hype.

socketcluster•2mo ago

The incumbent are trying to fully control the market but they don't have a justification for that. A company like Google which already had a monopoly over search needs to convince the market that this will allow them to expand past search. If the narrative is that anyone can run a specialized model on their machines for different tasks, this doesn't justify AI companies selling themselves on the assumption of a total market monopoly and stranglehold over the economy.

They cannot sell themselves without concealing reality. This is not a new thing. There were a lot of suppressed projects in Blockchain industry where everyone denied the existence of certain projects and most people never heard about them and talk as if the best coin in existence can do a measly 4 transactions per second as if it's state of the art... Solutions like "Lightning network" don't actually work but they are pitched as revolutionary... I bet there are more people shilling Bitcoin's Lightning network than they are people actually using it. This is the power of centralized financial incentives. Everyone ends up operating on top of shared deception "the official truth" which may not be true at all.

forgotTheLast•2mo ago

One argument against local fine-tuning was that by the time you were done training your finetune of model N, model N+1 was out and it performed your finetune out of the box. That kinda stopped being the case last year though.

brainless•2mo ago

May I add Gliner to this? The original Python version and the Rust version. Fantastic (non LLM) models for entity extraction. There are many others.

I really think using small models for a lot of smell tasks is the best way forward but it's not easy to orchestrate.

jgalt212•2mo ago

The net $5.5T the fed printed had to go somewhere. AI Arms Race was the answer. And when the models got good, then we needed agentic to create unbounded demand for inference just as there was unbounded demand for training.

https://fred.stlouisfed.org/series/WALCL

lioeters•2mo ago

The graph is horrifying. Before the 2008 crisis, less than $1 trillion. By the time of the 2020 crisis, it had hit 4, then in the next few years more than doubled to $9 trillion. It may contribute to explaining why the rich are swimming in free money while the underclass can't afford to live anymore. With AI eating up the job market, we seem to be headed for another even bigger crisis.

K0IN•2mo ago

Im always so surprised that embedding models we had for years like minlm (80mb) are so small, and I really wonder why not more on device searches use something like it.

musicandpiss•2mo ago

thank you for sharing.-)

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist