I propose a theory. My theory is that LLMs do not require a lot of computing power. What they do require is a small computer, but it's blown out of proportion with unnecessary information to artificially increase demand for high power hardware. So the reality why models are the way they are, especially local models, is because they actually suck at optimizing on purpose. As I said, it's even added to complexity to increase sales of hardware.
What are your pros and what are your cons? And do not refer to secondhand information as in "well I was told" or "well I read a paper".