I am wondering how people rave so much about local "small devices" LLM vs what codex or Claude code are capable of.
Sadly there are too much hype on local LLM, they look great for 5min tests and that's it.
#The use of NVFP4 results in a 3.5x reduction in model memory footprint relative to FP16 and a 1.8x reduction compared to FP8, while maintaining model accuracy with less than 1% degradation on key language modeling tasks for some models.
babblingfish•1h ago
gedy•1h ago
On device I would gladly pay for good hardware - it's my machine and I'm using as I see fit like an IDE.
aurareturn•1h ago
gedy•51m ago
Code tools that free my time up is very nice.
aurareturn•1h ago
I'm not convinced that local LLMs use less electricity either. Per token at the same level of intelligence, cloud LLMs should run circles around local LLMs in efficiency. If it doesn't, what are we paying hundreds of billions of dollars for?
I think local LLMs will continue to grow and there will be an "ChatGPT" moment for it when good enough models meet good enough hardware. We're not there yet though.
Note, this is why I'm big on investing in chip manufacture companies. Not only are they completely maxed out due to cloud LLMs, but soon, they will be double maxed out having to replace local computer chips with ones that are suited for inferencing AI. This is a massive transition and will fuel another chip manufacturing boom.
AugSun•1h ago
QuantumNomad_•41m ago
Ericson2314•34m ago
CC: Claude Code
TC: total comp(ensation)
virtue3•52m ago
the webgpu model in my browser on my m4 pro macbook was as good as chatgpt 3.5 and doing 80+ tokens/s
Local is here.
raincole•13m ago
It's just wishful thinking (and hatred towards American megacorps). Old as the hills. Understandable, but not based on reality.
AugSun•1h ago
selcuka•22m ago
ChatGPT free falls back to GPT-5.2 Mini after a few interactions.
melvinroest•1h ago
Recently I built a graphRAG app with Qwen 3.5 4b for small tasks like classifying what type of question I am asking or the entity extraction process itself, as graphRAG depends on extracted triplets (entity1, relationship_to, entity2). I used Qwen 3.5 27b for actually answering my questions.
It works pretty well. I have to be a bit patient but that’s it. So in that particular use case, I would agree.
I used MLX and my M1 64GB device. I found that MLX definitely works faster when it comes to extracting entities and triplets in batches.
pezgrande•1h ago
aurareturn•58m ago
When VCs inevitably ask their AI labs to start making money or shut down, those free open source LLMS will cease to be free.
Chinese AI labs have to release free open source models because they distill from OpenAI and Anthropic. They will always be behind. Therefore, they can't charge the same prices as OpenAI and Anthropic. Free open source is how they can get attention and how they can stay fairly close to OpenAI and Anthropic. They have to distill because they're banned from Nvidia chips and TSMC.
Before people tell me Chinese AI labs do use Nvidia chips, there is a huge difference between using older gimped Nvidia H100 (called H20) chips or sneaking around Southeast Asia for Blackwell chips and officially being allowed to buy millions of Nvidia's latest chips to build massive gigawatt data centers.
spiderfarmer•26m ago
Car manufacturers said the same.
aurareturn•22m ago
pezgrande•17m ago
They dont really have to though, they just need to be good enough and cheaper (even if distilled). That being said, it is true they are gaining a lot of visibility (specially Qwen) because of being open-source(weight).
Hardware-wise they seem they will catch-up in 3-5 years (Nvidia is kind of irrelevant, what matters is the node).