frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ollama is now powered by MLX on Apple Silicon in preview

https://ollama.com/blog/mlx
109•redundantly•2h ago

Comments

babblingfish•1h ago
LLMs on device is the future. It's more secure and solves the problem of too much demand for inference compared to data center supply, it also would use less electricity. It's just a matter of getting the performance good enough. Most users don't need frontier model performance.
gedy•1h ago
Man I really hope so, as, as much as I like Claude Code, I hate the company paying for it and tracking your usage, bullshit management control, etc. I feel like I'm training my replacement. Things feel like they are tightening vs more power and freedom.

On device I would gladly pay for good hardware - it's my machine and I'm using as I see fit like an IDE.

aurareturn•1h ago
When local LLMs get good enough for you to use delightfully, cloud LLMs will have gotten so much smarter that you'll still use it for stuff that needs more intelligence.
gedy•51m ago
True, but I'm already producing code/features faster than company knows what to do with, (even though every company says "omg we need this yesterday", etc). Even coding before AI was basically same.

Code tools that free my time up is very nice.

aurareturn•1h ago
It isn't going to replace cloud LLMs since cloud LLMs will always be faster in throughput and smarter. Cloud and local LLMs will grow together, not replace each other.

I'm not convinced that local LLMs use less electricity either. Per token at the same level of intelligence, cloud LLMs should run circles around local LLMs in efficiency. If it doesn't, what are we paying hundreds of billions of dollars for?

I think local LLMs will continue to grow and there will be an "ChatGPT" moment for it when good enough models meet good enough hardware. We're not there yet though.

Note, this is why I'm big on investing in chip manufacture companies. Not only are they completely maxed out due to cloud LLMs, but soon, they will be double maxed out having to replace local computer chips with ones that are suited for inferencing AI. This is a massive transition and will fuel another chip manufacturing boom.

AugSun•1h ago
Looking at downvotes I feel good about SDE future in 3-5 years. We will have a swamp of "vibe-experts" who won't be able to pay 100K a month to CC. Meanwhile, people who still remember how to code in Vim will (slowly) get back to pre-COVID TC levels.
QuantumNomad_•41m ago
What is CC and TC? I have not heard these abbreviations (except for CC to mean credit card or carbon copy, neither of which is what I think you mean here).
Ericson2314•34m ago
I figured it out from context clues

CC: Claude Code

TC: total comp(ensation)

virtue3•52m ago
We are 100% there already. In browser.

the webgpu model in my browser on my m4 pro macbook was as good as chatgpt 3.5 and doing 80+ tokens/s

Local is here.

raincole•13m ago
Yep. People were claiming DeepSeek was "almost as good as SOTA" when it came out. Local will always be one step away like fusion.

It's just wishful thinking (and hatred towards American megacorps). Old as the hills. Understandable, but not based on reality.

AugSun•1h ago
"Most users don't need frontier model performance" unfortunately, this is not the case.
selcuka•22m ago
Any citations? Because that was my impression, too. I want frontier model performance for my coding assistant, but "most users" could do with smaller/faster models.

ChatGPT free falls back to GPT-5.2 Mini after a few interactions.

melvinroest•1h ago
I have journaled digitally for the last 5 years with this expectation.

Recently I built a graphRAG app with Qwen 3.5 4b for small tasks like classifying what type of question I am asking or the entity extraction process itself, as graphRAG depends on extracted triplets (entity1, relationship_to, entity2). I used Qwen 3.5 27b for actually answering my questions.

It works pretty well. I have to be a bit patient but that’s it. So in that particular use case, I would agree.

I used MLX and my M1 64GB device. I found that MLX definitely works faster when it comes to extracting entities and triplets in batches.

pezgrande•1h ago
You could argue that the only reason we have good open-weight models is because companies are trying to undermine the big dogs, and they are spending millions to make sure they dont get too far ahead. If the bubble pops then there wont be incentive to keep doing it.
aurareturn•58m ago
I agree. I can totally see in the future that open source LLMs will turn into paying a lumpsum for the model. Many will shut down. Some will turn into closed source labs.

When VCs inevitably ask their AI labs to start making money or shut down, those free open source LLMS will cease to be free.

Chinese AI labs have to release free open source models because they distill from OpenAI and Anthropic. They will always be behind. Therefore, they can't charge the same prices as OpenAI and Anthropic. Free open source is how they can get attention and how they can stay fairly close to OpenAI and Anthropic. They have to distill because they're banned from Nvidia chips and TSMC.

Before people tell me Chinese AI labs do use Nvidia chips, there is a huge difference between using older gimped Nvidia H100 (called H20) chips or sneaking around Southeast Asia for Blackwell chips and officially being allowed to buy millions of Nvidia's latest chips to build massive gigawatt data centers.

spiderfarmer•26m ago
“They will always be behind”

Car manufacturers said the same.

aurareturn•22m ago
It did take decades to catch and surpass US car makers right?
pezgrande•17m ago
> have to release free open source models because they distill from OpenAI and Anthropic

They dont really have to though, they just need to be good enough and cheaper (even if distilled). That being said, it is true they are gaining a lot of visibility (specially Qwen) because of being open-source(weight).

Hardware-wise they seem they will catch-up in 3-5 years (Nvidia is kind of irrelevant, what matters is the node).

codelion•1h ago
How does it compare to some of the newer mlx inference engines like optiq that support turboquantization - https://mlx-optiq.pages.dev/
dial9-1•1h ago
still waiting for the day I can comfortably run Claude Code with local llm's on MacOS with only 16gb of ram
gedy•1h ago
How close is this? It says it needs 32GB min?
HDBaseT•1h ago
You can run Qwen3.5-35B-A3B on 32GB of RAM sure, although to get 'Claude Code' performance, which I assume he means Sonnet or Opus level models in 2026, this will likely be a few years away before its runnable locally (with reasonable hardware).
Foobar8568•54m ago
I fully agree, I run that one with Q4 on my MBP, and the performance (including quality of response) is a let down.

I am wondering how people rave so much about local "small devices" LLM vs what codex or Claude code are capable of.

Sadly there are too much hype on local LLM, they look great for 5min tests and that's it.

brcmthrowaway•51m ago
Just train it better with AGENTS.md
LuxBennu•1h ago
Already running qwen 70b 4-bit on m2 max 96gb through llama.cpp and it's pretty solid for day to day stuff. The mlx switch is interesting because ollama was basically shelling out to llama.cpp on mac before, so native mlx should mean better memory handling on apple silicon. Curious to see how it compares on the bigger models vs the gguf path
AugSun•1h ago
"We can run your dumbed down models faster":

#The use of NVFP4 results in a 3.5x reduction in model memory footprint relative to FP16 and a 1.8x reduction compared to FP8, while maintaining model accuracy with less than 1% degradation on key language modeling tasks for some models.

brcmthrowaway•49m ago
What is the difference between Ollama, llama.cpp, ggml and gguf?
xiconfjs•45m ago
Ollama on MacOS is a one-click solution with stable obe-click updates. Happy so far. But the mlx support was the only missing piece for me.
benob•31m ago
Ollama is a user-friendly UI for LLM inference. It is powered by llama.cpp (or a fork of it) which is more power-user oriented and requires command-line wrangling. GGML is the math library behind llama.cpp and GGUF is the associated file format used for storing LLM weights.
mfa1999•39m ago
How does this compare to llama.cpp in terms of performance?

Axios compromised on NPM – Malicious versions drop remote access trojan

https://www.stepsecurity.io/blog/axios-compromised-on-npm-malicious-versions-drop-remote-access-t...
420•mtud•3h ago•125 comments

Google's 200M-parameter time-series foundation model with 16k context

https://github.com/google-research/timesfm
38•codepawl•58m ago•16 comments

Ollama is now powered by MLX on Apple Silicon in preview

https://ollama.com/blog/mlx
109•redundantly•2h ago•31 comments

Universal Claude.md – cut Claude output tokens

https://github.com/drona23/claude-token-efficient
234•killme2008•4h ago•89 comments

Artemis II is not safe to fly

https://idlewords.com/2026/03/artemis_ii_is_not_safe_to_fly.htm
174•idlewords•3h ago•103 comments

Fedware: Government apps that spy harder than the apps they ban

https://www.sambent.com/the-white-house-app-has-huawei-spyware-and-an-ice-tip-line/
516•speckx•12h ago•163 comments

Do your own writing

https://alexhwoods.com/dont-let-ai-write-for-you/
464•karimf•17h ago•168 comments

Show HN: Raincast – Describe an app, get a native desktop app (open source)

https://github.com/tihiera/raincast
6•tito777•40m ago•3 comments

Android Developer Verification

https://android-developers.googleblog.com/2026/03/android-developer-verification-rolling-out-to-a...
205•ingve•8h ago•190 comments

GitHub backs down, kills Copilot pull-request ads after backlash

https://www.theregister.com/2026/03/30/github_copilot_ads_pull_requests/
42•_____k•1h ago•11 comments

Clojure: The Documentary, official trailer [video]

https://www.youtube.com/watch?v=JJEyffSdBsk
121•fogus•4d ago•7 comments

Turning a MacBook into a touchscreen with $1 of hardware (2018)

https://anishathalye.com/macbook-touchscreen/
277•HughParry•10h ago•128 comments

How to turn anything into a router

https://nbailey.ca/post/router/
648•yabones•16h ago•225 comments

Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly

https://research.google/blog/safeguarding-cryptocurrency-by-disclosing-quantum-vulnerabilities-re...
36•madars•2h ago•4 comments

Rock Star: Reading the Rosetta Stone

https://www.historytoday.com/archive/feature/original-rock-star
7•samizdis•2d ago•0 comments

Sony halts memory card shipments due to NAND shortage

https://www.techzine.eu/news/devices/140058/sony-halts-memory-card-shipments-due-to-nand-shortage/
18•methuselah_in•1h ago•3 comments

Mr. Chatterbox is a Victorian-era ethically trained model

https://simonwillison.net/2026/Mar/30/mr-chatterbox/
22•y1n0•3h ago•4 comments

Incident March 30th, 2026 – Accidental CDN Caching

https://blog.railway.com/p/incident-report-march-30-2026-accidental-cdn-caching
42•cebert•4h ago•15 comments

Bird brains (2023)

https://www.dhanishsemar.com/writing/bird-brains
310•DiffTheEnder•17h ago•195 comments

Show HN: I turned a sketch into a 3D-print pegboard for my kid with an AI agent

https://github.com/virpo/pegboard
29•virpo•7h ago•5 comments

OpenGridWorks: The Electricity Infrasctructure, Mapped

https://www.opengridworks.com
88•jonbraun•9h ago•10 comments

One of the largest salt mines in the world exists under Lake Erie

https://apnews.com/article/cleveland-salt-mine-winter-road-0daf091e3d56f65766bcf6a597683893
7•1659447091•2d ago•0 comments

Agents of Chaos

https://agentsofchaos.baulab.info/report.html
94•luu•3d ago•10 comments

Unit: A self-replicating Forth mesh agent running in a browser tab

https://davidcanhelp.github.io/unit/
24•DavidCanHelp•4d ago•1 comments

Cherri – programming language that compiles to an Apple Shortuct

https://github.com/electrikmilk/cherri
287•mihau•3d ago•57 comments

CodingFont: A game to help you pick a coding font

https://www.codingfont.com/
377•nvahalik•15h ago•193 comments

Researchers find 3,500-year-old loom that reveals textile revolution

https://web.ua.es/en/actualidad-universitaria/2026/marzo2026/23-31/ua-researchers-find-3-500-year...
97•geox•3d ago•9 comments

Vulnerability research is cooked

https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/
148•pedro84•11h ago•106 comments

R3 Bio pitched “brainless clones” to serve the role of backup human bodies

https://www.technologyreview.com/2026/03/30/1134780/r3-bio-brainless-human-clones-full-body-repla...
46•joozio•19h ago•58 comments

Oscar Reutersvärd (2021)

https://escherinhetpaleis.nl/en/about-escher/escher-today/oscar-reutersvard
11•layer8•1d ago•0 comments