frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ollama is now powered by MLX on Apple Silicon in preview

https://ollama.com/blog/mlx
59•redundantly•1h ago

Comments

babblingfish•41m ago
LLMs on device is the future. It's more secure and solves the problem of too much demand for inference compared to data center supply, it also would use less electricity. It's just a matter of getting the performance good enough. Most users don't need frontier model performance.
gedy•27m ago
Man I really hope so, as, as much as I like Claude Code, I hate the company paying for it and tracking your usage, bullshit management control, etc. I feel like I'm training my replacement. Things feel like they are tightening vs more power and freedom.

On device I would gladly pay for good hardware - it's my machine and I'm using as I see fit like an IDE.

aurareturn•9m ago
When local LLMs get good enough for you to use delightfully, cloud LLMs will have gotten so much smarter that you'll still use it for stuff that needs more intelligence.
aurareturn•17m ago
It isn't going to replace cloud LLMs since cloud LLMs will always be faster in throughput and smarter. Cloud and local LLMs will grow together, not replace each other.

I'm not convinced that local LLMs use less electricity either. Per token at the same level of intelligence, cloud LLMs should run circles around local LLMs in efficiency. If it doesn't, what are we paying hundreds of billions of dollars for?

I think local LLMs will continue to grow and there will be an "ChatGPT" moment for it when good enough models meet good enough hardware. We're not there yet though.

Note, this is why I'm big on investing in chip manufacture companies. Not only are they completely maxed out due to cloud LLMs, but soon, they will be double maxed out having to replace local computer chips with ones that are suited for inferencing AI. This is a massive transition and will fuel another chip manufacturing boom.

AugSun•2m ago
Looking at downvotes I feel good about SDE future in 3-5 years. We will have a swamp of "vibe-experts" who won't be able to pay 100K a month to CC. Meanwhile, people who still remember how to code in Vim will (slowly) get back to pre-COVID TC levels.
AugSun•15m ago
"Most users don't need frontier model performance" unfortunately, this is not the case.
melvinroest•4m ago
I have journaled digitally for the last 5 years with this expectation.

Recently I built a graphRAG app with Qwen 3.5 4b for small tasks like classifying what type of question I am asking or the entity extraction process itself, as graphRAG depends on extracted triplets (entity1, relationship_to, entity2). I used Qwen 3.5 27b for actually answering my questions.

It works pretty well. I have to be a bit patient but that’s it. So in that particular use case, I would agree.

I used MLX and my M1 64GB device. I found that MLX definitely works faster when it comes to extracting entities and triplets in batches.

pezgrande•4m ago
You could argue that the only reason we have good open-weight models is because companies are trying to undermine the big dogs, and they are spending millions to make sure they dont get too far ahead. If the bubble pops then there wont be incentive to keep doing it.
codelion•30m ago
How does it compare to some of the newer mlx inference engines like optiq that support turboquantization - https://mlx-optiq.pages.dev/
dial9-1•30m ago
still waiting for the day I can comfortably run Claude Code with local llm's on MacOS with only 16gb of ram
gedy•26m ago
How close is this? It says it needs 32GB min?
HDBaseT•11m ago
You can run Qwen3.5-35B-A3B on 32GB of RAM sure, although to get 'Claude Code' performance, which I assume he means Sonnet or Opus level models in 2026, this will likely be a few years away before its runnable locally (with reasonable hardware).
LuxBennu•20m ago
Already running qwen 70b 4-bit on m2 max 96gb through llama.cpp and it's pretty solid for day to day stuff. The mlx switch is interesting because ollama was basically shelling out to llama.cpp on mac before, so native mlx should mean better memory handling on apple silicon. Curious to see how it compares on the bigger models vs the gguf path
AugSun•17m ago
"We can run your dumbed down models faster":

#The use of NVFP4 results in a 3.5x reduction in model memory footprint relative to FP16 and a 1.8x reduction compared to FP8, while maintaining model accuracy with less than 1% degradation on key language modeling tasks for some models.

Sony halts memory card shipments due to NAND shortage

https://www.techzine.eu/news/devices/140058/sony-halts-memory-card-shipments-due-to-nand-shortage/
2•methuselah_in•8m ago•0 comments

Gone (Almost) Phishin'

https://ma.tt/2026/03/gone-almost-phishin/
1•luu•9m ago•0 comments

Scientists say we've been looking in the wrong place for human origins

https://www.sciencedaily.com/releases/2026/03/260327230113.htm
1•DeathArrow•9m ago•0 comments

GitHub backs down, kills Copilot pull-request ads after backlash

https://www.theregister.com/2026/03/30/github_copilot_ads_pull_requests/
2•_____k•9m ago•0 comments

Information Flow Kernel for Claude Code Hooks

https://github.com/coproduct-opensource/nucleus/blob/main/docs/quickstart-hook.md
1•difc•9m ago•1 comments

Vector Databases Explained in 3 Levels of Difficulty

https://machinelearningmastery.com/vector-databases-explained-in-3-levels-of-difficulty/
1•eigenBasis•11m ago•0 comments

A Knowledge Graph

https://tjid3.org/test/kg12.74
1•TimothyMJones•11m ago•1 comments

Would you use a GitHub App that auto-generates changelogs from commit diffs?

1•mandeepsng•12m ago•2 comments

Run virtualized iOS with Private Cloud Compute drivers

https://github.com/wh1te4ever/super-tart-vphone-writeup
1•goranmoomin•14m ago•0 comments

Read-Only vs. Action AI: Why Most Odoo AI Tools Stop at the Report

https://www.odooclaw.ai/blog/read-only-vs-action-ai-why-most-odoo-ai-tools-stop-at-the-report
1•oktra_dev•16m ago•0 comments

Securing Elliptic Curve Cryptocurrencies Against Quantum Vulnerabilities [pdf]

https://quantumai.google/static/site-assets/downloads/cryptocurrency-whitepaper.pdf
1•nstj•19m ago•0 comments

Meta Testing Instagram Plus Subscription with Exclusive Features

https://techlomedia.in/2026/03/meta-testing-instagram-plus-subscription-with-exclusive-features-1...
1•deepanker70•21m ago•1 comments

Distributed builds of LLVM with CMake,recc, and NativeLin

https://reidkleckner.dev/
1•swq115•29m ago•0 comments

Claude Code bug can silently 10-20x API costs

https://old.reddit.com/r/ClaudeCode/comments/1s7mitf/psa_claude_code_has_two_cache_bugs_that_can
2•wg0•30m ago•0 comments

Cognitive profiling from speech, not multiple choice

https://expressivecognition.org/
1•baplantas•35m ago•0 comments

The Philosopher and the Tsar

https://www.the-hinternet.com/p/the-philosopher-and-the-tsar
1•Caiero•36m ago•0 comments

Ask HN: Why have supply chain attacks become a near daily occurrence?

2•dhruv3006•36m ago•0 comments

Mental Health Fashion

https://nosaddays.com/
2•samdreamin•38m ago•0 comments

Show HN: I built an app after nearly missing a passport expiry

https://traveldocumentvault.com/
2•mustafahafeez•40m ago•1 comments

Screenpipe

https://github.com/screenpipe/screenpipe
2•handfuloflight•41m ago•0 comments

You can now run a full Linux operating system inside a 6mb PDF

https://twitter.com/oliviscusAI/status/2038563166431346865
3•matthewsinclair•47m ago•1 comments

Show HN: Provero – Data quality checks in YAML, compiled to single SQL queries

https://github.com/provero-org/provero
2•andreahlert•47m ago•0 comments

Office EU: European-owned cloud based office suite

https://office.eu
3•koenraad•47m ago•1 comments

Tokens Are the New Oil: How China Is Quietly Winning the AI Economy

https://thamizhelango.medium.com/tokens-are-the-new-oil-how-china-is-quietly-winning-the-ai-econo...
3•KnuthIsGod•49m ago•0 comments

The United States has become a rogue state

https://www.washingtonpost.com/ripple/2026/03/26/united-states-trump-rogue-state-iran/
2•hkhn•50m ago•0 comments

Show HN: Tiny Axios Alternative, Fch

https://www.npmjs.com/package/fch
1•franciscop•50m ago•0 comments

Our AI traced the axios NPM attack and found how the payload hid itself

https://app.strix.ai/share/chats/NDIxNzZiMTItZWQ2My00NDY4LWIzYzUtNDEyZDgyMWI1YjYzLm1uZTJldnQ0LkVt...
2•ahmedallam2•50m ago•0 comments

What is 'tokenomics' and how would China gain the edge

https://www.scmp.com/tech/big-tech/article/3347495/how-china-could-dominate-ai-eras-tokenomics-va...
1•KnuthIsGod•50m ago•0 comments

Delve – Fake Compliance as a Service – Part II

https://substack.com/home/post/p-192665132
2•theahura•53m ago•0 comments

California to impose new AI regulations in defiance of Trump call

https://www.theguardian.com/us-news/2026/mar/30/california-ai-regulations-trump
1•thm•54m ago•0 comments