frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: oMLX – SSD-backed KV cache cuts coding agent TTFT from 90s to 1s on Mac

https://github.com/jundot/omlx
4•jundot•1h ago

Comments

jundot•1h ago
Hi HN, I'm the developer. I built oMLX because every Mac LLM server I tried fell apart with coding agents.

The core problem: tools like Claude Code and OpenClaw send requests where the prompt prefix keeps shifting. Most servers invalidate the KV cache after each turn, forcing a full re-prefill. On large contexts (50-100K tokens), that means 30-90 seconds of waiting per request. After a few turns, it's practically unusable.

oMLX solves this with paged SSD caching. Every KV cache block is persisted to disk. When a previous prefix comes back, it's restored from SSD instead of being recomputed. In practice, TTFT drops from 30-90s to 1-3s on follow-up requests. Cache blocks survive server restarts too.

What's under the hood:

- Continuous batching via mlx-lm (multiple concurrent requests)

- Multi-model serving (LLM + VLM + Embedding + Reranker simultaneously, LRU eviction)

- OpenAI + Anthropic compatible APIs Tool calling support(JSON, Qwen, Gemma, MiniMax, GLM formats + MCP)

- Native macOS menubar app (PyObjC, signed DMG) — download, drag, done

As of v0.2.0: Vision-Language Model support with the same tiered caching

Benchmarks on M3 Ultra 512GB with Qwen3-Coder-Next-8bit: 58.7 tok/s single request, 243 tok/s at 8x batch. Full results on the README.

It reuses LM Studio's model directory, so you don't need to re-download anything.

100% open source (Apache 2.0): https://github.com/jundot/omlx

OpenAI GPT 5.4 Leak: 2M Tokens, Pixel Vision, and the Rise of Tiny Agents

https://www.revolutioninai.com/2026/03/openai-gpt-5-4-leak-tiny-agents.html
1•swolpers•30s ago•0 comments

Yall Need a Job

https://nadaav.substack.com/p/yall-need-a-job
1•huhkerrf•40s ago•0 comments

Offline Mode

https://jamesg.blog/2026/03/04/offline-mode-2
1•Tomte•1m ago•0 comments

Freedom Plane National Tour: Documents That Forged a Nation

https://freedomplane.org/
1•bookmtn•1m ago•0 comments

Human Flatus Atlas

https://www.flatus.info
1•fodmap•1m ago•0 comments

AgentaOS – Give your agents a financial OS in 30 seconds

https://agentaos.ai/blog/launch
1•Pance•3m ago•0 comments

SpaceX launch creates predawn 'jellyfish' spectacle in Florida sky

https://www.floridatoday.com/story/tech/science/space/2026/03/04/spacex-launch-photo-gallery-of-j...
1•bookmtn•3m ago•0 comments

Notes on Project Oberon

https://sidhion.com/blog/oberon_notes/
1•surprisetalk•4m ago•0 comments

Border-Image Tool

https://jacobfilipp.com/borderimage/
1•surprisetalk•4m ago•0 comments

Evolution in Higher Dimensions [video]

https://www.youtube.com/watch?v=DB-TD3s3MZ0
1•surprisetalk•4m ago•0 comments

Some Lotto Math

https://leancrew.com/all-this/2025/12/some-lotto-math/
1•surprisetalk•4m ago•0 comments

Ask HN: Do you think AI fulfills Revelations in anyway?

1•general_reveal•4m ago•0 comments

Long video maker using veo3 with one prompt

https://www.vo3ai.com/app/story-to-video
1•jeyzolo•4m ago•1 comments

SafetyWing – Global Partner Program – Remote – Commission-Based

https://global-partners-program.safetywing.com
1•damjanSW•5m ago•0 comments

Show HN: Teaching Tokens: Implementing Private, Lightweight AI in the Classroom

https://medium.com/@josh.beck2006/teaching-tokens-implementing-private-lightweight-ai-in-the-clas...
1•Beckmeister•6m ago•0 comments

WebKit for SwiftUI [video]

https://developer.apple.com/videos/play/wwdc2025/231/
1•simonpure•6m ago•0 comments

Open Memory Initiative (OMI) - an open DDR4 UDIMM reference design

1•mertefesensoy•7m ago•0 comments

Free nano banana 2 to try

https://banananano2.ai
2•jeyzolo•9m ago•0 comments

The Next Evolution of the Prisma ORM

https://www.prisma.io/blog/the-next-evolution-of-prisma-orm
2•durron•10m ago•0 comments

Microsoft rumoured to be planning launch of new 'modular' Windows 12 OS

https://www.pcgamer.com/software/operating-systems/microsoft-rumoured-to-be-planning-launch-of-ne...
2•fbelzile•10m ago•0 comments

Chinese hackers exploiting Dell zero-day flaw since mid-2024

https://www.bleepingcomputer.com/news/security/chinese-hackers-exploiting-dell-zero-day-flaw-sinc...
2•tuananh•11m ago•0 comments

Show HN: BrowseBrawl – What if browser agents battled to generate training data?

https://www.browser-brawl.com/
3•HrubyOnRails•11m ago•0 comments

Bropages Is Down

https://bropages.org/
2•bariumbitmap•12m ago•1 comments

Show HN: We built a News Synthesis Engine to fight media bias

https://apps.apple.com/us/app/the-bias-wide-angle-news/id6755318038
2•charlie_ehlen•14m ago•0 comments

Show HN: Kodama – A self-hosted autonomous daemon for Claude Code and Codex

https://github.com/FratteFlorian/kodama
2•flofra•14m ago•0 comments

I built safe JavaScript for $250 in two weeks

https://loewald.com/blog/2026/2/26/how-i-built-new-javascript-in-two-weeks-for-250-dollars
4•podperson•15m ago•1 comments

Apple Does Fusion

https://om.co/2026/03/03/apple-does-fusion/
3•herbertl•15m ago•0 comments

Chiplets: A Technology, Not a Market (2025)

https://semiengineering.com/chiplets-a-technology-not-a-market/
2•herbertl•16m ago•0 comments

Astrophysicist Proposes Interstellar Mission to Study Black Holes

https://www.sci.news/astronomy/interstellar-mission-astrophysical-black-holes-14129.html
3•bookofjoe•16m ago•0 comments

The DIY Diehards Building Green Infrastructure from Scratch

https://reasonstobecheerful.world/diy-diehards-building-renewable-energy-infrastructure-from-scra...
3•PaulHoule•18m ago•1 comments