news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: oMLX – SSD-backed KV cache cuts coding agent TTFT from 90s to 1s on Mac

https://github.com/jundot/omlx

4•jundot•3h ago

Comments

jundot•3h ago

Hi HN, I'm the developer. I built oMLX because every Mac LLM server I tried fell apart with coding agents.

The core problem: tools like Claude Code and OpenClaw send requests where the prompt prefix keeps shifting. Most servers invalidate the KV cache after each turn, forcing a full re-prefill. On large contexts (50-100K tokens), that means 30-90 seconds of waiting per request. After a few turns, it's practically unusable.

oMLX solves this with paged SSD caching. Every KV cache block is persisted to disk. When a previous prefix comes back, it's restored from SSD instead of being recomputed. In practice, TTFT drops from 30-90s to 1-3s on follow-up requests. Cache blocks survive server restarts too.

What's under the hood:

- Continuous batching via mlx-lm (multiple concurrent requests)

- Multi-model serving (LLM + VLM + Embedding + Reranker simultaneously, LRU eviction)

- OpenAI + Anthropic compatible APIs Tool calling support(JSON, Qwen, Gemma, MiniMax, GLM formats + MCP)

- Native macOS menubar app (PyObjC, signed DMG) — download, drag, done

As of v0.2.0: Vision-Language Model support with the same tiered caching

Benchmarks on M3 Ultra 512GB with Qwen3-Coder-Next-8bit: 58.7 tok/s single request, 243 tok/s at 8x batch. Full results on the README.

It reuses LM Studio's model directory, so you don't need to re-download anything.

100% open source (Apache 2.0): https://github.com/jundot/omlx

Show HN: Stacked Game of Life

https://stacked-game-of-life.koenvangilst.nl/

118•vnglst•4d ago•23 comments

Show HN: I made a zero-copy coroutine tracer to find my scheduler's lost wakeups

https://github.com/lixiasky-back/coroTracer

39•lixiasky•1d ago•1 comments

Show HN: Recite – I built an Skill and MCP so my AI agent does my bookkeeping

https://github.com/rivradev/recite-agent-skill

2•rivradev•1h ago•0 comments

Show HN: O4DB – Intent-based M2M protocol without centralized APIs

https://github.com/dannythecountok/O4DB-protocol

2•dannythecount•1h ago•2 comments

Show HN: AI Code Review CLI

https://github.com/kodustech/cli

3•eddelgado•1h ago•0 comments

Show HN: I built a browser-based 3D modeler because I'm scared of Blender

https://app.topomaker.com/

4•whothatcodeguy•1h ago•0 comments

Show HN: GitPulse – stop buying dead software (and a timeline for your dev life)

https://www.gitpulse.dev/

2•bombashell•2h ago•1 comments

Show HN: Open-sourced a web client that lets any device use Apple's on-device AI

https://github.com/Techopolis/perspective-intelligence-web-community

5•tayarndt•2h ago•0 comments

Show HN: Zsh plugin to switch macOS Terminal.app profiles

https://github.com/sfcodes/zsh-terminal-profile

2•sfcodes•2h ago•1 comments

Show HN: P0 – Yes, AI can ship complex features into real codebases

https://www.bepurple.ai/

17•arndt•2h ago•11 comments

Show HN: oMLX – SSD-backed KV cache cuts coding agent TTFT from 90s to 1s on Mac

https://github.com/jundot/omlx

4•jundot•3h ago•1 comments

Show HN: What % of your commits were written by AI?

https://technically-your-name-is-on-it.btao.org/

3•tao_oat•3h ago•1 comments

Show HN: ClawSandbox – 7/9 attacks succeeded against an AI agent w/ shell access

https://github.com/deduu/ClawSandbox

2•ariansyah•3h ago•3 comments

Show HN: Retrievo – In-memory hybrid search for .NET AI agents

https://github.com/TianqiZhang/Retrievo

3•ztq121121•4h ago•0 comments

Show HN: Rust compiler in PHP emitting x86-64 executables

https://github.com/mrconter1/rustc-php

48•mrconter11•3d ago•46 comments

Show HN: An MCP server for the docs of any repo that uses Sphinx

https://github.com/AUrbanec/sphinxdocs_mcp

2•btcalex•5h ago•0 comments

Show HN: Glyph, a local-first Markdown notes app for macOS built with Rust

https://glyphformac.com/

3•skarat•6h ago•2 comments

Show HN: Effective Git

https://github.com/nolasoft/okgit

33•nola-a•3d ago•5 comments

Show HN: I built a sub-500ms latency voice agent from scratch

https://www.ntik.me/posts/voice-agent

560•nicktikhonov•1d ago•152 comments

Show HN: Cloudwright – validate, cost, and export cloud architectures from text

https://github.com/xmpuspus/cloudwright

5•xmpuspus•9h ago•0 comments

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

https://github.com/kossisoroyce/timber

202•kossisoroyce•2d ago•33 comments

Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act

42•systima•1d ago•2 comments

Show HN: Omni – Open-source workplace search and chat, built on Postgres

https://github.com/getomnico/omni

168•prvnsmpth•2d ago•42 comments

Show HN: Demucs music stem separator rewritten in Rust – runs in the browser

https://github.com/nikhilunni/demucs-rs

14•nikhilunni•1d ago•2 comments

Show HN: Pianoterm – Run shell commands from your Piano. A Linux CLI tool

https://github.com/vustagc/pianoterm

59•vustagc•1d ago•21 comments

Show HN: Agent Action Protocol (AAP) – MCP got us started, but is insufficient

https://github.com/agentactionprotocol/aap/

13•hank2000•23h ago•2 comments

Show HN: Term-CLI – interactive terminals for AI agents (for SSH/TUI/REPL flows)

https://github.com/EliasOenal/term-cli

7•eliasoe•14h ago•2 comments

Show HN: Armalo AI – The Infrastructure for Agent Networks

2•ArmaloAI•9h ago•3 comments

Show HN: Yare.io – Kill all enemy cats. With JavaScript.

https://yare.io

3•levmiseri•9h ago•1 comments

Show HN: We want to displace Notion with collaborative Markdown files

https://www.moment.dev/

27•antics•22h ago•6 comments