frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File

https://github.com/christopherkarani/Wax
35•ckarani•4h ago

Comments

ckarani•4h ago
I built Wax because every RAG solution required either Pinecone/Weaviate in the cloud or ChromaDB/Qdrant running locally. I wanted the SQLite of RAG -- import a library, open a file, query. Except for multimodal content at GPU speed.

The architecture that makes this work: Metal-accelerated vector search -- Embeddings live directly in unified memory (MTLBuffer). Zero CPU-GPU copy overhead. Adaptive SIMD4/SIMD8 kernels + GPU-side bitonic sort = sub-millisecond search on 10K+ vectors (vs ~100ms CPU). This isn't just "faster" -- it enables interactive search UX that wasn't possible before.

Atomic single-file storage (.mv2s) -- Everything in one crash-safe binary: embeddings, BM25 index, metadata, compressed payloads. Dual-header writes with generation counters = kill -9 safe. Sync via iCloud, email it, commit to git. The file format is deterministic -- identical input produces byte-identical output.

Query-adaptive hybrid fusion -- Four parallel search lanes (BM25, vector, timeline, structured memory). Lightweight classifier detects intent ("when did I..." → boost timeline, "find documentation about..." → boost BM25). Reciprocal Rank Fusion with deterministic tie-breaking = identical queries always return identical results.

Photo/Video RAG -- Index your photo library with OCR, captions, GPS binning, per-region embeddings. Query "find that receipt from the restaurant" searches text, visual similarity, and location simultaneously. Videos get segmented with keyframe embeddings + transcript mapping. Results include timecodes for jump-to-moment navigation. All offline -- iCloud-only photos get metadata-only indexing. Swift 6.2 strict concurrency -- Every orchestrator is an actor. Thread safety proven at compile time, not runtime. Zero data races, zero @unchecked Sendable, zero escape hatches.

Deterministic context assembly -- Same query + same data = byte-identical context every time. Three-tier surrogate compression (full/gist/micro) adapts based on memory age. Bundled cl100k_base tokenizer = no network, no nondeterminism.

import Wax

let brain = try await MemoryOrchestrator(at: URL(fileURLWithPath: "brain.mv2s"))

// Index try await brain.remember("User prefers dark mode, gets headaches from bright screens")

// Retrieve let context = try await brain.recall(query: "user display preferences") // Returns relevant memories with source attribution, ready for LLM context

What makes this different:

Zero dependencies on cloud infrastructure -- No API keys, no vendor lock-in, no telemetry Production-grade concurrency -- Not "it works in my tests," but compile-time proven thread safety Multimodal from the ground up -- Text, photos, videos indexed with shared semantics Performance that unlocks new UX -- Sub-millisecond latency enables real-time RAG workflows

## Wax Performance (Apple Silicon, as of Feb 17, 2026)

  - 0.84ms vector search at 10K docs (Metal, warm cache)
  - 9.2ms first-query after cold-open for vector search
  - ~125x faster than CPU (105ms) and ~178x faster than SQLite FTS5 (150ms) in
    the same 10K benchmark
  - 17ms cold-open → first query overall
  - 10K ingest in 7.756s (~1289 docs/s) with hybrid batched ingest
  - 0.103s hybrid search on 10K docs
  - Recall path: 0.101–0.103s (smoke/standard workloads)
Built for: Developers shipping AI-native apps who want RAG without the infrastructure overhead. Your data stays local, your users stay private, your app stays fast.

The storage format and search pipeline are stable. The API surface is early but functional. If you're building RAG into Swift apps, I'd love your feedback.

GitHub: https://github.com/christopherkarani/Wax

Star it if you're tired of spinning up vector databases for what should be a library call.

mshekow•43m ago
Would wax also be usable as a simple variant of a hybrid search solution? (i.e., not in the context of "agent memory" where knowledge added earlier is worth less than knowledge added more recently)
ckarani•40m ago
Yes—Wax can absolutely be used as a general hybrid search layer, not just an “agent memory” feature.

  It already combines text + vector retrieval and reranking, so you can treat
  remember(...) as ingestion and recall(query:) as search for any document
  corpus.

  It does not natively do “recency decay” (newer beats older) out of the box in
  the core call signature. If you want recency weighting, add timestamps in
  metadata and apply post-retrieval re-scoring or filtering in your app logic
  (or query-time preprocessing).
Ive add this to the backlog, It comes in handy when dealing with time sensitive data. expect a pr this week
owenm•31m ago
Any plans to make it available to other languages via bindings?
anonymoushn•7m ago
ideally users could be banned for posting LLM outputs as if they were authored by humans https://www.pangram.com/history/49335ddf-118d-43e4-9340-a58a...
kleton•1h ago
sqlite_vec is already the sqlite for AI memory
Stefan-H•59m ago
Any chance you went beyond the surface comparison and have thoughts on how the libraries compare in functionality?
ckarani•24m ago
sqlite-vec is a great vector index — Wax actually uses SQLite under the hood too.

The difference is the layer. sqlite-vec gives you vec_distance_cosine() in SQL. Wax gives you: hand it a .mov file, get back token-budgeted, LLM-ready context from keyframes and transcripts, with EXIF-accurate timestamps and hybrid BM25+vector search via RRF fusion — all on-device.

It's the difference between a B-tree and an ORM. You'd still need to write the entire ingestion pipeline, media parsing, frame hierarchy, token counting, and context assembly on top of sqlite-vec. That's what Wax is.

peterloron•27m ago
Looks cool! Thoughts on exposing this through a cli or mcp for local knowledge access for agents? For example, I use Claude Code for research and I have a local corpus of PDFs that I would like to make available as additional domain-specific information that Claude can use in addition to what it has in Opus or whatever model I'm using.
giancarlostoro•22m ago
That's what I'm wondering as well.

Claude Sonnet 4.6

https://www.anthropic.com/news/claude-sonnet-4-6
412•adocomplete•2h ago•346 comments

Using go fix to modernize Go code

https://go.dev/blog/gofix
151•todsacerdoti•3h ago•21 comments

Gentoo on Codeberg

https://www.gentoo.org/news/2026/02/16/codeberg.html
114•todsacerdoti•2h ago•24 comments

GrapheneOS – Break Free from Google and Apple

https://blog.tomaszdunia.pl/grapheneos-eng/
903•to3k•9h ago•595 comments

Show HN: AsteroidOS 2.0 – Nobody asked, we shipped anyway

https://asteroidos.org/news/2-0-release/index.html
17•moWerk•37m ago•3 comments

HackMyClaw

https://hackmyclaw.com/
179•hentrep•3h ago•92 comments

So you want to build a tunnel

https://practical.engineering/blog/2026/2/17/so-you-want-to-build-a-tunnel
80•crescit_eundo•3h ago•28 comments

Async/Await on the GPU

https://www.vectorware.com/blog/async-await-on-gpu/
86•Philpax•3h ago•22 comments

Chess engines do weird stuff

https://girl.surgery/chess
85•admiringly•2h ago•43 comments

Physicists Make Electrons Flow Like Water

https://www.quantamagazine.org/physicists-make-electrons-flow-like-water-20260211/
20•rbanffy•3d ago•0 comments

Show HN: I wrote a technical history book on Lisp

https://berksoft.ca/gol/
80•cdegroot•4h ago•19 comments

I converted 2D conventional flight tracking into 3D

https://aeris.edbn.me/?city=SFO
158•kewonit•5h ago•40 comments

Trata (YC W25) Is Hiring Founding Engineers (NYC)

1•emc329•3h ago

Don't pass on small block ciphers

https://00f.net/2026/02/10/small-block-ciphers/
30•jstrieb•2d ago•8 comments

Is Show HN dead? No, but it's drowning

https://www.arthurcnops.blog/death-of-show-hn/
301•acnops•9h ago•254 comments

Launch HN: Sonarly (YC W26) – AI agent to triage and fix your production alerts

https://sonarly.com/
15•Dimittri•2h ago•1 comments

Discord Rival Gets Overwhelmed by Exodus of Players Fleeing Age-Verification

https://kotaku.com/discord-alternative-teamspeak-age-verification-check-rivals-2000669693
75•thunderbong•2h ago•26 comments

Stephen Colbert says CBS forbid interview of Democrat because of FCC threat

https://arstechnica.com/tech-policy/2026/02/stephen-colbert-says-cbs-forbid-interview-of-democrat...
59•voxadam•41m ago•12 comments

Show HN: 6cy – Experimental streaming archive format with per-block codecs

https://github.com/byte271/6cy
21•yihac1•3h ago•4 comments

Climbing Mount Fuji visualized through milestone stamps

https://fuji.halfof8.com/
24•gessha•2h ago•4 comments

Show HN: Continue – Source-controlled AI checks, enforceable in CI

https://docs.continue.dev
27•sestinj•2h ago•5 comments

Tesla 'Robotaxi' adds 5 more crashes in Austin in a month – 4x worse than humans

https://electrek.co/2026/02/17/tesla-robotaxi-adds-5-more-crashes-austin-month-4x-worse-than-humans/
51•Bender•59m ago•26 comments

Four Column ASCII (2017)

https://garbagecollected.org/2017/01/31/four-column-ascii/
307•tempodox•2d ago•74 comments

Labyrinth Locator

https://labyrinthlocator.org/
26•emigre•3d ago•4 comments

Semantic ablation: Why AI writing is generic and boring

https://www.theregister.com/2026/02/16/semantic_ablation_ai_writing/
160•benji8000•3h ago•139 comments

Show HN: I taught LLMs to play Magic: The Gathering against each other

https://mage-bench.com/
59•GregorStocks•3h ago•49 comments

Hamming Distance for Hybrid Search in SQLite

https://notnotp.com/notes/hamming-distance-for-hybrid-search-in-sqlite/
59•enz•2d ago•10 comments

Show HN: I built a simulated AI containment terminal for my sci-fi novel

https://vertex.flowlogix.ai
22•stevengreser•3h ago•12 comments

Canadians promised to boycott travel to US. They meant it

https://www.usatoday.com/story/travel/2026/02/12/canadian-tourism-us-decline/88632515007/
8•djkivi•20m ago•0 comments

Show HN: Glitchy camera – a circuit-bent camera simulator in the browser

https://glitchycam.com
156•elayabharath•1d ago•21 comments