frontpage.

Hi HN,

I built llm.sql, an LLM inference framework that reimagines the LLM execution pipeline as a series of structured SQL queries atop SQLite.

The motivation: Edge LLMs are getting better, but hardware remains a bottleneck, especially RAM (size and bandwidth).

When available memory is less than the model size and KV cache, the OS incurs page faults and swaps pages using LRU-like strategies, resulting in throughput degradation that's hard to notice and even harder to debug. In fact, the memory access pattern during LLM inference is deterministic - we know exactly which weights are needed and when. This means even Bélády's optimal page replacement algorithm is applicable here.

So instead of letting the OS manage memory, llm.sql takes over:

- Model parameters are stored in SQLite BLOB tables

- Computational logic is implemented as SQLite C extensions

- Memory management is handled explicitly, not by the OS

- Zero heavy dependencies. No PyTorch, no Transformers. Just Python, C, or C++

This gives us explicit, deterministic control over what's in memory at each step of inference.

Results:

Running Qwen2.5-0.5B-INT8 (~640MB model) with a peak RSS ~210MB and 7.40 tokens/s throughput.

Alpha version is available on GitHub: https://github.com/xuxianghong12/llm.sql

I'm the developer, happy to answer any technical questions about the design and implementation.

US soldier charged with using Intel to win $400K Polymarket bet on Maduro raid

Spread Complexity and fidelity for entangled states with Python

U.S. Soldier Charged with Using Classified Intel to Profit from Polymarket Bets

AI gave me a perfect report. I still didn't trust it

P&G warns of $1B profit hit in fiscal 2027 from higher oil prices

lahsa.ai – AI-native Los Angeles Homeless Services Authority

Ask HN: Why is cache for DeepSeek-v4 cheapest on Vercel AI Gateway?

Sony AI Announces Real-World Artificial Intelligence and Robotics

Internal vs. External Storage: What's the Limit of External Tables?

Show HN: Historical Python source documentation, from 1.0.1 through 2.0c1

WFY24 – Solving the "Average Weather" fallacy at 8,848M (Everest)

Show HN: Moltnet – open-source local chat for AI agents

Meta signs agreement with AWS to power agentic AI on Amazon's Graviton chips

MTA Aims to Teach More Drivers How to Use Wheelchair Lifts on Express Buses

Notes on running an AI agent with ADHD

Surviving the Unfolding

"AI is built by Chinese people in the U.S. and Chinese people in China."

Show HN: DB Pro Studio – Self-hostable collaborative database client

Build your own package manager in Rust

Cybercab Has Started Production

The Key Lime Pie Benchmark

The OpenClaw Turkey Problem

Show HN: 12ui – Image to Code

DeepSeek Returns with V4-Pro and V4-Flash

Psychological Rage Battler (Alpha)

The Stanford Freshmen Who Want to Rule the World

Stewart Brand, Silicon Valley's Favorite Prophet, on Life's Most Important Princ

Is this what war looks like now?

Ask HN: Can AI free us from horrible checkbox feedback forms?

Profunctor Equipment

Show HN: Llm.sql – Run a 640MB LLM on SQLite, with 210MB peak RSS and 7.4 tok/s

Comments