Show HN: PureBee – A software-defined GPU running Llama 3.2 1B at 3.6 tok/SEC

https://github.com/PureBee/purebee

3•benryanx•1h ago

This started as a question about simulation theory: if a GPU is just rules applied to a grid in parallel, do you actually need the silicon?

Turns out, no.

PureBee is a complete GPU defined as a software specification — Memory, Engine, Instruction Set, Runtime. It runs Llama 3.2 1B inference at 3.6 tok/sec on a single CPU core. The model answers questions correctly.

What makes it different from llama.cpp or WebLLM:

The WASM compute kernel is constructed byte-by-byte in JavaScript at runtime. No Emscripten. No Rust. No compiler. No build step. The binary that runs the Q4 SIMD matrix math is itself readable JavaScript. Every layer of the stack — including the thing executing the math — is auditable source.

The progression from first principles:

```

Baseline JS 0.08 tok/sec

Typed arrays 0.21 tok/sec

WASM kernels 0.70 tok/sec

Q4 quantization 1.30 tok/sec

SIMD 3.00 tok/sec

Worker threads 3.60 tok/sec

```

45× total. Single CPU core. Zero npm dependencies.

The claim isn't that this is faster than a real GPU. The claim is that a GPU was never the hardware — it was always the math. The hardware is just one way to run the math fast. PureBee is another way. If that's true, it changes where inference can run.

To run:

```

git clone https://github.com/PureBee/purebee

node download.js llama3

node --max-old-space-size=4096 chat-llama3.js

```

Requires Node.js ≥ 20. The heap flag is not optional.

Licensed FSL-1.1 (converts to Apache 2.0 in 2 years). Free for personal and internal use.

Happy to go deep on the WASM binary construction, the Q4 nibble layout, or the SharedArrayBuffer weight cache that runs a 4.5GB model in 1.8GB of RAM.

Comments

benryanx•1h ago

The part I'd point people to first is ARCHITECTURE.md — specifically the WASM binary construction section. Every other CPU inference project I know of uses Emscripten or a compiled Rust backend. PureBee builds the binary in JavaScript. That's the thing I'd most want challenged if I'm wrong about it being novel.

Fred Trump Jr

Why AI won't wipe out white-collar jobs (YouTube) [video]

I Caught Politico and the New York Times Laundering Pink Slime "News"

The Gap

GNU Octave 11.1.0 Released

Colorado Senate Bill Would Require Apple and Google to Embed ID Checks in OS

Teens see social media as the place to learn about race and faith

Humans Are Becoming Horses [video]

Ask HN: How do I get started in AI?

Want to Survive Current Tech Era – Learn to Be a Good QA

RFC 406i the Rejection of Artificially Generated Slop (Rags)

Signed, Sealed, Stolen: How We Patched Critical Vulnerabilities Under Fire [video]

Simpler JVM Project Setup with Mill

Show HN: AppIconKitchen – Free AI app icon generator I built for indie devs

Amazon Just Confirmed My Worst Fear About the AI Economy

Tesla Drops "Autopilot" Name After Legal Pressure from California

Why Your Load Balancer Still Sends Traffic to Dead Backends

Microsoft's big Xbox leadership shake-up

IOU Wallet – a lightweight protocol to structure informal obligations

Viral Doomsday Report Lays Bare Wall Street's Deep Anxiety About AI Future

Show HN: An AI built a Solana bonding curve platform for music artists (72h)

Show HN: loopmaster Beta – Code Music

Are we entering a Physical Asset supercycle? The pivot from digital to physical

Poor judgment or a principled stand? Susan Rice's spat with Trump dissected

VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

Sam Altman's anti-human worldview

Show HN: Using LLMs and differential testing to convert code

What Is Claude? Anthropic Doesn't Know, Either

5 Ways Libraries Provide Aid to the Homeless

Scientists bottle the sun with liquid battery