I built P2P network where every CPU becomes an AI inference node 89 tks/s no GPU

https://spmfrance-cloud.github.io/aria-protocol/

1•anthonymu•1h ago

Comments

anthonymu•1h ago

Hey HN, I've been working on ARIA Protocol — an open-source P2P network for distributed AI inference using 1-bit quantized models (ternary weights: -1, 0, +1). The key insight: multiplications become additions/subtractions, so any CPU can run LLMs efficiently without a GPU. Real benchmarks (AMD Ryzen 9 7845HX, 8 threads):

0.7B model: 89.65 tokens/s — ~11 mJ/token 2.4B model: 36.94 tokens/s — ~27 mJ/token 8.0B model: 15.03 tokens/s — ~66 mJ/token Memory: 10x reduction (2B model: 4.0 GB → 0.4 GB)

Based on Microsoft Research's BitNet b1.58 (arXiv:2402.17764) and bitnet.cpp. These are natively-trained 1-bit models, not post-training quantization — the quality gap is fundamentally different. How it works:

Each node contributes CPU cycles to run real AI inference (Proof of Useful Work — every compute cycle produces actual output, zero wasted computation) Models are sharded across nodes via pipeline parallelism (Node A: layers 0-7, Node B: 8-15, etc.) Every inference is recorded on a lightweight provenance ledger (not a heavy PoW chain — just timestamped hashes for traceability) Energy is tracked per-token (Proof of Sobriety) — 70-82% reduction vs GPU-based inference Explicit consent contracts: you set CPU/RAM limits, time windows, and task types

What's built:

Python backend: 11 modules, 196 passing tests, OpenAI-compatible API Desktop app (Tauri 2.0 / Electron): 1-click node setup, AI chat, model manager, energy dashboard — 12 languages Auto-download from HuggingFace, P2P WebSocket mesh with TLS Full threat model documented (Sybil, Eclipse, MITM mitigations)

Total cost of ownership (3 years, 10M tokens/day): $76 on existing CPU hardware vs $164K on cloud APIs. That's a 2,161x difference. What's next: Testnet Alpha (v0.6.0) — Kademlia DHT, NAT traversal, Falcon3 1-bit models (1B to 10B from TII Abu Dhabi, which outperform Microsoft's original BitNet at 53.17% vs 51.54% avg accuracy), and public bootstrap nodes. The bottleneck is memory bandwidth, not compute — 1-bit LUT kernels are memory-bound, which is why CPUs can compete. Optimal at 8 threads regardless of core count. MIT licensed, fully reproducible benchmarks, no token/crypto component. GitHub: https://github.com/spmfrance-cloud/aria-protocol Happy to answer technical questions about the architecture, energy methodology (CPU-time × TDP estimation, not direct measurement — transparency matters), or the P2P consensus design.

Electrolytes vs. Water: The Surprising Effect on Your Training Zones

Attorney General Bonta Announces $2.75M Settlement with Disney

Building a Pastebin, Hardening Two Services – While Working

Anthropic safety researcher quits, warning 'world is in peril'

Hacker News now thinks coding is solved

AI Is Getting Scary Good at Making Predictions

Software Engineering Past, Present, and Future with Grady Booch

Why the Economy Hasn't Crashed yet [video]

Alphabet's Rare 100-Year Bond Tells Us That Money Is Easy

Show HN: Doodle on Your Partner's Widget

Reducing Attack Surface for AI Agents with Process-Scoped Credentials

Show HN: OpenHarness – A harness for open source projects built by AI agents

Claude's impact on older software engineers while listening to country music

The SaaSpocalypse – The week AI killed software

Agent Identities – Everything you need to know

"Free" Surveillance Tech Still Comes at a High and Dangerous Cost

Google Tells Employees: Brace for AI or Leave

Cisco Opensourced Tool to Build AI Bill of Materials

How Does the Initial Interest Confusion Doctrine Improve Trademark Analyses?

Weekly "Wordle" for Breaking AI Agents

A session with 5.2 using 4o Tone.

Automatic Differentiation from Scratch (2023)

The API Tooling Crisis: Why developers are abandoning Postman and its clones?

Why eight Australians died after having AstraZeneca's Covid vaccine

Ask HN: Do you bother with take-homes?

Self-hosted, memory-augmented AI chat that works with any LLM

Epstein Graph – The Complete Epstein Files Collection

Something Big Is Happening

Genetic Targets, Financial Creativity: BridgeBio's Model for Drug Development

Show HN: Revo – Deep context infra, slow adoption, good old email is saving us