Hey HN,
I've been working on ARIA Protocol — an open-source P2P network for distributed AI inference using 1-bit quantized models (ternary weights: -1, 0, +1). The key insight: multiplications become additions/subtractions, so any CPU can run LLMs efficiently without a GPU.
Real benchmarks (AMD Ryzen 9 7845HX, 8 threads):
Based on Microsoft Research's BitNet b1.58 (arXiv:2402.17764) and bitnet.cpp. These are natively-trained 1-bit models, not post-training quantization — the quality gap is fundamentally different.
How it works:
Each node contributes CPU cycles to run real AI inference (Proof of Useful Work — every compute cycle produces actual output, zero wasted computation)
Models are sharded across nodes via pipeline parallelism (Node A: layers 0-7, Node B: 8-15, etc.)
Every inference is recorded on a lightweight provenance ledger (not a heavy PoW chain — just timestamped hashes for traceability)
Energy is tracked per-token (Proof of Sobriety) — 70-82% reduction vs GPU-based inference
Explicit consent contracts: you set CPU/RAM limits, time windows, and task types
What's built:
Python backend: 11 modules, 196 passing tests, OpenAI-compatible API
Desktop app (Tauri 2.0 / Electron): 1-click node setup, AI chat, model manager, energy dashboard — 12 languages
Auto-download from HuggingFace, P2P WebSocket mesh with TLS
Full threat model documented (Sybil, Eclipse, MITM mitigations)
Total cost of ownership (3 years, 10M tokens/day): $76 on existing CPU hardware vs $164K on cloud APIs. That's a 2,161x difference.
What's next: Testnet Alpha (v0.6.0) — Kademlia DHT, NAT traversal, Falcon3 1-bit models (1B to 10B from TII Abu Dhabi, which outperform Microsoft's original BitNet at 53.17% vs 51.54% avg accuracy), and public bootstrap nodes.
The bottleneck is memory bandwidth, not compute — 1-bit LUT kernels are memory-bound, which is why CPUs can compete. Optimal at 8 threads regardless of core count.
MIT licensed, fully reproducible benchmarks, no token/crypto component.
GitHub: https://github.com/spmfrance-cloud/aria-protocol
Happy to answer technical questions about the architecture, energy methodology (CPU-time × TDP estimation, not direct measurement — transparency matters), or the P2P consensus design.
anthonymu•1h ago
0.7B model: 89.65 tokens/s — ~11 mJ/token 2.4B model: 36.94 tokens/s — ~27 mJ/token 8.0B model: 15.03 tokens/s — ~66 mJ/token Memory: 10x reduction (2B model: 4.0 GB → 0.4 GB)
Based on Microsoft Research's BitNet b1.58 (arXiv:2402.17764) and bitnet.cpp. These are natively-trained 1-bit models, not post-training quantization — the quality gap is fundamentally different. How it works:
Each node contributes CPU cycles to run real AI inference (Proof of Useful Work — every compute cycle produces actual output, zero wasted computation) Models are sharded across nodes via pipeline parallelism (Node A: layers 0-7, Node B: 8-15, etc.) Every inference is recorded on a lightweight provenance ledger (not a heavy PoW chain — just timestamped hashes for traceability) Energy is tracked per-token (Proof of Sobriety) — 70-82% reduction vs GPU-based inference Explicit consent contracts: you set CPU/RAM limits, time windows, and task types
What's built:
Python backend: 11 modules, 196 passing tests, OpenAI-compatible API Desktop app (Tauri 2.0 / Electron): 1-click node setup, AI chat, model manager, energy dashboard — 12 languages Auto-download from HuggingFace, P2P WebSocket mesh with TLS Full threat model documented (Sybil, Eclipse, MITM mitigations)
Total cost of ownership (3 years, 10M tokens/day): $76 on existing CPU hardware vs $164K on cloud APIs. That's a 2,161x difference. What's next: Testnet Alpha (v0.6.0) — Kademlia DHT, NAT traversal, Falcon3 1-bit models (1B to 10B from TII Abu Dhabi, which outperform Microsoft's original BitNet at 53.17% vs 51.54% avg accuracy), and public bootstrap nodes. The bottleneck is memory bandwidth, not compute — 1-bit LUT kernels are memory-bound, which is why CPUs can compete. Optimal at 8 threads regardless of core count. MIT licensed, fully reproducible benchmarks, no token/crypto component. GitHub: https://github.com/spmfrance-cloud/aria-protocol Happy to answer technical questions about the architecture, energy methodology (CPU-time × TDP estimation, not direct measurement — transparency matters), or the P2P consensus design.