ARIA is a peer-to-peer protocol for running 1-bit quantized LLMs (ternary weights: -1, 0, +1) on ordinary CPUs. No GPU needed.
We benchmarked on a Ryzen 9: 89.65 t/s for 0.7B params, 36.94 t/s for 2.4B, 15.03 t/s for 8B — all on CPU, at ~28 mJ/token (99.5% less energy than GPU inference).
Key design choices: WebSocket-based P2P with pipeline parallelism for model sharding across nodes. Provenance ledger records every inference immutably. Proof of Useful Work replaces wasteful hash mining — the "mining" is the inference itself. Consent contracts ensure no resource is used without explicit permission.
Drop-in OpenAI-compatible API. ~5,800 lines Python, MIT licensed, 102 tests passing.