Author here, I've been collecting historical computing documentation for a few years and found Brusentsov's balanced ternary research from Moscow State University (1958-1965). Applied it to modern transformers.
Some interesting results:
93.8% energy reduction per inference,
16x memory compression (7B model: 28GB → 1.75GB),
Zero floating-point multiplication,
Runs on CPUs, no GPU required and
Architectural epistemic uncertainty (it won't hallucinate what it doesn't know)
ZaneHam•1h ago
Some interesting results:
93.8% energy reduction per inference, 16x memory compression (7B model: 28GB → 1.75GB), Zero floating-point multiplication, Runs on CPUs, no GPU required and Architectural epistemic uncertainty (it won't hallucinate what it doesn't know)
Repo: https://github.com/Zaneham/Ternary_inference
Happy to answer questions :-) Happy holidays and merry christmas!