Interesting that SparseLoCo held up at 72B scale with permissionless participants. I run distributed inference across multiple machines over Tailscale (M2 Max + RTX 5070 Ti), and even in that controlled setup, network variance is the dominant bottleneck. The fact that they got competitive quality with peers joining and leaving freely on 1.1T tokens is impressive — though I'd love to see how much the blockchain verification overhead actually cost in effective compute utilization.
Kave0ne•1h ago
The Byzantine fault tolerance question here is interesting. With 72B parameters trained across untrusted peers, even a small fraction of malicious nodes could introduce subtle gradient poisoning that degrades model quality in non-obvious ways. Curious how they handle the verification overhead at scale - cryptographic proofs on gradient updates would add significant latency. Is the threat model just Sybil attacks, or also honest-but-curious nodes leaking gradient information?
LuxBennu•1h ago