Verifiable ML Without Determinism: Tolerance-Aware Optimistic Verification

1•cnyaojz•2h ago

Comments

cnyaojz•2h ago

Modern ML is mostly run on other people's hardware (clouds, inference marketplaces, heterogeneous GPU fleets). Today, you can't verify what actually ran: which model, what inputs, or whether your result was silently downgraded (model swap, early exit, quantization, ad-embedding tweaks). The usual fix (recompute and compare) breaks because GPUs are nondeterministic: IEEE-754 is non-associative, kernels reorder reductions, and thread schedules/atomics vary. Bitwise replay at scale is brittle and costly.

We built a system that makes ML results verifiable without requiring determinism. Instead of demanding exact equality, we verify outputs up to principled, per-operator error bounds and resolve disagreements with an optimistic, Merkle-anchored dispute game.

What's inside (precise + concise):

Semantics: "Tolerance-aware correctness" for tensor programs and operator-specific acceptance regions induced by IEEE-754 rounding.

Two error models:

Theoretical IEEE-754 bounds (sound, per-operator, element-wise; conservative but cheap to verify).

Empirical percentile thresholds calibrated across GPUs (tight, model-specific).

Dispute protocol: If a result is challenged, we recursively partition the traced graph (Merkle proofs) until a single operator. At the leaf, we either (i) certify with the theoretical bound or (ii) run a small committee vote against the empirical threshold. Violations are slashed.

Runtime + coordinator: PyTorch-compatible runtime that traces graphs, computes bounds on the fly, records subgraph I/O, and emits/validates commitments; a lightweight coordination layer (we prototyped on Ethereum for authenticated logs/bonds, but the protocol doesn't depend on a blockchain). Overhead ~0.3% latency on Qwen3-8B; no extra memory beyond native subgraph execution.

Scope: Designed for the open-model setting (public weights/graphs enable subgraph extraction and thresholding). Closed-model APIs can still expose commitments to permissioned verifiers. No TEEs required; no deterministic kernels needed.

We’d love feedback on blind spots in the threat model, better partition policies for the dispute game, and where empirical thresholds might be gamed or leak information.

Microplastics May Be Tied to Vascular Dementia Cases

The Attention Economy Is Everywhere. Self-Hosting Is the Escape [video]

Can we find a good definition of AGI?

Regulation Isn't the European Trap – Resignation Is

Servo 0.0.1 Release

How to Launch a Secure DeFi Protocol in 120 Days

Binary Retrieval-Augmented Reward Mitigates Hallucinations

Server rendering benchmarks: Railway vs. Cloudflare vs. Vercel

Contexttree

Oakley Meta Vanguard Has Landed

The FSFE defends interoperability from Apple at the EU's highest court

I made a website of prompts meant to be generated only in your human imagination

LightlyStudio – an open-source multimodal data curation and labeling tool

Unrealistic depreciation feeding the AI bubble

Google Calendar's mobile web app hasn't changed since 2010

Model Convo: Rebecca Lowe (AI for philosophers, dystopia for utopians)

The Learning Curve Moat in Data Systems

How Trump Is Using Generated Imagery to Attack Enemies and Rouse Supporters

Forget SEO. Welcome to the World of Generative Engine Optimization

Spin LLM questions, get better answers

Building Instructions

Public Trust Demands Open-Source Voting Systems

Gitpod is now Ona, moving beyond the IDE

Is Sora the Beginning of the End for OpenAI?

Unify Your Security Stack with Socket Basics

Microspeak: The Hockey Stick on Wheels

Tokeko: Interactive LR parsers visualization to learn compilers

Show HN: Gnoke Station – A WebOS for Industrial and IoT Dashboards

Show HN: FastQR – A Fast QRCode Generator Supporting Batch Processing

Amazon's Silent Sacking (2023)