Tessera is an activation-based protocol that lets trained ML models transfer knowledge to other models across architectures. Instead of dumping weight tensors, it encodes what a model has learnt —
activations, feature representations, behavioural patterns — into self-describing tokens that a receiving model can decode into its own
architecture.
The reference implementation (tessera-core) is a Python/PyTorch library. Current benchmarks show positive transfer across CNN, Transformer, and LSTM pairs. It runs on CPU and the demo finishes in
under 60 seconds.
Happy to answer questions about the protocol design, the wire format, or the benchmark methodology.
0xecro1•2h ago
kirkmaddocks•2h ago
Mode A (activation transfer) operates at the representation level, not the parameter level. The source model's knowledge gets projected into a 2048-dim hub space — the receiving model doesn't need to match architecturally or in precision. A 200M FP32 training model and a 5M INT8 edge model can both have UHS encoders/decoders. The hub space is agnostic to what's underneath.
Mode B (behavioural) is probably the most interesting path for your use case. It transfers decision boundaries rather than activations or weights. If the quantised model can reproduce the input-output mapping, internal precision is irrelevant.
It's similar in spirit to distillation but decoupled through the hub space — teacher and student don't need to be online simultaneously, and you get a full audit trail of what knowledge went where (which matters if you're shipping medical/industrial edge models under EU AI Act).
The gap today is the decoder side. DecoderMLP outputs FP32. We'd need a quantisation-aware variant that respects the INT8 grid — straight-through estimator at minimum, learned quantisation boundaries ideally. We'd also want empirical drift characterisation across FP32→FP16→INT8→INT4 so you'd know your expected fidelity floor for a given target.
The swarm angle is where it gets genuinely useful for edge fleets. If you've got N devices training locally on-site data, they contribute quantised-model tokens back to a full-precision aggregator. The robust aggregation strategy (Huber-style cosine clipping) handles quantisation noise across heterogeneous devices naturally.
We're planning a quantisation-aware transfer module next. If you're interested in testing against real Cortex-A INT8 workloads, we'd welcome the collaboration — repo is at github.com/incocreativedev/tessera-core.