I’ve been working independently on a method that replaces full-transformer inference with a low-rank “meaning field” extracted from internal activations.
The core result: a frozen Llama-3.3-70B can be distilled into a 256-dimensional field representation, giving 224× compression and slightly higher accuracy on several benchmarks. A small student model then learns to directly generate these fields from text, removing the transformer from the inference path.
The Zenodo link contains the full paper, statistical results, and methodology.
A reference implementation (non-optimized) is here: https://github.com/Anima-Core/an1-core
Production variants (AN1-Turbo, FPU work, etc.) are not included.
I’m an outsider to academia so I’m posting this openly to get technical feedback, replication attempts, and critique from people who understand this space.
anima-core•50m ago
The core result: a frozen Llama-3.3-70B can be distilled into a 256-dimensional field representation, giving 224× compression and slightly higher accuracy on several benchmarks. A small student model then learns to directly generate these fields from text, removing the transformer from the inference path.
The Zenodo link contains the full paper, statistical results, and methodology. A reference implementation (non-optimized) is here: https://github.com/Anima-Core/an1-core
Production variants (AN1-Turbo, FPU work, etc.) are not included.
I’m an outsider to academia so I’m posting this openly to get technical feedback, replication attempts, and critique from people who understand this space.