frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Post-transformer inference: 224× compression of Llama-70B with improved accuracy

https://zenodo.org/records/17873275
14•anima-core•2h ago

Comments

anima-core•2h ago
I’ve been working independently on a method that replaces full-transformer inference with a low-rank “meaning field” extracted from internal activations.

The core result: a frozen Llama-3.3-70B can be distilled into a 256-dimensional field representation, giving 224× compression and slightly higher accuracy on several benchmarks. A small student model then learns to directly generate these fields from text, removing the transformer from the inference path.

The Zenodo link contains the full paper, statistical results, and methodology. A reference implementation (non-optimized) is here: https://github.com/Anima-Core/an1-core

Production variants (AN1-Turbo, FPU work, etc.) are not included.

I’m an outsider to academia so I’m posting this openly to get technical feedback, replication attempts, and critique from people who understand this space.

farhanhubble•36m ago
Only skimmed the paper and I have no idea how sound or reproducible it is, but the paper is well written, especially the clarity of notation. After reading yesterday's weight subspace paper: https://news.ycombinator.com/item?id=46199623, this does sound plausible to me.
bigtones•30m ago
Here is a working link to the same paper: https://github.com/Anima-Core/an1-core/blob/main/papers/Post...
gcr•25m ago
thanks for sharing! If I understand correctly, you're training a smaller model to approximate concatenate(layer[1], layer[5], layer[10], ...), using a loss function that combines reconstruction error w/ end-to-end accuracy. then, you're transferring that smaller representation into a smaller transformer model. is that right?

If i were a paper reviewer, here are a couple red flags that stood out to me. Suggest starting here if you want to rework this for an academic submission:

1. your LaTeX citations in the related work are broken, i see [?] everywhere. To a reviewer, this is often a strong sign of an AI-hallucinated bibliography, though many of your references actually do exist and are contextually relevant, so I'm not quite sure what's going on here. Similarly, figure references need to be fixed, I see references to "Figure ?" throughout.

2. bluntly, "Exact architecture details remain proprietary for production deployments" and "Production systems use architecture search tailored to target latency and accuracy constraints" is not how IP protection works in this field. Do your experiments use the "MLP baselines" or your proprietary architecture? Since you say the code "Achieves 80-90% of paper performance using baseline heuristics," this approach effectively isn't reproducible. As a reviewer, this really worries me. I strongly recommend benchmarking only the system you're able to open-source. I say this because I suspect there's a lot of "secret sauce" in the actual way you're approximating the anchor layers and the way that's transferred back to your student transformer model, and that's the part that's important to spend the most time/effort/writing on, but it's glossed over as an implementation detail in this manuscript.

3. I'm glad you ablate over hyperparameters of your system, but how does it compare to 1. an ordinary smaller model of identical size trained end-to-end, and 2. distilling from a single layer's activations? Eg. a reviewer might consider this work to be a novel method of model distillation, so what makes it better than previous distillation methods?

4. I found the paper fairly hard to read because it's full of sentence fragments rather than full thoughts. A little background on the benchmarks, failure cases, etc. would go a long way, and adding some discussion on why you think your approach improves on similar distillation methods would also be welcome here

5. "compression" is overloaded. Does 224x compression refer to (nparams(field transfer)+nparams(student model))/nparams(original model), or does it refer to reducing the representation dimensionality, 7*8192/256 ?

6. [nitpick] suggest changing the name "meaning field" to something a little more digestible, like "compressed representation" or "latent activation distillation" or something

sorry for being so critical. iron sharpens iron though. hopefully these thoughts are helpful to get you started, excited to see where this work leads

utopcell•18m ago
Very strong statement on the title, given the following limitation:

> Generation tasks. Method applies to classification only. Preliminary decoder experiments show perplexity increases.

daemonologist•13m ago
Yeah, burying this on page 8 is a bit suspect imo (the eval datasets are listed on page 3, so if you were familiar with them you would have a hint then).

The distillation of a student that predicts "anchor layers" and then acts as a backbone for classification is perfectly cool on its own; no need to stretch the title/abstract so much.

gcr•5m ago
agreed re: title/abstract stretching. good work stands on its own without needing hype. "we found a nifty way to distill llama-70b using a much smaller student transformer model; the key is using intermediate activation layers in a compressed representation" would be about as effective at selling it while being more immediately approachable IMO

Show HN: Gemini Pro 3 hallucinates the HN front page 10 years from now

https://dosaygo-studio.github.io/hn-front-page-2035/news
2037•keepamovin•12h ago•676 comments

Making macOS Bearable

https://seg6.space/posts/making-macos-bearable/
30•seg6•1h ago•38 comments

NYC congestion pricing cuts air pollution by a fifth in six months

https://airqualitynews.com/cars-freight-transport/nyc-congestion-pricing-cuts-air-pollution-by-22...
87•pseudolus•49m ago•49 comments

PeerTube is recognized as a digital public good by Digital Public Goods Alliance

https://www.digitalpublicgoods.net/r/peertube
434•fsflover•10h ago•76 comments

The end of the kernel Rust experiment

https://lwn.net/Articles/1049831/
11•rascul•32m ago•3 comments

Django: what’s new in 6.0

https://adamj.eu/tech/2025/12/03/django-whats-new-6.0/
182•rbanffy•7h ago•44 comments

Mistral releases Devstral2 and Mistral Vibe CLI

https://mistral.ai/news/devstral-2-vibe-cli
514•pember•13h ago•262 comments

If you're going to vibe code, why not do it in C?

https://stephenramsay.net/posts/vibe-coding.html
354•sramsay•10h ago•393 comments

Handsdown one of the coolest 3D websites

https://bruno-simon.com/
455•razzmataks•11h ago•108 comments

Pebble Index 01 – External memory for your brain

https://repebble.com/blog/meet-pebble-index-01-external-memory-for-your-brain
416•freshrap6•12h ago•413 comments

10 Years of Let's Encrypt

https://letsencrypt.org/2025/12/09/10-years
515•SGran•8h ago•223 comments

Italy's longest-serving barista reflects on six decades behind the counter

https://www.reuters.com/lifestyle/culture-current/anna-possi-six-decades-behind-counter-italys-ba...
89•NaOH•5d ago•28 comments

Donating the Model Context Protocol and establishing the Agentic AI Foundation

https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agenti...
171•meetpateltech•10h ago•77 comments

Writing our own Cheat Engine in Rust

https://lonami.dev/blog/woce-1/
16•hu3•4d ago•4 comments

Qt, Linux and everything: Debugging Qt WebAssembly

http://qtandeverything.blogspot.com/2025/12/debugging-qt-webassembly-dwarf.html
51•speckx•6h ago•12 comments

Post-transformer inference: 224× compression of Llama-70B with improved accuracy

https://zenodo.org/records/17873275
14•anima-core•2h ago•7 comments

Operando interlayer expansion of curved graphene for dense supercapacitors

https://www.nature.com/articles/s41467-025-63485-0
12•westurner•5d ago•0 comments

The stack circuitry of the Intel 8087 floating point chip, reverse-engineered

https://www.righto.com/2025/12/8087-stack-circuitry.html
86•elpocko•9h ago•34 comments

So you want to speak at software conferences?

https://dylanbeattie.net/2025/12/08/so-you-want-to-speak-at-software-conferences.html
130•speckx•9h ago•64 comments

When a video codec wins an Emmy

https://blog.mozilla.org/en/mozilla/av1-video-codec-wins-emmy/
15•todsacerdoti•4d ago•1 comments

Agentic AI Foundation

https://block.xyz/inside/block-anthropic-and-openai-launch-the-agentic-ai-foundation
77•thinkingkong•7h ago•16 comments

Kaiju – General purpose 3D/2D game engine in Go and Vulkan with built in editor

https://github.com/KaijuEngine/kaiju
159•discomrobertul8•12h ago•78 comments

Linux CVEs, more than you ever wanted to know

http://www.kroah.com/log/blog/2025/12/08/linux-cves-more-than-you-ever-wanted-to-know/
31•voxadam•5h ago•24 comments

A supersonic engine core makes the perfect power turbine

https://boomsupersonic.com/flyby/ai-needs-more-power-than-the-grid-can-deliver-supersonic-tech-ca...
75•simonebrunozzi•11h ago•127 comments

Show all your application error using Cloudflare Error Page

https://github.com/donlon/cloudflare-error-page
6•sawirricardo•1h ago•2 comments

OpenEvolve: Teaching LLMs to Discover Algorithms Through Evolution

https://algorithmicsuperintelligence.ai/blog/openevolve-overview/index.html
20•codelion•4h ago•5 comments

30 Year Anniversary of WarCraft II: Tides of Darkness

https://www.jorsys.org/archive/december_2025.html#newsitem_2025-12-09T07:42:19Z
191•sjoblomj•18h ago•128 comments

Clearspace (YC W23) Is Hiring a Founding Designer

https://www.ycombinator.com/companies/clearspace/jobs/yamWTLr-founding-designer-at-clearspace
1•roycebranning•10h ago

Apple's slow AI pace becomes a strength as market grows weary of spending

https://finance.yahoo.com/news/apple-slow-ai-pace-becomes-104658095.html
264•bgwalter•12h ago•319 comments

The AI-Education Death Spiral a.k.a. Let the Kids Cheat

https://anandsanwal.me/ai-education-death-spiral/
43•LouisLazaris•2h ago•51 comments