Apple: Embarrassingly Simple Self-Distillation Improves Code Generation

https://arxiv.org/abs/2604.01193

83•Anon84•1h ago

Comments

jofzar•34m ago

> simple self-distillation (SSD):

Sorry apple, SSD is already taken, you can't use that acronym.

ape4•30m ago

ATT=All TLAs are Taken

love2read•28m ago

You're right, I offer these alternatives:

Consistency Preservation Update (CPU)

Guided Probability Update (GPU)

History-aware Distillation Driving (HDD)

Probability Smoothing Update (PSU)

dist-epoch•30m ago

A heuristics I have lately: if more then half of the authors name on an AI paper are Chinese, it's worth reading. Works as a filter too: you don't lose much skipping papers with mostly non Chinese sounding author names.

100 years ago most scientific papers were written in German. I wonder when the switch to Chinese will happen.

https://en.wikipedia.org/wiki/Languages_of_science

0x3f•28m ago

That's... almost every AI paper.

amelius•20m ago

"Made in China, designed by Apple in California"

should be:

"Made in China, designed by Chinese people in California"?

ptidhomme•18m ago

I used to have the opposite rule in my signal processing field : the more Chinese names, the less innovation was there.

They seemed like they had to be churning out papers and any little adaptation to existing research triggered a new publication.

But it may have changed now.

avaer•13m ago

I definitely pay more attention to papers affiliated with Chinese companies; the economics seem to be more conducive to doing good academic work and publishing it. I would say the same for companies like Apple (where TFA came from).

But to filter based on author's names sounds pretty darn racist.

0x3f•29m ago

Haven't read the paper yet, but it is interesting how seemingly simple many breakthroughs in ML are. Even transformers are like that. Maybe it's hindsight bias.

I suppose we just don't have a deeper underlying theory to lean on and help us 'design' anything.

khalic•27m ago

Incredible, will translate to better coding models in the near future.

We really need to develop better tools to understand what's happening inside these NNs. Working with high-D spaces is not something we're good at, and we're basically throwing stuff at it and seeing if it sticks.

politelemon•22m ago

It's cringe worthy to see that the original paper itself is editorialised.

Title should be: Simple Self-Distillation Improves Code Generation

StevenWaterman•15m ago

"Embarrassingly" has a history as a technically meaningful word roughly equivalent to "maximally", see "Embarrassingly parallel"

https://en.wikipedia.org/wiki/Embarrassingly_parallel

Aurornis•11m ago

The phrase embarrassingly parallel has a history in computer science.

Many computer science paper titles allude to past titles in other CS papers.

Calling it “cringe worthy” is unnecessarily mean. There is context and history you don’t understand.

ape4•20m ago

Shouldn't a scientific paper be using metric units (like 30T) rather than 30B

roger_•20m ago

Skimmed this but don't have an intuitive understanding of why this works and how temperature and truncation factor in.

bensyverson•16m ago

Really fascinating how this works; it's basically context-aware decoding. From the paper:

> Code interleaves fork positions, where several continuations are genuinely plausible and may correspond to different solution approaches, with lock positions, where syntax and semantics leave little ambiguity but a low-probability distractor tail still remains… The best global decoding setting is therefore necessarily a compromise; we call this tension the precision-exploration conflict.

In other words, just like us, the model needs to shift from "exploration" in "fork" mode (divergent thinking to produce a creative solution) to "precision" in "lock" mode (producing syntactically correct code).

What this paper shows is that their simple technique (SSD) can improve the ranking of optimal tokens in both lock and fork positions, meaning the model is more likely to explore when it should be exploring, and more likely to be precise when it needs to be.

I love that we're still learning the emergent properties of LLMs!

wg0•7m ago

After TurboQuant and Gemma 4, came across the following video[0] running Gemma on local machine at 50 token/second.

That already looks like Sonnet 3x and 4 level capabilities to me where the model in question (Gemma 4) set ups whole python project with a UI and installs python libraries using uv etc.

Add this Simple Self Distillation to the picture and by 2028 I see cheaper coding model providers with much more generous usage limits in the future and power users would be mostly running their own models anyway.

Anyone using these models as "non-deterministic transpilers" from natural language to code (experienced engineers who can write code themselves) would probably not be paying to any AI providers.

[0] https://www.youtube.com/watch?v=-_hC-C_Drcw

smallerize•5m ago

I don't suppose they published the improved models?

How good is this DRAM cycle?

France, Russia and China block UN vote on Iran war

Show HN: A Vim plugin to search DuckDuckGo – directly from command mode (FOSS)

LLMs audit code from the same blind spot they wrote it from. Here's the fix

The Asiyah Protocol: Ethics Toward AI Under Uncertainty

Ask HN: Small LM or API?

Tesla confirms Model S and Model X production is over – only ~600 left

How did you learn Google and Hadoop File System?

Artemis Mission Tracker – Live Orion Spacecraft Position

Turing Machines and Formal Computation

Jack Dorsey says Block employees now bring prototypes, not slides, to meetings

AoBoy

The Innocence Tax: The Cost of Proving You're Human

Delx: AI therapist for AI agents, informed by Anthropic's emotion research

How to Back Up Your Digital Life (2026)

Show HN: Clusterflock: An AI orchestrator for networked hardware

Stanford CS 153 2026: Frontier Systems [video]

Show HN: I successfully failed at one-shot-ing a video codec like h.264

Show HN: Pluck – Copy any UI from any website, paste it into AI coding tools

Show HN: I made a tool that helps you find verifiably 'white space' products

DeepFocus-BP: SOTA NLP Confirmed! Fail Complete CNN. NLP SOTA LESS 66% FLOPs.

Known but clever approach to know how much your performance can be

Don't You Think Your AI Is Too Optimistic?

Living Brain Cells Enable Machine Learning Computations

YouTube playables games save data is just plain JSON and you can edit it

Dev Tool

I attacked myself with Google Spreadsheets (2012)

The CMS is dead. Long live the CMS

Tesla Is Sitting on a Record 50k Unsold EVs

Rendering arbitrary-scale emojis using the Slug algorithm