frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Apple: Embarrassingly Simple Self-Distillation Improves Code Generation

https://arxiv.org/abs/2604.01193
83•Anon84•1h ago

Comments

jofzar•34m ago
> simple self-distillation (SSD):

Sorry apple, SSD is already taken, you can't use that acronym.

ape4•30m ago
ATT=All TLAs are Taken
love2read•28m ago
You're right, I offer these alternatives:

Consistency Preservation Update (CPU)

Guided Probability Update (GPU)

History-aware Distillation Driving (HDD)

Probability Smoothing Update (PSU)

dist-epoch•30m ago
A heuristics I have lately: if more then half of the authors name on an AI paper are Chinese, it's worth reading. Works as a filter too: you don't lose much skipping papers with mostly non Chinese sounding author names.

100 years ago most scientific papers were written in German. I wonder when the switch to Chinese will happen.

https://en.wikipedia.org/wiki/Languages_of_science

0x3f•28m ago
That's... almost every AI paper.
amelius•20m ago
So

"Made in China, designed by Apple in California"

should be:

"Made in China, designed by Chinese people in California"?

ptidhomme•18m ago
I used to have the opposite rule in my signal processing field : the more Chinese names, the less innovation was there.

They seemed like they had to be churning out papers and any little adaptation to existing research triggered a new publication.

But it may have changed now.

avaer•13m ago
I definitely pay more attention to papers affiliated with Chinese companies; the economics seem to be more conducive to doing good academic work and publishing it. I would say the same for companies like Apple (where TFA came from).

But to filter based on author's names sounds pretty darn racist.

0x3f•29m ago
Haven't read the paper yet, but it is interesting how seemingly simple many breakthroughs in ML are. Even transformers are like that. Maybe it's hindsight bias.

I suppose we just don't have a deeper underlying theory to lean on and help us 'design' anything.

khalic•27m ago
Incredible, will translate to better coding models in the near future.

We really need to develop better tools to understand what's happening inside these NNs. Working with high-D spaces is not something we're good at, and we're basically throwing stuff at it and seeing if it sticks.

politelemon•22m ago
It's cringe worthy to see that the original paper itself is editorialised.

Title should be: Simple Self-Distillation Improves Code Generation

StevenWaterman•15m ago
"Embarrassingly" has a history as a technically meaningful word roughly equivalent to "maximally", see "Embarrassingly parallel"

https://en.wikipedia.org/wiki/Embarrassingly_parallel

Aurornis•11m ago
The phrase embarrassingly parallel has a history in computer science.

Many computer science paper titles allude to past titles in other CS papers.

Calling it “cringe worthy” is unnecessarily mean. There is context and history you don’t understand.

ape4•20m ago
Shouldn't a scientific paper be using metric units (like 30T) rather than 30B
roger_•20m ago
Skimmed this but don't have an intuitive understanding of why this works and how temperature and truncation factor in.
bensyverson•16m ago
Really fascinating how this works; it's basically context-aware decoding. From the paper:

> Code interleaves fork positions, where several continuations are genuinely plausible and may correspond to different solution approaches, with lock positions, where syntax and semantics leave little ambiguity but a low-probability distractor tail still remains… The best global decoding setting is therefore necessarily a compromise; we call this tension the precision-exploration conflict.

In other words, just like us, the model needs to shift from "exploration" in "fork" mode (divergent thinking to produce a creative solution) to "precision" in "lock" mode (producing syntactically correct code).

What this paper shows is that their simple technique (SSD) can improve the ranking of optimal tokens in both lock and fork positions, meaning the model is more likely to explore when it should be exploring, and more likely to be precise when it needs to be.

I love that we're still learning the emergent properties of LLMs!

wg0•7m ago
After TurboQuant and Gemma 4, came across the following video[0] running Gemma on local machine at 50 token/second.

That already looks like Sonnet 3x and 4 level capabilities to me where the model in question (Gemma 4) set ups whole python project with a UI and installs python libraries using uv etc.

Add this Simple Self Distillation to the picture and by 2028 I see cheaper coding model providers with much more generous usage limits in the future and power users would be mostly running their own models anyway.

Anyone using these models as "non-deterministic transpilers" from natural language to code (experienced engineers who can write code themselves) would probably not be paying to any AI providers.

[0] https://www.youtube.com/watch?v=-_hC-C_Drcw

smallerize•5m ago
I don't suppose they published the improved models?

How good is this DRAM cycle?

https://xcancel.com/lfg_cap/status/2040113414555472139
1•ironyman•8m ago•0 comments

France, Russia and China block UN vote on Iran war

https://newsukraine.rbc.ua/news/france-russia-and-china-block-un-vote-on-1775166736.html
1•vrganj•9m ago•0 comments

Show HN: A Vim plugin to search DuckDuckGo – directly from command mode (FOSS)

https://github.com/digitalby/ddg-vim
2•vashchylau•10m ago•0 comments

LLMs audit code from the same blind spot they wrote it from. Here's the fix

1•brodeurmartin•10m ago•0 comments

The Asiyah Protocol: Ethics Toward AI Under Uncertainty

https://github.com/thansz137/asiyah-protocol
1•thansz•13m ago•1 comments

Ask HN: Small LM or API?

1•ostefani•17m ago•0 comments

Tesla confirms Model S and Model X production is over – only ~600 left

https://electrek.co/2026/04/01/tesla-model-s-x-production-over-only-inventory-left/
2•taubek•17m ago•0 comments

How did you learn Google and Hadoop File System?

1•shivajikobardan•20m ago•0 comments

Artemis Mission Tracker – Live Orion Spacecraft Position

https://issinfo.net/artemis
1•mpweiher•20m ago•0 comments

Turing Machines and Formal Computation

https://max-amb.github.io/blog/an_introduction_to_turing_machines_and_computation/
1•max-amb•20m ago•0 comments

Jack Dorsey says Block employees now bring prototypes, not slides, to meetings

https://www.businessinsider.com/block-ceo-jack-dorsey-bring-prototypes-not-slide-decks-meetings-2...
3•taubek•21m ago•0 comments

AoBoy

1•aoboy•23m ago•0 comments

The Innocence Tax: The Cost of Proving You're Human

https://www.wanderingwonderingstar.com/p/undertow-004-the-innocence-tax
3•jlzsignal•28m ago•0 comments

Delx: AI therapist for AI agents, informed by Anthropic's emotion research

https://delx.ai
2•davidmosiah•32m ago•0 comments

How to Back Up Your Digital Life (2026)

https://www.wired.com/story/how-to-back-up-your-digital-life/
4•swq115•34m ago•0 comments

Show HN: Clusterflock: An AI orchestrator for networked hardware

2•notum•34m ago•0 comments

Stanford CS 153 2026: Frontier Systems [video]

https://www.youtube.com/watch?v=mZqh7emiz9Q
2•walterbell•34m ago•3 comments

Show HN: I successfully failed at one-shot-ing a video codec like h.264

https://github.com/DheerG/libsinter
1•bushido•38m ago•0 comments

Show HN: Pluck – Copy any UI from any website, paste it into AI coding tools

https://www.pluck.so/
1•bring-shrubbery•38m ago•1 comments

Show HN: I made a tool that helps you find verifiably 'white space' products

https://www.nichescout.pro/
1•MoOk-OSC•40m ago•1 comments

DeepFocus-BP: SOTA NLP Confirmed! Fail Complete CNN. NLP SOTA LESS 66% FLOPs.

https://zenodo.org/records/19415887
1•sunbagger•43m ago•0 comments

Known but clever approach to know how much your performance can be

https://www.youtube.com/watch?v=rTH7fHrEskk
1•manishfoodtechs•46m ago•0 comments

Don't You Think Your AI Is Too Optimistic?

https://markhuang.ai/blog/dont-you-think-your-ai-is-too-optimistic
1•zh_code•46m ago•0 comments

Living Brain Cells Enable Machine Learning Computations

https://www.tohoku.ac.jp/en/press/living_brain_cells_enable_machine_learning_computations.html
1•giuliomagnifico•49m ago•0 comments

YouTube playables games save data is just plain JSON and you can edit it

https://www.youtube.com/playables/Ugkxto-OwJZo4rm8Xl2Nj3K403nHlYThf-sr
1•birdculture•49m ago•0 comments

Dev Tool

https://www.adgenai.ca/
1•kissablepicasso•50m ago•0 comments

I attacked myself with Google Spreadsheets (2012)

https://www.behind-the-enemy-lines.com/2012/04/google-attack-how-i-self-attacked.html
1•downbad_•50m ago•1 comments

The CMS is dead. Long live the CMS

https://next.jazzsequence.com/posts/the-cms-is-dead-long-live-the-cms
5•taubek•53m ago•0 comments

Tesla Is Sitting on a Record 50k Unsold EVs

https://insideevs.com/news/791999/tesla-unsold-inventory-record-q1-2026/
29•vrganj•58m ago•27 comments

Rendering arbitrary-scale emojis using the Slug algorithm

https://leduyquang753.name.vn/blog/2026/4/4/rendering-arbitrary-scale-emojis-using-the-slug-algor...
1•leduyquang753•59m ago•0 comments