frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Simple self-distillation improves code generation

https://arxiv.org/abs/2604.01193
185•Anon84•3h ago•42 comments

Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw

833•firloop•14h ago•649 comments

Some Unusual Trees

https://thoughts.wyounas.com/p/some-unusual-trees
86•simplegeek•4h ago•26 comments

Artemis II crew take “spectacular” image of Earth

https://www.bbc.com/news/articles/ce8jzr423p9o
852•andsoitis•18h ago•290 comments

The CMS is dead. Long live the CMS

https://next.jazzsequence.com/posts/the-cms-is-dead-long-live-the-cms
37•taubek•2h ago•17 comments

The Cathedral, the Bazaar, and the Winchester Mystery House

https://www.dbreunig.com/2026/03/26/winchester-mystery-house.html
28•dbreunig•2d ago•9 comments

iNaturalist

https://www.inaturalist.org/
465•bookofjoe•20h ago•111 comments

OpenClaw privilege escalation vulnerability

https://nvd.nist.gov/vuln/detail/CVE-2026-33579
436•kykeonaut•21h ago•210 comments

Mbodi AI (YC P25) Is Hiring

https://www.ycombinator.com/companies/mbodi-ai/jobs/mf9L3sy-senior-robotics-engineer-systems-cont...
1•chitianhao•1h ago

The most-disliked people in the publishing industry

https://www.woman-of-letters.com/p/the-most-disliked-people-in-the-publishing
28•Caiero•3d ago•6 comments

Herbie: Automatically improve imprecise floating point formulas

https://herbie.uwplse.org/doc/latest/tutorial.html
150•summarity•4d ago•23 comments

Claude Code Found a Linux Vulnerability Hidden for 23 Years

https://mtlynch.io/claude-code-found-linux-vulnerability/
106•eichin•13h ago•71 comments

Run Linux containers on Android, no root required

https://github.com/ExTV/Podroid
160•politelemon•15h ago•55 comments

Improving my focus by giving up my big monitor

https://ounapuu.ee/posts/2026/04/01/focus/
124•Fudgel•3d ago•143 comments

We replaced RAG with a virtual filesystem for our AI documentation assistant

https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant
344•denssumesh•1d ago•127 comments

Why Inventing Color TV Was So Difficult [video]

https://www.youtube.com/watch?v=hyjCmIbRRvs
5•DamnInteresting•3d ago•0 comments

The Technocracy Movement of the 1930s

https://donotresearch.substack.com/p/welcome-to-the-technocracy
127•lazydogbrownfox•1d ago•100 comments

What changes when you turn a Linux box into a router

https://patrickmccanna.net/7-configuration-changes-that-turn-a-multi-homed-host-into-a-switch-rou...
187•0o_MrPatrick_o0•4d ago•47 comments

Go on Embedded Systems and WebAssembly

https://tinygo.org/
176•uticus•20h ago•24 comments

Jack Dorsey says Block employees now bring prototypes, not slides, to meetings

https://www.businessinsider.com/block-ceo-jack-dorsey-bring-prototypes-not-slide-decks-meetings-2...
21•taubek•1h ago•5 comments

Build your own Dial-up ISP with a Raspberry Pi

https://www.jeffgeerling.com/blog/2026/build-your-own-dial-up-isp-with-a-raspberry-pi/
175•arjunbajaj•22h ago•32 comments

F-15E jet shot down over Iran

https://www.theguardian.com/world/2026/apr/03/us-fighter-jet-confirmed-shot-down-over-iran
527•tjwds•21h ago•1174 comments

Big-Endian Testing with QEMU

https://www.hanshq.net/big-endian-qemu.html
101•jandeboevrie•1d ago•113 comments

Delve removed from Y Combinator

https://www.ycombinator.com/companies/delve
384•carabiner•12h ago•236 comments

How to make a sliding, self-locking, and predator-proof chicken coop door (2020)

https://www.backyardchickens.com/articles/how-to-make-a-sliding-self-locking-and-predator-proof-c...
115•uticus•18h ago•50 comments

Fake Fans

https://www.wordsfromeliza.com/p/fake-fans
128•performative•15h ago•34 comments

The house is a work of art: Frank Lloyd Wright

https://aeon.co/essays/frank-lloyd-wright-as-a-mirror-of-the-american-condition
96•midnightfish•15h ago•40 comments

Why are we still using Markdown?

https://bgslabs.org/blog/why-are-we-using-markdown/
157•veqq•19h ago•227 comments

Sequential Optimal Packing for PCB Placement

https://blog.autorouting.com/p/sequential-optimal-packing-for-pcb
15•seveibar•2d ago•5 comments

The FAA’s flight restriction for drones is an attempt to criminalize filming ICE

https://www.eff.org/deeplinks/2026/04/faas-temporary-flight-restriction-drones-blatant-attempt-cr...
460•detaro•13h ago•145 comments
Open in hackernews

Simple self-distillation improves code generation

https://arxiv.org/abs/2604.01193
185•Anon84•3h ago

Comments

jofzar•2h ago
> simple self-distillation (SSD):

Sorry apple, SSD is already taken, you can't use that acronym.

ape4•1h ago
ATT=All TLAs are Taken
love2read•1h ago
You're right, I offer these alternatives:

Consistency Preservation Update (CPU)

Guided Probability Update (GPU)

History-aware Distillation Driving (HDD)

Probability Smoothing Update (PSU)

drittich•1h ago
I used to invent TLAs on the spot for fun, and when someone asked what it was, would respond, "It's a PUA", eventually revealing that meant "previously unknown acronym". It was even more annoying that it sounds.
0x3f•1h ago
Haven't read the paper yet, but it is interesting how seemingly simple many breakthroughs in ML are. Even transformers are like that. Maybe it's hindsight bias.

I suppose we just don't have a deeper underlying theory to lean on and help us 'design' anything.

christophilus•1h ago
A lot of discoveries are like that. In fact, simplicity is often the hallmark of correctness, and complexity is often a sign that our understanding is incomplete and we’re still stumbling towards the right model. Not always, but often. It’s been a good rule of thumb in my programming career.
heeton•1h ago
100%. I have a guiding approach when solving problems: keep reframing and exploring until the solution becomes obvious.

I often find, if I've got a complicated solution, it’s because I haven’t fully examined the problem.

khalic•1h ago
Incredible, will translate to better coding models in the near future.

We really need to develop better tools to understand what's happening inside these NNs. Working with high-D spaces is not something we're good at, and we're basically throwing stuff at it and seeing if it sticks.

politelemon•1h ago
It's cringe worthy to see that the original paper itself is editorialised.

Title should be: Simple Self-Distillation Improves Code Generation

StevenWaterman•1h ago
"Embarrassingly" has a history as a technically meaningful word roughly equivalent to "maximally", see "Embarrassingly parallel"

https://en.wikipedia.org/wiki/Embarrassingly_parallel

Aurornis•1h ago
The phrase embarrassingly parallel has a history in computer science.

Many computer science paper titles allude to past titles in other CS papers.

Calling it “cringe worthy” is unnecessarily mean. There is context and history you don’t understand.

gottheUIblues•1h ago
"Embarrassingly" considered harmful?
cbm-vic-20•53m ago
"Embarrassingly" considered harmful is all you need.
ape4•1h ago
Shouldn't a scientific paper be using metric units (like 30T) rather than 30B.

There are two distinct billions. https://en.wikipedia.org/wiki/Billion

mikkupikku•46m ago
Objective one should be to communicate effectively, not confuse everybody.
roger_•1h ago
Skimmed this but don't have an intuitive understanding of why this works and how temperature and truncation factor in.
bensyverson•1h ago
Really fascinating how this works; it's basically context-aware decoding. From the paper:

> Code interleaves fork positions, where several continuations are genuinely plausible and may correspond to different solution approaches, with lock positions, where syntax and semantics leave little ambiguity but a low-probability distractor tail still remains… The best global decoding setting is therefore necessarily a compromise; we call this tension the precision-exploration conflict.

In other words, just like us, the model needs to shift from "exploration" in "fork" mode (divergent thinking to produce a creative solution) to "precision" in "lock" mode (producing syntactically correct code).

What this paper shows is that their simple technique (SSD) can improve the ranking of optimal tokens in both lock and fork positions, meaning the model is more likely to explore when it should be exploring, and more likely to be precise when it needs to be.

I love that we're still learning the emergent properties of LLMs!

stingraycharles•1h ago
Seems like this is true for not just code but for all content being generated? Albeit for code it’s more well-defined, but the fork / lock mechanism works for a lot more problem domains.
bensyverson•1h ago
That would seem intuitively true; it certainly applies to written language, where a clause could go off in another direction, but at other positions the correct grammar/syntax is unambiguous.
bryanrasmussen•1h ago
thinking - well if we think of lock as happening in a narrative, then I think we can see there can be points where "everything you know is wrong" which essentially allows you to go back into a sort of fork mode and work towards another lock.

Completely artistic creation, creating something that does not exist and that cannot produce things out of itself, means that locking can be more diffuse, not as settled.

stingraycharles•1h ago
I think this seems similar to what Anthropic had been doing since the latest few Opus releases, which is interleaved thinking; CoT reasoning in the middle of a message. But they operate at different layers.
michaelbuckbee•42m ago
I don't really understand the internal mechanics of of this, but my first thought was why not combine this with a linter/tests. So that it produces all the forks and only keeps the syntactically correct ones.
user_7832•42m ago
> I love that we're still learning the emergent properties of LLMs!

TBH, this is (very much my opinion btw) the least surprising thing. LLMs (and especially their emergent properties) are still black boxes. Humans have been studying the human brain for millenia, and we are barely better at predicting how humans work (or for eg to what extent free will is a thing). Hell, emergent properties of traffic was not understood or properly given attention to, even when a researcher, as a driver, knows what a driver does. Right now, on the front page, is this post:

> 14. Claude Code Found a Linux Vulnerability Hidden for 23 Years (mtlynch.io)

So it's pretty cool we're learning new things about LLMs, sure, but it's barely surprising that we're still learning it.

(Sorry, mini grumpy man rant over. I just wish we knew more of the world but I know that's not realistic.)

bensyverson•3m ago
Learning about the emergent properties of these black boxes is not surprising, but it's also not daily. I think every new insight is worth celebrating.
khalic•32m ago
Another example of the mindf@#$ these systems are: I was doing some fine tuning to a small model, take data fields and make a sentence out of it. I was running into mode collapse (basically when the AI simplifies too much and always output the same thing).

I got unstuck by randomizing the field order for each row?!? At training, and now I'm thinking I should do the same at inference time...

DavidPiper•30m ago
Sounds just like John Cleese's "Open Mode" and "Closed Mode" - https://www.youtube.com/watch?v=Pb5oIIPO62g
TacticalCoder•5m ago
> What this paper shows is that their simple technique (SSD)

"Simple Self-Distillation". We had an acronym for Solid-State Drive. Don't know about that technique but the naming sure sound.. Simple?

wg0•1h ago
After TurboQuant and Gemma 4, came across the following video[0] running Gemma on local machine at 50 token/second.

That already looks like Sonnet 3x and 4 level capabilities to me where the model in question (Gemma 4) set ups whole python project with a UI and installs python libraries using uv etc.

Add this Simple Self Distillation to the picture and by 2028 I see cheaper coding model providers with much more generous usage limits in the future and power users would be mostly running their own models anyway.

Anyone using these models as "non-deterministic transpilers" from natural language to code (experienced engineers who can write code themselves) would probably not be paying to any AI providers.

[0] https://www.youtube.com/watch?v=-_hC-C_Drcw

spiderfarmer•56m ago
I always wonder how much smaller and faster models could be if they were only trained on the latest versions of the languages I use, so for me that is PHP, SQL, HTML, JS, CSS, Dutch, English, plus tool use for my OS of choice (MacOS).

Right now it feels like hammering a house onto a nail instead of the other way around.

BarryMilo•41m ago
I seem to remember that's one of the first things they tried, but the general models tended to win out. Turns out there's more to learn from all code/discussions than from just JS.
Someone1234•7m ago
Wouldn't that mean they're bad at migration tasks? I feel like for most languages, going from [old] to [current] is a fairly to very common usage scenario.
smallerize•1h ago
I don't suppose they published the improved models?
l5870uoo9y•1h ago
> Our method, simple self-distillation (SSD), is embarrassingly simple: sample solutions from the base model with specified temperature and truncation, then fine-tune on those raw, unverified samples via standard cross-entropy loss.

So you prompt the base model for answer and then rerun the prompt with the answer from the first run?

ACCount37•1h ago
No. There's no "answer" really.

They use self-distillation to shift the output distribution of the model towards that of the same model, but running with different temperature/truncation settings in sampling.

This effectively "folds" the logit tail truncation behavior into the model itself.

Not entirely unlike a few "model controlled sampling settings" things I've seen in what it does, but different in execution.

drooby•1h ago
Fascinating...

This feels eerily similar to sleep consolidation or synaptic pruning

vishnugupta•57m ago
Can someone please eli5 this to a friend web developer? I read the abstract but couldn’t understand much.
xbmcuser•12m ago
So the chances of Singularity went up.
4b11b4•2m ago
Self-consistency meets fine-tuning?