frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

From Memorization to Reasoning in the Spectrum of Loss Curvature

https://arxiv.org/abs/2510.24256
26•andy12_•3h ago

Comments

andy12_•3h ago
Very concise summary of the procedure described in this paper:

1. Run the model once across a dataset to estimate loss curvature per MLP weight matrix via K-FAC (activation/gradient covariances).

2. Decompose each weight matrix into curvature-ordered components; low-curvature directions correspond most to verbatim memorization, higher curvature to shared/general mechanisms.

3. Edit by dropping the low-curvature subspace and keep only the top directions.

vessenes•2h ago
Thank you for this huge time saver.

Now, about the paper-that’s super interesting. I imagine the dream here is to distil down into a “reasoning” core. Or maybe reclaim space for more generalization. Lots of interesting use cases.

getnormality•2h ago
Thank you!

I think you may have accidentally switched low and high in #2, no? The abstract speaks of high curvature as associated with memorization:

> curvature for memorized training points is much sharper than non memorized

radarsat1•1h ago
This sounds more correct to me. I've read previously somewhere that better generalization is usually associated with wider, smoother minima, and this is why regularization is important, because it has a smoothing function on the loss landscape.
getnormality•1h ago
Yes. This is also not hard to see intuitively from scratch.

Say you have a smooth but highly flexible model y = f(x) and some data points you are fitting with a machine learning algorithm. For whatever reason, the algorithm decides it wants to reduce training error by interpolating some specific point, (x0,y0), without negatively affecting training error on nearby points. The direct, guaranteed successful way to do this is to adjust the model to y0 = f(x0) exactly on x0 by adding a Dirac delta there, leaving the rest of f exactly as-is. But this cannot be done on a differentiable model, as it would create a discontinuity. The next best thing that such a model can actually do is replace the Dirac delta with a smooth but very narrow bump (e.g. Gaussian). But this narrow bump will inevitably have extremely high curvature at x0, since the bump is flat at x0 and it has to merge with the neighborhood around x0 in a very short distance.

Think of driving: if you have to change lanes in a very short distance, you're going to have to steer hard. Steering is curvature.

woadwarrior01•1h ago
That's very reminiscent of the idea behind the SAM (Sharpness Aware Minimization) family of optimizers.
andy12_•56m ago
Actually, no! Look at this in the paper

> In extending from studying per-example to bulk memorization, we propose a novel inversion of the previous interpretation of loss curvature: while individual memorized points are associated with high curvature, the direction of curvature varies across examples, meaning that, averaged across multiple examples, memorization directions are actually flatter than generalizing directions, which maintain a consistent moderate curvature across points

getnormality•19m ago
Ah! I figured I should be very circumspect in the question since I hadn't read in full and there could be some crazy reason it's actually the opposite.
kingstnap•1h ago
A very similar idea is presented here in the first 5 minutes of this recent talk. But more from observing a kink in loss curves.

https://youtu.be/UyK3DgWY7yw?si=NN3f9Erik8o_Nfbs

Leaving Meta and PyTorch

https://soumith.ch/blog/2025-11-06-leaving-meta-and-pytorch.md.html
505•saikatsg•9h ago•113 comments

A Fond Farewell

https://www.farmersalmanac.com/fond-farewell-from-farmers-almanac
399•erhuve•12h ago•128 comments

Meta projected 10% of 2024 revenue came from scams

https://sherwood.news/tech/meta-projected-10-of-2024-revenue-came-from-scams-and-banned-goods-reu...
270•donohoe•3h ago•183 comments

OpenMW 0.50.0 Released – open-source Morrowind reimplementation

https://openmw.org/2025/openmw-0-50-0-released/
95•agluszak•2h ago•27 comments

PyTorch Helion

https://pytorch.org/blog/helion/
63•jarbus•5d ago•12 comments

From Memorization to Reasoning in the Spectrum of Loss Curvature

https://arxiv.org/abs/2510.24256
26•andy12_•3h ago•9 comments

You should write an agent

https://fly.io/blog/everyone-write-an-agent/
856•tabletcorry•19h ago•340 comments

Comparison Traits – Understanding Equality and Ordering in Rust

https://itsfoxstudio.substack.com/p/comparison-traits-understanding-equality
11•rpunkfu•5d ago•1 comments

Two billion email addresses were exposed

https://www.troyhunt.com/2-billion-email-addresses-were-exposed-and-we-indexed-them-all-in-have-i...
543•esnard•19h ago•379 comments

1973 Implementation of Wordle was Published by DEC (2022)

https://troypress.com/1973-implementation-of-wordle-was-published-by-dec/
29•msephton•6d ago•14 comments

Text case changes the size of QR codes

https://www.johndcook.com/blog/2025/10/31/smaller-qr-codes/
101•ibobev•5d ago•31 comments

We chose OCaml to write Stategraph

https://stategraph.dev/blog/why-we-chose-ocaml
71•lawnchair•2h ago•71 comments

Claude Is Down

https://status.claude.com/incidents/tgtw1sqs9ths
17•agrocrag•1h ago•11 comments

Sweep (YC S23) is hiring to build autocomplete for JetBrains

https://www.ycombinator.com/companies/sweep/jobs/8dUn406-founding-engineer-intern
1•williamzeng0•3h ago

Is Software the UFOlogy of Engineering Disciplines?

https://codemanship.wordpress.com/2025/11/07/is-software-the-ufology-of-engineering-disciplines/
63•flail•2h ago•97 comments

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

https://book.sv
491•costco•1d ago•191 comments

Game design is simple

https://www.raphkoster.com/2025/11/03/game-design-is-simple-actually/
408•vrnvu•17h ago•128 comments

The Silent Scientist: When Software Research Fails to Reach Its Audience

https://cacm.acm.org/opinion/the-silent-scientist-when-software-research-fails-to-reach-its-audie...
54•mschnell•6d ago•25 comments

I Love OCaml

https://mccd.space/posts/ocaml-the-worlds-best/
4•art-w•1h ago•0 comments

I'm Making a Small RPG and I Need Feeback Regarding Performance

https://jslegenddev.substack.com/p/im-making-a-small-rpg-and-i-need
32•ibobev•2h ago•22 comments

Revisiting Interface Segregation in Go

https://rednafi.com/go/interface-segregation/
8•ingve•5d ago•2 comments

Analysis indicates that the universe’s expansion is not accelerating

https://ras.ac.uk/news-and-press/research-highlights/universes-expansion-now-slowing-not-speeding
213•chrka•19h ago•171 comments

From web developer to database developer in 10 years

https://notes.eatonphil.com/2025-02-15-from-web-developer-to-database-developer-in-10-years.html
125•pmbanugo•3d ago•45 comments

OpenTelemetry: Escape Hatch from the Observability Cartel

https://oneuptime.com/blog/post/2025-11-03-opentelemetry-escape-from-observability-cartel/view
61•ndhandala•3d ago•47 comments

Lessons from Growing a Piracy Streaming Site

https://prison.josh.mn/lessons
207•zuhayeer•8h ago•127 comments

Machine Scheduler in LLVM – Part II

https://myhsu.xyz/llvm-machine-scheduler-2/
25•mshockwave•5d ago•0 comments

Cryptography 101 with Alfred Menezes

https://cryptography101.ca
84•nmadden•4d ago•11 comments

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

https://moonshotai.github.io/Kimi-K2/thinking.html
826•nekofneko•1d ago•364 comments

JermCAD: Browser-Based CAD Software

https://github.com/jeremyaboyd/jerm-cad
46•azhenley•11h ago•27 comments

A.I. and Social Media Contribute to 'Brain Rot'

https://www.nytimes.com/2025/11/06/technology/personaltech/ai-social-media-brain-rot.html
2•pretext•19m ago•0 comments