frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: The Hessian of tall-skinny networks is easy to invert

https://github.com/a-rahimi/hessian
12•rahimiali•1h ago
It turns out the inverse of the Hessian of a deep net is easy to apply to a vector. Doing this naively takes cubically many operations in the number of layers (so impractical), but it's possible to do this in time linear in the number of layers (so very practical)!

This is possible because the Hessian of a deep net has a matrix polynomial structure that factorizes nicely. The Hessian-inverse-product algorithm that takes advantage of this is similar to running backprop on a dual version of the deep net. It echoes an old idea of Pearlmutter's for computing Hessian-vector products.

Maybe this idea is useful as a preconditioner for stochastic gradient descent?

Comments

MontyCarloHall•36m ago
>If the Hessian-vector product is Hv for some fixed vector v, we're interested in solving Hx=v for x. The hope is to soon use this as a preconditioner to speed up stochastic gradient descent.

Silly question, but if you have some clever way to compute the inverse Hessian, why not go all the way and use it for Newton's method, rather than as a preconditioner for SGD?

rahimiali•22m ago
Good q. The method computes Hessian-inverse on a batch. When people say "Newton's method" they're often thinking H^{-1} g, where both the Hessian and the gradient g are on the full dataset. I thought saying "preconditioner" instead of "Newton's method" would make it clear this is solving H^{-1} g on a batch, not on the full dataset.
MontyCarloHall•21m ago
I'd call it "Stochastic Newton's Method" then. :-)
rahimiali•16m ago
fair. thanks. i'll sleep on it and update the paper if it still sounds right tomorrow.

probably my nomenclature bias is that i started this project as a way to find new preconditioners on deep nets.

hodgehog11•4m ago
Just a heads up in case you didn't know, taking the Hessian over batches is indeed referred to as Stochastic Newton, and methods of this kind have been studied for quite some time. Inverting the Hessian is often done with CG, which tends to work pretty well. The only problem is that the Hessian is often not invertible so you need a regularizer (same as here I believe). Newton methods work at scale, but no-one with the resources to try them at scale seems to be aware of them.

It's an interesting trick though, so I'd be curious to see how it compares to CG.

[1] https://arxiv.org/abs/2204.09266 [2] https://arxiv.org/abs/1601.04737 [3] https://pytorch-minimize.readthedocs.io/en/latest/api/minimi...

jeffjeffbear•25m ago
I haven't looked into it in years, but would the inverse of a block bi-diagonal matrix have some semiseperable structure? Maybe that would be good to look into?
rahimiali•18m ago
just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?

If so, it would be nice if this were the case, because you could then just use the Woodbury formula to invert H. But I don't think such a decomposition exists. I tried to exhaustively search through all the decompositions of H that involved one dummy variable (of which the above is a special case) and I couldn't find one. I ended up having to introduce two dummy variables instead.

jeffjeffbear•11m ago
> just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?

Not quite, it means any submatrix taken from the upper(lower) part of the matrix has some low rank. Like a matrix is {3,4}-semiseperable if any sub matrix taken from the lower triangular part has at most rank 3 and any submatrix taken from the upper triangular part has at most rank 4.

The inverse of an upper bidiagonal matrix is {0,1}-semiseperable.

There are a lot of fast algorithms if you know a matrix is semiseperable.

edit: link https://people.cs.kuleuven.be/~raf.vandebril/homepage/public...

rahimiali•6m ago
thanks for the explanation! sorry i had misread the AI summary on "semiseparable".

i need to firm my intuition on this first before i can say anything clever, but i agree it's worth thinking about!

Apple is fighting for TSMC capacity as Nvidia takes center stage

https://www.culpium.com/p/exclusiveapple-is-fighting-for-tsmc
465•speckx•7h ago•298 comments

Inside The Internet Archive's Infrastructure

https://hackernoon.com/the-long-now-of-the-web-inside-the-internet-archives-fight-against-forgetting
164•dvrp•1d ago•27 comments

CVEs affecting the Svelte ecosystem

https://svelte.dev/blog/cves-affecting-the-svelte-ecosystem
117•tobr•4h ago•23 comments

Ask HN: How can we solve the loneliness epidemic?

244•publicdebates•5h ago•457 comments

JuiceFS is a distributed POSIX file system built on top of Redis and S3

https://github.com/juicedata/juicefs
70•tosh•3h ago•40 comments

Use of Bayesian methodology in clinical trials of drug and biological products [pdf]

https://www.fda.gov/media/190505/download
18•brendanashworth•14h ago•1 comments

Linux boxes via SSH: suspended when disconected

https://shellbox.dev/
47•messh•1h ago•29 comments

Go-legacy-winxp: Compile Golang 1.24 code for Windows XP

https://github.com/syncguy/go-legacy-winxp/tree/winxp-compat
24•Oxodao•3d ago•2 comments

Aviator (YC S21) is hiring to build multiplayer AI coding platform

https://www.ycombinator.com/companies/aviator/jobs
1•ankitdce•1h ago

Show HN: OpenWork – an open-source alternative to Claude Cowork

https://github.com/different-ai/openwork
82•ben_talent•1d ago•19 comments

Data is the only moat

https://frontierai.substack.com/p/data-is-your-only-moat
18•cgwu•3h ago•3 comments

Claude is good at assembling blocks, but still falls apart at creating them

https://www.approachwithalacrity.com/claude-ne/
114•bblcla•1d ago•92 comments

Show HN: The Hessian of tall-skinny networks is easy to invert

https://github.com/a-rahimi/hessian
12•rahimiali•1h ago•10 comments

Ask HN: One IP, multiple unrealistic locations worldwide hitting my website

18•nacho-daddy•3h ago•11 comments

European troops arrive in Greenland to boost the Arctic island's security

https://www.npr.org/2026/01/15/g-s1-106113/european-troops-arrive-greenland
57•geox•1h ago•44 comments

Photos capture the breathtaking scale of China's wind and solar buildout

https://e360.yale.edu/digest/china-renewable-photo-essay
384•mrtksn•12h ago•314 comments

Supply Chain Vuln Compromised Core AWS GitHub Repos & Threatened the AWS Console

https://www.wiz.io/blog/wiz-research-codebreach-vulnerability-aws-codebuild
64•uvuv•4h ago•8 comments

Claude Cowork runs Linux VM via Apple virtualization framework

https://gist.github.com/simonw/35732f187edbe4fbd0bf976d013f22c8
67•jumploops•1d ago•25 comments

UK offshore wind prices come in 40% cheaper than gas in record auction

https://electrek.co/2026/01/14/uk-offshore-wind-record-auction/
129•doener•3h ago•66 comments

Found: Medieval Cargo Ship – Largest Vessel of Its Kind Ever

https://www.smithsonianmag.com/smart-news/archaeologists-say-theyve-unearthed-a-massive-medieval-...
102•bookofjoe•7h ago•23 comments

Pocket TTS: A high quality TTS that gives your CPU a voice

https://kyutai.org/blog/2026-01-13-pocket-tts
10•pain_perdu•17h ago•0 comments

25 Years of Wikipedia

https://wikipedia25.org
380•easton•9h ago•336 comments

A Unique Performance Optimization for a 3D Geometry Language

https://cprimozic.net/notes/posts/persistent-expr-memo-optimization-for-geoscript/
5•Ameo•4d ago•1 comments

Design and Implementation of Sprites

https://fly.io/blog/design-and-implementation/
106•sethev•6h ago•82 comments

First impressions of Claude Cowork

https://simonw.substack.com/p/first-impressions-of-claude-cowork
92•stosssik•1d ago•47 comments

I learned everything I know about programming

https://agentultra.com/blog/how-i-learned-everything-i-know/index.html
16•speckx•2h ago•9 comments

Show HN: Tusk Drift – Turn production traffic into API tests

https://github.com/Use-Tusk/tusk-drift-cli
12•jy-tan•3h ago•0 comments

Show HN: Tabstack – Browser infrastructure for AI agents (by Mozilla)

89•MrTravisB•1d ago•14 comments

Virginia Faulkner: Writer, editor, and ghostwriter?

https://lithub.com/virginia-faulkner-writer-editor-and-ghostwriter/
19•samclemens•6d ago•2 comments

Show HN: TinyCity – A tiny city SIM for MicroPython (Thumby micro console)

https://github.com/chrisdiana/TinyCity
105•inflam52•8h ago•18 comments