frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: The Hessian of tall-skinny networks is easy to invert

https://github.com/a-rahimi/hessian
12•rahimiali•1h ago
It turns out the inverse of the Hessian of a deep net is easy to apply to a vector. Doing this naively takes cubically many operations in the number of layers (so impractical), but it's possible to do this in time linear in the number of layers (so very practical)!

This is possible because the Hessian of a deep net has a matrix polynomial structure that factorizes nicely. The Hessian-inverse-product algorithm that takes advantage of this is similar to running backprop on a dual version of the deep net. It echoes an old idea of Pearlmutter's for computing Hessian-vector products.

Maybe this idea is useful as a preconditioner for stochastic gradient descent?

Comments

MontyCarloHall•31m ago
>If the Hessian-vector product is Hv for some fixed vector v, we're interested in solving Hx=v for x. The hope is to soon use this as a preconditioner to speed up stochastic gradient descent.

Silly question, but if you have some clever way to compute the inverse Hessian, why not go all the way and use it for Newton's method, rather than as a preconditioner for SGD?

rahimiali•17m ago
Good q. The method computes Hessian-inverse on a batch. When people say "Newton's method" they're often thinking H^{-1} g, where both the Hessian and the gradient g are on the full dataset. I thought saying "preconditioner" instead of "Newton's method" would make it clear this is solving H^{-1} g on a batch, not on the full dataset.
MontyCarloHall•16m ago
I'd call it "Stochastic Newton's Method" then. :-)
rahimiali•11m ago
fair. thanks. i'll sleep on it and update the paper if it still sounds right tomorrow.

probably my nomenclature bias is that i started this project as a way to find new preconditioners on deep nets.

jeffjeffbear•20m ago
I haven't looked into it in years, but would the inverse of a block bi-diagonal matrix have some semiseperable structure? Maybe that would be good to look into?
rahimiali•13m ago
just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?

If so, it would be nice if this were the case, because you could then just use the Woodbury formula to invert H. But I don't think such a decomposition exists. I tried to exhaustively search through all the decompositions of H that involved one dummy variable (of which the above is a special case) and I couldn't find one. I ended up having to introduce two dummy variables instead.

jeffjeffbear•6m ago
> just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?

Not quite, it means any submatrix taken from the upper(lower) part of the matrix has some low rank. Like a matrix is {3,4}-semiseperable if any sub matrix taken from the lower triangular part has at most rank 3 and any submatrix taken from the upper triangular part has at most rank 4.

The inverse of an upper bidiagonal matrix is {0,1}-semiseperable.

There are a lot of fast algorithms if you know a matrix is semiseperable.

edit: link https://people.cs.kuleuven.be/~raf.vandebril/homepage/public...

Show HN: OpenWork – an open-source alternative to Claude Cowork

https://github.com/different-ai/openwork
79•ben_talent•1d ago•19 comments

Show HN: The Hessian of tall-skinny networks is easy to invert

https://github.com/a-rahimi/hessian
12•rahimiali•1h ago•8 comments

Show HN: Tusk Drift – Turn production traffic into API tests

https://github.com/Use-Tusk/tusk-drift-cli
10•jy-tan•3h ago•0 comments

Show HN: Tabstack – Browser infrastructure for AI agents (by Mozilla)

87•MrTravisB•1d ago•13 comments

Show HN: Munimet.ro – ML-based status page for the local subways in SF

https://munimet.ro/
5•MrEricSir•3h ago•0 comments

Show HN: TinyCity – A tiny city SIM for MicroPython (Thumby micro console)

https://github.com/chrisdiana/TinyCity
105•inflam52•8h ago•18 comments

Show HN: Sparrow-1 – Audio-native model for human-level turn-taking without ASR

https://www.tavus.io/post/sparrow-1-human-level-conversational-timing-in-real-time-voice
111•code_brian•1d ago•43 comments

Show HN: GoGen – A simple template-based file generator written in Go

https://github.com/zaheershaikh936/gogen
2•zaheer9360•1h ago•1 comments

Show HN: ContextFort – Visibility and controls for browser agents

https://contextfort.ai/
8•ashwinr2002•1d ago•1 comments

Show HN: Beni AI – Real-time face-to-face AI companion

https://thebeni.ai/
4•chaeeunlee9611•21h ago•0 comments

Show HN: WebTiles – create a tiny 250x250 website with neighbors around you

https://webtiles.kicya.net/
224•dimden•5d ago•38 comments

Show HN: Webctl – Browser automation for agents based on CLI instead of MCP

https://github.com/cosinusalpha/webctl
119•cosinusalpha•1d ago•35 comments

Show HN: Voice Composer – Browser-based pitch detection to MIDI/strudel/tidal

https://dioptre.github.io/tidal/
29•dioptre•3d ago•6 comments

Show HN: I built an 11MB offline PDF editor because mobile Acrobat is 500MB

https://revpdf.com/
4•pawandeepsingh•3h ago•1 comments

Show HN: Cache Explorer – The Compiler Explorer for CPU Cache Behavior

https://github.com/AveryClapp/Cache-Explorer
2•AveryClapp•4h ago•0 comments

Show HN: Keypost – Policy enforcement for MCP pipelines

https://keypost.ai
3•kxb4032•4h ago•1 comments

Show HN: I'm building an open-source AI agent runtime using Firecracker microVMs

https://github.com/moru-ai/moru
2•markoh49•4h ago•0 comments

Show HN: Tiny FOSS Compass and Navigation App (<2MB)

https://github.com/CompassMB/MBCompass
130•nativeforks•1d ago•45 comments

Show HN: HyTags – HTML as a Programming Language

https://hytags.org
67•lassejansen•2d ago•32 comments

Show HN: A 10KiB kernel for cloud apps

https://github.com/ReturnInfinity/BareMetal-Cloud
66•ianseyler•1d ago•11 comments

Show HN: Ever wanted to look at yourself in Braille?

https://github.com/NishantJoshi00/dith
26•cat-whisperer•6d ago•13 comments

Show HN: I spent 10k hours building the perfect language learning app

https://phrasing.app/
3•barrell•6h ago•2 comments

Show HN: Xoscript

https://xoscript.com/history.xo
53•gabordemooij•1d ago•43 comments

Show HN: A fast CLI and MCP server for managing Lambda cloud GPU instances

https://github.com/Strand-AI/lambda-cli
23•odedfalik•1d ago•2 comments

Show HN: Digital Carrot – Block social media with programmable rules and goals

https://www.digitalcarrot.app/
37•newswangerd•1d ago•11 comments

Show HN: Stash: End-to-end encrypted file sharing with zero friction

https://stash-app.xyz/
3•alepacheco-dev•8h ago•0 comments

Show HN: 1D-Pong Game at 39C3

https://github.com/ogermer/1d-pong
67•oger•4d ago•13 comments

Show HN: OSS AI agent that indexes and searches the Epstein files

https://epstein.trynia.ai/
205•jellyotsiro•1d ago•95 comments

Show HN: Nogic – VS Code extension that visualizes your codebase as a graph

https://marketplace.visualstudio.com/items?itemName=Nogic.nogic
128•davelradindra•2d ago•50 comments

Show HN: The Tsonic Programming Language

https://tsonic.org
59•jeswin•2d ago•9 comments