frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
1•Brajeshwar•4m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
2•Brajeshwar•4m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
1•Brajeshwar•4m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•7m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
1•righthand•10m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•11m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•11m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•12m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•17m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•21m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•26m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•27m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•28m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•35m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•37m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•38m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•39m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•40m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•40m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•41m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•41m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•45m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•45m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•46m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•46m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•55m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•55m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
2•surprisetalk•57m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•57m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
2•surprisetalk•57m ago•0 comments
Open in hackernews

Show HN: The Hessian of tall-skinny networks is easy to invert

https://github.com/a-rahimi/hessian
31•rahimiali•3w ago
It turns out the inverse of the Hessian of a deep net is easy to apply to a vector. Doing this naively takes cubically many operations in the number of layers (so impractical), but it's possible to do this in time linear in the number of layers (so very practical)!

This is possible because the Hessian of a deep net has a matrix polynomial structure that factorizes nicely. The Hessian-inverse-product algorithm that takes advantage of this is similar to running backprop on a dual version of the deep net. It echoes an old idea of Pearlmutter's for computing Hessian-vector products.

Maybe this idea is useful as a preconditioner for stochastic gradient descent?

Comments

MontyCarloHall•3w ago
>If the Hessian-vector product is Hv for some fixed vector v, we're interested in solving Hx=v for x. The hope is to soon use this as a preconditioner to speed up stochastic gradient descent.

Silly question, but if you have some clever way to compute the inverse Hessian, why not go all the way and use it for Newton's method, rather than as a preconditioner for SGD?

rahimiali•3w ago
Good q. The method computes Hessian-inverse on a batch. When people say "Newton's method" they're often thinking H^{-1} g, where both the Hessian and the gradient g are on the full dataset. I thought saying "preconditioner" instead of "Newton's method" would make it clear this is solving H^{-1} g on a batch, not on the full dataset.
MontyCarloHall•3w ago
I'd call it "Stochastic Newton's Method" then. :-)
rahimiali•3w ago
fair. thanks. i'll sleep on it and update the paper if it still sounds right tomorrow.

probably my nomenclature bias is that i started this project as a way to find new preconditioners on deep nets.

hodgehog11•3w ago
Just a heads up in case you didn't know, taking the Hessian over batches is indeed referred to as Stochastic Newton, and methods of this kind have been studied for quite some time. Inverting the Hessian is often done with CG, which tends to work pretty well. The only problem is that the Hessian is often not invertible so you need a regularizer (same as here I believe). Newton methods work at scale, but no-one with the resources to try them at scale seems to be aware of them.

It's an interesting trick though, so I'd be curious to see how it compares to CG.

[1] https://arxiv.org/abs/2204.09266 [2] https://arxiv.org/abs/1601.04737 [3] https://pytorch-minimize.readthedocs.io/en/latest/api/minimi...

semi-extrinsic•3w ago
For solving physics equations there is also Jacobian-free Newton-Krylov methods.
conformist•3w ago
Yes the combination of Krylov and quasi-Newton methods are very successful for physics problems (https://en.wikipedia.org/wiki/Quasi-Newton_method).

Iirc eg GMRES is a popular Krylov subspace method.

throwaway198846•3w ago
I lately used these methods and BFGS worked better than CG for me.
hodgehog11•3w ago
Absolutely plausible (BFGS is awesome), but this is situation dependent (no free lunch and all that). In the context of training neural networks, it gets even more complicated when one takes implicit regularisation coming from the optimizer into account. It's often worthwhile to try a SGD-type optimizer, BFGS, and a Newton variant to see which type works best for a particular problem.
jeffjeffbear•3w ago
I haven't looked into it in years, but would the inverse of a block bi-diagonal matrix have some semiseperable structure? Maybe that would be good to look into?
rahimiali•3w ago
just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?

If so, it would be nice if this were the case, because you could then just use the Woodbury formula to invert H. But I don't think such a decomposition exists. I tried to exhaustively search through all the decompositions of H that involved one dummy variable (of which the above is a special case) and I couldn't find one. I ended up having to introduce two dummy variables instead.

jeffjeffbear•3w ago
> just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?

Not quite, it means any submatrix taken from the upper(lower) part of the matrix has some low rank. Like a matrix is {3,4}-semiseperable if any sub matrix taken from the lower triangular part has at most rank 3 and any submatrix taken from the upper triangular part has at most rank 4.

The inverse of an upper bidiagonal matrix is {0,1}-semiseperable.

There are a lot of fast algorithms if you know a matrix is semiseperable.

edit: link https://people.cs.kuleuven.be/~raf.vandebril/homepage/public...

rahimiali•3w ago
thanks for the explanation! sorry i had misread the AI summary on "semiseparable".

i need to firm my intuition on this first before i can say anything clever, but i agree it's worth thinking about!

Lerc•3w ago
I am not a mathematician, but I do enough weird stuff that I encounter things referring to Hessians, yet I don't really know what they are, because everyone who writes about them does so in terms that assumes the reader knows what they are.

Any hints? The Battenburg graphics of matrices?

stevenae•3w ago
This helped me, coming from an ml background: https://randomrealizations.com/posts/xgboost-explained/
Nevermark•3w ago
GRADIENT

In the context of optimizing parameters of a model, the Gradient consists of all the derivatives of the output being optimized (i.e. the total error measure) with respect to each of the models parameters.

This creates a simplified version of the model, linearized around its current parameter values, making it easy to see which direction to take a small step to move the ultimate output in the direction that is desired.

And easy to see which parameters adjust the desired output more vs. less.

[EDIT] Nx1 1st derivative vector, N = #parameters, 1 = scalar output.

HESSIAN

The Hessian consists of all 2nd order derivatives, i.e. not just slope, but the curvature of the model, around the current parameter values.

Calculating all the first and 2nd degree derivatives takes more calculations and memory, but allows for more information as to which direction to take a learning step. As not only do we know how the output will respond linearly to a small parameter change, but whether larger changes will produce higher or lower than linear responses.

This can allow for the calculation of much larger changes to parameters, with high output improvements, speeding up training considerably, per training step.

But the trade off is each learning step requires more derivative calculations and memory. So a conducive model architecture, and clever tricks, are often needed to make the Hessian worth using, on larger models.

[EDIT] NxNx1 = NxN 2nd derivative matrix, N = #parameters, 1 = scalar output.

JACOBIAN

Another derivative type is the Jacobian, which is the derivate of every individual output (i.e. all those numbers we normally think of as the outputs, not just the final error measure), with respect to every parameter.

Jacobians can become enormous matrices. For billions of parameters, on billions of examples, with 100's of output elements, we would get a billions x 100's of billions derivative matrix. So the Jacobians calculation can take enormous amounts of extra computation and memory. But there are still occasions (much fewer) when using it can radically speed up training.

[EDIT] NxQxM 1st derivative matrix, N = #parameters, Q = #samples, M = #output elements

At this point, we have enough computer power and memory available, that all small enough problems should be trained with Jacobians in my view. Levenberg-Marquardt is an optimization algorithm that uses Jacobians. It can be orders of magnitude faster than gradient descent.

tubs•3w ago
You explain well so what I never understood is how the Jacobians aren't the first derivatives themselves?

Also if you have happen to have any suggestions for linear algebra for someone who uses it without really understanding it (I can write a measurement function for an EKF from scratch OK, but I don't really understand why the maths does what it does) I would really appreciate it.

mxwsn•3w ago
The Jacobian is first derivatives, but for a function mapping N to M dimensions. It's the first derivative of every output wrt every input, so it will be an N x M matrix.

The gradient is a special case of the Jacobian for functions mapping N to 1 dimension, such as loss functions. The gradient is an N x 1 vector.

Nevermark•3w ago
[EDIT] Updated original comment to include matrix dimensions.

If you want a serious text that goes through the relevant linear algebra and optimization mathematics in depth up front, Neural Network Design, 2nd edition is a good one. [Disclaimer, co-author]. We took great pains to walk through every conceptual and mathematical topic before we apply those concepts to machine learning. We use MATLAB a lot, which may or may not be helpful.

Another potential option is "Linear Algebra and Optimization for Machine Learning", which looks good and also starts out with linear algebra before machine learning. I haven't read it, but the first 2020 edition gets good reviews, and a second 2026 edition just came out, apparently with a fair amount of positive revision. Given the speed of change, that's nice to see.

Lerc•3w ago
Thank you very much for this description.

If I understand it in a nutshell. If Gradient is the angle Hessian is the curvature.

and Jacobians let you know how much weights contributed to the blue component of something identified as a big blue cat.

I think.

Jacobians look like they could be used to train concept splitters. For instance if an LLM has a grab bag of possible conversation paths, the final embedding would have information for each path, but once the selection is made it could filter the embedding to that path, which would be beneficial for chain of thought using the filtered embedding instead of the predicted token. I always wondered how much the thinking in embedding space carried around remnants of conversation paths not taken.

petters•3w ago
Would be great to see this work continued with some training runs
rahimiali•3w ago
Agreed. But these things have a way of not working out, and one the sadness, one forgets to celebrate the intermediate victories. I wanted to share an intermediate victory before reality crushes the joy.
holg•3w ago
Great work. Making the Hessian calculation linear in depth is a solid intermediate step. Thanks for sharing this; I look forward to seeing the final results as this research matures.