frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

An illustrated guide to automatic sparse differentiation

https://iclr-blogposts.github.io/2025/blog/sparse-autodiff/
137•mariuz•7mo ago

Comments

nathan_douglas•7mo ago
Picking my way through this slowly... I'm familiar with autodiff but some of these ideas are very new to me. This seems really, really exciting though.
whitten•7mo ago
This paper is written by three Europeans who clearly understand these mathematical ideas.

Is this type of analysis a part of a particular mathematical heritage ?

What would it be called ?

Is this article relevant ? https://medium.com/@lobosi/calculus-for-machine-learning-jac...

ghurtado•7mo ago
I quickly realized it was approximately 20,000 ft over my head, but I still power through these sort of things to see if anything "sticks".

So far, nothing but I'll keep trying ..

molticrystal•7mo ago
Maybe someone else can summarize more accurately or do a better job but I'll take a shot:

The Jacobian often appears in the final segment of a three part calculus series when exploring chain rules and variable transformations. Look up the Jacobian used in converting between x,y,z and spherical coordinates ρ,φ,θ and note its matrix structure. Skimming your Medium Lobosi article it seems it emphasizes this aspect.

The Jacobian also serves another purpose. As stated in the OP's article "The Jacobian operator Df:x⟼Df(x) is a linear map which provides the best linear approximation of f around a given point x."

We like approximations, we can make a speed vs accuracy/memory trade off, you only have so much space in a register or memory cell, and trying to get more accuracy past a certain point takes more memory/computations/time.

The article then notes that many computations involve Jacobians with sparse matrices meaning some matrix elements can be ignored so we don't have to waste our time on them if handled cleverly.

Subsequent sections cover methods to identify and label sparsity patterns. The article explains how applying their proposed coloring techniques to large matrices common in machine learning yields significant efficiency gains.

As far as the mathematical heritage, I don't know the family tree, but I suspect it stems from courses blending matrix theory linear algebra and algorithms so you'd want the computer science version of such math. Functional approximation ties to numerical methods though I am uncertain if introductory texts cover Jacobians. Check out Newton's method to grasp its mechanics and understand how that works then explore its Jacobian extension. For the coloring aspect graph theory is where to turn. You can learn its basics with minimal prerequisites by studying the seven bridges problem or the map coloring problem, do the five color version. Many of these concepts can be simplified into small programming projects. They will not rival Matlab but they will solidify your understanding.

yorwba•7mo ago
The blog post mentions in an aside that "The authors of this blog post are all developers of the ASD ecosystem in Julia." which might be the closest thing to an intellectual school that this kind of work is associated with.
gdalle•7mo ago
Blog post author here, happy to answer any questions you may have!

The prerequisites for understanding the blog post are an undergrad course in calculus and linear algebra, and some graph theory. I can look up some accessible resources if you're interested :)

JohnKemeny•7mo ago
Does this article exist as a (LaTeX) pdf for printing too?
funks_•7mo ago
We don’t have plans for that, but you could try to convert the Markdown source: https://github.com/iclr-blogposts/2025/blob/main/_posts/2025...
gdalle•7mo ago
Our Arxiv preprint is a slightly longer read, available in PDF form with more precise descriptions: https://arxiv.org/abs/2501.17737
carterschonwald•7mo ago
Thx!

It’s always fun to see new flavors of AD work. My attempts in that direction haven’t been the most successful

constantcrying•7mo ago
>Is this type of analysis a part of a particular mathematical heritage ?

It is a mixture of two very much related areas of mathematics. Analysis, called calculus in the US, and numerics.

The ideas behind automatic differentiation arise from the question of how to compute the derivative of a function on a computer. The "derivative" part is the Analysis part and the "on a computer" part is the numerics.

As it turns out writing down the formal definition of the derivative and approximating it on a computer has many undesirable properties. So alternative approaches, like AD, were developed. But AD is much older than the recent Neural network trend.

imtringued•7mo ago
The hessian is needed for optimization and this blog post was most likely motivated by improving the method used in the precursor blog post: https://iclr-blogposts.github.io/2024/blog/bench-hvp/
funks_•7mo ago
Yes, this blog post indeed inspired us to submit ours!
rdyro•7mo ago
A really cool post and a great set of visualizations!

Computing sparse Jacobians can save a lot of compute if there's a real lack of dependency between part of the input and the output. Discovering this automatically through coloring is very appealing.

Another alternative is to implement sparse rules for each operation yourself, but that often requires custom autodiff implementations which aren't easy to get right, I wrote a small toy version of a sparse rules-based autodiff here: https://github.com/rdyro/SpAutoDiff.jl

Another example (a much more serious one) is https://github.com/microsoft/folx

gdalle•7mo ago
You might also be interested in Spadina for Enzyme. There are no Julia bindings yet but I’d be excited if someone made them! https://c.wsmoses.com/presentations/weuroad23.pdf
oulipo•7mo ago
Sparsely-related question: is the blog style/css open-source?
molticrystal•7mo ago
It seems to be based off of Al-Folio, MIT licensed

https://github.com/alshedivat/al-folio

FilosofumRex•7mo ago
The classic reference on the subject is "Numerical Linear Algebra" by Lloyd Trefethen. Skip to the last chapter on the iterative methods for computational aspects. You'll learn a lot more and faster with Matlab.

https://davidtabora.wordpress.com/wp-content/uploads/2015/01...

A short overview is chapter 11 in Gilbert Strangs's Intro to linear Algebra https://math.mit.edu/~gs/linearalgebra/ila5/linearalgebra5_1...

AD comes from a different tradition - dating back to FORTRAN 77 programers attempt to differentiate non-elementary functions (For Loops, procedural functions, Subroutines, etc). Note the hardware specs for some nostalgia https://www.mcs.anl.gov/research/projects/adifor/

gwf•7mo ago
Not trying to "Schmidhuber" this or anything, but I think my 1999 NIPS paper gives a cleaner derivation and explanation for working on the Jacobian. In it, I derive a Jacobian operator that allows you to compute arbitrary products between the Jacobian and any vector, with complexity that is comparable to standard backprop.

[*] G.W. Flake & B.A. Pearlmutter, "Differentiating Functions of the Jacobian with Respect to the Weights," https://proceedings.neurips.cc/paper_files/paper/1999/file/b...

goosedragons•7mo ago
There is automatic sparse differentiation available in the R ecosystem. That's what the RTMB & TMB packages do.
adgjlsfhk1•7mo ago
That's really impressive! I can't even imagine implimenting sparsity tracing in a language as dynamic and hard to compile as R.
goosedragons•7mo ago
I think they do sparsity tracing and tape construction in C++ behind the scenes. RTMB does some wacky thing abusing complex values to let you write the function code in R and pass that off somehow.
patrick451•7mo ago
The optimal control framework Casadi has had the ability to compute sparse jacobians and hessians for a long time (maybe a decade?), which come up all the time in trajectory optimization.This not only provides massive speed ups in both the differentiation and linear solver time, but also greatly reduces the memory requirements. If this catches on in machine learning, it will be interesting to see if we can finally move past first order optimization methods.
gdalle•7mo ago
Indeed, CasADi is among the precursors in this area! The key difference with our approach is their use of a domain-specific language, with distinct mathematical functions and array types. This has lots of benefits, but it expects users to rewrite their existing code in the CasADi formalism. What we seek to achieve in Julia is compatibility with native code, without a DSL-imposed refactor. We share this ambition with the broader Julia autodiff ecosystem, which is focused on differentiating the language as a whole. Of course it doesn't always work, but in many cases, it enables a plug-and-play approach to (sparse) autodiff which makes really cool applications possible.

We built another object storage

https://fractalbits.com/blog/why-we-built-another-object-storage/
60•fractalbits•2h ago•11 comments

Java FFM zero-copy transport using io_uring

https://www.mvp.express/
26•mands•5d ago•6 comments

How exchanges turn order books into distributed logs

https://quant.engineering/exchange-order-book-distributed-logs.html
50•rundef•5d ago•17 comments

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

https://developer.apple.com/documentation/macos-release-notes/macos-26_2-release-notes#RDMA-over-...
467•guiand•18h ago•237 comments

AI is bringing old nuclear plants out of retirement

https://www.wbur.org/hereandnow/2025/12/09/nuclear-power-ai
35•geox•2h ago•26 comments

Sick of smart TVs? Here are your best options

https://arstechnica.com/gadgets/2025/12/the-ars-technica-guide-to-dumb-tvs/
435•fleahunter•1d ago•363 comments

Photographer built a medium-format rangefinder, and so can you

https://petapixel.com/2025/12/06/this-photographer-built-an-awesome-medium-format-rangefinder-and...
78•shinryuu•6d ago•10 comments

Apple has locked my Apple ID, and I have no recourse. A plea for help

https://hey.paris/posts/appleid/
873•parisidau•10h ago•450 comments

GNU Unifont

https://unifoundry.com/unifont/index.html
288•remywang•18h ago•68 comments

A 'toaster with a lens': The story behind the first handheld digital camera

https://www.bbc.com/future/article/20251205-how-the-handheld-digital-camera-was-born
42•selvan•5d ago•18 comments

Beautiful Abelian Sandpiles

https://eavan.blog/posts/beautiful-sandpiles.html
84•eavan0•3d ago•16 comments

Rats Play DOOM

https://ratsplaydoom.com/
335•ano-ther•18h ago•123 comments

Show HN: Tiny VM sandbox in C with apps in Rust, C and Zig

https://github.com/ringtailsoftware/uvm32
167•trj•17h ago•11 comments

OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

https://simonwillison.net/2025/Dec/12/openai-skills/
481•simonw•15h ago•272 comments

Computer Animator and Amiga fanatic Dick Van Dyke turns 100

110•ggm•6h ago•23 comments

Will West Coast Jazz Get Some Respect?

https://www.honest-broker.com/p/will-west-coast-jazz-finally-get
10•paulpauper•6d ago•2 comments

Formula One Handovers and Handovers From Surgery to Intensive Care (2008) [pdf]

https://gwern.net/doc/technology/2008-sower.pdf
82•bookofjoe•6d ago•33 comments

Show HN: I made a spreadsheet where formulas also update backwards

https://victorpoughon.github.io/bidicalc/
179•fouronnes3•1d ago•85 comments

Freeing a Xiaomi humidifier from the cloud

https://0l.de/blog/2025/11/xiaomi-humidifier/
126•stv0g•1d ago•51 comments

Obscuring P2P Nodes with Dandelion

https://www.johndcook.com/blog/2025/12/08/dandelion/
57•ColinWright•4d ago•1 comments

Go is portable, until it isn't

https://simpleobservability.com/blog/go-portable-until-isnt
120•khazit•6d ago•101 comments

Ensuring a National Policy Framework for Artificial Intelligence

https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-nati...
169•andsoitis•1d ago•217 comments

Poor Johnny still won't encrypt

https://bfswa.substack.com/p/poor-johnny-still-wont-encrypt
52•zdw•10h ago•66 comments

YouTube's CEO limits his kids' social media use – other tech bosses do the same

https://www.cnbc.com/2025/12/13/youtubes-ceo-is-latest-tech-boss-limiting-his-kids-social-media-u...
86•pseudolus•3h ago•67 comments

Slax: Live Pocket Linux

https://www.slax.org/
41•Ulf950•5d ago•5 comments

50 years of proof assistants

https://lawrencecpaulson.github.io//2025/12/05/History_of_Proof_Assistants.html
107•baruchel•15h ago•17 comments

Gild Just One Lily

https://www.smashingmagazine.com/2025/04/gild-just-one-lily/
29•serialx•5d ago•5 comments

Capsudo: Rethinking sudo with object capabilities

https://ariadne.space/2025/12/12/rethinking-sudo-with-object-capabilities.html
75•fanf2•17h ago•44 comments

Google removes Sci-Hub domains from U.S. search results due to dated court order

https://torrentfreak.com/google-removes-sci-hub-domains-from-u-s-search-results-due-to-dated-cour...
193•t-3•11h ago•35 comments

String theory inspires a brilliant, baffling new math proof

https://www.quantamagazine.org/string-theory-inspires-a-brilliant-baffling-new-math-proof-20251212/
167•ArmageddonIt•22h ago•154 comments