perfect for learning how ml frameworks work under the hood :)
perfect for learning how ml frameworks work under the hood :)
We had all these issues back in 2006 when my group was implementing autograd for C++ and, later, a computer algebra system called Axiom. We knew it'd be ideal for NN; I was trying to build this out for my brother who was porting AI models to GPUs. (This did not work in 2006 for both HW & math reasons.)
So, the killer cost is at compile time, not runtime, which is fundamental to the underlying autograd operation.
On the flip side, it's 2025, not 2006, so pro modern algorithms & heuristics can change this story quite a bit.
All of this is spelled out in Griewank's work (the book).
i think you might be interested in MLIR/IREE: https://github.com/openxla/iree
Do you mean the method theano is using? Anyway, the performance bottleneck often lies in matrix multiplication or 2D-CNN (which can be reduced to matmul). Compiler autograd wouldn't save much time.
Edit: I asked this before I read the design decisions. Reasoning is, as far as I understand, that for simplificity no in-place operations hence accumulating it done on a new tensor.
https://github.com/sueszli/autograd.c/blob/main/src/autograd...
i wonder whether there is a more clever way to do this without sacrificing simplicity.
sueszli•1mo ago
if you are interested in the technical details, the design specs are here: https://github.com/sueszli/autograd.c/blob/main/docs/design....
if you are working on similar mlsys or compiler-style projects and think there could be overlap, please reach out: https://sueszli.github.io/