If that's your bar for quality, I'll think less of your code. I can't help it.
Also your saxpy example seems to be daxpy. s and d are short for single or double precision.
If you use diagrams like this, at least ensure they are accurately conveying the right understanding.
And in general, listen to the person I'm responding to -- be really deliberate with your graphics or omit. Most AI-generated diagrams are crap.
acc_10000•2d ago
ironkernel lets you write element-wise expressions with a Python decorator, compiles them to a Rust expression tree at definition time, and executes via rayon on all cores. ~2k lines of Rust, ~500 lines of Python.
The win is expression fusion: NumPy evaluates `where(x > 0, sqrt(abs(x)) + sin(x), 0)` as 5 passes with 4 temporaries. ironkernel fuses into 1 pass, zero temporaries, and skips dead branches (no NaN from sqrt of negatives). 2.25x NumPy on compound expressions at 10M elements. For BLAS ops like SAXPY, NumPy is faster — ironkernel doesn't call BLAS.
Early stage: f64 only, 1-D only, expression subset only (intentional — parallel safety guarantee). Numba warm is 3.2x faster (LLVM JIT vs interpreter).