Then again, maybe even that is wrong! "Notation as a tool for thought" and all that. Maybe "dimension-munging" in APL really is the best way to do these things, once you really understand it.
Anyway, the general problem of having an n-dimensional array and wanting to dynamically… I dunno, it is a little tricky. But, sometimes when I see the examples people pop up with, I wonder how much pressure could be relieved if we just had a nice way of expressing operations on block or partitioned matrices. Like the canonical annoying example of wanting to apply solve using a series of small-ish matrices on a series of vectors, that’s just a block diagonal matrix…
For example, I just cut-and-pasted the author's own cri de coeur into Claude: https://claude.ai/share/1d750315-bffa-434b-a7e8-fb4d739ac89a Presumably at least one of the vectorized versions it replied with will work, although none is identical to the author's version.
When this cycle ends, high-level programs and functions will be as incomprehensible to most mainstream developers as assembly is today. Today's specs are tomorrow's programs.
Not a bad thing, really. And overdue, as the article makes all too clear. But the transition will be a dizzying one, with plenty of collateral disruption along the way.
For transposes, np.einsum can be easier to read as it let's you use (single character, admittedly) axis specifiers to "name" them.
I don’t use numpy often enough - but this explains the many WTF moments why it’s so annoying to get numpy pieces to work together.
https://news.ycombinator.com/item?id=44072775
https://news.ycombinator.com/item?id=44063553
NumPy definitely has some rough edges, I sympathize with the frustrations for sure.
Mathematical operations aren't obliged to respect the Zen of Python, but I like the thought that we could make it so that most have an obvious expression.
Also I both understand the need for declaring the matrices up front and think it's a bit of a shame that it is not seamless.
Here are some (ill-advised?) alternatives:
X, Y, Z = dp.Slots()
with dp.Slot() as X:
...
import dp.Slot as new
X = new['i','j'] = ...
X = dp['i','j'] = ...
I like your alternatives! I agree that having to write
X = dp.Slot()
before assigning to X is unfortunate. I settled on the current syntax mostly just because I thought it was the choice that made it most "obvious" what was happening under the hood. If you really want to, you could use the walrus operator and write (X := dp.Slot())['i','j'] = ...
but this cure seems worse than the disease...Actually, doing something like
X = new['i','j'] = ...
could simplify the implementation. Currently, a Slot has to act both like an Array and a Slot, which is slightly awkward. If new is a special object, it could just return an Array, which would be a bit nicer.[1]: https://numpy.org/doc/stable/reference/generated/numpy.einsu...
The code could be just:
AiX = np.linalg.solve(A, X[:, np.newaxis, :, np.newaxis]).squeeze()
Z = np.vecdot(Y, AiX)
Or if you don't like NumPy 2.0: AiX = np.linalg.solve(A, X[:, np.newaxis, :])
Z = np.einsum("jk,ijk->ij", Y, AiX)
The NumPy 1.x code is very intuitive, you have A[i, j, n] and X[i, n] and you want to use the same X[i] for all A[i, j], so just add an axis in the middle. Broadcast, which the author very strongly refuse to understand, deals with all the messy parts.Alternatively, if you hate `[:, np.newaxis, :]` syntax, you may do:
I, J, K, K = A.shape
AiX = np.linalg.solve(A, X.reshape(I, 1, K, 1)).squeeze()
Z = np.vecdot(Y, AiX)
I can read this faster than nested for-loops. YMMV.Avoiding the special purpose tools designed by and used by the people who work on these problems every day is the instinct of someone who has just started needing them and wants to solve new kinds of problems with the tools they already know. But the people with experience don't do it that way (for a reason), you need to learn what the current best solution is before you can transcend it.
Guys, every few months some well meaning person posts their great new library here that "fixes" something to remove all that pesky complexity. Invariably its made by some beginner with whatever it is who doesn't understand why the complexity exists, and we never hear of them or the library again. This is absolutely one of those times.
Compare ggplot to matplotlib. They both serve the same purpose but one has an API you can actually reason about and use without needing to consult the documentation every five minutes.
> In DumPy, every time you index an array or assign to a dp.Slot, it checks that all indices have been included.
Not having to specify all indices makes for more generic implementations. Sure, the broadcasting rules could be simpler and more consistent, but in the meantime (implicit) broadcasting is what makes NumPy so powerful and flexible.
Also I think straight up vmap would be cleaner IF Python did not intentionally make lambdas/FP so restricted and cumbersome apparently due to some emotional reasons.
solve(X[i,_], Y[j,_], A[i,j,_,_]) = over i, j/+: Y * linalg_solve(A, X)
I wish I could peek at the alternative universe where Numpy just didn’t include broadcasting. Broadcasting is a sort of ridiculous idea. Trying to multiply a NxM matrix by a 1x1 matrix… should return an error, not perform some other operation totally unrelated to matrix multiplication!
Broadcasting is a sort of generalization of the idea of scalar-matrix product. You could make that less "ridiculous" by requiring a Hadamard product with a constant value matrix instead.
This is a big improvement over numpy, but I don't see much of a compelling reason to go back to Python.
Being able to use sane programming workflows is quite a compelling reason.
My solution, very, very inelegant, was to focus on the scalar case, and ask Copilot to vectorize my code. I did that just the last two days. It is not all that easy. Copilot will give you something, but you still need to tweak it. In the end, I still had to write line by line the code, and make sure I understand what it is doing.
But the proposal is very questionable. I like loops instead of broadcast (apply_...), but making them this kind of black magic just adds equal amount of mental load.
As for compiled languages - they do optimize for SIMD, if you're ok running on CPU. Optimizing for CUDA and GPU parallelization... that's quite a niche demand, I think, to justify inconveniences for the more common users.
> So, I removed it. In DumPy you can only do AB if one of A or B is scalar or A and B have exactly the same shape. That’s it, anything else raises an error. Instead, use indices, so it’s clear what you’re doing.
I also dislike the seemingly inconsistent adding of last axes with length 1.
I do like the capability that existing axes of length 1 will be broadcasted to any length of the other array in the operation, and use that all the time.
If I understand the text correctly, the author removed that as well.
the__alchemist•1mo ago