Is this true? my understanding is the hard part about interpreting neural networks is that there are many many neurons, with many many interconnections, not that the activation function itself is not explainable. even with an explainable classifier, how do you explain trillions of them with deep layers of nested connections
So I do think that's more interpretable in two ways:
1. You can look at specific representations in the model and "see" what they "mean"
2. This means you can give a high-level interpretation to a particular inference run: "X_i is a 7 because it's like this prototype that looks like a 7, and it has some features that only turn up in 7s"
I do think complex models doing complex tasks will sometimes have extremely complex "explanations" which may not really communicate anything to a human, and so do not function as an explanation.
Neutral networks need to be over parameterized to find good solutions, meaning there is a surface of solutions. The optimization procedure tries to walk towards that surface as quickly as possible, and tend to find a low-energy point on the surface of solutions. In particular, a low energy solution isn't sparse, and therefore isn't interpretable.
I can see how that worked for KANs because weights and activations are the bread and butter of Neural networks. Changing the activations kind-of does make a distinct difference. I still thing there's merit in having learnable weights and activations together, but that's not very Kolmogorov Arnold theorem, so activations only seemed like a decent start point (but I digress).
This new thing seems more like just switching out one bit of the toolkit for another. There are any number of ways to measure how a bunch of values are like another bunch of values. Cosine similarity, despite sounding all intellectual is just a dot product wearing a lab coat and glasses. I assume it is easily acknowledged as not the best metric, but really can't be beat for performance if you have a lot of multiply units lying around.
It would be worth combining this research with the efforts on translating one embedding model to another. Transferring between metrics might allow you to pick the most appropriate one at specific times.
You should have called it the Amos-Tversky Network, abbreviated ATN. An extra letter instantly increases the value of the algorithm by three orders of magnitude, at least. What, you think KAN was an accident? Amateurs.
Now you just sound like you're desperately trying to piggy-back on an existing buzzword, which has the same feel as "from the producer of Avatar" does.
Everybody knows a catchy name is more important than the technology itself. The catchy title creates citations, and citations create traction. And good luck getting cited with a two-letter acronym. Everybody knows it's the network effect that drives adoption, not quality; just look at MS Windows.
What. You think anyone gave a rat's ass about nanotechnology back when it was still just called "chemistry"?
/s
heyitsguay•5mo ago
throwawaymaths•5mo ago
no sense spending large amounts of compute on algorithms for new math unless you can prove it can crawl.
heyitsguay•5mo ago
It's also a more natural question to ask, since building projections on top of frozen foundation model embeddings is both common in an absolute sense, and much more common, relatively, than building projections off of tiny frozen networks like a ResNet-50.