Tversky Neural Networks

https://gonzoml.substack.com/p/tversky-neural-networks

131•che_shr_cat•5mo ago

Comments

heyitsguay•5mo ago

Seems cool, but the image classification model benchmark choice is kinda weak given all the fun tools we have now. I wonder how Tversky probes do on top of DINOv3 for building a classifier for some task.

throwawaymaths•5mo ago

crawl walk run.

no sense spending large amounts of compute on algorithms for new math unless you can prove it can crawl.

heyitsguay•5mo ago

It's the same amount of effort benchmarking, just a better choice of backbone that enables better choices of benchmark tasks. If the claim is that a Tversky projection layer beats a linear projection layer today, then one can test whether that's true with foundation embedding models today.

It's also a more natural question to ask, since building projections on top of frozen foundation model embeddings is both common in an absolute sense, and much more common, relatively, than building projections off of tiny frozen networks like a ResNet-50.

dkdcio•5mo ago

> Another useful property of the model is interpretability.

Is this true? my understanding is the hard part about interpreting neural networks is that there are many many neurons, with many many interconnections, not that the activation function itself is not explainable. even with an explainable classifier, how do you explain trillions of them with deep layers of nested connections

bobmarleybiceps•5mo ago

I've decided 100% of papers saying their modification of a neural network is interpretable are exaggerating.

tpoacher•5mo ago

Personally, I'm looking forward to MNNs: Mansplainable Neural Networks.

abeppu•5mo ago

I think the case for interpretability could have been made better, but in Figure 3 I think if you look at the middle "prototype" rows from the traditional vs Tversky layers, and scroll so you can't see the rows above, I think you could pick out mostly which Tversky prototype corresponds to each digit, but not which traditional/linear prototype corresponds to each digit.

So I do think that's more interpretable in two ways:

1. You can look at specific representations in the model and "see" what they "mean"

2. This means you can give a high-level interpretation to a particular inference run: "X_i is a 7 because it's like this prototype that looks like a 7, and it has some features that only turn up in 7s"

I do think complex models doing complex tasks will sometimes have extremely complex "explanations" which may not really communicate anything to a human, and so do not function as an explanation.

sdenton4•5mo ago

It's wishful thinking.

Neutral networks need to be over parameterized to find good solutions, meaning there is a surface of solutions. The optimization procedure tries to walk towards that surface as quickly as possible, and tend to find a low-energy point on the surface of solutions. In particular, a low energy solution isn't sparse, and therefore isn't interpretable.

c32c33429009ed6•5mo ago

Interesting; can you provide some references for this way of thinking?

Lerc•5mo ago

It seems a bit much to stick a Proper Noun in front of Neural Networks and call it a new paradigm.

I can see how that worked for KANs because weights and activations are the bread and butter of Neural networks. Changing the activations kind-of does make a distinct difference. I still thing there's merit in having learnable weights and activations together, but that's not very Kolmogorov Arnold theorem, so activations only seemed like a decent start point (but I digress).

This new thing seems more like just switching out one bit of the toolkit for another. There are any number of ways to measure how a bunch of values are like another bunch of values. Cosine similarity, despite sounding all intellectual is just a dot product wearing a lab coat and glasses. I assume it is easily acknowledged as not the best metric, but really can't be beat for performance if you have a lot of multiply units lying around.

It would be worth combining this research with the efforts on translating one embedding model to another. Transferring between metrics might allow you to pick the most appropriate one at specific times.

roger_•5mo ago

Interesting, can this be applied to regression?

tpoacher•5mo ago

Fools. Everybody knows a TLA (three-letter acronym) is instantly more marketable than a two-letter one (also abbreviated TLA, but we don't talk about Bruno and all that jazz).

You should have called it the Amos-Tversky Network, abbreviated ATN. An extra letter instantly increases the value of the algorithm by three orders of magnitude, at least. What, you think KAN was an accident? Amateurs.

Now you just sound like you're desperately trying to piggy-back on an existing buzzword, which has the same feel as "from the producer of Avatar" does.

Everybody knows a catchy name is more important than the technology itself. The catchy title creates citations, and citations create traction. And good luck getting cited with a two-letter acronym. Everybody knows it's the network effect that drives adoption, not quality; just look at MS Windows.

What. You think anyone gave a rat's ass about nanotechnology back when it was still just called "chemistry"?

Tiny C Compiler

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Vocal Guide – belt sing without killing yourself

Al Lowe on model trains, funny deaths and working with Disney

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Start all of your commands with a comma (2009)

The F Word

Show HN: A luma dependent chroma compression algorithm (image compression)

I write games in C (yes, C) (2016)

Eigen: Building a Workspace

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

The AI boom is causing shortages everywhere else

Selection rather than prediction

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Learning from context is harder than we thought

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience