Upshot: Gaussian sampling along the parameters of nodes rather than a fixed number. This might offer one of the following:
* Better inference time accuracy on average
* Faster convergence during training
It probably costs additional inference and training compute.
The paper demonstrates worse results on MNIST, and shows the architecture is more than capable of dealing with the Iris test (which I hadn’t heard of; categorizing types of irises, I presume the flower, but maybe the eye?)
The paper claims to keep the number of parameters and depth the same, but it doesn’t report as to
* training time/flops (probably more I’d guess?)
* inference time/flops (almost certainly more)
Intuitively if you’ve got a mean, variance and mix coefficient, then you have triple the data space per parameter — no word as to whether the networks were normalized as to total data taken by the NN or just the number of “parameters”.
Upshot - I don’t think this paper demonstrates any sort of benefit here or elucidates the tradeoffs.
Quick reminder, negative results are good, too. I’d almost rather see the paper framed that way.
zakeria•2h ago