Anyone with a billion dollars want to try this and report back?
nullc•23m ago
From the paper it appears that it's probably more useful on small-ish models.
aetherspawn•36m ago
It makes sense to me that distributing across more parameters results in models that can be quant more heavily (information theory - more bits available)
I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size
lwansbrough•38m ago
nullc•23m ago