Anyone with a billion dollars want to try this and report back?
nullc•4m ago
From the paper it appears that it's probably more useful on small-ish models.
aetherspawn•17m ago
It makes sense to me that distributing across more parameters results in models that can be quant more heavily (information theory - more bits available)
I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size
lwansbrough•18m ago
nullc•4m ago