Why is it currently not feasible to just keep those in flash memory (fast PCIe SSD Raid or somesuch), and only use RAM for intermediate values/results?
Even modest success on this front seems very attractive to me, because Flash storage appears much cheaper and easier to scale than GPU memory right now.
Are there any efforts in this direction? Is this a flawed approach for some reason, or am I fundamentally misunderstanding things?
sunscream89•2h ago