In alpha, we ship a pipelined-ring strategy inspired by PRIMA.CPP. dnet’s solver (distilp) extends it so devices can punch above memory: layers stream from disk mid-round and overlap with compute, so total model size can exceed total cluster RAM.
Please let us know if you have any questions or feedback!
driaforall•7m ago
We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit.
https://github.com/firstbatchxyz/dnet?tab=readme-ov-file
In alpha, we ship a pipelined-ring strategy inspired by PRIMA.CPP. dnet’s solver (distilp) extends it so devices can punch above memory: layers stream from disk mid-round and overlap with compute, so total model size can exceed total cluster RAM.
Please let us know if you have any questions or feedback!