I looked into this because part of our pipeline is forced to be chunked. Most advice I've seen boils down to "more contiguity = better", but without numbers, or at least not generalizable ones.
My concrete tasks will already reach peak performance before 128 kB and I couldn't find pure processing workloads that benefit significantly beyond 1 MB chunk size. Code is linked in the post, it would be nice to see results on more systems.
PhilipTrettner•1h ago
My concrete tasks will already reach peak performance before 128 kB and I couldn't find pure processing workloads that benefit significantly beyond 1 MB chunk size. Code is linked in the post, it would be nice to see results on more systems.