A more interesting example might be if each slot in the target array has the next index to go to in addition to the value, then you will introduce a dependency chain preventing this from happening.
Another thing I noticed is that the spike on the left hand side of his graphs is the overhead of file access.
Without this overhead, small array random access should have a lot better per-element cost.
However, on some processors there's a data-dependent prefetcher that will notice the pointer-like value and start prefetching that address before the CPU requests it.
https://www.forrestthewoods.com/blog/memory-bandwidth-napkin...
I’m not sure I agree with the data presentation format. “time per element” doesn’t seem like the right metric.
Using something like the overall run length would have such large variations making only the shape of the graph particularly useful (to me) less so much the values themselves.
If I was showing a chart like this to "leadership" I'd show with the overall run length. As I'd care more about them realizing the "real world" impact rather than the per unit impact. But this is written for engineers, so I'd expect it to also be focused on per unit impacts for a blog like this.
However, having said all that, I'd love to hear what your reservations are using it as a metric.
> Random access from the cache is remarkably quick. It's comparable to sequential RAM performance
That's actually expected once you think about it, it's a natural consequence of prefetching.
Presumably if you'd split the elements into 16 shares (one for each CPU), summed with 16 threads, and then summed the lot at the end, then random would be faster than sorted?
Adhyyan1252•4h ago