From the memory access patterns, the biggest architectural difference seems to be prefetch behavior.
M4 appears to use a much more aggressive prefetcher that strongly assumes forward-linear access and delivers very high bandwidth when that assumption holds, but performance drops sharply when the access pattern breaks.
M1 is more conservative, with lower peak bandwidth but more consistent performance across different access directions and patterns.
Macbook Air M1 Bandwidth & latency:
Main Memory Bandwidth Tests (multi-threaded, 8 threads):
Read : 58.63377 GB/s (Total time: 9.15634 s)
Write: 33.19277 GB/s (Total time: 16.17433 s)
Copy : 58.52319 GB/s (Total time: 18.34729 s)
Main Memory Latency Test (single-threaded, pointer chase): Total time: 21.11410 s
Average latency: 105.57 ns
Macbook Air M1 Patterns:Sequential Forward:
Read : 58.573 GB/s
Write: 33.202 GB/s
Copy : 58.712 GB/s
Sequential Reverse: Read : 62.390 GB/s (+6.5%)
Write: 33.116 GB/s (-0.3%)
Copy : 59.786 GB/s (+1.8%)
Strided (Cache Line - 64B): Read : 31.476 GB/s (-46.3%)
Write: 16.584 GB/s (-50.1%)
Copy : 38.262 GB/s (-34.8%)
Strided (Page - 4096B): Read : 6.288 GB/s (-89.3%)
Write: 14.165 GB/s (-57.3%)
Copy : 8.610 GB/s (-85.3%)
Random Uniform: Read : 5.282 GB/s (-91.0%)
Write: 15.219 GB/s (-54.2%)
Copy : 8.259 GB/s (-85.9%)
Mac Mini M4 Bandwidth & Latency:Main Memory Bandwidth Tests (multi-threaded, 10 threads):
Read : 114.74709 GB/s (Total time: 4.67873 s)
Write: 68.55840 GB/s (Total time: 7.83086 s)
Copy : 105.60609 GB/s (Total time: 10.16742 s)
Main Memory Latency Test (single-threaded, pointer chase): Total time: 19.47850 s
Average latency: 97.39 ns
Mac Mini M4 Patterns:Sequential Forward:
Read : 114.754 GB/s
Write: 65.959 GB/s
Copy : 105.532 GB/s
Sequential Reverse: Read : 70.724 GB/s (-38.4%)
Write: 37.437 GB/s (-43.2%)
Copy : 88.743 GB/s (-15.9%)
Strided (Cache Line - 64B): Read : 35.369 GB/s (-69.2%)
Write: 18.655 GB/s (-71.7%)
Copy : 52.914 GB/s (-49.9%)
Strided (Page - 4096B): Read : 9.072 GB/s (-92.1%)
Write: 24.492 GB/s (-62.9%)
Copy : 12.979 GB/s (-87.7%)
Random Uniform: Read : 6.342 GB/s (-94.5%)
Write: 21.752 GB/s (-67.0%)
Copy : 11.142 GB/s (-89.4%)
user_timo•1mo ago
Measures cache latency (pointer chase) and memory bandwidth (read/write/copy). ARM64-native.
Looking for feedback on methodology and results across different M-series chips. I have been able to test this on M4 Mac mini and M1 MacBook Air.
Can be installed with Brew.