> The recommendation for 64 byte alignment comes from the Intel performance guide that recommends alignment of memory to match SIMD register width. The specific padding length was chosen because it matches the largest SIMD instruction registers available on widely deployed x86 architecture (Intel AVX-512).
> The recommended padding of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler, efficient and CPU cache-friendly code. In other words, we can load the entire 64-byte buffer into a 512-bit wide SIMD register and get data-level parallelism on all the columnar values packed into the 64-byte buffer. Guaranteed padding can also allow certain compilers to generate more optimized code directly (e.g. One can safely use Intel’s `-qopt-assume-safe-padding`).
westurner•1h ago
Arrow Columnar Format > Buffer Alignment and Padding: https://arrow.apache.org/docs/format/Columnar.html :
> The recommendation for 64 byte alignment comes from the Intel performance guide that recommends alignment of memory to match SIMD register width. The specific padding length was chosen because it matches the largest SIMD instruction registers available on widely deployed x86 architecture (Intel AVX-512).
> The recommended padding of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler, efficient and CPU cache-friendly code. In other words, we can load the entire 64-byte buffer into a 512-bit wide SIMD register and get data-level parallelism on all the columnar values packed into the 64-byte buffer. Guaranteed padding can also allow certain compilers to generate more optimized code directly (e.g. One can safely use Intel’s `-qopt-assume-safe-padding`).