The task is intentionally minimal:
detect newlines (\n)
for each newline, identify the first non-space character that follows
Scanning for newlines alone is trivial and runs at memory bandwidth. As soon as I add “find the first non-space after each newline,” throughput drops sharply.
There’s no branching, no backtracking, no variable-length tokens. In theory this should still be a linear, bandwidth-bound pass, but adding this second condition introduces a dependency I don’t know how to express efficiently in SIMD.
I’m interested in algorithmic / data-parallel approaches to this problem — not micro-optimizations. If you treat this as a SIMD coding challenge, what approach would you try?
Another formulation:
# Bit-Parallel Challenge: O(1) "First Set Bit After Each Set Bit"
Given two 64-bit masks `A` and `B`, count positions where `B[i]=1` and the nearest set bit in `A|B` before position `i` is in `A`.
Equivalently: for each segment between consecutive bits in `A`, does `B` have any bit set?
*Example:* `A=0b10010000`, `B=0b01100110` → answer is 2 (positions 1 and 5)
Newline scan alone: 90% memory bandwidth. Adding this drops to 50%.
Is there an O(1) bit-parallel solution using x86 BMI/AVX2, or is O(popcount(A)) the lower bound?
pestatije•58m ago
[5] codereview.stackexchange.com
zokrezyl•48m ago
zokrezyl•38m ago
One approach....