The borrow checker had opinions. You can't .filter() over a bit array (immutable borrow) and .for_each() mutate it simultaneously. Fair enough — explicit loops for the sieve phase, iterators for collection. Same assembly output, provably safe. Along the way I caught two assertion bugs the sieve was too correct to trigger (off-by-one on the 10,001st prime; 499,999 is composite, not prime).
The flat sieve hit a wall at ~10M — the bit array exceeds L1 cache (32KB) and every sieving pass thrashes. So I built a segmented version: bootstrap sieving primes via flat sieve up to √n, then process the full range in 32KB L1-sized segments. One buffer, reused, never leaves cache.
Results (single-threaded, 25 iterations, cross-verified against primal crate):
n=500K: ~440µs, 30KB sieve mem
n=10M: ~10ms, 32KB sieve mem
n=50M: ~52ms, 32KB sieve mem (flat: ~67ms)
n=100M: ~137ms, 32KB sieve mem (flat: ~190ms)
Compared to the primal crate (Rust's best-in-class), raw speed is ~2x behind — primal uses wheel-30 factorisation which skips multiples of 2, 3, and 5. This sieve only skips evens. But memory is where it wins: 187x less sieve working memory than primal at n=50M (32KB vs 6MB), and a tighter result vector from prime-counting pre-allocation.The target use case is embedded/constrained: ESP32 nodes, Raspberry Pi Zeros, distributed timing networks where every KB matters. It's also a clean reference implementation — one file, zero dependencies, compiles with a single rustc invocation.
Single file. No Cargo. No crates. Just:
rustc -C opt-level=3 -C target-cpu=native primer.rs && ./primer
Code, benchmarks, borrow checker writeup, and a full development narrative are all in the repo.