To compile SQLite, I use wasi-sdk, which uses wasi-libc, which is based on musl. It's been said that musl is slow(er than glibc), which is true, to a point.
musl uses SWAR on a size_t to implement various functions in string.h. This is fine, except size_t is just 32-bit on Wasm.
I found that implementing a few of those functions with Wasm SIMD128 can make them go around 4x faster.
Other functions don't even use SWAR; redoing those can make them 16x faster.
Smooth sort also has trouble pulling its own weight; a Shell sort seems both simpler and faster, while similarly avoiding recursion, allocations and the addressable stack.
I found that using SIMD intrinsics (rather than SWAR) makes it easier to avoid UB, but the code would definitely benefit from more eyeballs.
See this for some benchmarks on both x86-64 and Aarch64: https://github.com/ncruces/go-sqlite3/actions/runs/145169318...
phickey•22h ago
ncruces•22h ago
I've also only really tested wazero. I can't know for sure that this is a straight improvement for other runtimes and architectures.
For instance, the code delays using wasm_i8x16_bitmask as much as possible, because on Aarch64 it can be slower than not using SIMD at all, whereas it's plenty fast on x86-64.
phickey•22h ago
ncruces•21h ago
One of the nice things about Go is how much that's a solved issue out of the box, compared to almost everything else; certainly compared to C.
Pinging them in an issue: https://github.com/WebAssembly/wasi-libc/issues/580