What started as a small utility for a personal project turned into a month-long deep dive into performance engineering, SIMD, and the surprisingly sharp edges of CSV parsing. My goal was straightforward: build a simple, zero-allocation CSV iterator and writer in Zig that could handle real-world inputs without sacrificing performance.
Along the way, I explored a number of parsing strategies, including approaches inspired by a well-known paper on SIMD-accelerated JSON parsing. While that technique is elegant and highly effective for JSON, I found it didn’t translate cleanly to CSV—at least not without giving up more performance than I was willing to accept. CSV’s delimiter-heavy structure and quoting rules demanded a different approach.
After iterating through several designs and benchmarking them against each other, I eventually converged on a technique that consistently outperformed my earlier implementations. When I compared the final version against some of the fastest CSV libraries I could find, the results were better than I expected.
This post walks through the design decisions behind csv-zero, the tradeoffs I made, and the techniques that ended up mattering the most. It’s also a bit of a love letter to Zig: working in the language made it much easier to reason about memory, data layout, and performance, and pushed me to tackle problems I would have otherwise avoided.
If you’re curious about SIMD-based parsing, zero-allocation APIs, or just want to see how far you can push CSV, read on.
peymo•1h ago
What started as a small utility for a personal project turned into a month-long deep dive into performance engineering, SIMD, and the surprisingly sharp edges of CSV parsing. My goal was straightforward: build a simple, zero-allocation CSV iterator and writer in Zig that could handle real-world inputs without sacrificing performance.
Along the way, I explored a number of parsing strategies, including approaches inspired by a well-known paper on SIMD-accelerated JSON parsing. While that technique is elegant and highly effective for JSON, I found it didn’t translate cleanly to CSV—at least not without giving up more performance than I was willing to accept. CSV’s delimiter-heavy structure and quoting rules demanded a different approach.
After iterating through several designs and benchmarking them against each other, I eventually converged on a technique that consistently outperformed my earlier implementations. When I compared the final version against some of the fastest CSV libraries I could find, the results were better than I expected.
To make those comparisons reproducible, I put together a small benchmark suite here: https://github.com/peymanmortazavi/csv-race
And the actual implementation is here: https://github.com/peymanmortazavi/csv-zero
This post walks through the design decisions behind csv-zero, the tradeoffs I made, and the techniques that ended up mattering the most. It’s also a bit of a love letter to Zig: working in the language made it much easier to reason about memory, data layout, and performance, and pushed me to tackle problems I would have otherwise avoided.
If you’re curious about SIMD-based parsing, zero-allocation APIs, or just want to see how far you can push CSV, read on.