Show HN: ZSV – A fast, SIMD-based CSV parser and CLI toolkit

1•mattewong•3h ago

Comments

mattewong•3h ago

Hi HN, I'm the author of zsv.

zsv was built because I needed a library to integrate with my application, and other CSV parsers had one or more of a variety of limitations (couldn't handle "real-world" CSV or malformed UTF8, were too slow, degraded when used on very large files, couldn't compile to web assembly, could not handle multi-row headers (seems like basically none of the other CSV parsers do this) etc-- more details are in the repo README). The closest solution to what I wanted was xsv, but was not designed as an API and I still needed a lot of flexibility that wasn't already built into it.

My first inclination was to use flex/bison but that approach yielded surprisingly slow performance; SIMD had just been shown to be useful in unprecedented performance gains for JSON parsing, so a friend and I took a page from that approach to create what afaik (though I could be wrong) is now the fastest CSV parser (and most customizable as well) that properly handles "real-world" CSV.

When I say "real-world CSV": if you've worked with CSV in the wild, you probably know what I mean, but feel free to check out the README for a more technical explanation.

With parser built, I found that some of the use cases I needed it for were generic, so I wrapped them up in a CLI. Most of the CLI commands are run-of-the-mill stuff: echo, select, count, sql, pretty, 2tsv, stack. Some of the commands are harder to find in other utilities: compare (cell-level comparison with customizable numerical tolerance-- useful when, for example, comparing CSV vs data from a deconstructed XLSX, where the latter may look the same but technically differ by < 0.000001), serialize/flatten, 2json (multiple different JSON schema output choices). A few are not directly CSV-related, but dovetail with others, such as 2db, which converts 2json output to sqlite3 with indexing options, allowing you to run e.g. `zsv 2json my.csv --unique-index mycolumn | zsv 2db -t mytable -o my.db`.

I've been using zsv for years now in commercial software running bare metal and also in the browser (see e.g. https://liquidaty.github.io/zsv/), so I finally got around to tagging v1.0.1 as the first production-ready release.

I'd love for you to try it out and would welcome any feedback, bug reports, or questions.

The Muscular Compassion of "Paper Girl"

Collatz Automata

What antidepressants do to your brain and body

Linux Proposed Cache Aware Scheduling Benchmarks Show Big Potential on AMD Turin

Cyberthreats surge against US logistics infrastructure

Trump pauses federal surge to San Francisco

Beyond Arithmetic: Understanding Computation and Computers

Avocados, auto parts, and ambushes: Inside Mexico's cargo theft crisis

Fat-chomping enzyme that moonlights as gene regulator could treat obesity

Shahed-136 prototype was created in 1980s Germany, and it was called DAR

Trump pardons Binance founder Changpeng Zhao, high-profile cryptocurrency figure

The Great AdTech Fork: Prebid vs. OpenAds

Show HN: xCapture v3 for thread-level dimensional performance analysis with eBPF

Ireland Becomes an Associate Member State of CERN

Use of full-body restraints during deportation raises humanitarian concerns

Google drops a key program for boosting women in tech

Chroma Cloud Sync

Show HN: BesiegeField – LLM Agents Learn to Build Machines in a Physics Sandbox

IBM Launches Granite Version 4.0 and Granite-Docling

Dinosaurs were thriving until asteroid struck, research suggests

GM says hands-free, eyes-off driving is coming to Escalade IQ in 2028

One Year with Next.js App Router – Why We're Moving On

Show HN: OpenSnowcat – A fork of Snowplow to keep open analytics alive

Apple considering buying Warner Bros., including HBO

A New Paradigm for Protecting Homes from Disastrous Fires

Is This the New 'Scariest Chart in the World'?

Is OpenAI Firing Us?

Ask HN: Where should an experienced developer start learning AI development?

Show HN: I built drift journal, a simple app to reflect

Show HN: Generative UI: OSS "Imagine with Claude"