frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format

https://github.com/rorosen/zeekstd
108•rorosen•19h ago
Hello,

I would like to share a Rust implementation of the Zstandard seekable format I've been working on.

Regular zstd compressed files consist of a single frame, meaning you have to start decompression at the beginning. The seekable format splits compressed data into a series of independent frames, each compressed individually, so that decompression of a section in the middle of an archive only requires zstd to decompress at most a frame's worth of extra data, instead of the entire archive.

I started working with the seekable format because I wanted to resume downloads of big zstd compressed files that are decompressed and written to disk on the fly. At first I created and used bindings to the C functions that are available upstream[1], however, I stumbled over the first segfault rather quickly (it's now fixed) and found out that the functions only allow basic things. After looking closer at the upstream implementation, I noticed that is uses functions of the core API that are now deprecated and it doesn't allow access to low-level (de)compression contexts. To me it looks like a PoC/demo implementation that isn't maintained the same way as the zstd core API, probably that's also the reason it's in the contrib directory.

My use-case seemed to require a complete rewrite of the seekable format, so I decided to implement it from scratch in Rust using bindings to the advanced zstd compression API, available from zstd 1.4.0.

The result is a single dependency library crate[2], and a CLI crate[3] for the seekable format that feels similar to the regular zstd tool.

Any feedback is highly appreciated!

[1]: https://github.com/facebook/zstd/tree/dev/contrib/seekable_f... [2]: https://crates.io/crates/zeekstd [3]: https://github.com/rorosen/zeekstd/tree/main/cli

Comments

simeonmiteff•3h ago
This is very cool. Nice work! At my day job, I have been using a Go library[1] to build tools that require seekable zstd, but felt a bit uncomfortable with the lack of broader support for the format.

Why zeek, BTW? Is it a play on "zstd" and "seek"? My employer is also the custodian of the zeek project (https://zeek.org), so I was confused for a second.

[1] https://github.com/SaveTheRbtz/zstd-seekable-format-go

rorosen•2h ago
Thanks! I was also surprised that there are very few tools to work with the seekable format. I could imagine that at least some people have a use-case for it.

Yes, the name is a combination of zstd and seek. Funnily enough, I wanted to name it just zeek first before I knew that it already exists, so I switched to zeekstd. You're not the first person asking me if there is any relation to zeek and I understand how that is misleading. In hindsight the name is a little unfortunate.

etyp•2h ago
Zeek is well known in "security" spaces, but not as much in "developer" spaces. It did get me a bit excited to see Zeek here until I realized it was unrelated, though :)
stu2010•3h ago
This is cool, I'd say that the most common tool in this space is bgzip[1]. Have you thought about training a dictionary on the first few chunks of each file and embedding the dictionary in a skippable frame at the start? Likely makes less difference if your chunk size is 2MB, but at smaller chunk sizes that could have significant benefit.

[1] https://www.htslib.org/doc/bgzip.html

jeroenhd•3h ago
Looking at the spec (https://github.com/facebook/zstd/blob/dev/contrib/seekable_f...), I don't see any mention of custom dictionaries like you describe.

The spec does mention:

> While only Checksum_Flag currently exists, there are 7 other bits in this field that can be used for future changes to the format, for example the addition of inline dictionaries.

so I don't think seekable zstd supports these dictionaries just yet.

With multiple inline dictionaries, one could detect when new chunks compress badly with the previous dictionary and train new ones on the fly. Could be useful for compressing formats with headers and mixed data (i.e. game files, which can contain a mix of text + audio + video, or just regular old .tar files I suppose).

rwmj•3h ago
Seekable formats also allow random reads which lets you do trickery like booting qemu VMs from remotely hosted, compressed files (over HTTPS). We do this already for xz: https://libguestfs.org/nbdkit-xz-filter.1.html https://rwmj.wordpress.com/2018/11/23/nbdkit-xz-curl/

Has zstd actually standardized the seekable version? Last I checked (which was quite a while ago) it had not been declared a standard, so I was reluctant to write a filter for nbdkit, even though it's very much a requested feature.

tyilo•3h ago
I already use zstd_seekable (https://docs.rs/zstd-seekable/) in a project. Could you compare the API's of this crate and yours?
tyilo•2h ago
Correct me if I'm wrong, but it doesn't seem like you provide the equivalent of Seekable::decompress in zstd_seekable which decompresses at a specific offset, without having to calculate which frame(s) to decompress.

This is basically the only function I use from zstd_seekable, so it would be nice to have that in zeekstd as well.

throebrifnr•2h ago
Gz has --rsyncable option that does something similar.

Explanation here https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/

Scaevolus•1h ago
Rsyncable goes further: instead of having fixed size blocks, it makes the block split points deterministically content-dependent. This means that you can edit/insert/delete bytes in the middle of the uncompressed input, and the compressed output will only have a few compressed blocks change.
andrewaylett•57m ago
zstd also has an rsyncable option -- as an example of when it's useful, I take a dump of an SQLite database (my Home Assistant DB) using a command like this:

    sqlite3 -readonly "${i}" .dump | zstd --fast --rsyncable -v -o "${PART}" -
The DB is 1.2G, the SQL dump is 1.4G, the compressed dump is 286M. And I still only have to sync the parts that have changed to take a backup.
77pt77•2h ago
BTW, something similar can be done with zlib/gzip.
rwmj•37m ago
It's true, using some rather non-obvious trickery: https://github.com/madler/zlib/blob/develop/examples/zran.c

I also wrote a tool to make a randomly modifiable gzipped disk image: https://rwmj.wordpress.com/2022/12/01/creating-a-modifiable-...

dafelst•32m ago
Sure, but zstd soundly beats gzip on every single metric except ubiquity, it is just straight up a better compression/decompression strategy.
ncruces•2h ago
How's tool support these days to create compress a file with seekable zstd?

Given existing libraries, it should be really simple to create an SQLite VFS for my Go driver that reads (not writes) compressed databases transparently, but tool support was kinda lacking.

Will the zstd CLI ever support it? https://github.com/facebook/zstd/issues/2121

b0a04gl•1h ago
how do you handle cases where the seek table itself gets truncated or corrupted? do you fallback to scanning for frame boundaries or just error out? wondering if there's room to embed a minimal redundant index at the tail too for safety
threeducks•1m ago
Assuming that frames come at a cost, how much larger are the seekable zstd files? Perhaps as a graph based on frame size and for different kinds of data (text, binaries, ...).

Benzene at 200

https://www.chemistryworld.com/opinion/benzene-at-200/4021504.article
22•Brajeshwar•38m ago•5 comments

Working on databases from prison

https://turso.tech/blog/working-on-databases-from-prison
320•dvektor•3h ago•205 comments

ZjsComponent: A Pragmatic Approach to Reusable UI Fragments for Web Development

https://arxiv.org/abs/2506.11016
11•lelanthran•47m ago•3 comments

Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format

https://github.com/rorosen/zeekstd
108•rorosen•19h ago•17 comments

Show HN: dk – A script runner and cross-compiler, written in OCaml

https://diskuv.com/dk/help/latest/
13•beckford•1h ago•1 comments

How the first electric grid was built

https://www.worksinprogress.news/p/how-the-worlds-first-electric-grid
7•bensouthwood•1h ago•0 comments

Nanonets-OCR-s – OCR model that transforms documents into structured markdown

https://huggingface.co/nanonets/Nanonets-OCR-s
146•PixelPanda•9h ago•38 comments

Salesforce study finds LLM agents flunk CRM and confidentiality tests

https://www.theregister.com/2025/06/16/salesforce_llm_agents_benchmark/
59•rntn•1h ago•24 comments

Mathematical Illustrations: A Manual of Geometry and PostScript

https://personal.math.ubc.ca/~cass/graphics/text/www/
22•Bogdanp•1h ago•6 comments

Show HN: Socket-call – Call socket.io events like normal JavaScript functions

https://github.com/bperel/socket-call
21•bperel•4h ago•5 comments

Start your own Internet Resiliency Club

https://bowshock.nl/irc/
375•todsacerdoti•8h ago•214 comments

Maya Blue: Unlocking the Mysteries of an Ancient Pigment

https://www.mexicolore.co.uk/maya/home/maya-blue-unlocking-the-mysteries-of-an-ancient-pigment
34•DanielKehoe•2d ago•6 comments

Infracost (YC W21) is hiring software engineers (GMT+2 to GMT-6)

https://infracost.io/join-the-team
1•aliscott•3h ago

Is gravity just entropy rising? Long-shot idea gets another look

https://www.quantamagazine.org/is-gravity-just-entropy-rising-long-shot-idea-gets-another-look-20250613/
152•pseudolus•15h ago•153 comments

Jokes and Humour in the Public Android API

https://voxelmanip.se/2025/06/14/jokes-and-humour-in-the-public-android-api/
215•todsacerdoti•15h ago•123 comments

A Framework for Characterizing Emergent Conflict Between Non-Coordinating Agents [pdf]

https://paperclipmaximizer.ai/Unaware_Adversaries.pdf
11•ycombiredd•2d ago•2 comments

Why SSL was renamed to TLS in late 90s (2014)

https://tim.dierks.org/2014/05/security-standards-and-name-changes-in.html
417•Bogdanp•1d ago•195 comments

Object personification in autism: This paper will be sad if you don't read

https://pubmed.ncbi.nlm.nih.gov/30101594/
8•oliverkwebb•20m ago•1 comments

Quantum mechanics provide truly random numbers on demand

https://phys.org/news/2025-06-quantum-mechanics-random-demand.html
3•bookofjoe•2d ago•0 comments

Occurences of swearing in the Linux kernel source code over time

https://www.vidarholen.net/contents/wordcount/#fuck*,shit*,damn*,idiot*,retard*,crap*
70•microsoftedging•2d ago•120 comments

Mechanisms for Detection and Repair of Puncture Damage in Soft Robotics [pdf]

https://smr.unl.edu/papers/Krings_et_al-2025-ICRA.pdf
8•PaulHoule•2d ago•0 comments

Modifying an HDMI dummy plug's EDID using a Raspberry Pi

https://www.downtowndougbrown.com/2025/06/modifying-an-hdmi-dummy-plugs-edid-using-a-raspberry-pi/
266•zdw•23h ago•72 comments

How the BIC Cristal ballpoint pen became ubiquitous

https://www.openculture.com/2025/06/how-the-bic-cristal-ballpoint-pen-became-the-most-successful-product-in-history.html
42•janandonly•5h ago•79 comments

Childhood leukemia: how a deadly cancer became treatable

https://ourworldindata.org/childhood-leukemia-treatment-history
248•surprisetalk•1d ago•70 comments

Real-time CO2 monitoring without batteries or external power

https://news.kaist.ac.kr/newsen/html/news/?mode=V&mng_no=47450
83•gnabgib•17h ago•22 comments

Solving LinkedIn Queens with APL

https://pitr.ca/2025-06-14-queens
49•pitr•2d ago•16 comments

DARPA program sets distance record for power beaming

https://www.darpa.mil/news/2025/darpa-program-distance-record-power-beaming
119•gnabgib•17h ago•85 comments

Twin – A Textmode WINdow Environment

https://github.com/cosmos72/twin
122•kim_rutherford•19h ago•25 comments

Chemical knowledge and reasoning of large language models vs. chemist expertise

https://www.nature.com/articles/s41557-025-01815-x
88•bookofjoe•2d ago•48 comments

Hyperspectral scans of historical pigments and painting reconstructions

https://github.com/rubenwiersma/painting_tools
25•yig•3d ago•1 comments