frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Kaitai Struct: declarative binary format parsing language

https://kaitai.io/
61•djoldman•1w ago

Comments

zzlk•4h ago
I wanted to use this a long time ago but the rust support wasn't there. I can see now that it's on the front page with apparently first class support so looks like I can give it a go again.
jdp•4h ago
I also like Protodata [1]. It's complementary as an exploration and transformation tool when working with binary data formats.

[1]: https://github.com/evincarofautumn/protodata

woodruffw•4h ago
Kaitai Struct is really great. I've used it several times over the years to quickly pull in a parser that I'd otherwise have to hand-roll (and almost certainly get subtly wrong).

Their reference parsers for Mach-O and DER work quite nicely in abi3audit[1].

[1]: https://github.com/pypa/abi3audit/tree/main/abi3audit/_vendo...

theLiminator•3h ago
Is the main difference from https://github.com/google/wuffs being that Kaitai is declarative?
setheron•3h ago
Looking at that repo.. i have no clue how to get started.
nigeltao•1h ago
The top-level README has a link called "Getting Started".
Sesse__•2h ago
They overlap, but none does strictly more than the other.

Kaitai is for describing, encoding and decoding file formats. Wuffs is for decoding images (which includes decoding certain file formats). Kaitai is multi-language, Wuffs compiles to C only. If you wrote a parser for PNGs, your Kaitai implementation could tell you what the resolution was, where the palette information was (if any), what the comments look like and on what byte the compressed pixel chunk started. Your Wuffs implementation would give you back the decoded pixels (OK, and the resolution).

Think of Kaitai as an IDL generator for file formats, perhaps. It lets you parse the file into some sort of language-native struct (say, a series of nested objects) but doesn't try to process it beyond the parse.

nigeltao•1h ago
See https://github.com/google/wuffs/blob/main/doc/related-work.m...

> Kaitai Struct is in a similar space, generating safe parsers for multiple target programming languages from one declarative specification. Again, Wuffs differs in that it is a complete (and performant) end to end implementation, not just for the structured parts of a file format. Repeating a point in the previous paragraph, the difficulty in decoding the GIF format isn't in the regularly-expressible part of the format, it's in the LZW compression. Kaitai's GIF parser returns the compressed LZW data as an opaque blob.

Taking PNG as an example, Kaitai will tell you the image's metadata (including width and height) and that the compressed pixels are in the such-and-such part of the file. But unlike Wuffs, Kaitai doesn't actually decode the compressed pixels.

---

Wuffs' generated C code also doesn't need any capabilities, including the ability to malloc or free. Its example/mzcat program (equivalent to /bin/bzcat or /bin/zcat, for decoding BZIP2 or GZIP) self-imposes a SECCOMP_MODE_STRICT sandbox, which is so restrictive (and secure!) that it prohibits any syscalls other than read, write, _exit and sigreturn.

(I am the Wuffs author.)

cxr•47m ago
> the difficulty in decoding the GIF format isn't in the regularly-expressible part of the format, it's in the LZW compression

Maybe for GIF, but that each of the following is true is worth a ponder:

(a) Wuffs doesn't implement the various archive formats; the deflate/ README says this is a TODO

(b) the very first sentence of the Wuffs README says Wuffs is "for Wrangling Untrusted File Formats Safely. Wrangling includes parsing, decoding and encoding. Example file formats include images, audio, video, fonts and compressed archives."

(c) a bunch of commentary that has accompanied the recent advisories about ZIP implementation exploits in the last several months have included complaints about the ZIP container format (and not deflate)

(d) for the longest time (like years), the Kaitai IDE demo for ZIP was broken (it may still be broken; I'm not in a place where I can check right now)

mturk•3h ago
Kaitai is absolutely one of my favorite projects. I use it for work (parsing scientific formats, prototyping and exploring those formats, etc) as well as for fun (reverse engineering games, formats for DOSbox core dumps, etc).

I gave a guest lecture in a friend's class last week where we used Kaitai to back out the file format used in "Where in Time is Carmen Sandiego" and it was a total blast. (For me. Not sure that the class agreed? Maybe.) The Web IDE made this super easy -- https://ide.kaitai.io/ .

(On my youtube page I've got recordings of streams where I work with Kaitai to do projects like these, but somehow I am not able to work up the courage to link them here.)

setheron•3h ago
Great timing! I just published https://github.com/fzakaria/nix-nar-kaitai-spec and contributed kaitai C++ STL runtime to nixpkgs https://github.com/NixOS/nixpkgs/pull/454243
layoric•3h ago
I discovered this project recently and used it for Himawari Standard Data format and it made it so much easier. Definitely recommend using this if you need to create binary readers for uncommon formats.
okanat•3h ago
Even if you don't want to use it since it is not as efficient as a hand-written specialized parser, Kaitai Struct gives a perfect way of documenting file formats. I love the idea and every bit of the project!
jonstewart•47m ago
I like using it for parsing structs but then intersperse procedural code in it for loops/containers, so not everything gets read into RAM all at once.
sitkack•2h ago
What was the Python based binary parsing library from around 2010? Hachoir?

https://hachoir.readthedocs.io/en/latest/index.html

ctoth•2h ago
Construct?
jonstewart•46m ago
Hachoir was rad, just not very fast.
ginko•2h ago
No pure C backend?
dhsysusbsjsi•1h ago
This would be great for most projects as Swift for example is abandoned & 6+ years since last commit.
vendiddy•1h ago
It's not C but we have sponsored a Zig target for Kaitai. If anyone reading this knows Zig well, please comment because would love to get a code review of the generated code!
imtringued•1h ago
https://en.wikipedia.org/wiki/Data_Format_Description_Langua...

DFDL is heavily encroaching on Kaitai structs territory.

dgan•1h ago
Wow this is good. My only complaint is annoyingly verbose yaml. What if I would like to use Kaitai instead of protobuffs, my .proto file is already a thousand lines, splitting each od these lines into 3-4 yaml indented lines is hurting readability
Everdred2dx•39m ago
I had a ton of fun using Kaitai to write an unpacking script for a video game's proprietary pack file format. Super cool project.

I did NOT have fun trying to use Kaitai to pack the files back together. Not sure if this has improved at all but a year or so ago you had to build dependencies yourself and the process was so cumbersome it ended up being easier to just write imperative code to do it myself.

How memory maps (mmap) deliver faster file access in Go

https://info.varnish-software.com/blog/how-memory-maps-mmap-deliver-25x-faster-file-access-in-go
47•ingve•2h ago•21 comments

Betty White's shoulder bag is a time capsule of World War II

https://americanhistory.si.edu/explore/stories/betty-white-world-war-ii
29•thunderbong•6d ago•0 comments

Claude Memory

https://www.anthropic.com/news/memory
335•doppp•7h ago•189 comments

/dev/null is an ACID compliant database

https://jyu.dev/blog/why-dev-null-is-an-acid-compliant-database/
86•swills•2h ago•46 comments

Can “second life” EV batteries work as grid-scale energy storage?

https://www.volts.wtf/p/can-second-life-ev-batteries-work
97•davidw•5h ago•100 comments

Apple loses UK App Store monopoly case, penalty might near $2B

https://9to5mac.com/2025/10/23/apple-loses-uk-app-store-monopoly-case-penalty-might-near-2-billion/
121•thelastgallon•2h ago•74 comments

Zram Performance Analysis

https://notes.xeome.dev/notes/Zram
46•enz•4h ago•6 comments

When is it better to think without words?

https://www.henrikkarlsson.xyz/p/wordless-thought
34•Curiositry•2h ago•9 comments

Pyscripter – Open-source Python IDE written in Delphi

https://github.com/pyscripter/pyscripter
40•peter_d_sherman•3d ago•6 comments

PyTorch Monarch

https://pytorch.org/blog/introducing-pytorch-monarch/
307•jarbus•13h ago•39 comments

New updates and more access to Google Earth AI

https://blog.google/technology/research/new-updates-and-more-access-to-google-earth-ai/
121•diogenico•7h ago•38 comments

Armed police swarm student after AI mistakes bag of Doritos for a weapon

https://www.dexerto.com/entertainment/armed-police-swarm-student-after-ai-mistakes-bag-of-doritos...
415•antongribok•6h ago•257 comments

Kaitai Struct: declarative binary format parsing language

https://kaitai.io/
61•djoldman•1w ago•23 comments

Summary of the Amazon DynamoDB Service Disruption in US-East-1 Region

https://aws.amazon.com/message/101925/
376•meetpateltech•22h ago•98 comments

US probes Waymo robotaxis over school bus safety

https://www.yahoo.com/news/articles/us-investigates-waymo-robotaxis-over-102015308.html
57•gmays•11h ago•84 comments

OpenAI acquires Sky.app

https://openai.com/index/openai-acquires-software-applications-incorporated
108•meetpateltech•7h ago•56 comments

Show HN: Git for LLMs – A context management interface

https://twigg.ai
53•jborland•9h ago•11 comments

Trump pardons convicted Binance founder

https://www.wsj.com/finance/currencies/trump-pardons-convicted-binance-founder-7509bd63
664•cowboyscott•8h ago•640 comments

The OS/2 Display Driver Zoo

https://www.os2museum.com/wp/the-os-2-display-driver-zoo/
53•kencausey•1w ago•6 comments

Introduction to the concept of likelihood and its applications (2018)

https://journals.sagepub.com/doi/10.1177/2515245917744314
5•sebg•1h ago•0 comments

I managed to grow countable yeast colonies

https://chillphysicsenjoyer.substack.com/p/i-managed-to-grow-countable-yeast
20•crescit_eundo•1w ago•6 comments

Show HN: OpenSnowcat – A fork of Snowplow to keep open analytics alive

https://opensnowcat.io/
45•joaocorreia•4h ago•12 comments

RFC 9861: KangarooTwelve and TurboSHAKE

https://datatracker.ietf.org/doc/rfc9861/
9•ecesena•1w ago•1 comments

I spent a year making an ASN.1 compiler in D

https://bradley.chatha.dev/blog/dlang-propaganda/asn1-compiler-in-d/
237•BradleyChatha•11h ago•144 comments

Glasses-free 3D using webcam head tracking

https://assetstore.unity.com/packages/tools/camera/vr-without-glasses-for-webgl-332314
75•il_nets•5d ago•58 comments

What happened to Apple's legendary attention to detail?

https://blog.johnozbay.com/what-happened-to-apples-attention-to-detail.html
564•Bogdanp•5h ago•351 comments

Make Any TypeScript Function Durable

https://useworkflow.dev/
77•tilt•7h ago•53 comments

How count-min sketches work – frequencies, but without the actual data

https://www.instantdb.com/essays/count_min_sketch
36•stopachka•1d ago•7 comments

Nango (YC W23) is hiring staff back-end engineers (remote)

https://www.nango.dev/careers
1•bastienbeurier•12h ago

AI discovers a 5x faster MoE load balancing algorithm than human experts

https://adrs-ucb.notion.site/moe-load-balancing
11•melissapan•1h ago•2 comments