Binary Encodings for JSON and Variant

https://jincongho.com/posts/designing-binary-encodings-for-json-and-variant/

16•jincongho•3d ago

Comments

kstenerud•18h ago

As the author stated, it really depends on what you intend to use it for.

Fast internal scanning isn't free, because now you need pre-indexing, which is more data, and loses the incremental buildability on the encoding end.

Small transfer size and fast (full) decoding is possible with a single binary format, but unfortunately designers keep falling into the trap of adding extra things that make them incompatible with JSON. It's why I wrote https://github.com/kstenerud/bonjson/

boricj•18h ago

At work, I wrote a C++20 data binding library. It works by running visitors over a data model that binds to the application state. My comment comes from a different set of trade-offs driven by memory constraints.

I've implemented a bunch of serialization visitors. For the structured formats, most (JSON, YAML, CBOR with indefinite lengths) use an output iterator and can stream out one character/byte at a time, which is useful when your target is a MCU with 640 KiB of SRAM and you need to reply large REST API responses.

And there's the BSON serializer, which writes to a byte buffer because it uses tag-length-value and I need to backtrack in order to patch in the lengths after serializing the values. This means that the entire document needs to be written upfront before I can do something with it. It also has some annoying quirks, like array indices being strings in base 10.

There are also other trade-offs when dealing with JSON vs. its binary encodings. Strings in JSON may have escape characters that require parsing, if it has them then you can't return a view into the document, you need to allocate a string to hold the decoded value. Whereas in BSON or CBOR (excluding indefinite-length strings) the strings are not escaped and you can return a std::string_view straight from the document (and even a const char* for BSON, as it embeds a NUL character).

Some encodings like CBOR are also more expressive than JSON, allowing for example any value type to be used for map keys and not just strings.

jincongho•11h ago

Parquet file format writes its metadata including length info after all data, at the footer. It was counterintuitive when I first look at it, but smart thinking about it now. I haven't had to trade off for memory constraints, but being able to stream output is definitely easier!

Interesting point about the difference in escape characters, I stored length and the decoded value so it's ready for string view. But when I need them back as JSON string, I need to encode them again :)

GollyStream – Real-time Ethereum events at 1,265 ops/s with minipass

U.S. National Park Finder

Open-source DIY radar that's 95% cheaper than $250k commercial offerings

Show HN: Claude Code Rust – a native Rust TUI that avoids the V8 heap OOM

Reconstructing a Dead USB Protocol: From Unknown Chip to Working Implementation

In the UK, EVs are cheaper than petrol cars, thanks to Chinese competition

Crypto Hack Worth $290M Triggers DeFi Contagion Shock

The Work Runs on Different Maps

Memjar: Uncompromising, local-first second brain

Tim Davis – Probabilistic engineering and the 24-7 employee

2005 PS2 Game Returning with New Release on PS5, PS4

The framework built by a father in 2006 dominated the internet

An Amish Paradox: Diversity and Change in the Largest Amish Community

LlaMa.cpp Robot Wars

Performance • tldraw Docs

Ask HN: Is your Mac's menu bar throwing errors?

SitTall – a macOS app that uses AirPods sensors to detect slouching

Anatomy of High-Performance Matrix Multiplication (2008) [pdf]

Visual Studio Code Agents App (Preview)

Hero Engineering: In Defense of Unreasonable Optimizations

Edit store price tags using Flipper Zero

Why Musicians Are Manufacturing Sold-Out Shows

A Private Recommendation System I Can Control

Ask HN: How did you land your first projects as a solo engineer/consultant?

Waterloo’s live AI-goose tracker. Never get ambushed by a cobra chicken again.

Public grocery stores are having a moment. Can they make food more affordable?

Ask HN: Should I build *another* Markdown task manager?

Show HN: AI Primer – A Searchable AI Changelog for AI Engineers and Creatives

Deleteduser.com –A $15 PII Magnet

Shyell – a Rust shell with built-in benchmarking and project-aware prompts