frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Lite^3, a JSON-Compatible Zero-Copy Serialization Format

https://github.com/fastserial/lite3
52•cryptonector•6d ago

Comments

cryptonector•6d ago
Lite^3 is a clever encoding for JSON data that is indexed as-encoded and is mutable in place.

Perhaps I should have posted this URI instead: https://lite3.io/design_and_limitations.html

Lite^3 deserves to be noticed by HN. u/eliasdejong (the author) posted it 23 days ago but it didn't get very far. I'm hoping this time it gets noticed.

eric-p7•3h ago
This needs more attention than it's getting. Perhaps if you made some changes to the landing pages could help?

"outperforms the fastest JSON libraries (that make use of SIMD) by up to 120x depending on the benchmark. It also outperforms schema-only formats, such as Google Flatbuffers (242x). Lite³ is possibly the fastest schemaless data format in the world."

^ This should be a bar graph at the top of the page that shows both serializing sizes and speeds.

It would also be nice to see a json representation on the left and a color coded string of bytes on the right that shows how the data is packed.

Then the explanation follows.

Someone•50m ago
FTA#1: “Hashmaps do not (efficiently) support range queries. Since the keys are stored in pseudorandom order”

FTA#2: “Object keys (think JSON) are hashed to a 4-byte digest and stored inside B-tree nodes”

It still will likely be faster because of better cache locality, but doesn’t that means this also does not (efficiently) support range queries?

That page also says

“tree traversal inside the critical path can be satisfied entirely using fixed 4-byte word comparisons, never actually requiring string comparisons except for detection of hash collisions. This design choice alone contributes to much of the runtime performance of Lite³.”

How can that be true, given that this beats libraries that use hash maps, that also rarely require string comparisons, by a large margin?

Finally, https://lite3.io/design_and_limitations.html#autotoc_md37 says:

“Inserting a colliding key will not corrupt your data or have side effects. It will simply fail to insert.”

I also notice this uses the DJB2 hash function, which has hash collisions between short strings (http://dmytry.blogspot.com/2009/11/horrible-hashes.html), and those are more likely to be present in json documents. You get about 8 + 3 × 5 = 23 bits of hash for four-character strings, for example, increasing the risk of collisions to, ballpark, about one in three thousand.

=> I think that needs fixing before this can be widely used.

nneonneo•9m ago
Looking at the actual code (https://github.com/fastserial/lite3/blob/main/src/lite3.c#L2...), it seems like it performs up to 128 probes to find a target before failing, rather than bailing immediately if a collision is detected. It seems like maybe the documentation needs to be updated?

It's a bit unfortunate that the wire format is tied to a specific hash function. It also means that the spec will ossify around a specific hash function, which may not end up being the optimal choice. Neither JSON nor Protobuf have this limitation. One way around this would be to ditch the hashing and use the keys for the b-tree directly. It might be worth benchmarking - I don't think it's necessarily any slower, and an inline cache of key prefixes (basically a cheapo hash using the first N chars) should help preserve performance for common cases.

al2o3cr•6d ago
The docs mention that space for overwritten variable-sized values in the buffer is not reclaimed:

    The overridden space is never recovered, causing buffer size
    to grow indefinitely.
Is the garbage at least zeroed? Otherwise seems like it could "leak" overwritten values when sending whole buffers via memcpy
mjd•2h ago
“By default, deleted values are overwritten with NULL bytes (0x00). This is a safety feature since not doing so would leave 'deleted' entries intact inside the datastructure until they are overwritten by other values. If the user wishes to maximize performance at the cost of leaking deleted data, LITE3_ZERO_MEM_DELETED should be disabled.”
rixed•2h ago
So it's not really a serialization format, it's a compact, modifiable untyped tree, that one can therefore send to another machine with the same architecture. Or deserialise into native language specific data structures.

Don't get me wrong, I find this type of data structures interesting and useful, but it's misleading to call it "serialization", unless my understanding is wrong.

koolala•1h ago
You have to encode the type of all the binary data. Does that make it serialization?
koolala•1h ago
GLTF is like this too (or PLY)? The main difference is the format of their headers? Just by reading the header you can parse the binary data. I'm surprised BSON and any of the other binary JSON formats they list don't support reading the memory layout in a header.
lsb•1h ago
This is super interesting!

Apache Arrow is trying to do something similar, using Flatbuffer to serialize with zero-copy and zero-parse semantics, and an index structure built on top of that.

Would love to see comparisons with Arrow

tarasglek•18m ago
hash collision limitation for keys is the most questionable part of design. Usually thats handled by forcing key lookup to verify that what you looked up matches what you tried to lookup. Resolving this perf hit is probably doable by having an extra table of conflicting hashes

Beginning January 2026, all ACM publications will be made open access

https://dl.acm.org/openaccess
1604•Kerrick•15h ago•175 comments

Getting bitten by Intel's poor naming scenes

https://lorendb.dev/posts/getting-bitten-by-poor-naming-schemes/
43•LorenDB•2h ago•19 comments

1.5 TB of VRAM on Mac Studio – RDMA over Thunderbolt 5

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-studio-rdma-over-thunderbolt-5
337•rbanffy•9h ago•109 comments

History LLMs: Models trained exclusively on pre-1913 texts

https://github.com/DGoettlich/history-llms
411•iamwil•8h ago•154 comments

We pwned X, Vercel, Cursor, and Discord through a supply-chain attack

https://gist.github.com/hackermondev/5e2cdc32849405fff6b46957747a2d28
772•hackermondev•12h ago•305 comments

Texas is suing all of the big TV makers for spying on what you watch

https://www.theverge.com/news/845400/texas-tv-makers-lawsuit-samsung-sony-lg-hisense-tcl-spying
719•tortilla•2d ago•350 comments

Noclip.website – A digital museum of video game levels

https://noclip.website/
129•ivmoreau•5h ago•17 comments

2026 Apple introducing more ads to increase opportunity in search results

https://ads.apple.com/app-store/help/ad-placements/0082-search-results
106•punnerud•1h ago•93 comments

The state of the kernel Rust experiment

https://lwn.net/SubscriberLink/1050174/63aa7da43214c3ce/
61•dochtman•6d ago•10 comments

GPT-5.2-Codex

https://openai.com/index/introducing-gpt-5-2-codex/
456•meetpateltech•13h ago•237 comments

Reconstructed Commander Keen 1-3 Source Code

https://pckf.com/viewtopic.php?t=18248
47•deevus•4h ago•2 comments

From Zero to QED: An informal introduction to formality with Lean 4

https://sdiehl.github.io/zero-to-qed/01_introduction.html
7•rwosync•5d ago•0 comments

How China built its ‘Manhattan Project’ to rival the West in AI chips

https://www.japantimes.co.jp/business/2025/12/18/tech/china-west-ai-chips/
316•artninja1988•12h ago•327 comments

Making Google Sans Flex

https://design.google/library/google-sans-flex-font
8•meetpateltech•1h ago•2 comments

SMB Direct – SMB3 over RDMA – The Linux Kernel Documentation

https://docs.kernel.org/filesystems/smb/smbdirect.html
19•tambourine_man•5h ago•4 comments

Show HN: Picknplace.js, an alternative to drag-and-drop

https://jgthms.com/picknplace.js/
253•bbx•2d ago•101 comments

Property-Based Testing Caught a Security Bug I Never Would Have Found

https://kiro.dev/blog/property-based-testing-fixed-security-bug/
6•nslog•7h ago•0 comments

Telegraph chess: A 19th century tech marvel

https://spectrum.ieee.org/telegraph-chess
28•sohkamyung•6d ago•4 comments

Skills for organizations, partners, the ecosystem

https://claude.com/blog/organization-skills-and-directory
258•adocomplete•14h ago•143 comments

Lite^3, a JSON-Compatible Zero-Copy Serialization Format

https://github.com/fastserial/lite3
52•cryptonector•6d ago•11 comments

Great ideas in theoretical computer science

https://www.cs251.com/
89•sebg•8h ago•16 comments

Show HN: Stop AI scrapers from hammering your self-hosted blog (using porn)

https://github.com/vivienhenz24/fuzzy-canary
209•misterchocolat•2d ago•144 comments

My First Impression on HP Zbook Ultra G1a: Ryzen AI Max+ 395, Strix Halo 128GB

https://forum.level1techs.com/t/my-first-impression-on-hp-zbook-ultra-g1a-ryzen-ai-max-395-strix-...
4•teleforce•8h ago•1 comments

The Code That Revolutionized Orbital Simulation [video]

https://www.youtube.com/watch?v=nCg3aXn5F3M
40•surprisetalk•4d ago•3 comments

Prompt caching: 10x cheaper LLM tokens, but how?

https://ngrok.com/blog/prompt-caching/
57•samwho•2d ago•5 comments

Firefox will have an option to disable all AI features

https://mastodon.social/@firefoxwebdevs/115740500373677782
384•twapi•13h ago•328 comments

Delty (YC X25) Is Hiring an ML Engineer

https://www.ycombinator.com/companies/delty/jobs/MDeC49o-machine-learning-engineer
1•lalitkundu•10h ago

Two kinds of vibe coding

https://davidbau.com/archives/2025/12/16/vibe_coding.html
75•jxmorris12•10h ago•58 comments

T5Gemma 2: The next generation of encoder-decoder models

https://blog.google/technology/developers/t5gemma-2/
127•milomg•11h ago•22 comments

Oliver Sacks put himself into his case studies – what was the cost?

https://www.newyorker.com/magazine/2025/12/15/oliver-sacks-put-himself-into-his-case-studies-what...
39•barry-cotter•10h ago•73 comments