https://cborbook.com/part_1/practical_introduction_to_cbor.h...
Edit: BSON seems to contain more data types than JSON, and as such it is more complex, whereas CBOR doesn't add to JSON's existing structure.
https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml
This is, for example, used by IPLD (https://ipld.io) to express references between objects through native types (https://github.com/ipld/cid-cbor/).
- `null`
- `"hello"`
- `[1,2,NaN]`
Additionally, BSON will just tell you what the type of a field is. JSON requires inferring it.
Edit: OK, actually there is a separate page for alternatives: https://cborbook.com/introduction/cbor_vs_the_other_guys.htm...
I honestly wonder sometimes if it's held back by the name— I love the campiness of it, but I feel like it could be a barrier to being taken seriously in some environments.
Working as intended. ;)
It's entirely possible that this will change in the future: like maybe we'll decide that Cap'n Proto should be a public-facing part of the Cloudflare Workers platform (rather than just an implementation detail, as it is today), in which case adoption would then benefit Workers and thus me. At the moment, though, that's not the plan.
In any case, if there's some company that fancies themselves Too Serious to use a technology with such a silly name and web site, I am perfectly happy for them to not use it! :)
I think for me it would be less about locking out stodgy, boring companies, and perhaps instead it being an issue for emerging platforms that are themselves concerned with the optics of being "taken seriously". I'm specifically in the robotics space, and over the past ten years ROS has been basically rewritten to be based around DDS, and I know during the evaluation process for that there were prototypes kicked around that would have been more webtech type stuff, things like 0mq, protobufs, etc. In the end the decision for DDS was made on technical merits, but I still suspect that it being a thing that had preexisting traction in aerospace and especially NASA influenced that.
There's also nothing stopping you from serializing unstructured data using an array of key/value structs, with a union for the value to allow for different value types (int/float/string/object/etc), although it probably wouldn't be as efficient as something like CBOR for that purpose. It could make sense if most of the data is well-defined but you want to add additional properties/metadata.
Many languages take unstructured data like JSON and parse them into a strongly-typed class (throwing validation errors if it doesn't map correctly) anyways, so having a predefined schema is not entirely a bad thing. It does make you think a bit harder about backwards-compatibility and versioning. It also probably works better when you own the code for both the sender and receiver, rather than for a format that anyone can use.
Finally, maybe not a practical thing and something that I've never seen used in practice: in theory you could send a copy of the schema definition as a preamble to the data. If you're sending 10000 records and they all have the same fields in the same order, why waste bits/bytes tagging the key name and type for every record, when you could send a header describing the struct layout. Or if it's a large schema, you could request it from the server on demand, using an id/version/hash to check if you already have it.
In practice though, 1) you probably need to map the unknown/foreign schema into your own objects anyways, and 2) most people would just zlib compress the stream to get rid of repeated key names and call it a day. But the optimizer in me says why burn all those CPU cycles decompressing and decoding the same field names over and over. CBOR could have easily added optional support for a dictionary of key strings to the header, for applications where the keys are known ahead of time, for example. (My guess is that they didn't because it would be harder for extremely-resource-constrained microcontrollers to implement).
CBOR isn't special here, similar incentives could apply to just about any format - but JSON for example is already so ubiquitous that nobody needs to promote it.
it's not hard, it's exactly like creating your own text format but you write binary data instead of text, and you can't read it with your eyes right away (but you can after you've looked at enough of it.) there is nothing to fear or to even worry about; just try it. look up how things like TLV work on wikipedia. you can do just about anything you would ever need with plain binary TLV and it's gonna perform like you wouldn't believe.
https://en.wikipedia.org/wiki/Type%E2%80%93length%E2%80%93va...
binary formats are always going to be 1-2 orders of magnitude faster than plain text formats, no matter which plain text format you're using. writing a viewer so you can easily read the data isn't zero-effort like it is for JSON or XML where any existing text editor will do, but it's not exactly hard, either. your binary format reading code is the core of what that viewer would be.
once you write and use your own binary format, existing binary formats you come across become a lot less opaque, and it starts to feel like you're developing a mild superpower.
If you did mean for production use, I assume you also implement your own encryption, encoding schemes and everything else?
no i don't write my own encoding or encryption.
why the hell would anyone use json for everything, and why would someone who doesn't do that earn your derision?
I think most people would go with something standard and documented. If you work in a team it helps if you can hire people that are familiar with tech or can read up on it easily.
And in general, unless you can show that your formatter is an actual hot path in need of optimization, you've just added another piece of code in need of care and feeding for no real gain.
Most devs/applications are fine with protobuf or even Json performance. And solving that problem is not something they can or should do.
If you write something like that just to prove a point, good for you. Also I would never want to be on the same team
With small code size it beats also BSON, EBML and others.
I haven't used it, but I thought that was the big claim.
Cap'n Proto serialization can be a huge win in terms of compute if you are communicating using shared memory or reading huge mmaped files, especially if the reader only cares to read some random subset of the message but not the whole thing.
But in the common use case of sending messages over a network, Cap'n Proto probably isn't a huge difference. Pushing the message through a socket is still O(n), and the benefits of compression might outweigh the CPU cost. (Though at least with Cap'n Proto, you have the option to skip compression. Most formats have some amount of compression baked into the serialization itself.)
Note that benchmarks vary wildly depending on the use case and the type of data being sent, so it's not really possible to say "Well it's N% faster"... it really depends. Sometimes Protobuf wins! You have to test your use case. But most people don't have time to build their code both ways to compare.
I actually think Cap'n Proto's biggest wins are in the RPC system, not the serialization. But these wins are much harder to explain, because it's not about speed, but instead expressiveness. It's really hard to understand the benefits of using a more expressive language until you've really tried it.
(I'm the author of Cap'n Proto.)
your server can do this natively for live data. your browser can decompress natively. and ++human-readable. if you're one of those that doesn't want the user to read the data, then maybe CBOR is attractive??? but why would you send data down the wire that you don't want the user to see? isn't the point of sending the data to the client is so the client can display that data?
I guess i'm just shouting at the clouds :D
SEQUENCE {
SEQUENCE {
OBJECT IDENTIFIER '1 2 840 113549 1 1 1'
NULL
}
BIT STRING 0 unused bits, encapsulates {
SEQUENCE {
INTEGER
00 EB 11 E7 B4 46 2E 09 BB 3F 90 7E 25 98 BA 2F
C4 F5 41 92 5D AB BF D8 FF 0B 8E 74 C3 F1 5E 14
9E 7F B6 14 06 55 18 4D E4 2F 6D DB CD EA 14 2D
8B F8 3D E9 5E 07 78 1F 98 98 83 24 E2 94 DC DB
39 2F 82 89 01 45 07 8C 5C 03 79 BB 74 34 FF AC
04 AD 15 29 E4 C0 4C BD 98 AF F4 B7 6D 3F F1 87
2F B5 C6 D8 F8 46 47 55 ED F5 71 4E 7E 7A 2D BE
2E 75 49 F0 BB 12 B8 57 96 F9 3D D3 8A 8F FF 97
73
INTEGER 65537
}
}
}
or this: (public-key
(rsa
(e 65537)
(n
165071726774300746220448927123206364028774814791758998398858897954156302007761692873754545479643969345816518330759318956949640997453881810518810470402537189804357876129675511237354284731082047260695951082386841026898616038200651610616199959087780217655249147161066729973643243611871694748249209548180369151859)))
I know that I’d prefer the latter. Yes, we could debate whether the big integer should be a Base64-encoded binary integer or not, but regardless writing a parser for the former is significantly more work.And let’s not even get started with DER/BER/PEM and all that insanity. Just give me text!
The BER/PER are binary formats and great where binary formats are needed. You also have XER (XML) and JER (JSON) if you want text. You can create an s-expr encoding if you want.
Separate ASN.1--the data model from ASN.1--the abstract syntax notation (what you wrote) from ASN.1's encoding formats.
[1] https://www.itu.int/en/ITU-T/asn1/Pages/asn1_project.aspx
> Significantly smaller than JSON without complex compression
Although compression of JSON could be considered complex, it's also extremely simple in that it's widely used and usually performed in a distinct step - often transparently to a user. Gzip, and increasingly zstd are widely used.
I'd be interested to see a comparison between compressed JSON and CBOR, I'm quite surprised that this hasn't been included.
Why? That goes against the narrative of promoting one over the other. Nissan doesn't advertise that a Toyota has something they don't. They just pretend it doesn't exist.
CBOR – Concise Binary Object Representation - https://news.ycombinator.com/item?id=20603378 - Aug 2019 (71 comments)
Begrudgingly Choosing CBOR over MessagePack - https://news.ycombinator.com/item?id=43229259 - Mar 2025 (78 comments)
Structured data that, by nesting, pleases the human eye, reduced to the max in a key-value fashion, pure minimalism.
And while you have to write type converters all the time for datetime, BLOBs etc., these converters are the real reasons why JSON is so useful: every OS or framework provides the heavy lifting for it.
So any elaborated new silver bullet would require solving the converter/mapper problem, which it can't.
And you can complain or explain with JSON: "Comments not a feature?! WTF!" - Add a field with the key "comment"
Some smart guys went the extra mile and nevertheless demanded more, because wouldn't it be nice to have some sort of "strict JSON"? JSON schema was born.
And here you can visibly experience the inner conflict of "on the one hand" vs "on the other hand". Applying schemas to JSON is a good cause and reasonable, but guess what happens to JSON? It looks like unreadable bloat, which means XML.
Extensibility is fine, basic operations appeal to both demands, simple and sophisticated, and don't impose the sophistication on you just for a simple 3-field exchange about dog food preferences.
{“x”: NaN}
valid JSON? How about 9007199254740993? Or -.053? If so, will that text round trip through your JSON library without loss of precision? Is that desirable if it does?Basically I think formats with syntax typed primitives always run into this problem: even if the encoder and decoder are consistent with each other about what the values are, the receiver still has to decide whether it can use the result. This after all is the main benefit of a library like Pydantic. But if we’re doing all this work to make sure the object is correct, we know what the value types are supposed to be on the receiving end, so why are we making a needlessly complex decoder guess for us?
JSON was originally parsed in javascript with eval() which allowed many things that aren't JSON through, but that doesn't make JSON more complex.
Edit: I didn’t see my thought all the way through here. Syntax typing invites this kind of nonconformity, because different programming languages mean different things by “number,” “string,” “date,” or even “null.” They will bend the format to match their own semantics, resulting in incompatibility.
JSON numbers have unlimited range in terms of the format standard, but implementations are explicitly permitted to set limits on the range and precision they generate and handle, and users are warned that:
[...] Since software that implements IEEE 754 binary64 (double precision)
numbers is generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range than these
provide, in the sense that implementations will approximate JSON
numbers within the expected precision.
Also, you don't need an int to represent it (a wide enough int will represent it, so will unlimited precision decimals, wide enough binary floats -- of standard formats, IEEE 754 binary128 works -- etc.).Mostly, I just want to offer a gentle critique of this book's comparison with MessagePack [0].
> Encoding Details: CBOR supports indefinite-length arrays and maps (beneficial for streaming when total size is unknown), while MessagePack typically requires fixed collection counts.
This refers to CBOR's indefinite length types, but awkwardly, streaming is a protocol level feature, not a data format level feature. As a result, there's many better options, ranging from "use HTTP" to "simply send more than 1 message". Crucially, CBOR provides no facility for re-syncing a stream in the event of an error, whether that's network or simply a bad encoding. "More features" is not necessarily better.
> Standardization: CBOR is a formal IETF standard (RFC 8949) developed through consensus, whereas MessagePack uses a community-maintained specification. Many view CBOR as a more rigorous standard inspired by MessagePack.
Well, CBOR is MessagePack. Carsten Bormann forked MessagePack, changed some of the tag values, wrote a standard around it, and submitted it to the IETF against the wishes of MessagePack's creators.
> Extensibility: CBOR employs a standardized semantic tag system with an IANA registry for extended types (dates, URIs, bignums). MessagePack uses a simpler but less structured ext type where applications define tag meanings.
Warning: I have a big rant about the tag registry.
The facilities are the same (well, the tag is 8 bytes instead of 1 byte, but w/e); it's TLV all the way down (Bormann ripped this also). Bormann's contribution is the registry, which is bonkers [1]. There's... dozens of extensions there? Hundreds? No CBOR implementation supports anywhere near all this stuff. "Universal Geographical Area Description (GAD) description of velocity"? "ur:request, Transaction Request identifier"?
The registry isn't useful. Here are the possible scenarios:
If something is in high demand and has good support across platforms, then it's a no-brainer to reserve a tag. MP does this with timestamps.
If something is in high demand, but doesn't have good support across platforms, then you're putting extra burden on those platforms. Ex: it's not great if my tiny microcontroller now has to support bignums or 128-bit UUIDs. Maybe you do that, or you make them optional, but that leads us to...
If something isn't in high demand or can't easily be supported across platforms, but you want support for it anyway, there's no need to tell anyone else you're using that thing. You can just use it. That's MP's ext types.
CBOR seems to imagine that there's a hypothetical general purpose decoder out there that you can point to any CBOR API, but there isn't and there never will be. Nothing will support both "Used to mark pointers in PSA Crypto API IPC implementation" and "PlatformV_HAS_PROPERTY" (I just cannot get over this stuff). There is no world where you tell the IETF about your tags, define an API with them, and someone completely independently builds a decoder for them. It will always be a person who cares about your specific tags, in which case, why not just agree on the ext types ahead of time? A COSE decoder doesn't need also need to decode a "RAINS Message".
> Performance and Size: Comparisons vary by implementation and data. CBOR prioritizes small codec size (for constrained devices) alongside message compactness, while MessagePack focuses primarily on message size and speed.
I can't say I fully understand what this means, but CBOR and MP are equivalent here, because CBOR is MP.
> Conceptual Simplicity: MessagePack's shorter specification appears simpler, but CBOR's unification of types under its major type/additional info system and tag mechanism offers conceptual clarity.
Even if there's some subjectivity around "conceptual simplicity/clarity", again CBOR and MP are equivalent here because they're functionally the same format.
---
I have some notes about the blurb above too:
> MessagePack delivers greater efficiency than JSON
I think it's probably true that the fastest JSON encoders/decoders are faster than the fastest MP encoders/decoders. Not that JSON performance has a higher ceiling, but it's got gazillions of engineering hours poured into it, and rightly so. JSON is also usually compressed, so space benefits only matter at the perimeters. I'm not saying there's no case for MP/CBOR/etc., just that the efficienty/etc. gap is a lot smaller than one would predict.
> However, MessagePack sacrifices human-readability
This, of course, applies to CBOR as well.
> ext mechanism provides less structure than CBOR's IANA-registered tags
Again the mechanism is the same, only the registry is different.
[0]: https://cborbook.com/introduction/cbor_vs_the_other_guys.htm...
[1]: https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml
Indeed. I recall that tnetstrings were intentionally made non-streamable to discourage people from trying to do so: "If you need to send 1000 DVDs, don't try to encode them in 1 tnetstring payload, instead send them as a sequence of tnetstrings as payload chunks with checks and headers like most other protocols"
> Warning: I have a big rant about the tag registry. > ...
I completely agree with your rant w.r.t. automated decoding. However, a global tag registry can still potentially be useful in that, given CBOR encoded data with a tag that my decoder doesn't support, it may be easier for a human to infer the intended meaning. Some types may be very obvious, others less so.
e.g. Standardized MIME types are useful even if no application supports every one of them.
https://www.erlang.org/doc/apps/asn1/asn1_getting_started.ht...
https://www2.erlang.org/documentation/doc-14/lib/asn1-5.1/do... (https://www2.erlang.org/documentation/doc-14/lib/asn1-5.1/do...)
I am using ASN.1 to communicate between a client (Java / Kotlin) and server (Erlang / Elixir), but unfortunately Java / Kotlin has somewhat of a shitty support for ASN.1 in comparison to Erlang.
1: https://cborbook.com/part_1/practical_introduction_to_cbor.h...
2: i.e. 1.220703125×2¹³
Begrudgingly Choosing CBOR over MessagePack - https://news.ycombinator.com/item?id=43229259 - March 2025 (78 comments)
MessagePack vs. CBOR (RFC7049) - https://news.ycombinator.com/item?id=23838565 - July 2020 (2 comments)
CBOR – Concise Binary Object Representation - https://news.ycombinator.com/item?id=20603378 - Aug 2019 (71 comments)
CBOR – Concise Binary Object Representation - https://news.ycombinator.com/item?id=10995726 - Jan 2016 (36 comments)
Libcbor – CBOR implementation for C and others - https://news.ycombinator.com/item?id=9597198 - May 2015 (5 comments)
CBOR – A new object encoding format - https://news.ycombinator.com/item?id=6932089 - Dec 2013 (9 comments)
RFC 7049 - Concise Binary Object Representation (CBOR) - https://news.ycombinator.com/item?id=6632576 - Oct 2013 (52 comments)
To do Passkey-verification server-side, I had to implement a pure-SQL/PLpgSQL CBOR parser, out of fear that a C-implementation could crash the PostgreSQL server: https://github.com/truthly/pg-cbor
https://www.schnada.de/grapt/eriknaggum-xmlrant.html
We're going to have to think up something worse for CBOR.
fjfaase•20h ago