Why I stopped using JSON for my APIs

https://aloisdeniel.com/blog/better-than-json

180•barremian•2mo ago

Comments

spagoop•2mo ago

Is it just me or is this article insanely confusing? With all due respect to the author, please be mindful of copy editing LLM-assisted writing.

There is a really interesting discussion underneath of this as to the limitations of JSON along with potential alternatives, but I can't help but distrust this writing due to how much it sounds like an LLM.

port11•2mo ago

I don't think it's LLM-generated or even assisted. It's kinda like how I write when I don't want to really argue a point but rather get to the good bits.

Seems like the author just wanted to talk about Protobuf without bothering too much about the issues with JSON (though some are mentioned).

dkdcio•2mo ago

do you have any evidence that the author used a LLM? focusing on the content, instead of the tooling used to write the content, leads to a lot more productive discussions

I promise you cannot tell LLM-generated content from non-LLM generated content. what you think you’re detecting is poor quality, which is orthogonal to the tooling used

spagoop•2mo ago

Fair point, to be constructive here, LLMs seem to love lists and emphasizing random words / phrases with bold. Those two are everywhere. Not a smoking gun but enough to tune out.

I am not dismissing this as being slop and actually have no beef with using LLMs to write but yes, as you call out, I think it's just poorly written or perhaps I'm not the specific audience for this.

Sorry if this is bad energy, I appreciate the write up regardless.

pavel_lishin•2mo ago

> Is it just me or is this article insanely confusing?

I didn't find it confusing.

I found it unconvincing, but the argument itself was pretty clear. I just disagreed with it.

phyzome•2mo ago

It wasn't confusing, but yeah, it smelled strongly of LLMs.

codewritero•2mo ago

I love to see people advocating for better protocols and standards but seeing the title I expected the author to present something which would be better in the sense of supporting the same or more use cases with better efficiency and/or ergonomics and I don't think that protobuf does that.

Protobuf has advantages, but is missing support for a tons of use cases where JSON thrives due to the strict schema requirement.

A much stronger argument could be made for CBOR as a replacement for JSON for most use cases. CBOR has the same schema flexibility as JSON but has a more concise encoding.

port11•2mo ago

I think the strict schema of Protobuf might be one of the major improvements, as most APIs don't publish a JSON schema? I've always had to use ajv or superstruct to make sure payloads match a schema, Protobuf doesn't need that (supposedly).

thayne•2mo ago

One limitation of protobuf 3 schemas, is they doen't allow required fields. That makes it easier to remove the field in a later version in a backwards compatible way, but sometimes fields really are required, and the message doesn't make any sense without them. Ideally, IMO, if the message is missing those fields, it would fail to parse successfully. But with protobuf, you instead get a default value, which could potentially cause subtle bugs.

port11•2mo ago

Okay, this is a definite issue, you're still stuck validating inputs/outputs.

youngtaff•2mo ago

We need browsers to support CBOR APIs… and it shouldn’t be that hard as they all have internal implementations now

deathanatos•2mo ago

I suppose I should publish this, but a WASM module, in Rust, which just binds [ciborium] into JS only took me ~100 LoC. (And by this I mean that it effectively provides a "cbor_load" function to JS, which returns JS objects; I mention this just b/c I think some people have the impression that WASM can't interact with JS except by serializing stuff to/from bytestrings and/or JSON, which isn't really the whole story now with refs.)

But yes, a native implementation would save me the trouble!

[ciborium]: a Rust CBOR library; https://docs.rs/ciborium/latest/ciborium/

written-beyond•2mo ago

Idk I built a production system and ensured all data transfers, client to server and server to client were proto buf and it was a pain.

Technically, it sounds really good but the actual act of managing it is hell. That or I need a lot of practice to use them, at that point shouldn't I just use JSON and get on with my life.

Arainach•2mo ago

What issues did you have? In my experience, most things that could be called painful with protobuf would be bigger pains with things like JSON.

Making changes to messages in a backwards-compatible way can be annoying, but JSON allowing you to shoot yourself in the foot will take more time and effort to fix when it's corrupting data in prod than protobuf giving you a compile error would.

written-beyond•2mo ago

Well at the bare minimum setting up proto files and knowing where they live across many projects.

If they live in their own project, making a single project be buildable with a git clone gets progressively more complex.

You now need sub modules to pull in your protobuf definitions.

You now also need the protobuf tool chain to be available in your environment you just cloned to. If that environment has the wrong version the build fails, it starts to get frustrating pretty fast.

Compare that to json, yes I don't get versioning and a bunch of other fancy features but... I get to finish my work, build and test pretty quickly.

barremian•2mo ago

I think the it-just-works nature and human readability for debugging JSON cannot be overstated. Most people and projects are content to just use JSON even if protos offer some advantages, if not only to save time and resources.

Whether the team saves times in the longer when using protos is a question in its own.

IshKebab•2mo ago

There are plenty of it-doesn't-just-work things about JSON though. Sending binary data or 64-bit integers is a huge pain. Or maps with non-string keys, or ordered maps. Plus JSON doesn't scale well with message size because it doesn't use TLV so parsing any of a message requires parsing all of it.

It's not some perfect format.

That said, I'm disappointed with Protobuf too. Especially the handling of optional/default fields. I believe they did eventually add an `optional` tag so you can at least distinguish missing vs default field values.

The lack of required fields makes it very annoying to work with though. And no, there's no issue with required fields in general. The only reason it doesn't have them is because the implementation in Protobuf 2 caused issues and Google had to have a smooth transition from that.

If you're starting a greenfield project it seems silly to opt in to Google's tech debt.

If you're thinking "but schema evolution!!" well yeah all you need is a way to have versioned structs and then you can mark fields as being required for ranges of versions. So you can still remove required fields without breaking backwards compatibility.

eliasdejong•2mo ago

Check out Lite³, a schemaless binary format: https://github.com/fastserial/lite3

Supports 64 bit ints, raw byte datatype, zero-copy parsing, does not require schema and can be converted to JSON for readability while retaining all field names.

esafak•2mo ago

It's premature and maybe presumptuous of him to be advertising protobufs when he hasn't heard of the alternatives yet. I'll engage the article after he discovers them...

pzmarzly•2mo ago

> With Protobuf, that’s impossible.

Unless your servers and clients push at different time, thus are compiled with different versions of your specs, then many safety bets are off.

There are ways to be mostly safe (never reuse IDs, use unknown-field-friendly copying methods, etc.), but distributed systems are distributed systems, and protobuf isn't a silver bullet that can solve all problems on author's list.

On the upside, it seems like protobuf3 fixed a lot of stuff I used to hate about protobuf2. Issues like:

> if the field is not a message, it has two states:

> - ...

> - the field is set to the default (zero) value. It will not be serialized to the wire. In fact, you cannot determine whether the default (zero) value was set or parsed from the wire or not provided at all

are now gone if you stick to using protobuf3 + `message` keyword. That's really cool.

brabel•2mo ago

No type system survives going through a network.

dhussoe•2mo ago

yes, but any sane JSON parsing library (Rust Serde, kotlinx-serialization, Swift, etc.) will raise an error when you have the wrong type or are missing a required field. and any JSON parsing callsite is very likely also an IO callsite so you need to handle errors there anyways, all IO can fail. then you log it or recover or whatever you do when IO fails in some other way in that situation.

this seems like a problem only if you use JSON.parse or json.loads etc. and then just cross your fingers and hope that the types are correct, basically doing the silent equivalent of casting an "any" type to some structure that you assume is correct, rather than strictly parsing (parse, don't validate) into a typed structure before handing that off to other code.

koakuma-chan•2mo ago

> strictly parsing (parse, don't validate)

That's called validating? Zod is a validation library.

But yeah, people really need to start strictly parsing/validating their data. One time I had an interview and I was told yOu DoN'T tRuSt YoUr BaCkeNd?!?!?!?

dhussoe•2mo ago

"parse don't validate" is from: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

looking at zod (assuming https://zod.dev) it is a parsing library by that definition — which isn't, like, an official definition or anything, one person on the internet came up with it, but I think it is good at getting the principle across

under these definitions a "parser" takes some input and returns either some valid output (generally a more specific type, like String -> URL) or an error, whereas a "validator" just takes that input and returns a boolean or throws an error or whatever makes sense in the language.

eta: probably part of the distinction here is that since zod is a JS library the actual implementation can be a "validator" and then the original parsed JSON input can just be returned with a different type. "parse don't validate" is (IMO) more popular in languages like Rust where you would already need to parse the JSON to a language-native structure from the original bytes, or to some "JSON" type like https://docs.rs/serde_json/latest/serde_json/enum.Value.html that are generally awkward for application code (nudging you onto the happy parsing path).

koakuma-chan•2mo ago

I like your message and I think that you are right on everything.

dhussoe•2mo ago

yeah I have repeatedly had things like "yOu DoN'T tRuSt YoUr BaCkeNd?!?!?!?" come up and am extremely tired of it when it's 2025 and we have libraries that solve this problem automatically and in a way that is usually more ergonomic anyways... I don't do JS/TS so I guess just casting the result of JSON.parse is sort of more convenient there, but come on...

koakuma-chan•2mo ago

Yes, I know right? You are so lucky to not do JS/TS—those people are incredible. Finally, someone who understands me.

socalgal2•2mo ago

It's common to validate in JS land as well

https://github.com/ajv-validator/ajv

connicpu•2mo ago

Regardless of whether you use JSON or Protobuf, the only way to be safe from version tears in your serialization format is to enforce backwards compatibility in your CI pipeline by testing the new version of your service creates responses that are usable by older versions of your clients, and vice versa.

sevensor•2mo ago

Yeah, no discussion of this topic is complete without bringing up schema evolution. There’s a school of thought that holds this is basically impossible and the right thing to do is never ever make a breaking change. Instead, allow new fields to be absent and accept unrecognized fields always. I think this is unsustainable and hard to reason about.

I have not found it difficult to support backwards compatibility with explicit versioning, and the only justification I’ve seen for not doing it is that it’s impossible to coordinate between development teams. Which I think is an indictment of how the company is run more than anything else.

volemo•2mo ago

S-expressions exist since 1960, what more do you need? /s

Jemaclus•2mo ago

"Better than JSON" is a pretty bold claim, and even though the article makes some great cases, the author is making some trade-offs that I wouldn't make, based on my 20+ year career and experience. The author makes a statement at the beginning: "I find it surprising that JSON is so omnipresent when there are far more efficient alternatives."

We might disagree on what "efficient" means. OP is focusing on computer efficiency, where as you'll see, I tend to optimize for human efficiency (and, let's be clear, JSON is efficient _enough_ for 99% of computer cases).

I think the "human readable" part is often an overlooked pro by hardcore protobuf fans. One of my fundamental philosophies of engineering historically has been "clarity over cleverness." Perhaps the corollary to this is "...and simplicity over complexity." And I think protobuf, generally speaking, falls in the cleverness part, and certainly into the complexity part (with regards to dependencies).

JSON, on the other hand, is ubiquitous, human readable (clear), and simple (little-to-no dependencies).

I've found in my career that there's tremendous value in not needing to execute code to see what a payload contains. I've seen a lot of engineers (including myself, once upon a time!) take shortcuts like using bitwise values and protobufs and things like that to make things faster or to be clever or whatever. And then I've seen those same engineers, or perhaps their successors, find great difficulty in navigating years-old protobufs, when a JSON payload is immediately clear and understandable to any human, technical or not, upon a glance.

I write MUDs for fun, and one of the things that older MUD codebases do is that they use bit flags to compress a lot of information into a tiny integer. To know what conditions a player has (hunger, thirst, cursed, etc), you do some bit manipulation and you wind up with something like 31 that represents the player being thirsty (1), hungry (2), cursed (4), with haste (8), and with shield (16). Which is great, if you're optimizing for integer compression, but it's really bad when you want a human to look at it. You have to do a bunch of math to sort of de-compress that integer into something meaningful for humans.

Similarly with protobuf, I find that it usually optimizes for the wrong thing. To be clear, one of my other fundamental philosophies about engineering is that performance is king and that you should try to make things fast, but there are certainly diminishing returns, especially in codebases where humans interact frequently with the data. Protobufs make things fast at a cost, and that cost is typically clarity and human readability. Versioning also creates more friction. I've seen teams spend an inordinate amount of effort trying to ensure that both the producer and consumer are using the same versions.

This is not to say that protobufs are useless. It's great for enforcing API contracts at the code level, and it provides those speed improvements OP mentions. There are certain high-throughput use-cases where this complexity and relative opaqueness is not only an acceptable trade off, but the right one to make. But I've found that it's not particularly common, and people reaching for protobufs are often optimizing for the wrong things. Again, clarity over cleverness and simplicity over complexity.

I know one of the arguments is "it's better for situations where you control both sides," but if you're in any kind of team with more than a couple of engineers, this stops being true. Even if your internal API is controlled by "us," that "us" can sometimes span 100+ engineers, and you might as well consider it a public API.

I'm not a protobuf hater, I just think that the vast majority of engineers would go through their careers without ever touching protobufs, never miss it, never need it, and never find themselves where eking out that extra performance is truly worth the hassle.

Arainach•2mo ago

If you want human readable, there are text representations of protobuf for use at rest (checked in config files, etc.) while still being more efficient over the wire.

In terms of human effort, a strongly typed schema rather than one where you have to sanity check everything saves far more time in the long run.

Aldipower•2mo ago

Great writing, thanks. There are of course 2 sides as always. I think especially for larger teams and large projects Protobuf in conjunction with gRPC can play wisely with the backwards compatibility feature, which makes it very hard to break things.

PantaloonFlames•2mo ago

Yes to all of this.

Also the “us” is ever-changing in a large enough system. There are always people joining and leaving the team. Always, many people are approximately new, and JSON lets them discover more easily.

catchmeifyoucan•2mo ago

I wonder if we can write an API w/ JSON the usual way and change the final packaging to send it over protobuf.

pstuart•2mo ago

If you're using Go then this framework let's you work with protobufs and gives you a JSON rest-like service for free: https://github.com/grpc-ecosystem/grpc-gateway

bglusman•2mo ago

Sure... https://protobuf.dev/programming-guides/json/

I was pushing at one point for us to have some code in our protobuf parsers that would essentially allow reading messages in either JSON or binary format, though to be fair there's some overhead that way by doing some kind of try/catch, but, for some use cases I think it's worth it...

lanigone•2mo ago

envoy (the proxy) can transcode RESTful APIs to internal grpc services and vice versa. you can even map like url params etc. to proto fields. it works well, even server streaming.

brabel•2mo ago

Mandatory comment about ASN.1, a protocol from 1984, already did what Protobuf does, with more flexibility. Yes, it's a bit ugly but if you stick to the DER encoding it's really not worse than Protbuf at all. Check out the Wikipedia example:

https://en.wikipedia.org/wiki/ASN.1#Example_encoded_in_DER

Protobuf is ok but if you actually look at how the serializers work, it's just too complex for what it achieves.

bloppe•2mo ago

What makes it too complex in your opinion?

kragen•2mo ago

Experience with ASN.1.

kstrauser•2mo ago

Same. My take on ASN.1 is that no one would pay me what I would ask to work on ASN.1. I’d only touch it if I had to parse files from an outside source, and a package already exists in my language of choice that’s capable of parsing those files.

zzo38computer•2mo ago

I also think ASN.1 DER is better (there are other formats, but in my opinion, DER is the only good one, because BER is too messy). I use it in some of my stuff, and when I can, my new designs also use ASN.1 DER rather than using JSON and Protobuf etc. (Some types are missing from standard ASN.1 but I made up a variant called "ASN.1X" which adds some additional types such as key/value list and some others. With the key/value list type added, it is now a superset of the data model of JSON, so you can convert JSON to ASN.1X DER.)

(I wrote a implementation of DER encoding/decoding in C, which is public domain and FOSS.)

pphysch•2mo ago

> ASN.1, a protocol from 1984, already did what Protobuf does, with more flexibility.

After working heavily with SNMP across a wide variety of OEMs, this flexibility becomes a downside. Or SNMP/MIBs were specified at the wrong abstraction level, where the ASN.1 flexibility gives mfgs too much power to do insane and unconventional things.

morshu9001•2mo ago

Yeah same, ASN.1 was a nightmare when I was dealing with LTE

cryptonector•2mo ago

I've been working on and with Kerberos and PKIX for decades. I don't find ASN.1 to be a problem as long as you have good tooling or are willing to build it. The specs are a pleasure to read -- clear, concise, precise, and approachable (once you have a mental model for it anyways).

Of course, I am an ASN.1 compiler maintainer, but hey, I had to become one because the compiler I was using was awesome but not good enough, so I made it good enough.

I'm curious what made it a nightmare for you.

whatevaa•2mo ago

You just said it. You had to become compiler maintainer to make it good enough.

morshu9001•2mo ago

This was the main reason. The asn.1 language has a ton of unnecessary features that make it harder to implement, but the stuff I dealt with was using those features so I couldn't just ignore it. I didn't write a compiler but did hack around some asn1c outputted code to make it faster for our use case. And had to use asn1c in the first place because there was no complete Rust asn1 compiler at the time, though I tried DIY'ing it and gave up.

I also remember it being complicated to use, but it's been too long to recall why exactly, probably the feature bloat. Once I used proto3, I realized it's all you need.

cryptonector•2mo ago

> The asn.1 language has a ton of unnecessary features that make it harder to implement

Only if you want to implement them. You could get quite far with just a subset of UNIVERSAL types, including UTF8String, SEQUENCE/SET, SEQUENCE OF / SET OF, etc. There's a ton of features in x.680 you can easily drop.

I've implemented a subset of x.681, x.682, and x.683 to get automatic, recursive decoding through all typed holes in PKIX certificates, CRLs, CSRs, etc. Only a subset, and it got me quite far. I had a pretty good open source x.680 implementation to build on.

This is the story of how Heimdal's authors wrote its ASN.1 compiler: they wanted tooling, there wasn't a good option, they built enough for PKIX and Kerberos. They added things as they went along. OpenSSL does not-quite-DER things? Add support in the Heimdal decoder. They hacked a lot of things for a while which I later fixed, like they didn't support DEFAULT, so they changed DEFAULTed members to OPTIONAL, and they hacked IMPLICIT support, which I finished. And so on. It still doesn't have things like REAL (who needs it in security protocols? no one). Its support for GeneralString is totally half-assed just like... MIT Kerberos, OpenSSL, etc. We do what we need to. Someone could take that code, polish it up, add features, support more programming languages, and make some good money. In fact, Fabrice Belllard has his own not-open-source, commercial ASN.1 compiler and stack, and it must be quite good -- very smart!

cryptonector•2mo ago

Here's the problem though: people have used the absence of tooling to justify the creation of new, supposedly-superior schemas and codecs that by definition have strictly less tooling available on day zero and which invariably turn out to be worse than ASN.1/DER were in 1984 because the authors also refused to study the literature to see what good ideas they could pick up. That's how we end up with:

- PB being a TLV encoding, just like DER, with all the same problems

   (Instead PB should have been inspired by XDR or OER, but not DER.)

 - PB's IDL requiring explicitly tagging every field of every data structure(!) even though ASN.1 never required tagging every field, and even though ASN.1 eventually adopted automatic tagging.

 - PB's very naive approach to extensibility that is just like 1984 ASN.1's.

It's a mistake.

Some people, when faced with a dearth of tooling, will write said tooling. Other people will say that the technology in question is a nightmare, and some of those people will then go on to invent a worse wheel.

I'd be ecstatic to use something other than ASN.1 if it wasn't a poor reinvention of it.

morshu9001•2mo ago

Protobuf ended up having more tooling in the end though, and it didn't take very long to get there. This is like how JSON replaced XML for many use cases.

cryptonector•2mo ago

If they had put the same energy towards building tooling for an existing IDL/codec then they would have had strictly less work to do. Besides being inefficient in the use of their resources they also saddled us with 15th system (probably more like a 25th system, but you get the reference), and a poor one at that. There is really nothing much good to say about PB.

morshu9001•2mo ago

I rely on protos for lots of stuff at work and honestly couldn't imagine having to do all this in ASN.1, even if tooling were completely solved.

cryptonector•2mo ago

I use both (and JSON, and I've used XML, and I've used XDR, and ...). Check this out and weep for not having anything like this for PB: https://github.com/heimdal/heimdal/blob/master/lib/asn1/READ...

morshu9001•2mo ago

Not sure what this is. Transcoding to/from JSON is something protobuf does easily, but this readme is about a lot more than that.

cryptonector•2mo ago

Yes, it's about a lot more than that. It's about automatically and recursively encoding/decoding through "typed hole". A typed hole is where you have a struct with one field that denotes the type of the other, and the other is basically a byte string whose value is an encoding of a value of a type identified by the other field. Typed holes are surprisingly common in protocols. Typically you first decode the outer value, then you inspect the typed hole's type ID field, then you decode the typed hole's value accordingly, and this is code you have to write by hand. Whereas with automatic handling of typed holes just one invocation of the codec is sufficient (as opposed to one codec invocation for the outermost value plus one invocation for every typed hole).

morshu9001•2mo ago

Why isn't the other value just a oneof? I get if your holed value is passthru data encoded in some special way that isn't standard asn1 or proto, but at that point it's heavily application-dependent and not really the outer protocol's job to support.

cryptonector•2mo ago

You can do CHOICE in ASN.1, yes, and you can even make it an extensible CHOICE. In that case the tag is the type determinant, and it looks a lot like a typed hole. But! sometimes you want a typed hole where the type determinant is something like a URN, or a URI, or some other type where the value space is a) large, and b) structured so you can avoid needing a registry. And sometimes the protocol you're writing inherently can't have a type registry -- think of an RPC layer where you have headers that provide things like authentication and negotiation of things, session-like things, while the application provides the procedures (the 'P' in RPC) and so you need to identify the application without a registry of oneof tags.

cryptonector•2mo ago

That's not ASN.1's fault though.

whatevaa•2mo ago

Json doesn't support comments specifically to not allow parsing directives, that means less customization. More customization of interoperability protocols is not always a good thing.

cryptonector•2mo ago

The compiler I [occasionally] work on does not abuse comments for directives. All directives in that compiler are out of band.

pphysch•2mo ago

No, but it is an argument against "ASN.1 is superior to protobufs".

Many modern high-volume telemetry systems use gRPC for a good reason, it wins in the "pragmatic" department.

dgan•2mo ago

I honestly looked up for a encoder/decoder for python/c++ application, and couldnt find anything usable; i guess i would need to contact the purchase department for a license (?), while with protobuf i can make the decision myself & all alone

morshu9001•2mo ago

ASN.1 is way overengineered to the point of making it hard to support. You don't need inheritance for example.

zzo38computer•2mo ago

it is not necessary to use or to implement all of the data types and other features of ASN.1; you can implement only the features that you are using. Since DER uses the same framing for all data types, it is possible to skip past any fields that you do not care about (although in some cases you will still need to check its type, to determine whether or not an optional field is present; fortunately the type can be checked easily, even if it is not a type you implement).

morshu9001•2mo ago

Yes but I don't want to worry about what parts of the spec are implemented on each end. If you removed all the unnecessary stuff and formed a new standard, it'd basically be protobuf.

zzo38computer•2mo ago

I do not agree. Which parts are necessary depends on the application; there is not one good way to do for everyone (and Protobuf is too limited). You will need to implement the parts specific to your schema/application on each end, and if the format does not have the data types that you want then you must add them in a more messy way (especially when using JSON).

morshu9001•2mo ago

In what ASN1 application is protobuf spec too limited? I've used protobuf for tons of different things, it's always felt right. Though I understand certain encodings of ASN1 can have better performance for specific things.

zzo38computer•2mo ago

Numbers bigger than 64-bits, character sets other than Unicode (and ASCII), OIDs, etc.

morshu9001•2mo ago

These are only scalars that you'd encode into bytes. I guess it's slightly annoying that both ends have to agree on how to serialize rather than protobuf itself doing it, but it's not a big enough problem.

Also I don't see special ASN1 support for non-Unicode string encodings, only subsets of Unicode like ascii or printable ascii. It's a big can of worms once you bring in things like Latin-1.

zzo38computer•2mo ago

ASN.1 has support for ISO 2022 as well as ASCII and Unicode (ASCII is a subset of Unicode as well as a subset of ISO 2022). (My own nonstandard extensions add a few more (such as TRON character code and packed BCD), and the standard unrestricted character string type can be used if you really need arbitrary character sets.) (Unicode is not a very good character set, anyways.)

Also, DER allows to indicate the type of data within the file (unless you are using implicit types). Protobuf has only a limited case of this (you cannot always identify the types), and it requires different framing for different types. However, DER uses the same framing for all types, and strings are not inherently limited to 2GB by the file format.

Furthermore, there are other non-scalar types as well.

In any of these cases, you do not have to use all of the types (nor do you need to implement all of the types); you only need to use the types that are applicable for your use.

I will continue to use ASN.1; Protobuf is not good enough in my opinion.

cryptonector•2mo ago

To be fair, if you don't need to support anything other than Unicode, then this is not an advantage, and over time we're all going to need non-Unicode less and less. That said I'm a big fan of ASN.1 (see my comment history).

morshu9001•2mo ago

I'm still confused how these ISO 2022 strings even work, and the ASN1 docs discourage using the UniversalString and GraphicString types. All these different string types are intimidating if I just want unicode/ascii, and even if I were using an obscure encoding, I'd use generic bytes instead of wanting asn1 to care about it.

cryptonector•2mo ago

GeneralString relies on control characters to "load" character sets into the C0 and C1 registers. This is madness -- specifically it's pre-Unicode madness, but before Unicode it made sense.

morshu9001•2mo ago

Oh gosh. Fair enough that this exists and something uses it, but I'd absolutely want to handle that on the ends only, not get asn1 involved in parsing it.

cryptonector•2mo ago

Oh GeneralString is madness. It's pre-Unicode madness. It exists because Unicode didn't exist in 1984, but people still wanted to be able to exchange text in multiple scripts, which necessitated being able to "switch codesets" in the middle. It's... yeah, it's.. it's nuts. I've _not_ implemented GeneralString, and practically no one needs to even when specs say to. E.g., in Kerberos the strings are GeneralString, but all the implementations just-send-8 and do not attempt to interpret any codeset switching escapes.

zzo38computer•2mo ago

ASN.1 does not necessarily need to get involved in parsing the values; for some applications doing so is unnecessary anyways (this is true for many fields of many types and not only this one, though). ASN.1 will need to be involved in parsing the framing; whether or not it is involved in parsing the values depends on whether the application requires it for that specific value (for example, it is commonly not necessary to parse OIDs (you can usually just treat them as opaque data which can be compared for equality (or looked up in a table), although sometimes it is useful to display them), although some implementations insist on doing so anyways).

cryptonector•2mo ago

Correct, ASN.1 does not tell you how to implement all its semantics. You can totally have tooling that exposes a GeneralString's blob payload to the application and lets the app handle the codeset switching aspects of GeneralString.

I want to object that anyone who really has to handle multi-codeset GeneralString values would want a better library but...

...what would that be if not a converter to/from Unicode? But what if one really wants an array/list of {codeset, string} pairs? At that point open-coding support for those escapes is probably just as well since that one might have the only application in the world that wants that!!

:laugh:

zzo38computer•2mo ago

> You can totally have tooling that exposes a GeneralString's blob payload to the application and lets the app handle the codeset switching aspects of GeneralString.

You can do that for all types, and my own implementation does just expose the payloads for all types, although it also has functions to encode and decode many of them, they are functions that must be called separately (e.g. asn1_decode_number and asn1_decode_date).

For some applications, there is no need to decode the payload anyways, and you can just treat it as opaque data if you do not need to display them (the same is true for many other types).

> I want to object that anyone who really has to handle multi-codeset GeneralString values would want a better library

I did start to try to write such a thing (it is not published yet), and the library for ISO 2022 is separate than the library for ASN.1, although they can be used together. It is intended to be usable for multiple uses. (I might also add support for character codes other than ISO 2022 (such as the encodings of Unicode and TRON code), although it is mainly intended to support ISO 2022.)

> ...what would that be if not a converter to/from Unicode?

For one thing, not all of the codes can be correctly converted to/from Unicode (especially control characters), and even if they can be, this is does not preserve some of the details, because the way the character set works is different from Unicode (e.g. some things might be considered to match in Unicode but not other character sets and vice-versa; this would also be true of case-folding, missing details, character properties, etc). For example, some information may be lost when converting to Unicode.

(This does not necessarily mean that converting to/from Unicode is never useful, although you should consider if you can do it in a better way; for example, if your program accepts input that is meant to be added to a General string (or Graphic string) in a DER file then it would be better to accept ISO 2022 input directly if possible. Converting to TRON code (especially for CJK text) might be better, but even that depends on what you are using it for; some uses do not require conversion at all.)

If you really want to store Unicode text anyways, you should consider if UTF-8 is a valid type according to the schema you are using (and use that type instead if so; note that some schemas might not care so much about the type in some cases); if not, consider prepending the three bytes <1B 25 47> when encoding it as a ISO 2022 string. (As far as I can tell, this is not really supposed to be allowed in ASN.1, but I suppose it is possible if you really need to. You might also check if it is only ASCII and avoid adding this prefix if so.)

> But what if one really wants an array/list of {codeset, string} pairs? At that point open-coding support for those escapes is probably just as well since that one might have the only application in the world that wants that!!

I think that what you will want will depend on the specific application. Some will want that, some will want something else, and also different ways that you might want handling control characters, etc.

zzo38computer•2mo ago

> I'm still confused how these ISO 2022 strings even work

There is C0, G0, C1, and G1 sets (C0 and C1 are control characters and G0 and G1 are graphic characters), and escape sequences are used to select the C or G set for bytes with or without the high bit set. Graphic string does not allow control characters and General string does allow control characters.

You probably do not need all control characters; your schema should probably restrict which control characters are allowed in each context (although the ASN.1 schema format does not seem to have any way to do this). This way, you will only handle the control characters which are appropriate for your use.

This is messy, although canonical form simplifies it by adding some restrictions (this is one of the reasons why DER is better than BER, in my opinion). TRON code is better and is much simpler than the working of ISO 2022. (Unicode has a different kind of mess; although decoding is simpler, actually handling the decoded characters in text is its own big mess for many reasons. Unicode is a stateful character set, even though the encoding is stateless; TRON code is the other way around (and with a significantly simpler stateful encoding than ISO 2022).)

> the ASN1 docs discourage using the UniversalString and GraphicString types

UniversalString is UTF-32BE and GraphicString is ISO 2022 without control characters. By knowing what they are, you should know in which circumstances they should be considered useful or not useful; I think that they should not be discouraged in general (although usually if you want Unicode, you would use UTF-8 rather than UTF-32, there are some circumstances where you might want to use UTF-32, such as if the data or program is already UTF-32 for other reasons).

(The data type which probably should be avoided is the UTC time type, which is not Y2K compliant.)

> All these different string types are intimidating if I just want unicode/ascii

If you only want ASCII, use the IA5 type (or Visible if you do not want control characters); if you only want Unicode, use the UTF-8 string type (or Universal if you want UTF-32 instead for some reason). ("IA5" is another name for ASCII that as far as I can tell hardly anyone other than ITU uses.)

However, Unicode is not a very good character set, and they should not force or expect you to use it.

As I had mentioned before, you do not need to use or implement all of the ASN.1 data types; only use the ones appropriate for your application (so, if you do not like most of the types, then don't use those types). I also made up some additional nonstandard ASN.1 types (called ASN.1X), which also might be useful for some applications; you are not required to use or implement these either.

cryptonector•2mo ago

> However, Unicode is not a very good character set, and they should not force or expect you to use it.

Unicode is an excellent character set, and for 99% of cases (much more probably) it's absolutely the best choice. So one should choose Unicode (and UTF-8) in all cases unless there is an excellent reason to do otherwise. As time passes there will be fewer and fewer cases where Unicode is not sufficient, so really we are asymptotically approaching the point at which Unicode is the only good choice to make.

This is all independent of ASN.1. But it is true that ASN.1 has built-in types for Unicode and non-Unicode strings that many other protocols lack.

Have you written up anything about ASN.1X anywhere? I'd love to take a look.

zzo38computer•2mo ago

> Have you written up anything about ASN.1X anywhere? I'd love to take a look.

ASN1_BCD_STRING (64): Represents a string with the following characters: "0123456789*#+-. " (excluding the quotation marks). Each octet encodes two characters, where the high nybble corresponds to the first character and the low nybble corresponds to the second character.

ASN1_PC_STRING (65): Represents a string of characters in the PC character set. Note that the control characters can also be used as graphic characters.

ASN1_TRON_STRING (66): Represents a string of characters in the TRON character set, encoded as TRON-8.

ASN1_KEY_VALUE_LIST (67): Represents a set of keys (with no duplicate keys) and with a value associated with each key. The encoding is the same as for a SET of the keys, but with the corresponding value immediately after each key (when they are sorted, only the keys are sorted and the values are kept with the corresponding keys).

ASN1_UTC_TIMESTAMP (68): Represents a number of UTC seconds (and optionally fractions of seconds), excluding leap seconds, relative to epoch.

ASN1_SI_TIMESTAMP (69): Represents a number of SI seconds (and optionally fractions of seconds), including leap seconds, relative to epoch.

ASN1_UTC_TIME_INTERVAL (70): Represents a time interval as a number of UTC seconds. The number of seconds does not include leap seconds.

ASN1_SI_TIME_INTERVAL (71): Represents a time interval as a number of SI seconds (which may include fractions).

ASN1_OUT_OF_BAND (72): This type is not for use for general-purpose data. It represents something which is transmitted out of band (e.g. a file descriptor) with whatever transport mechanism is being used. The transport mechanism defines how a value of this type is supposed to be encoded with whatever ASN.1 encoding is being used.

ASN1_MORSE_STRING (73): Represents a string of characters in the Morse character set. The encoding is like a relative object identifier, where 0 means an empty space, and others is like bijective base 2 with 1 for dots and 2 for dashes, with the high bit for the first dot/dash, e.g. 4 means A and 8 means U.

ASN1_REFERENCE (74): A reference to another node within the same file. (Not all implementations will support this feature.) The encoding is like a Relative Object Identifier; the first number is how many times to go to the parent node (where 0 means the reference itself), and then the rest of the numbers specify which child node of the current node to go to where 0 means the first child, 1 means the second child, etc. It can reference a primitive or constructed node of a BER file, but you cannot specify a child index for a child of a primitive node, since primitive nodes cannot have child nodes. At least one number (how many levels of parents) is required, but any number of numbers is potentially possible.

ASN1_IDENTIFIED_DATA (75): Data which has a format and/or meaning which is identified within the data. The encoding is always constructed and consists of two or three items. The first item is a set of object identifiers, object descriptors (used only for display), and/or sequences where the first item of the sequence is a object identifier. The receiver ignores any items in this set that it does not understand. The second item in a ASN1_IDENTIFIED_DATA can be any single item of any type; it is interpreted according to the object identifiers in the first set that the receiver understands. The third item is optional, and if it is present it is a key/value list of extensions; the keys are object identifiers and the values are of any type according to the object identifiers. The default value of this key/value list is an empty key/value list.

ASN1_RATIONAL (76): Stored as constructed, containing two integers, being the numerator and the denominator. The denominator must be greater than zero. If it is canonical form, then it must be lowest terms.

ASN1_TRANSLATION_LIST (77): A key/value list where the keys identify languages. If the key is null then it means the default in case no language present in this list is applicable. The types of the values depends on the application (usually they will be some kind of character strings).

In addition, the same number for the BMP string type can also be used for a UTF-16 string type, and there is a "OBJECT IDENTIFIER RELATIVE TO" type which encodes a OID as either relative or absolute (in canonical form, it is always relative when possible) in order to save space; the schema will specify what it is relative to. ANY and ANY DEFINED BY are allowed despite being removed from the most recent versions of standard ASN.1. (The schema format for these extensions is not defined, since I am not using the ASN.1 schema format; however, someone who does use it might do so if they need it.)

There is also SDER, which is a superset of DER but a subset of BER, in case you do not want the mess of BER but do not want to require strictly canonical form either; and also SDSER which uses the same encoding for types and values than SDER but but length works differently in order to support streaming better.

As is usual, you do not have to use any or all of these types, but someone might find them useful for some uses. I have used some of them in my own stuff.

cryptonector•2mo ago

Re: ASN1_KEY_VALUE_LIST, why not just do something like:

  ValueList ::= SET OF KeyValue

  KeyValue ::= SEQUENCE { key MyKeyType, value MyValueType }

  MyKeyType   ::= ... -- whatever you want here
  MyValueType ::= ... -- ditto

?

cryptonector•2mo ago

What are the numbers in parenthesis? UNIVERSAL tag values?

cryptonector•2mo ago

ASN1_BCD_STRING can be just IA5String with a constraint attached...

Your time types can be just an INTEGER with a constraint attached... (In Heimdal we use INTEGER constraints to pick a representation in the programming language.) E.g.,

  -- 64-bit signed count of seconds where 0 is the Unix epoch
  ASN1_UTC_TIMESTAMP ::= INTEGER (-18446744073709551616..18446744073709551615)

ASN1_OUT_OF_BAND can just be a NULL with an APPLICATION tag or whatever:

  Out-of-Band ::= [APPLICATION 100] NULL

or maybe an ENUMERATED or BIT STRING with named bits to indicate what kind of thing is referenced out of band. You might even use this with a SEQUENCE type instead where one member identifies an out of band datum as an index, and the other identifies the kind.

ASN1_REFERENCE is... interesting. I've not needed it, but some RPC protocols support intra-payload and even circular references, so if you have a need for that (hopefully you don't), then your ASN1_REFERENCE would be useful indeed.

ASN1_IDENTIFIED_DATA... ASN.1 has EMBEDDED-PDV, open types, and the TYPE-IDENTIFIER class -- there are many ways to do this in ASN.1. See https://github.com/heimdal/heimdal/blob/master/lib/asn1/READ...

ASN1_RATIONAL is just a tagged sequence of numerator and denominator, with a constraint that the denominator must not be zero.

OBJECT IDENTIFIER RELATIVE TO is just a CHOICE of OBJECT IDENTIFIER and RELATIVE IDENTIFIER.

Re: SDER... yeah, so Heimdal's codec produces DER but accepts a subset of BER for interop with OpenSSL and others. If you really want streaming then you'll want a variant of OER with fixed-length lengths (which IMO OER should have had, dammit), which then looks a lot like XDR but with different alignment and more types.

I had kind of expected a subset of x.680.

zzo38computer•2mo ago

> ASN1_BCD_STRING can be just IA5String with a constraint attached...

The abstract meaning matches, but the format is differently.

> ASN1_OUT_OF_BAND can just be a NULL with an APPLICATION tag or whatever

There are some uses of having a dedicated "out of band" type, such as being able to find them regardless of the schema (e.g. it might be used by a protocol that can use data with any schemas, but allows out of band data with any schema for some reason, and might want to modify the representations of out of band data when sending it to someone else).

> ASN1_IDENTIFIED_DATA... ASN.1 has EMBEDDED-PDV, open types, and the TYPE-IDENTIFIER class -- there are many ways to do this in ASN.1

EMBEDDED-PDV and those other things are different situations than I am doing, although it is similar, the use is not quite the same. ASN1_IDENTIFIED_DATA is simpler in some ways but also allows some things that EMBEDDED-PDV does not do.

Programs can also use ASN1_IDENTIFIED_DATA to identify the schema of a file that uses this type (and potentially be able to e.g. uncompress or decrypt it; this is the reason why the identifiers are allowed to be sequences and not only plain OIDs), or a part of another file.

> ASN1_REFERENCE is... interesting. I've not needed it, but some RPC protocols support intra-payload and even circular references, so if you have a need for that (hopefully you don't), then your ASN1_REFERENCE would be useful indeed.

Yes, it is what I thought too. So far I have not needed it either, but it might sometimes be useful.

> ASN1_RATIONAL is just a tagged sequence of numerator and denominator, with a constraint...

It can be defined as such in standard ASN.1, and the format is the same as that, but the abstract meaning is different. There is also a further constraint for the canonical form.

(Currently, the only place I have used this type is the tempo ratio in the .BGM lumps in Super ZZ Zero, but it would have other uses too, such as when converting data from other formats that have a rational number type.)

> OBJECT IDENTIFIER RELATIVE TO is just a CHOICE of OBJECT IDENTIFIER and RELATIVE IDENTIFIER.

It can be implemented that way in standard ASN.1 and has the same DER representation as your described type, although the abstract meaning is essentially the same as OBJECT IDENTIFIER and there is an additional constraint in the canonical form (as far as I know, this additional constraint cannot be written in standard ASN.1, but Super ZZ Zero cares about it being in canonical form (except for sound card identifiers in .BGM lumps, but this is an implementation detail for that specific part of the program)).

> I had kind of expected a subset of x.680.

Currently I am not using the schema format for ASN.1X (nor do I use the schema format of standard ASN.1); if someone else does then they might implement a variant of X.680 for use with ASN.1X. I probably would remove some stuff (and add some stuff) if I did make a variant, though.

(The use of ASN.1X is also not defined for JER, XER, OER, etc; if someone needs to, then they might do that.)

cryptonector•2mo ago

I couldn't find anything about "Super ZZ Zero". Is this open source? Do you have a link?

zzo38computer•2mo ago

Yes, it is open source, and I do have a link: https://github.com/zzo38/superzz0 (it is a game creation system similar to ZZT and MegaZeux)

cryptonector•2mo ago

Thanks!

cryptonector•2mo ago

Open types, constrained types, parameterized types, not needing tags, etc.

strongpigeon•2mo ago

> Protobuf is ok but if you actually look at how the serializers work, it's just too complex for what it achieves.

Yeah. I do remember a lot of workloads at Google where most of the CPU time was spent serializing/deserializing protos.

ses1984•2mo ago

I feel like most high throughput distributed systems eventually reach a point where some part of it is constrained by de/serialization.

Not much is faster than protobuf except for zero copy formats.

kragen•2mo ago

But zero-copy formats like FlatBuffers or Cap'n Proto can be much faster. Like, faster by an arbitrarily large factor, for data at rest.

theamk•2mo ago

ASN.1 has too much stuff. The moment you write "I made ASN.1 decoder/encoder", someone will throw TeletexString or BMPString at it. Or inheritance, as morshu9001 sad. So at this point:

- You can support all those features, and your ASM.1 library will be horribly bloated and over-engineered.

- You can support your favorite subset, but then you cannot say it's ASN.1 anymore. It will be "ASN.brabel", which only has one implementation (yours). And who wants that?

(unless you are Google and have immense developer influence... But in this case, why not design things from scratch, since we are making all-new protocol anyway?)

themafia•2mo ago

> someone will throw TeletexString or BMPString

ASCII with escapes and UCS-2.

> horribly bloated and over-engineered.

It's no more or less complicated than XML, JSON or CSV. Which is why you can use ASN.1 to serialize to and from all these formats. ASN.1 provides you an additional level of schema above these. It simply allows you to describe your problem.

I find ASN.1 far more sane and useful than something like JSON Schema which is just as "bloated and over-engineered." It turns out describing data is not a simple problem.

kragen•2mo ago

ASN.1 is far, far more complicated than JSON or any particular flavor of CSV, in part because it does provide an extra level of schema that those other formats don't.

theamk•2mo ago

Nope, TeletexString is ITU T.61, a.k.a codepage 1036. So Backspace (0x08) is OK, but Tab (0x09) is not.

What, your implementation does not include CP1036 to Unicode translation table? Sorry, it's no longer ASN.1, it's now ASN.themafia.

Oh, it does? Then how about xmlhstring vs hstring, are you handing difference properly?

What about REAL type? Does your asn.1 library include support for arbitrary-precision floating point numbers, both base 2 and 10? And no, you cannot use "double", I am sure there is an application out there which uses base 10 reals, or has 128 bits of mantissa.

ASN.1 is full of overengineered, ancient things like those, and the worst part - once you actually start using it to interoperate with other software, there is a good chance you'll see them. If you want something that people actually implement fully, choose something else.

themafia•2mo ago

> Nope, TeletexString is ITU T.61

Yes, read the standard, it's ASCII with special escape sequences. Which I don't have to render, I only have to convey them correctly across the network.

> What, your implementation does not include CP1036 to Unicode translation table? Sorry, it's no longer ASN.1, it's now ASN.themafia.

Why would I need the table?

> Oh, it does? Then how about xmlhstring vs hstring, are you handing difference properly?

What exactly needs to be "handled?"

> Does your asn.1 library include support for arbitrary-precision floating point numbers

Yes, because there are third party libraries which supply this functionality, so it's hardly any special effort to implement.

> ancient things like those

So no one ever needs arbitrary precision integers? You'll eventually need them for some application. Now all you have is ad-hoc implementations and almost no software to interoperate with them or verify them.

> If you want something that people actually implement fully, choose something else.

Name anything else with the same features yet is easier to implement "fully." Seriously, go read the JSON Schema specification. This is a _hard_ problem. If you think you've found an easy solution it's likely you've just left most of the functionality on the floor. And now we have to ask "is software X compatible with software Y?" Obviating the entire point of a standard.

adastra22•2mo ago

ASN.1 is a nightmare, and I would never use it for a greenfield project.

cryptonector•2mo ago

I... wouldn't use it for a greenfield project either unless I got good at porting Luke Howard's Swift ASN.1 stack to whatever language I might be using that isn't C. For C I'd just use Heimdal's awesome ASN.1 compiler and be done. Even then I would be tempted to use flatbuffers instead, or else I'd have to go implement OER (a bunch of work I don't really care to do).

The problem with ASN.1 -- the only real problem with ASN.1, is lack of excellent tooling.

cryptonector•2mo ago

Thank you! Now I don't have to be the one saying this. Props if you use OER over DER. But since OP needs available tooling they might as well go to flatbuffers, which is much better than PB much like OER is much better than DER.

IshKebab•2mo ago

I don't see how you can seriously criticise Protobuf's very simple encoding scheme as being too complex while recommending ASN.1!!

Totally mad.

wilg•2mo ago

One of the best parts of Protobuf is that there's a fully compatible JSON serialization and deserialization spec, so you can offer a parallel JSON API with minimal extra work.

bglusman•2mo ago

Yes! Came to comments to see if that was discussed/commented on above with link: https://protobuf.dev/programming-guides/json/

pyrolistical•2mo ago

Compressed JSON is good enough and requires less human communication initially.

Sure it will blow up in your face when a field goes missing or value changes type.

People who advocate paying the higher cost ahead of time to perfectly type the entire data structure AND propose a process to do perform version updates to sync client/server are going to lose most of the time.

The zero cost of starting with JSON is too compelling even if it has a higher total cost due to production bugs later on.

When judging which alternative will succeed, lower perceived human cost beats lower machine cost every time.

This is why JSON is never going away, until it gets replaced with something with even lower human communication cost.

esafak•2mo ago

It won't go away in the same way COBOL won't. That does not mean we should be using it everywhere for greenfield projects.

DyslexicAtheist•2mo ago

> People who advocate paying the higher cost ahead of time to perfectly type the entire data structure AND propose a process to do perform version updates to sync client/server are going to lose most of the time.

that's true. But people also rather argue about security vulnerabilities than getting it right from the get-go. Why spend an extra 15 mins effort during design when you can spend 3 months revisiting the ensuing problem later.

jstanley•2mo ago

Alternatively: why spend an extra 15 mins on protobuf every other day, when you can put off the 3-month JSON-revisiting project forever?

OccamsMirror•2mo ago

I use ConnectRPC (proto). I definitely do not spend any extra time. In fact the generated types for my backend and frontend saves me time.

morshu9001•2mo ago

I've gone the all-JSON route many times, and pretty soon it starts getting annoying enough that I lament not using protos. I'm actually against static types in languages, but the API is one place they really matter (the other is the DB). Google made some unforced mistakes on proto usability/popularity though.

almosthere•2mo ago

why are you against static types in languages?

I once converted a fairly large JS codebase to TS and I found about 200 mismatching names/properties all over the place. Tons of properties we had nulls suddenly started getting values.

MobiusHorizons•2mo ago

Sounds like this introduced behavior changes. How did you evaluate if the new behavior was desirable or not? I’ve definitely run into cases where the missing fields were load bearing in ways the types would not suggest, so I never take it for granted that type error in prod code = bug

sevensor•2mo ago

The most terrifying systems to maintain are the ones that work accidentally. If what you describe is actually desired behavior, I hope you have good tests! For my part, I’ll take types that prevent load-bearing absences from arising in the first place, because that sounds like a nightmare.

Although, an esoteric language defined in terms of negative space might be interesting. A completely empty source file implements “hello world” because you didn’t write a main function. All integers are incremented for every statement that doesn’t include them. Your only variables are the ones you don’t declare. That kind of thing.

almosthere•2mo ago

it was desirable because our reason for the conversion was subtle bugs all over the place where data was disappearing.

MobiusHorizons•2mo ago

Makes sense. That sounds like a good reason to do it. Unfortulately I've also seen people try to add typescript or various linters without adequate respect for the danger associated with changing code that seems to be working but looks like a bug especially when it requires manual testing to verify.

morshu9001•2mo ago

It costs time, distracts some devs, and adds complexity for negligible safety improvement. Especially if/when the types end up being used everywhere because managers like that metric. I get using types if you have no tests, but you really need tests either way. I've done the opposite migration before, TS to JS.

Oh I forgot to qualify that I'm only talking about high level code, not things that you'd use C or Rust for. But part of the reason those langs have static types is they need to know sizes on stack at compile time.

barremian•2mo ago

> When judging which alternative will succeed, lower perceived human cost beats lower machine cost every time.

Yup this is it. No architect considers using protos unless there is an explicit need for it. And the explicit need is most times using gRPC.

Unless the alternative allows for zero cost startup and debugging by just doing `console.log()`, they won't replace JSON any time soon.

Edit: Just for context, I'm not the author. I found the article interesting and wanted to share.

duped•2mo ago

Print debugging is fine and all but I find that it pays massive dividends to learn how to use a debugger and actually inspect the values in scope rather than guessing which are worth printing. It also is useless when you need to debug a currently running system and can't change the code.

And since you need to translate it anyway, there's not much benefit in my mind to using something like msgpack which is more compact and self describing, you just need a decoder to convert to json when you display it.

themafia•2mo ago

> rather than guessing

I'm not guessing. I'm using my knowledge of the program and the error together to decide what to print. I never find the process laborious and I almost always get the right set of variables in the first debug run.

The only time I use a debugger is when working on someone else's code.

duped•2mo ago

That's just an educated guess. You can also do it with a debugger.

mekoka•2mo ago

The debugger is fine, but it's not the key to unlock some secret skill level that you make it out to be. https://lemire.me/blog/2016/06/21/i-do-not-use-a-debugger/

duped•2mo ago

I didn't say it's some arcane skill, just that it's a useful one. I would also agree that _reading the code_ to find a bug is the most useful debugging tool. Debuggers are second. Print debugging third.

And that lines up with some of the appeals to authority there that are good, and that are bad (edited to be less toxic)

mekoka•2mo ago

Even though I'm using the second person, I actually don't care at all to convince you particularly. You sound pretty set in your ways and that's perfectly fine. But there are other readers on HN who are already pretty efficient at log debugging or are developing the required analytical skills and I wanted to debunk the unsubstantiated and possibly misleading claims in your comments of some superiority in using a debugger for those people.

The logger vs debugger debate is decades old, with no argument suggesting that the latter is a clear winner, on the contrary. An earlier comment explained the log debugging process: carefully thinking about the code and well chosen spots to log the data structure under analysis. The link I posted was to confirms it as a valid methodology. Overall code analysis is the general debugging skill you want to sharpen. If you have it and decide to work with a debugger, it will look like log debugging, which is why many skilled programmers may choose to revert to just logging after a while. Usage of a debugger then tends to be focused on situations when the code itself is escaping you (e.g. bad code, intricate code, foreign code, etc).

If you're working on your own software and feel that you often need a debugger, maybe your analytical skills are atrophying and you should work on thinking more carefully about the code.

bdavisx•2mo ago

Debuggers are great when you can use them. Where I work (financial/insurance) we are not allowed to debug on production servers. I would guess that's true in a lot of high security environments.

So the skill of knowing how to "println" debug is still very useful.

theshrike79•2mo ago

Also: debugging

You (a human) can just open a JSON request or response and read what's in it.

With protobuf you need to build or use tooling that can see what's going on.

generativenoise•2mo ago

It is only "human readable" since our tooling is so bad and the lowest common denominator tooling we have can dump out a sequence of bytes as ascii/utf-8 text somewhat reliably.

One can imagine a world where the lowest common denominator format being a richer structured binary format where every system has tooling to work with it out of the box and that would be considered human readable.

taco_emoji•2mo ago

i really hate this blog post format of posting the new thing you discovered as if it's objectively better than the previous thing. no it's not, you just like it better, which is FINE, just own it

morshu9001•2mo ago

Protos are great. Last time I did a small project in NodeJS, I set up a server that defines the entire API in a .proto and serves each endpoint as either proto or json, depending on the content header. Even if the clients want to use json, at least I can define the whole API in proto spec instead of something like Swagger.

So my question is, why didn't Google just provide that as a library? The setup wasn't hard but wasn't trivial either, and had several "wrong" ways to set up the proto side. They also bait most people with gRPC, which is its own separate annoying thing that requires HTTP/2, which even Google's own cloud products don't support well (e.g App Engine).

P.S. Text proto is also the best static config language. More readable than JSON, less error-prone than YAML, more structure than both.

noctune•2mo ago

You might be interested in https://connectrpc.com/. It's basically what you describe, though it's not clear to me how well supported it is.

morshu9001•2mo ago

Yeah that one looked good. I don't remember why I didn't use it that time, maybe just felt it was easy enough to DIY that I didn't feel like using another dep (given that I already knew express and proto in isolation). The thing is, Google themselves had to lead the way on this if they wanted protobuf to be mainstream like JSON.

c-cube•2mo ago

That's what Twirp (https://github.com/twitchtv/twirp) is about. Protobuf or JSON, over any HTTP, with a simple URL schema. It's fairly simple.

rileymichael•2mo ago

highly recommend twirp, even in current year. connectrpc seems to have stalled so there isn't a ton of languages w/server support, and because of the grpc interop on top of their own protocol it's a bit of an undertaking to roll your own.

the twirp spec however is so simple you can throw together your own code generator in an afternoon for whatever language you want.

jtrn•2mo ago

I like Python-like indentation, but I usually read Python in an IDE or code blocks. JSON in a non-monospace environment might be problematic with some fonts. Hell, I pass JSON around in emails and word processors all the time.

wg0•2mo ago

Protos don't work out of the box in any browsers as far as I checked last time unless you're willing to deploy a proxy in front to do the translation and it requires extra dependency on the browser as well.

Plus - tooling.

JSON might not be simpler or strict but it gets the job done.

If JSON's size or performance is causing you to go out of business, you surely have bigger problems than JSON.

taftster•2mo ago

Right, I find the use of protobuf lacking with direct support in the browser. Since JSON is a native data encoding format of the browser (effectively), it's just easier to have a JSON-based API.

Yes, there are abstractions or other hacks that would bolt on protobuf support in the browser, but that's not ideal in my mind.

Protobuf is an ideal exchange format when you're not dealing with the public as a whole. That is, a private or corporate API for your data processing pipelines, etc.

dhussoe•2mo ago

I like that JSON parsing libraries (Serde etc.) allow you to validate nullability constraints at parse-time. Protobuf's deliberate lack of support for required fields means that either you kick that down to every callsite, or you need to build another parsing layer on top of the generated protobuf code.

Now, there is a serde_protobuf (I haven't used it) that I assume allows you to enforce nullability constraints but one of the article's points is that you can use the generated code directly and:

> No manual validation. No JSON parsing. No risk of type errors.

But this is not true—nullability errors are type errors. Manual validation is still required (except you should parse, not validate) to make sure that all of the fields your app expects are there in the response. And "manual" validation (again, parse don't validate) is not necessary with any good JSON parsing library, the library handles it.

myvoiceismypass•2mo ago

'Parse, don't validate' is such great reading: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

recursivecaveat•2mo ago

My dream binary format is schema driven, as compact and efficient as Capt Proto or such, but just optionally embeds the entire schema into the message. Then we can write a vim plugin that just opens the file in human readable form without having to fish for the schema. Whenever I am using binary formats, it's because I have a list of millions of objects of the same types. Seems to me that you may as well tack 1KB of schema onto a 2GB message and make it self-describing so everyone's life is easier.

barremian•2mo ago

You could have a look at Avro (https://avro.apache.org/) and Yardl (https://microsoft.github.io/yardl/).

EdwardDiego•2mo ago

As another user suggested, Avro is something to look into.

HelloNurse•2mo ago

For many web services it would be more often 200 KB of schema (many possible request and responses, some of them complex) tacked onto a less than 1 KB message (brief requests and acknowledgements without significant data inside).

squirrellous•2mo ago

It’s possible to build this around protobuf. Google has a rich internal protobuf ecosystem that does this and supports querying large amounts of protobuf data without specifying schemas. They are only selectively open sourced. Have a look at riegeli if you are interested.

https://github.com/google/riegeli

coffeeaddict1•2mo ago

> protobuf

As an aside, like all things Google, their C++ library is massive (14mb dll) and painful to build (takes nearly 10 minutes on my laptop).

mb7733•2mo ago

Yeah and (1) the codegen produces massive headers that slow compilation of anything that touches them (2) the generated classes are really awkward to use. Not a big fan of the experience of protobuffer generated code in a large C++ code base.

It's lead to a huge layer of adapters and native c++ classes equivalent to the protobuffers classes to try and mitigate these issues.

keithnz•2mo ago

CBOR is a pretty good middle ground

teleforce•2mo ago

Kudos to the poster and the author of this article. I think this is by far the most insightful technical post I've read this year on HN.

>If you develop or use an API, there’s a 99% chance it exchanges data encoded in JSON.

Just wondering if the inherent defiencies of JSON can somewhat be improved by CUE lang since the former is very much pervasive and the latter understand JSON [1],[2].

[1] Configure Unify Execute (CUE): Validate, define, and use dynamic and text‑based data:

https://cuelang.org/

[2] Cue – A language for defining, generating, and validating data:

https://news.ycombinator.com/item?id=20847943

Joel_Mckay•2mo ago

BSON dealt with a lot of the JSON limitations, and ambiguous type issues.

Batching with message pooling to a transaction payload size limit actually made it performant. =3

loph•2mo ago

How many times has this problem been "solved"?

https://en.wikipedia.org/wiki/DCE/RPC

DCE/RPC worked in 1993, and still does today.

Protocol buffers is just another IDL.

PantaloonFlames•2mo ago

DCE STILL WORKS?

Where!??

JoshMock•2mo ago

Protobuf is a great format with a lot of benefits, but it's missing one that I wish it could support: zero-copy. The ability to transport data between processes, services and languages with effectively zero time spent on serialization and deserialization.

It appears possible in some cases but it's not universally the case. Which means that similar binary transport formats that do support zero-copy, like Cap'n Proto, offer most or all of the perks described in this post, with the addition of ensuring that serialization and deserialization are not a bottleneck when passing data between processes.

jonny_eh•2mo ago

Is that a format/serialization issue, or library/implementation issue?

__s•2mo ago

Format: https://news.ycombinator.com/item?id=23589117

ElectricalUnion•2mo ago

Serialization issue. From the Introduction to Cap’n Proto:

"Cap’n Proto is INFINITY TIMES faster than Protocol Buffers. (...) there is no encoding/decoding step. The Cap’n Proto encoding is appropriate both as a data interchange format and an in-memory representation, so once your structure is built, you can simply write the bytes straight out".

I take it as a rationalization of what OLE Compound File Binary - internal Microsoft Office memory structures serialized "raw" as file format - would look like if they paid more attention to being backward and forward compatible and extensible.

TillE•2mo ago

Google has a library/format for that too, with FlatBuffers. Different use cases and advantages really, not clearly better/worse.

kragen•2mo ago

Kenton Varda also worked on Protobufs at Google before he wrote CapnProto, I think.

pornel•2mo ago

It depends how you actually use the messages. Zero-copy can be slowing things down. Copying within L1 cache is ~free, but operating on needlessly dynamic or suboptimal data structures can add overheads everywhere they're used.

To actually literally avoid any copying, you'd have to directly use the messages in their on-the-wire format as your in-memory data representation. If you have to read them many times, the extra cost of dynamic getters can add up (the format may cost you extra pointer chasing, unnecessary dynamic offsets, redundant validation checks and conditional fallbacks for defaults, even if the wire format is relatively static and uncompressed). It can also be limiting, especially if you need to mutate variable-length data (it's easy to serialize when only appending).

In practice, you'll probably copy data once from your preferred in-memory data structures to the messages when constructing them. When you need to read messages multiple times at the receiving end, or merge with some other data, you'll probably copy them into dedicated native data structs too.

If you change the problem from zero-copy to one-copy, it opens up many other possibilities for optimization of (de)serialization, and doesn't keep your program tightly coupled to the serialization framework.

squirrellous•2mo ago

I don’t understand this argument. It seems to originate from capnp’s marketing. Capnp is great, but the fact that protobuf can’t do zero copy should be more an academic issue than practical. Applications that want to use a schema always needs their own native types that serialize and deserialize from binary formats. For protobuf you either bring your own or use the generated type. For capnp you have to bring your own. So a fair comparison of serialization cost would compare:

native > pb binary > native

native > capnp binary > native

If you benchmark this, the two formats are very close. Exact perf depends on payload. Additionally, one could write their own protobuf serializer with protoc they really need to.

brunoluiz•2mo ago

I am curious why the author did not consider ConnectRPC (http://connectrpc.com/), which could be a great middle ground since it is compatible with both Protobuf and JSON served APIs. It is developed by Buf, which has been a leader Protobuf tooling.

dfabulich•2mo ago

> With JSON, you often send ambiguous or non-guaranteed data. You may encounter a missing field, an incorrect type, a typo in a key, or simply an undocumented structure. With Protobuf, that’s impossible. Everything starts with a .proto file that defines the structure of messages precisely.

This deeply misunderstands the philosophy of Protobuf. proto3 doesn't even support required fields. https://protobuf.dev/best-practices/dos-donts/

> Never add a required field, instead add `// required` to document the API contract. Required fields are considered harmful by so many they were removed from proto3 completely.

Protobuf clients need to be written defensively, just like JSON API clients.

klysm•2mo ago

It’s also conflating the serialization format with contracts

shortrounddev2•2mo ago

Most web frameworks do both at the same time to the point where having to write code which enforced a type contract after deserializing is a delabreaker for me. I eant to be able to define my DTOs in one place, once, and have it both deserialize and enforce types/format. Anything else is code smell

Seattle3503•2mo ago

I'm in the same boat. I mostly write Rust and Python. Using serde_json and Pydantic, you get deserialization and validation at the same time. It allows you to de-serialize really "tight" types.

Most of my APIs are internal APIs that accept breaking changes easily. My experience with protobufs is that it was created to solve problems in large systems with many teams and APIs, where backwards compatibility is important. There are certainly systems where you can't "just" push through a breaking API change, and in those cases protobufs make sense.

masklinn•2mo ago

> My experience with protobufs is that it was created to solve problems in large systems with many teams and APIs

Also significant distribution such that it’s impossible to ensure every system is updated in lockstep (at least not without significant downtime), and high tail latencies e.g. a message could be stashed into a queue or database and processed hours or days later.

ablob•2mo ago

I feel like that's fine since both things go hand in hand anyway. And if choosing the JSON-format comes with a rather high amount of contract-breaches it might just be easier to switch that instead of fixing the contract.

peterkelly•2mo ago

Unless a violation of that contract can lead to a crash or security vulnerability...

ablob•2mo ago

The post is about changing the serialization-format so enforcing contracts becomes esier; and I am defending the post, so I don't understand what you're hinting at here.

happymellon•2mo ago

Then reject the request if it is incomplete?

cookiengineer•2mo ago

Isn't the core issue just language and implementation differences of clients vs servers here?

I went all in with Go's Marshalling concept, and am using my Gooey framework on the client side nowadays. If you can come around Go's language limitations, it's pretty nice to use and _very_ typesafe. Just make sure to json:"-" the private fields so they can't be injected.

[1] shameless drop https://github.com/cookiengineer/gooey

bloppe•2mo ago

Skew is an inherent problem of networked systems no matter what the encoding is. But, once the decoding is done, assuming there were no decoding errors in either case, at least with protobuf you have a statically typed object.

You could also just validate the JSON payload, but most people don't bother. And then they just pass the JSON blob around to all sorts of functions, adding, modifying, and removing fields until nobody knows for sure what's in it anymore.

happymellon•2mo ago

> You could also just validate the JSON payload, but most people don't bother.

I don't think I have ever worked somewhere that didn't require people to validate inputs.

The only scenario could be prototypes that made it to production, and even when its thrown over the wall I'll make it clear that it is unsupported until it meets minimum requirements. Who does it is less important than it happening.

Ghoelian•2mo ago

The convention at every company I've worked at was to use DTO's. So yes, JSON payloads are in fact validated, usually with proper type validation as well (though unfortunately that part is technically optional since we work in php).

Usually it's not super strict, as in it won't fail if a new field suddenly appears (but will if one that's specified disappears), but that's a configuration thing we explicitly decided to set this way.

HelloNurse•2mo ago

The blog seems to contain other similar misunderstandings: for example the parallel article against using SVG images doesn't consider scaling the images freely a benefit of vector formats.

chrismorgan•2mo ago

https://aloisdeniel.com/blog/i-changed-my-mind-about-vector-... seems fairly clearly to be talking about icons of known sizes, in which case that advantage disappears. (I still feel the article is misguided and that the benefit of runtime-determined scaling should have been mentioned, and see no benchmarks supporting its performance theses, and I’d be surprised if the difference was anything but negligible; vector graphic pipelines are getting increasingly good, and the best ones do not work in the way described, and could in fact be more efficient than raster images at least for simpler icons like those shown.)

eviks•2mo ago

Are there display pipelines that cache the generated-for-my-device-resolution svgs instead of doing all the slower parsing etc from scratch every time, achieving benefits of both worlds? And you can still have runtime-defined scaling by "just" rebuilding the cache?

chrismorgan•2mo ago

Increasingly I think you’ll find that the efficient format for simple icons like this actually isn’t raster, due to (simplifying aggressively) hardware acceleration. We definitely haven’t reached that stage in wide deployment yet, but multiple C++ and Rust projects exist where I strongly suspect it’s already the case, at least on some hardware.

HelloNurse•2mo ago

The best place for such a cache is a GPU texture, and in a shader that does simple texture mapping instead of rasterizing shapes it would cost more memory reads in exchange for less calculations.

IIsi50MHz•2mo ago

Haiku (OS) caches the vector icons rendered from HVIF[1][2] files which are used extensively for UI.

I didn't find details of the caching design. Possibly it was mentioned to me by waddlesplash on IRC[3].

[1] 500 Byte Images: The Haiku Vector Icon Format (2016) http://blog.leahhanson.us/post/recursecenter2016/haiku_icons...

[2] Why Haiku Vector Icons are So Small | Haiku Project (2006) https://www.haiku-os.org/articles/2006-11-13_why_haiku_vecto...

[3] irc://irc.oftc.net/haiku

eviks•2mo ago

> The drawback to using vector images is that it can take longer to render a vector image than a bitmap; you basically need to turn the vector image into a bitmap at the size you want to display on the screen.

Indeed, would be nice if one of these blogs explained the caching solution to tackle the drawback.

Another issue, I think, especially at smaller sizes, is the pixel snapping might be imperfect and require "hints" like in fonts? Wonder if these icons suffer from these/address it

cosmotic•2mo ago

Icons are no longer fixed sizes. They're are numerous dpi/scaling settings even if the "size" doesn't change.

chrismorgan•2mo ago

The article goes into that, it’s making a sprite map of at least the expected scaling factors.

rerdavies•2mo ago

There are no "expected" scaling factors anymore.

HelloNurse•2mo ago

> seems fairly clearly to be talking about icons of known sizes, in which case that advantage disappears.

That's the point: obliviousness to different concerns and their importance.

Among mature people, the main reason to use SVG is scaling vector graphics (in different contexts, including resolution-elastic final rendering, automatically exporting bitmap images from easy to maintain vector sources, altering the images programmatically like in many icon collections); worrying about file sizes and rendering speed is a luxury for situations that allow switching to bitmap images without serious cost or friction.

gipp•2mo ago

I think the OP meant something far simpler (and perhaps less interesting), which is that you simply cannot encounter key errors due to missing fields, since all fields are always initialized with a default value when deserializing. That's distinct from what a "required" field is in protobuf

orphea•2mo ago

Depending on the language/library, you can get exactly the same behavior with JSON.

Pet_Ant•2mo ago

> Protobuf clients need to be written defensively, just like JSON API clients.

Oof. I'd rather just version the endpoints and have required fields. Defensive is error-prone, and verbose, harder to reason about, and still not guaranteed. It really feels like an anti-pattern.

_heimdall•2mo ago

I need to dig deeper into Protobuf. I've never quite understood the benefit of Protobug over XML.

_el1s7•2mo ago

Why I stopped caring about "Why I stopped [insert something widely used here]" click bait articles

PunchyHamster•2mo ago

"I've used this optimization technique to make app faster"

The app 20req/sec

The app after optimizations: 20req/sec (It waits for db query anyway)

PantaloonFlames•2mo ago

Yes. Proto makes sense when the request rate is much higher and the network is constrained.

Otherwise, json is sufficient.

kragen•2mo ago

If it's the network that's constrained and not the CPU, gzipped JSON will often beat protobufs.

bzmrgonz•2mo ago

What about TOON op? I understand that's the standard poised to takeover from Json. Your thoughts??

beders•2mo ago

Know the consumer of your API.

If that is just your team, use whatever tech gets you there quick.

However, if you need to provide some guarantees to a second or third party with your API, embrace standards like JSON, even better, use content negotiation.

socalgal2•2mo ago

https://github.com/ajv-validator/ajv

138 million downloads from npm in the last week. Yes, you can validate your JSON

cryptonector•2mo ago

If you're gonna switch from JSON to... PB, then you might as well switch to flatbuffers before you realize it's better than PB and save yourself a lot of trouble.

lenkite•2mo ago

There is no mention in this thread of how the author is using Dart and Shelf for his REST API's. Code is rather readable and elegant. This is not a combo I have every tried before. Does anyone have any experience of how it compares versus REST services written in Go/Rust/Python ?

lazy_afternoons•2mo ago

IIRC from the red wild boar book, JSONs biggest win is that it got the adoption.

Getting everyone to agree on a standard was/is/will be the tougher part.

rock_artist•2mo ago

I feel few points weren’t addressed in the article.

1. Size, biggest problem with JSON can happen when things gets too big. So here other formats might be better. Yet, as a reminder JSON has the binary version named BSON.

2. Zero batteries. JSON is readable by humans but also almost self explanatory format. Most languages has built in or quick drop in for json. Still, it’s easy to implement a limited JSON parser from scratch when in need (eg. Pure on func in C on a tiny device).

Working with Protobuf and MsgPack in the past, You have much more tooling involved especially if data passes between parts written in different languages.

3. Validation, JSON is simple. But there are solutions such as JSON Schema.

paulddraper•2mo ago

BSON is not simply a binary encoding of JSON.

It is a JSON superset with binary encoding, created by MongoDB. (And there is even a JSON encoding of BSON, called extended JSON.)

AFAIK there is no widely adopted binary pure adaptation of JSON. (There are application-specific storage formats, like PostgreSQL JSONB, or SQLite JSONB.)

——-

Moreover, JSON is relatively compact. BSON or other self descriptive binary formats are often around the same size of JSON. MessagePack aggressively tries to be compact and is, depending on the data. BSON doesn’t try to be compact, rather it improves the parse speed.

WorldMaker•2mo ago

CBOR [0] is probably the closest to a widely adopted "pure" adaptation of JSON. It is still technically a superset of JSON, but it tries to be closely matched and frequently cross-references the JSON specs directly, especially because it is also an IETF tracked standard like JSON, and in so far as widely adopted is concerned is included in web standards like WebAuthn today. (For instance, you can't handle Passkeys without some amount of CBOR. The presumed next steps to wider adoption now that all browsers have internal CBOR encoders/decoders would be to add a web platform JS API for it as well.)

However, yes, JSON compresses extremely well even in ancient gzip, but especially in Brotli, and desiring compaction of your API responses alone isn't necessarily the best reason to prefer a binary encoding of JSON to just using JSON and letting compression do its thing.

[0] https://www.rfc-editor.org/rfc/rfc8949.html

paulddraper•2mo ago

> JSON compresses extremely well even in ancient gzip

Correct. MessagePack comes out significantly ahead for small messages, otherwise it's nearly a wash.

> CBOR is probably the closest to a widely adopted "pure" adaptation of JSON

It adds (1) binary data (2) numeric variants including big integers (3) timestamps (4) tagged/semantic types.

I.e. it adds quite a bit. That's not necessarily a bad thing. Browser implementation would be great, but I'm skeptical.

ansgri•2mo ago

Compressed jsonlines with strong schema seems to cover most cases where you aren't severely constrained by CPU or small message size (i.e., mostly embedded stuff).

davedx•2mo ago

"Ultra-efficient"

Searched the article, no mention of gzip, and how most of the time all that text data (html, js and css too!) you're sending over the wire will be automatically compressed to...... an efficient binary format!

So really, the author should compare protobufs to gzipped JSON

chillfox•2mo ago

Last time I was evaluating different binary serialization formats for an API I was really hoping to get to use one of the cool ones, but gzipped JSON just beat everything and it wasn't even close.

theshrike79•2mo ago

There are some compression formats that perform better than gzip, but it's very dependent on the data you're compressing and your timing requirements (is bandwidth or CPU more important to conserve).

But in the end compressed JSON is pretty good. Not perfect, but good enough for many many things.

MangoToupe•2mo ago

I would think that serialization/deserialization time would be the largest drawback of json (at least for serving APIs). Pretty much all the other pain points can slowly be ironed out over time, albeit with deeply ugly solutions.

thayne•2mo ago

It depends on what your data looks like. If your content is mostly UTF-8 text, with dynamic keys, then I wouldn't expect protobuf to have much of an advantage over JSON for parsing to an equivalent structure. On the other hand, if you have binary data that needs to be base64 encoded in JSON, then protobuf has a significant advantage.

bloppe•2mo ago

https://auth0.com/blog/beating-json-performance-with-protobu...

tmikaeld•2mo ago

So a potential 4-9% difference..

NOT worth it, especially if the whole infra is already using JSON.

bloppe•2mo ago

Per the post:

> This can sound like nothing, but considering that Protobuf has to be converted from binary to JSON - JavaScript code uses JSON as its object literal format - it is amazing that Protobuf managed to be faster than its counterpart.

Presumably the difference would be much larger for languages that can actually represent a statically-typed structure efficiently.

Also, the tradeoffs have changed since Protobuf was invented. Network bandwidth has gotten cheaper faster than CPU bandwidth has, so the en/de-coding speed is more important than the packet size in many situations. And if you don't use gzip, Protobuf is much faster (especially in non-JS languages, and especially if you use fixed-size integer types instead of variants).

andersmurphy•2mo ago

Or streaming brotli/zstd json/html where the compression window can be used for the duration of the connection.

akoboldfrying•2mo ago

But in that case the server/CDN won't be able to cache the gzipped forms of the individual files -- so probably a win for highly dynamic/user-specific content, but a loss for static or infrequently generated content.

WorldMaker•2mo ago

Brotli also benefits out-of-the-box in fresh compression windows with some common JSON patterns always in the Brotli static dictionary.

JodieBenitez•2mo ago

This is so obvious to me... JSON vs. JSON + mod_deflate is just night and day.

orphea•2mo ago

Do you compress data if it's then encrypted (HTTPS)?

https://en.wikipedia.org/wiki/CRIME

coolThingsFirst•2mo ago

How on earth would gzipping larger amount of data be more efficient than gzipping smaller amount of data?

ablob•2mo ago

It's a question of entropy. Data is rarely truly random and for larger data there is a lot higher chance of having this "unrandomness" occur.

If your data consists of 4 kilobytes of just 00_01, then you gain a lot by just remembering:

  "write 00_01 2000 times".

Conversely, if the small amount of data is 00_01_00_01_00_01 then using the previous format would yield:

  "write 00_01 3 times"

As you can see, it does not nearly save as much space in comparison with the original data, hence it's less efficient to use the format. The specifics are highly dependent on the compression algorithmm used so take the example with a grain of salt, but I hope it gets the basic idea of why it can be more efficient across.

sergiotapia•2mo ago

Personally, I looked into protobuf for our Elixir/React Native wombocombo but the second I realized we would have to deploy app updates when we added or removed a field from the response structure it became a non-starter.

I can't imagine using protobuf when you're in the first 5 years of a product.

xarope•2mo ago

I'm pretty sure protobuf ignores new fields (if you add; assuming you add as an append, and not change the field ordering), and it recommends you not to remove a field to ensure backward compatibility.

gethly•2mo ago

Reading the first few paragraphs and immediately seeing PB made me instantly think of "Every master was once a beginner.".

When you'll go through your own journey, and inevitably end back with json, do write another blog post :) ... we've all been there.

umvi•2mo ago

Protobufs are a pain to debug and maintain compared to json and modern browsers support zstd compression making json "efficient"

undefeated•2mo ago

Complaining about JSON and then proceeding to write your API in Dart is... interesting.

zabil•2mo ago

I have a slight dislike for JSON+REST for API's.

The design overhead involved in determining the correct URL and HTTP method adds a layer of subjectivity to the design and bike shedding arguments.

I’m not a huge fan of Protobuf/GRPC either, if there’s a better alternative I believe RPC is the right approach for exposing APIs.

globular-toast•2mo ago

> An API (Application Programming Interface) is a set of rules that allow two systems to communicate. In the web world, REST APIs ... are by far the most widespread.

I too had this overly restrictive view of "APIs" for too long. One I started to think about it as the interface between any teo software components it really changed the way I did programming. In other words, a system itself is composed and that composition is done via APIs. There's no point treating "the API" as something special.

cheema33•2mo ago

I use GraphQL. It has a higher learning curve. But it addresses the shortcomings listed by the referenced blog article. It offers type safety, efficiency and modern tooling. And it is also human readable.

If you use good tooling, you can have a mutation change a variable type in the database and that type change is automatically reflected in the middleware/backend and the typescript UI code. Not only that libraries like HotChocolate for asp.net come with built-in functions for filtering, pagination, streaming etc.

knallfrosch•2mo ago

I have never encountered the use case where the data sent by the backend over the normal CRUD operations was the bottleneck.

But we have built protobuf into a web server that handles 2 requests per second. Why? We wanted to learn about it on the job.

I think that's 99% of Protobuf usage.

mjmas•2mo ago

> The same message in Protobuf binary

> → About 23 bytes.

This appears to be a nice ai-generated result. There are already at least 23 bytes of data there (number 42 (1 byte) + string of 5 chars + string of 17 chars + 1 boolean), so that plus field overhead will be more.

liampulles•2mo ago

I know that OpenAPI code gen support is spotty, and that protobuf codegen (in my experience) is quite good, but all of this starts from the idea that the SaaS I'm consuming has actually documented their API properly.

A sizable portion of the integrations I've built have had to be built by hand, because there are inevitable stupid quirks and/or failures I've had to work around. For these usecases, using JSON is preferable, because it is easy for me to see what I have actually been sent, not what the partially up to date spec says I should've been sent.

This is consistent with the idea that communication over the internet should consist of (encrypted and compressed) plain text. It's because human beings are going to have to deal with human reality at the end of the day.

pjacotg•2mo ago

On the human readability concern, we use protobuf converted to text format. It looks JSON like so very readable and comes with all the other benefits of protobuf.

whatevaa•2mo ago

I can use text based API's (like with JSON) with nothing else but text editor and curl. No other tooling actually required. Meanwhile if binary protocol tooling in your stack sucks, then it just sucks.

Had the joy of implementing calling SOAP service with client generated from wsdl in .net core 2/3 times. Tooling was shit poorly undocumented amd with crappy errors at that time. Run only with very specific versions in very specific way. And not much you can do about it, rolling your own SOAP client would be too expensive for our team.

REST with JSON meanwhile is easy and we do it all the time, don't need any client, just give us request/response spec and any docs.

borplk•2mo ago

If you have a protobuf API, does it work in the js environment of browsers? Last time I checked (many years ago) the browser story wasn't good.

baquero•2mo ago

If you want to work with ProtoBuf as with other APIs, have a look at https://github.com/qaware/protocurl

protobuf-ai•2mo ago

I started working with protobuf on a project, and just went down the path of MCP. If anybody would like to try it out, it's here: https://www.protobuf.ai/. Just a lightweight MCP server for schemas that plugs into a schema registry.

madFlasher•2mo ago

The author should check first the performance of protobuf serialization/deserialiazation in browsers

Due to very native nature of JSON in browsers and node backend it usually also the fastest data format

If for example you have C++ on backend and C++ on frontend - you'll definitely have some performance boost

But for browsers usage the goal is not so obvious

asa400•2mo ago

Whenever people bring up binary formats I have to bring up Erlang External Term Format (ETF)[0].

It's the native format Erlang nodes use to serialize data to communicate with each other, but it's just a simple Type-Length-Value binary format, so anything can implement it.

It's small enough that you can create a reasonably complete and fast implementation of it for your language in an afternoon. It's self-describing, so if you can read ETF you can read any ETF message, as there are no out of band schemas. I love it.

[0] - https://www.erlang.org/doc/apps/erts/erl_ext_dist.html

wallaconno•2mo ago

My criticism is that protobuf is a premature optimization for most projects. I want to love protobuf, but it usually slows down my development pace and it’s just not worth it for most web / small data projects.

Distributing the contract with lock-step deploys between services is a bit much. JSON parsing time and size are not usually key factors in my projects. Protobuf doesn’t pose strict requirements for data validation, so I have to do that anyway. Losing data readability in transit is a huge problem.

Protobuf seems like it would be amazing in situations where data is illegible and tightly coupled such as embedded CAN or radio.

stanfordkid•2mo ago

They didn't even mention MessagePack. Also there is a huge amount of developer over-head for using things like ProtoBuf. You can always validate your API responses with Zod or JSONSchema so that is a bit of a moot point!

firemelt•2mo ago

this article makes me keep json

DoNotNotify is now Open Source

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

LLMs as the new high level language

Software factories and the agentic moment

Moroccan sardine prices to stabilise via new measures: officials

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Speed up responses with fast mode

Modern and Antique Technologies Reveal a Dynamic Cosmos

Roger Ebert Reviews "The Shawshank Redemption" (1999)

LineageOS 23.2

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Vocal Guide – belt sing without killing yourself

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

uLauncher

Substack confirms data breach affects users’ email addresses and phone numbers

First Proof

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Start all of your commands with a comma (2009)

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Show HN: A luma dependent chroma compression algorithm (image compression)

The Scriptovision Super Micro Script video titler is almost a home computer

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

DoNotNotify is now Open Source

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

LLMs as the new high level language

Software factories and the agentic moment

Moroccan sardine prices to stabilise via new measures: officials

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Speed up responses with fast mode

Modern and Antique Technologies Reveal a Dynamic Cosmos

Roger Ebert Reviews "The Shawshank Redemption" (1999)

LineageOS 23.2

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Vocal Guide – belt sing without killing yourself

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

uLauncher

Substack confirms data breach affects users’ email addresses and phone numbers

First Proof

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Start all of your commands with a comma (2009)

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Show HN: A luma dependent chroma compression algorithm (image compression)

The Scriptovision Super Micro Script video titler is almost a home computer

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Why I stopped using JSON for my APIs

Comments