Protobuf has advantages, but is missing support for a tons of use cases where JSON thrives due to the strict schema requirement.
A much stronger argument could be made for CBOR as a replacement for JSON for most use cases. CBOR has the same schema flexibility as JSON but has a more concise encoding.
Technically, it sounds really good but the actual act of managing it is hell. That or I need a lot of practice to use them, at that point shouldn't I just use JSON and get on with my life.
Unless your servers and clients push at different time, thus are compiled with different versions of your specs, then many safety bets are off.
There are ways to be mostly safe (never reuse IDs, use unknown-field-friendly copying methods, etc.), but distributed systems are distributed systems, and protobuf isn't a silver bullet that can solve all problems on author's list.
On the upside, it seems like protobuf3 fixed a lot of stuff I used to hate about protobuf2. Issues like:
> if the field is not a message, it has two states:
> - ...
> - the field is set to the default (zero) value. It will not be serialized to the wire. In fact, you cannot determine whether the default (zero) value was set or parsed from the wire or not provided at all
are now gone if you stick to using protobuf3 + `message` keyword. That's really cool.
We might disagree on what "efficient" means. OP is focusing on computer efficiency, where as you'll see, I tend to optimize for human efficiency (and, let's be clear, JSON is efficient _enough_ for 99% of computer cases).
I think the "human readable" part is often an overlooked pro by hardcore protobuf fans. One of my fundamental philosophies of engineering historically has been "clarity over cleverness." Perhaps the corollary to this is "...and simplicity over complexity." And I think protobuf, generally speaking, falls in the cleverness part, and certainly into the complexity part (with regards to dependencies).
JSON, on the other hand, is ubiquitous, human readable (clear), and simple (little-to-no dependencies).
I've found in my career that there's tremendous value in not needing to execute code to see what a payload contains. I've seen a lot of engineers (including myself, once upon a time!) take shortcuts like using bitwise values and protobufs and things like that to make things faster or to be clever or whatever. And then I've seen those same engineers, or perhaps their successors, find great difficulty in navigating years-old protobufs, when a JSON payload is immediately clear and understandable to any human, technical or not, upon a glance.
I write MUDs for fun, and one of the things that older MUD codebases do is that they use bit flags to compress a lot of information into a tiny integer. To know what conditions a player has (hunger, thirst, cursed, etc), you do some bit manipulation and you wind up with something like 31 that represents the player being thirsty (1), hungry (2), cursed (4), with haste (8), and with shield (16). Which is great, if you're optimizing for integer compression, but it's really bad when you want a human to look at it. You have to do a bunch of math to sort of de-compress that integer into something meaningful for humans.
Similarly with protobuf, I find that it usually optimizes for the wrong thing. To be clear, one of my other fundamental philosophies about engineering is that performance is king and that you should try to make things fast, but there are certainly diminishing returns, especially in codebases where humans interact frequently with the data. Protobufs make things fast at a cost, and that cost is typically clarity and human readability. Versioning also creates more friction. I've seen teams spend an inordinate amount of effort trying to ensure that both the producer and consumer are using the same versions.
This is not to say that protobufs are useless. It's great for enforcing API contracts at the code level, and it provides those speed improvements OP mentions. There are certain high-throughput use-cases where this complexity and relative opaqueness is not only an acceptable trade off, but the right one to make. But I've found that it's not particularly common, and people reaching for protobufs are often optimizing for the wrong things. Again, clarity over cleverness and simplicity over complexity.
I know one of the arguments is "it's better for situations where you control both sides," but if you're in any kind of team with more than a couple of engineers, this stops being true. Even if your internal API is controlled by "us," that "us" can sometimes span 100+ engineers, and you might as well consider it a public API.
I'm not a protobuf hater, I just think that the vast majority of engineers would go through their careers without ever touching protobufs, never miss it, never need it, and never find themselves where eking out that extra performance is truly worth the hassle.
spagoop•19m ago
There is a really interesting discussion underneath of this as to the limitations of JSON along with potential alternatives, but I can't help but distrust this writing due to how much it sounds like an LLM.
port11•15m ago
Seems like the author just wanted to talk about Protobuf without bothering too much about the issues with JSON (though some are mentioned).
dkdcio•13m ago
I promise you cannot tell LLM-generated content from non-LLM generated content. what you think you’re detecting is poor quality, which is orthogonal to the tooling used
spagoop•1m ago
I am not dismissing this as being slop and actually have no beef with using LLMs to write but yes, as you call out, I think it's just poorly written or perhaps I'm not the specific audience for this.
Sorry if this is bad energy, I appreciate the write up regardless.