"You Don't Need Kafka, Just Use Postgres" Considered Harmful

https://www.morling.dev/blog/you-dont-need-kafka-just-use-postgres-considered-harmful/

39•ingve•3mo ago

Comments

ekjhgkejhgk•3mo ago

"Considered harmful" considered harmful.

wewewedxfgdf•3mo ago

It lends an academic pomposity that demands respect.

noxs•3mo ago

At certain point this will become ""Considered harmful" considered harmful" considered harmful.

philipwhiuk•3mo ago

> Named a Java Champion, I enjoy speaking at conferences, for instance at QCon, JavaOne, Red Hat Summit, JavaZone, JavaLand and Kafka Summit.

candiddevmike•3mo ago

That tracks, I feel like Kafka is over represented in the Java codebases I've seen TBH.

cultofmetatron•3mo ago

Id argue that if you are in the position where you legitimately NEED kafka, you hopefully also know what you're doing. You're outside the audience for the "just use postres" crowd. That said, if you're in a startup with a few thousand users, just use postgres is still solid advice.

threatofrain•3mo ago

If you need some kind of event streaming system there are other choices which have less dev ops burden, such as just using any particular cloud's proprietary or managed offerings. I've seen two companies on NATS so far, I'm trying it out myself for size as well.

There are plenty of choices between PSQL & Kafka. It's not like you take one step north and you're in the "oh no you better know what you're doing" territory.

strken•3mo ago

The problem with taking one step north and leaving the border of Postgres is what you lose, not the direct ops burden.

Postgres land is a comfy place filled with transactions across all your data at once, one backup solution that you (hopefully) have had running for months or years and has been thoroughly tested, and ACID compliance. You have a single host, probably, which means that you are neither Available nor Partion-tolerant, but at least you are Consistent.

The moment you expand beyond a single database host you now have a distributed system, and woe unto you if you don't understand what that means.

cultofmetatron•3mo ago

well said. I've been working on my startup. We are profitable in part because I spend my time focused on building new features and improving our reliability instead of chasing after all the idiosyncratic bugs that come with distributed systems.

threatofrain•3mo ago

If you wanted such simplicity then nothing is stopping you from running single-node NATS or even just Redis. You always had all the simplicity and consistency you wanted.

strken•3mo ago

The problem is that now you use Postgres for 95% of your system, and also Redis or NATS, which means you lose the ability to atomically commit changes to your database and send a message in one transaction.

You can work around this, but to the best of my knowledge you can't have consistency (between your existing Postgres database and your separate queue or event log) and simplicity.

hbogert•3mo ago

Indeed, only eventual consistency. The article approaches this subject and mentions the use of the outbox pattern and/or using tools like Debezium.

threatofrain•3mo ago

One of the secret traps of Kafka is that it tried to be a collection of nice primitives without that much built on top. So the Fortune 500 story of Kafka is actually one of everyone building on top of Kafka internally with a bigger team. If you weren't prepared to do a little building on top of Kafka you might find the ecosystem a little empty.

But with PSQL you have even less built on top for a use case it wasn't meant for. Now you are in the "you better know what you're doing" territory.

strken•3mo ago

My experience has been that a SQL database is an easier foundation to build on top of than a distributed event streaming log until you run into performance issues that can't be solved by understanding your database better and/or scaling vertically, and that companies which run into performance issues can afford to migrate to other technologies and/or pay for a good ops team, while companies which started using message queues for systems handling 5 requests per second often struggle to make their system sound and get features out the door.

threatofrain•3mo ago

You're still talking about performance. I'm talking about how much you have to build.

When people use Redis/whatever, are you surprised that speed is just a side benefit and ergonomics is what they're looking for? Those are the same ergonomics you'd have to rebuild, as opposed to not build at all because it's already there.

If you want to rebuild Redis in PSQL you are very much in the "you better know what you're doing" territory, and you're also very much in the "are you sure this is your core business value" territory as you rediscover how much nice stuff was packed into the Redis ecosystem. And how am I supposed to uncover the surface or ergonomics of your Redis replacement?

strken•3mo ago

Can you explain how using Postgres for a job queue, ideally using something like pgmq[0], is in "you better know what you're doing" territory, while using Redis isn't, and without talking about performance?

Redis is fine as-is and I'm not suggesting anyone should rebuild the whole thing in Postgres, only that if they want a specific thing that Redis can do and they already use Postgres, then they might be able to do the same thing in Postgres with less effort and thus avoid losing atomicity and consistency and any other properties of Postgres that they find desirable.

[0] Which was mentioned by the original "You Don't Need Kafka" article

threatofrain•3mo ago

If you want something simple then you could use Redis Streams, and if you want a more mature solution you could use BullMQ, which is pretty solid and well liked. Knowing what you're doing in BullMQ is reading some docs.

Do I really need to discuss why BullMQ is non-trivial software, or Redis Streams with consumer groups?

And I asked but you didn't answer. How am I supposed to discover the ergonomics of your custom Redis or BullMQ replacement? Do I read your hand-written docs?

strken•3mo ago

You could go read the pgmq docs, look at the fifty or so lines of code that make up the thin wrapper around it, and understand it just as well as you understand BullMQ. Or you could use one of the extensive set of pgmq libraries if you aren't comfortable calling Postgres functions (for some reason).

I'm not suggesting that BullMQ and Redis are trivial and that you should rewrite them yourself, I'm telling you that Postgres has multiple existing implementations of a message queue already.

And frankly, the fact that something is non-trivial does not automatically mean you can't or shouldn't rewrite it, only that you need a good reason. Even if there was no existing Postgres message queue (there are many), atomicity is a good enough reason on its own. Imagine if you could avoid idempotency issues when calling third party APIs until you had more than ten engineering teams! Well, you can! Just don't add more than one method of storing online data to your architecture.

112233•3mo ago

Current "one host" options are in ridiculous territory - 256 core CPUs with terabytes of RAM an storage in 100 GB/s range. A decade ago that much needed a few racks.

hactually•3mo ago

isn't Kafka old news at this point?

LinkedIn have moved onto Northguard... but no GitHub yet

AceJohnny2•3mo ago

so you mean that Kafka is boring, functional and stable?

https://boringtechnology.club/

rubenvanwyk•3mo ago

Also wish there was more information available about Northguard.

blindriver•3mo ago

""You don't need Kafka" considered harmful by employees of Kafka."

redhale•3mo ago

Yes. Setting aside the specific merits of the argument, this blog post should really have a disclaimer somewhere that the author works for Confluent, a major managed Kafka service provider. Perhaps that makes him an expert on this topic, but it should still be disclosed!

> Managed services make running Kafka a very uneventful experience (pun intended) and should be the first choice

Confluent, you say?

gunnarmorling•3mo ago

> this blog post should really have a disclaimer somewhere that the author works for Confluent

Good idea; this is stated in the bio on my web site, but I've just added the same info again to the end of the post.

redhale•3mo ago

Fair point.

It might be worth adding a more direct call-out to posts like this one. Many may not go as far as reading the Bio page. That may be on them technically speaking, but still.

In any case, thank you for writing and sharing your considered opinion!

gunnarmorling•3mo ago

Thank you, appreciate it!

blindriver•3mo ago

Confluent isn't just "a major managed Kafka service provider." The founders of Confluent created Kafka and they and their employees/former employees dominate the PMC committee for Kafka, meaning they control the direction of Kafka. Confluent is Kafka.

The author is a an employee for Confluent/Kafka so because his paycheck and equity grant depends on it and CFLT stock price, obviously whatever he writes is going to be heavily slanted in favor of Kafka. This isn't something that is a footnote at the bottom, it should be right up at the front.

pheggs•3mo ago

employee of Confluent.

I think that shouldn't matter but I still have a lot to disagree with the article.

feels like overengineering has become the standard for some people, and I quite dislike it personally.

atoav•3mo ago

Could we please just agree not to use this "considered harmful" phrase to describe advice where the answer is "depends"? This kinda makes the author seem like he has lost the ability to consider what software is out there. That he is working for Kafka doesn't help.

Example: Someone writes a software that could use something simple like SQLite, and they switched to Postgres for performance reasons. Now unless what Kafka beings is the core reason they switched to Postgres not pulling in another dependency and adding a nother piece to the puzzle, can be a total legitimate engineering decision. And that renders the "considered harmful" utterly ridiculous.

Use a system like Kafka if you need what it brings (a distributed event streaming platform). If that isn't what you need or a very simple postgres solition suffices, go for that. Maybe you need event streaming but distributing it is overkill. Maybe you just need some sort of queue. Who knows? Not the author of this post.

gunnarmorling•3mo ago

> the answer is "depends"?

Indeed that is the point I am trying to make in the article. Postgres oftentimes absolutely is the right tool to use, and oftentimes it's not. The thing I'm advocating to be wary of is "if all you have is a hammer...". This is to what "considered harmful" refers.

atoav•3mo ago

Ok then I misunderstood, sorry for that. I take everything back and proclaim the opposite of what I said.

scottcodie•3mo ago

One thing the other blog post missed and this post misses too is that you don't need Kafka to use Debezium with Postgres. This gives you a pretty seamless onramp to event streaming tools as you scale.

gunnarmorling•3mo ago

Are you referring to using Debezium embedded as a library? If so, yes, it absolutely has its place; for instance, it's used by Flink CDC. There's pros and cons to either way of running Debezium. Seeing embedded Debezium a lot for in-app use cases, for instance cache invalidation. Going through Kafka allows for reply and setting up multiple independent consumers for the same change event stream.

brettgriffin•3mo ago

> Looking to make it to the front page of HackerNews?

Nailed it. I read the original post earlier this week and was very impressed with its technical detail. But the point of the the post was incongruent with the post's title. But the post got way more attention because of that title.

But if you think about the effort it took to write that post, the title was a really good bet on ROI.

tacticus•3mo ago

> > Looking to make it to the front page of HackerNews?

> Nailed it.

Worked for the confluent marketing fluff as well.

wewewedxfgdf•3mo ago

There's many many ways to make a message queue these days - all the main SQL databases can act as a queue - everything from Postgres to MS SQL server to MySQL to Oracle to sqlite to the custom applications like Kafka and for the most part they are all more or less valid - it's not all about Postgres.

Take the approach that appeals to you and feel happy about it without big open source telling you "you're holding it wrong!"

jauntywundrkind•3mo ago

I'm more interested in the "You don't need Kafka the product, when we have this Kafka protocol compatible alternative". Kafka is more than a product: it's become a standard, with many many implementations. I'd love to see wider coverage of the alternatives. RedPanda, StreamNative Ursa, OSO, Aiven, many others.

oompydoompy74•3mo ago

Insufferable tone aside, I really dislike the “right tool for the job” argument. The correct tool is the one that is handy and gets the job done. Has the author never encountered a Swiss Army Knife?

ryandvm•3mo ago

In my experience, "you don't need *MQ, just use Kafka" is a way worse problem.

Trying to explain the distinction between an event streaming platform and a distributed message queue to your enterprise architect is an exercise that no one should have to go through.

kermatt•3mo ago

> Trying to explain the distinction between an event streaming platform and a distributed message queue to your enterprise architect ...

If you have to explain that distinction, are you really speaking to an "enterprise architect" ?

gunnarmorling•3mo ago

To those who flagged this submission, can you share your reasoning? What guidelines do you see being violated by it? Thanks!

Disclaimer: I'm the author of this post.

vepo•3mo ago

I hear that que should no use Kafka and I look to the previous version of our aplication that does not use Kafka. It's better with Kafka.

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Software factories and the agentic moment

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Making geo joins faster with H3 indexes

Ga68, a GNU Algol 68 Compiler

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

An Update on Heroku

Show HN: If you lose your memory, how to regain access to your computer?

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Software factories and the agentic moment

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Making geo joins faster with H3 indexes

Ga68, a GNU Algol 68 Compiler

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

An Update on Heroku

Show HN: If you lose your memory, how to regain access to your computer?

"You Don't Need Kafka, Just Use Postgres" Considered Harmful

Comments