Like the article outlines, partitions are not that useful for most people. Instead of removing them, how about having them behind a feature flag, i.e. not on by default. That would ease 99% of users problems.
The next point in the article which to me resonates is the lack of proper schema support. That's just bad UX again, not inherent complexity of the problem space.
On testing side, why do I need to spin up a Kafka testcontainer, why is there no in-memory kafka server that I can use for simple testing purposes.
Confluent has rather good marketing and when you need messaging but can also gain a persistent, super scalable data store and more, why not use that instead? The obvious answer is: Because there is no one-size-fits-all-solution with no drawbacks.
If you take action a, then action b, your system will throw 500s fairly regularly between those two steps, leaving your user in an inconsistent state. (a = pay money, b = receive item). Re-ordering the steps will just make it break differently.
If you stick both actions into a single event ({userid} paid {money} for {item}) then "two things" has just become "one thing" in your system. The user either paid money for item, or didn't. Your warehouse team can read this list of events to figure out which items to ship, and your payments team can read this list of events to figure out users' balances and owed taxes.
(You could do the one-thing-instead-of-two-things using a DB instead of Kafka, but then you have to invent some kind of pub-sub so that callers know when to check for new events.)
Also it's silly waiting around to see exceptions build up in your dev logs, or for angry customers to reach out via support tickets. When your implementation depends on publishing literal events of what happened, you can spin up side-cars which verify properties of your system in (soft) real-time. One side-car could just read all the ({userid} paid {money} for {item}) events and ({item} has been shipped) events. It's a few lines of code to match those together and all of a sudden you have a monitor of "Whose items haven't been shipped?". Then you can debug-in-bulk (before the customers get angry and reach out) rather than scour the developer logs for individual userIds to try to piece together what happened.
Also, read this thread https://news.ycombinator.com/item?id=43776967 from a day ago, and compare this approach to what's going on in there, with audit trails, soft-deletes and updated_at fields.
Take a look at Debezium's KafkaCluster, which is exactly that: https://github.com/debezium/debezium/blob/main/debezium-core....
It's used within Debezium's test suite. Check out the test for this class itself to see how it's being used: https://github.com/debezium/debezium/blob/main/debezium-core...
Granted it doesn't happen often, if you plan correctly, but the possibility of going wrong in the partitioning and replication makes updates and upgrades nightmare fuel.
It took 4 years to properly integrate Kafka into our pipelines. Everything, like everything is complicated with it: cluster management, numerous semi-tested configurations, etc.
My final conclusion with it is that the project just doesn't really know what it wants to be. Instead it tries to provide everything for everybody, and ends up being an unbelievably complicated mess.
You know, there are systems that know what they want to be (Amazon S3, Postres, etc), and then there are systems that try to eat the world (Kafka, k8s, systemd).
it's real goal is to make Linux administration as useless as windows so RH can sell certifications.
tell me the output of systemctl is not as awful as opening the windows service panel.
But both are kind of hard to understand end-to-end, especially for an occasional user.
We use it for streaming tick data, system events, order events, etc, into kdb. We write to kafka and forget. The messages are persisted, and we don't have to worry if kdb has an issue. Out of band consumers read from the topics and persist to kdb.
In several years of doing this we haven't really had any major issues. It does the job we want. Of course, we use the aws managed service, so that simplifies quite a few things.
I read all the hate comments and wonder what we're missing.
Really? I got scared by Kafka by just reading through the documentation.
The backing db in this wishlist would be something in the vein of Aurora to achieve the storage compute split.
I feel like Kafka is a victim of it's own success, it's excellent for what it was designed, but since the design is simple and elegant, people have been using it for all sorts of things for which it was not designed. And well, of course it's not perfect for these use cases.
Kafka is misused for some weird stuff. I've seen it used as a user database, which makes absolutely no sense. I've also seen it used a "key/value" store, which I can't imagine being efficient as you'd have to scan the entire log.
Part of it seems to stem from "We need somewhere to store X. We already have Kafka, and requesting a database or key/value store is just a bit to much work, so let's stuff it into Kafka".
I had a client ask for a Kafka cluster, when queried about what they'd need it for we got "We don't know yet". Well that's going to make it a bit hard to dimension and tune it correctly. Everyone else used Kafka, so they wanted to use it too.
Kafka is simple and elegant?
Still, if a hypothetical new Kafka would incorporate some of Nats' features, that would be a good thing.
Just don't use Kafka.
Write to the downstream datastore directly. Then you know your data is committed and you have a database to query.
In the plural, accomplishing that task in a performant way at enterprise scale seems to involve turning every function call into an asynchronous, queued service of some sort.
Which then begets additional deployment and monitoring services.
A queued problem requires a cute solution, bringing acute pain.
There may be many things that go wrong and how you handle this depends on your data guarantees and consistency requirements.
If you're not queuing what are you doing when a write fails, throwing away the data?
The reason message queue systems exist is scale. Good luck sending a notification at 9am to your 3 million users and keeping your database alive in the sudden influx of activity. You need to queue that load.
Like distinctly recall running it at home as a global unioning filesystem where content had to fully fit on the specific device it was targeted at (rather then being striped or whatever).
>I cannot make you understand. I cannot make anyone understand what is happening inside me. I cannot even explain it to myself. -Franz Kafka, The Metamorphosis
I'm currently building a full workload scheduler/orchestrator. I'm sick of Kubernetes. The world needs better -> https://x.com/GeoffreyHuntley/status/1915677858867105862
But don’t public cloud providers already all have cloud-native event sourcing? If that’s what you need, just use that instead of Kafka.
This is exactly what I interpret from these kind of articles: engineering just for the cause of engineering. I am not saying we should not investigate on how to improve our engineered artifacts, or that we should not improve them. But I see a generalized lack of reflection on why we should do it, and I think it is related to a detachment from the domains we create software for. The article suggests uses of the technology that come from so different ways of using it, that it looses coherence as a technical item.
> "Key-level streams (... of events)"
When you are leaning on the storage backend for physical partitioning (as per the cloud example, where they would literally partition based on keys), doesnt this effectively just boil down to renaming partitions to keys, and keys to events?
I skipped learning Kafka, and jumped right into Pulsar. It works great for our use case. No complaints. But I wonder why so few use it?
While it would be useful to just blame Kafka for being bad technology it seems many other people get it wrong, too.
Anyone here have some real world experience with it?
hardwaresofton•6h ago
Feels like there is another squeeze in that idea if someone “just” took all their docs and replicated the feature set. But maybe that’s what S2 is already aiming at.
Wonder how long warpstream docs, marketing materials and useful blogs will stay up.