Currently Ursa is only available in our cloud service. But we do plan to open-source the core soon. Stay tuned.
The sad reality is that most people in open source want stuff for free and won't pay back and that sucks. So what are your thoughts on this? I am genuinely curious.
The second part as someone noted, in a comment of the parent comment that you are responding, that code is not the most important part here, how much do you agree with that statement? Since to me, If I can self host it using open source without using your cloud service but rather using amazon directly, I do think that might be cheaper than using the cloud service directly.
https://streamnative.io/blog/how-we-run-a-5-gb-s-kafka-workl...
And the test result was verified by Databricks: https://www.linkedin.com/posts/kramasamy_incredible-streamna...
The analysis in the blog is based on two key assumptions:
- Multi-zone deployment on AWS - Tiered storage is not enabled
If you’re looking to estimate costs with tiered storage, you can ignore the differences in storage costs mentioned in the post.
One important point not covered in the blog is that Ursa compacts data directly into a Lakehouse (This is also the major differentiator from WarpStream). This means you maintain only a single copy of data, shared between both streaming reads and table queries. This significantly reduces costs related to:
- Managing and maintaining connectors - Duplicated data across streaming and Lakehouse systems
> Redpanda recently introduced leader pinning, but this only benefits setups where producers are confined to a single AZ—not applicable to our multi-AZ benchmark.
Redpanda has leadership pinning (producers) and follower fetching (consumers). I suspect a significant amount of cost is improper shaping of traffic.
> Interzone traffic - replication: 10GB/s * $0.02/GB(in+out) * 3600 = $720
With follower fetching you shouldn't have cross-AZ charges on read, only on replication. In 15 seconds of looking at this piece I cut out $360/hour...no offense but this reeks of bad faith benchmarketing...
(https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3...)
1. It is leaderless by design. So there is no single lead broker you need to route the traffic. So you can eliminate majority of the inter-zone traffic.
2. It is lakehouse-native by design. It is not only just use object storage as the storage layer, but also use open table formats for storing data. So streaming data can be made available in open table formats (Iceberg or Delta) after ingestion. One example is the integration with S3 Tables: https://aws.amazon.com/blogs/storage/seamless-streaming-to-a... This would simplify the Kafka-to-Iceberg integration.
Pulsar has been widely adopted in many mission-critical business-facing systems like billing, payment, transaction processing, or used a unified platform that consolidate enterprises diverse streaming & messaging use cases. It has quite a lot of adoptions from F500 companies, hyperscalers, to startups.
Kafka is used for in data ingestion and streaming pipeline. Kafka protocol itself is great. However, the implementation has its own challenges.
Both Pulsar and Kafka are great open source projects and their protocols are designed for different use cases. We have seen many different companies use both technologies.
Ursa is the underlying streaming engine that we re-implemented to be leaderless and lakehouse-native so that we can better leverage the current cloud infrastructure and natively integrate with broader lakehouse ecosystem. It is the engine we used to support both in our product offerings.
If I may ask a philosophical question, when would you consider your product to "succeed", would it be when someone uses it for something important or some money related benchmark or what exactly
Wishing Ursa team peace and success. maybe don't ever enshittify your product as so many do. Will look at you from the sidebars since I don't have a purpose to even kafka but I would recommend having some discord or some way to actually form a community I suppose. I recommend matrix but there are folks who are discord too.
Anyways, have fun building new things!
So far as I understand both Kafka and Pulsar use (leader-based) consensus protocols to deliver some of their features and guarantees, so to match these you must either have developed a leaderless consensus protocol, or modify the guarantees you offer, or else have a leader-based consensus protocol you utilise still?
From one of your other answers, you mention you rely on Apache Bookkeeper, which appears to be leader-based?
I ask because I am aware of only one industry leaderless consensus protocol under development (and I am working on it), and it is always fun to hear about related work.
[1] https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf [2] https://github.com/apache/cassandra-accord [3] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15...
There's an OK high level cruise, WarpStream is dead, long live AutoMQ riffing off WarpStream doing similar against Kafka. While I loosely got the idea, I had to dig a lot deeper in docs for things to start to really click. https://github.com/AutoMQ/automq/wiki/WarpStream-is-dead,-lo...
There may be reasons it's a bad fit, but I'm expecting object-storage database SlateDB someday makes a very fine streaming system too!! https://github.com/slatedb/slatedb
I believe object storage is shaping the future architecture of cloud databases. The first big shift happened in the data warehouse space, where we saw the move from Teradata and Greenplum to Snowflake accelerate around 2016. Snowflake’s adoption of object storage as its primary storage layer not only reduced costs but also unlocked true elasticity.
Now, we’re seeing a similar trend in the streaming world. If I recall correctly, Ursa was the first to GA an object-storage–based streaming service, with Kafka(WarpStream) and AutoMQ following afterward.
I also believe the next generation of OLTP databases will use object storage as their main storage layer. This blog post shares some insights into this trend and the unique challenges of implementing object storage correctly for OLTP workloads, which are much more latency-sensitive.
https://www.eloqdata.com/blog/2025/07/16/data-substrate-bene...
netpaladinx•21h ago
x0x0•20h ago
I have not tried it, and full disclosure, I really like Kafka: it's one of the pieces of software that has been rock solid for me. I built a project where it quietly ingested low gb/s of data with year-long uptimes.
sijieg•18h ago
davidkj•18h ago
I break all of the costs down in the following e-book. https://streamnative.io/ebooks/reducing-kafka-costs-with-lea...
x0x0•17h ago
And while I like Kafka, nobody would claim it likes being scaled up and down dynamically, so probably built-in tolerance for that as well? We ran Kafka on-prem so that wasn't an issue for us, and given the nature of the service, didn't have a lot of usage variance.
This: https://www.youtube.com/watch?v=bb-_4r1N6eg was an interesting watch, btw.