Show HN: Walrus – a Kafka alternative written in Rust

160•janicerk•2mo ago

Comments

roncohen•2mo ago

As someone who myself worked on a hobby-level Rust based Kafka alternative that used Raft for metadata coordination for ~8 months: nice work!

Wasn't immediately clear to me if the data-plane level replication also happens through Raft or something home-rolled? Getting consistency and reliability right with something home-rolled is challenging.

Notes:

- Would love to see it in an S3-backed mode, either entirely diskless like WarpStream or as tiered storage.

- Love the simplified API. If possible, adding a Kafka compatible API interface is probably worth it to connect to the broader ecosystem.

Best of luck!

seanhunter•2mo ago

It says on the github page

   " It provides fault-tolerant streaming with automatic leadership rotation, segment-based partitioning, and Raft consensus for metadata coordination."

So I guess that's a "yes" to raft?

zbentley•2mo ago

GP asked about data plane consensus, not metadata/control plane.

EdwardDiego•2mo ago

They asked about data plane replication - e.g., leader -> followers. Unless I misunderstood them.

nubskr•2mo ago

Hi, the creator here, I think its a good idea to have S3 backed storage mode, its kinda tricky to do it for the 'active' block which we are currently writing to, but totally doable for historical data.

Also about the kafka API, I tried to implement that earlier, I had a sort of `translation` layer for that earlier, but it gets pretty complicated to maintain that because kafka is offset based, while walrus is message based.

EdwardDiego•2mo ago

TBH I don't think anyone can utilise S3 for the active segment, I didn't dig into Warpstream too much, but I vaguely recall they only offloaded to S3 once the segment was rolled.

zellyn•2mo ago

The Developer Voices interview where Kris Jenkins talks to Ryan Worl is one of the best, and goes into a surprising amount of detail: https://www.youtube.com/watch?v=xgzmxe6cj6A

tl;dr they write to s3 once every 250ms to save costs. IIRC, they contend that when you keep things organized by writing to different files for each topic, it's the Linux disk cache being clever that turns the tangle of disk block arrangement into a clean view per file. They wrote their own version of that, so they can cheaply checkpoint heavily interleaved chunks of data while their in-memory cache provides a clean per-topic view. I think maybe they clean up later async, but my memory fails me.

I don't know how BufStream works.

The thing that really stuck with me from that interview is the 10x cost reduction you can get if you're willing and able to tolerate higher latency and increased complexity and use S3. Apparently they implemented that inside Datadog ("Labrador" I think?), and then did it again with WarpStream.

I highly recommend the whole episode (and the whole podcast, really).

nubskr•2mo ago

s3 charges per 1,000 Update requests, not sure how it's sustainable to do it every 250ms tbh, especially in multi tenant mode where you can have thousands of 'active' blocks being written to

zellyn•2mo ago

Guess it beats doing it every 250ms for every topic…

EdwardDiego•2mo ago

Thanks, I have been highly remiss in catching up on that podcast :)

k_bx•2mo ago

There's also Iggy https://github.com/apache/iggy

Never tried it, but looks promising

tormeh•2mo ago

Looks like it has a solid amount of contributors. Exciting! Some other attempts like Fluvio seem to have lost momentum.

spetz•2mo ago

Thank you for the mention! BTW, we're currently working on VSR (Viewstamped Replication) to provide the proper clustering :)

carverauto•2mo ago

iggy is amazing

oulipo2•2mo ago

Nice! How does it compare to Redpanda, NATS, etc?

Zambyte•2mo ago

Funnily enough I have been toying around with implementing a NATS compatible server using the new Zig Io interfaces (including the 0.16.0 preview) :)

teleforce•2mo ago

For Kafka alternative written in C++ there's Redpanda [1],[2].

Redpanda claim of better performance but benchmarks showed no clear winner [3].

It will be interesting to test them together on the performance benchmarks.

I've got the feeling it's not due to programming language implementation of Scala/Java (Kafka), C++ (Redpanda) and Rust (Walrus).

It's the very architecture of Kafka itself due to the notorious head of line problem (check the top most comments [4].

[1] Redpanda – A Kafka-compatible streaming platform for mission-critical workloads (120 comments):

https://news.ycombinator.com/item?id=25075739

[2] Redpanda website:

https://www.redpanda.com/

[3] Kafka vs. Redpanda performance – do the claims add up? (141 comments):

https://news.ycombinator.com/item?id=35949771

[4] What If We Could Rebuild Kafka from Scratch? (220 comments):

https://news.ycombinator.com/item?id=43790420

nubskr•2mo ago

In the current benchmarks, I only have Kafka and rocksdb wal, will surely try to add redpanda there as well, curious how walrus would hold up against seastar based systems.

chaotic-good•2mo ago

I don't see any mentions of p99 latency in the benchmark results. Pushing gigabytes per second is not that difficult on modern hardware. Doing so with reasonable latency is what's challenging. Also, instead of using custom benchmarks it's better to just use the OMB (open-messaging benchmark).

EdwardDiego•2mo ago

> It's the very architecture of Kafka itself due to the notorious head of line problem

Except a consumer can discard an unprocessable record? I'm not certain I understand how HOL applies to Kafka, but keen to learn more :)

thinkharderdev•2mo ago

> Except a consumer can discard an unprocessable record?

It's not the unproccessable records that are the problem it is the records that are very slow to process (for whatever reason).

drob518•2mo ago

Or it’s I/O-bound.

jeberle•2mo ago

That was my first reaction. It's not like Java is terribly slow, so rewriting it in a slightly faster language seems like an empty exercise.

pixelpoet•2mo ago

Yeah but how is a C++ based project supposed to hit the HN frontpage?

selkin•2mo ago

By having a use-after-free bug that caused a noticeable and catastrophic incident.

pixelpoet•2mo ago

Like that Cloudflare outage from the other day? In any case, I'm sure there's good reason to believe that, as Rust gains popularity, there totally won't be proportionally more high profile issues in stuff made with it.

selkin•2mo ago

You could write a lot of bugs in Rust. Or in any memory-safe language. Just not some bugs. It's just that starting a non-toy project using a non-safe language today is really in the "you could, but didn't stop think if you should" category.

pixelpoet•2mo ago

Fair enough, and I do appreciate Rust's contributions to the spectrum of programming language considerations. Personally I'd like to become a Zig guy, though.

There's also Fil-C BTW, and in normal C++ there are GCC or Clang (I forget which) extensions for detecting threading issues, even good old Valgrind is under-appreciated and under-used. In general one wants to adopt best practices and be proactive, rather than relying on the language to solve all problems (of course).

lionkor•2mo ago

Fun anecdote; a couple years ago I started writing a Kafka alternative in C++ with a friend. I got pretty far, but abandoned the project.

We called it `tuberculosis`, or `tube` for short; of course, that is what killed Kafka.

sgt•2mo ago

Imagine talking to your clients about tech stacks and "we're running tuberculosis" comes up... while people are dying from it.

lionkor•2mo ago

You just say "well, the alternative was Kafka" and they'd surely get it. Or not. Either way we imagined it to be hilarious.

ramses0•2mo ago

t10s, pronounced "tíos" or a stuttering "t- tents" on your geo. :-D

kylecazar•2mo ago

"Consumption" works too :)

Assuming topics are consumed in your version, a la Kafka.

dbacar•2mo ago

Is it me or is this analogy a bit too dark/insensitive?

gethly•2mo ago

I never understood the popularity of Kafka. It's just a queue with persistent storage(ie. not in-memory queu with ram-size limited capacity) after all.

sumtechguy•2mo ago

Most of the other ones at the time it was pop and the data was gone. You had to jump thru some hoops to make it work as persistent. Not 'hard' but just more annoying. Kafka has that out of the box. Where kafka starts to come apart is how to set it up. Its configuration is a bit tedious to setup.

mrkeen•2mo ago

A queue with persistent storage is like a ledger whose entries don't vanish when you read them, or a git branch whose commits stick around for longer than 24-72 hours.

It's popular because it didn't have any competition while it built up its ecosystem. And even though there's competitors now, I haven't had time to check them out, and they still brand themselves as "Kafka-alternatives".

lucyjojo•2mo ago

a huge amount of distributed programs work exceedingly well with the processing on a log structure. and kafka did it better than other at the times.

Barathkanna•2mo ago

Walrus isn’t trying to replace Kafka, but it does beat Kafka in a few narrow areas. It’s a lightweight Rust-based distributed log with a fast WAL engine and modern I/O (io_uring), so the operational overhead is much lower than running a full Kafka stack. If you just want a simple, fast log without JVM tuning, controllers, or the entire Kafka ecosystem, Walrus is a lot easier to run. Kafka still wins on ecosystem, connectors, and massive scale, but Walrus is appealing for teams that want the core idea without the complexity. Really impressed by the direction here, great work!!.

alexmorley•2mo ago

IMO for those requirements Redpanda has been the go-to for the last 5 years or so but I agree this is still a nice take and potentially even lighter.

ertucetin•2mo ago

We need Rust alternative not written in Rust

enether•2mo ago

For Rust-based Kafka alternatives, I like Tansu[1]. It at least provides Kafka API parity, and critically also gives users a pluggable backend (embedded SQLite, S3 for low cost diskless type workloads and Postgres because just use Postgres)

It’s nice to try and out innovate Kafka, but I fear the network effect can’t be beaten unless the alternative is 10x better.

Something like Warpstream’s architecture[2] had a shot at dethroning Kafka, but critically even they adopted the Kafka API. Sure enough, Apache Kafka introduced a competing feature[3] within two years of warpstreams launch too.

[1] - https://github.com/tansu-io/tansu [2] - https://www.warpstream.com/ [3] - https://topicpartition.io/blog/kip-1150-diskless-topics-in-a...

WD-42•2mo ago

Are those dips in the write throughput comparison for Kafka GC pauses?

Natfan•2mo ago

i would love to see a simple HTTP API built on top of this

like how postgrest works with postgres

throw10920•2mo ago

What value does this provide other than being written in Rust? I'm pretty sure all of the five "key features" listed are either present in alternatives or already in Kafka itself.

optician_owl•2mo ago

That's funny.

- Let's create a new Kafka in rust. Yeah!

- Let's create a Kafka client that's ready to use. Blah.

yencabulator•2mo ago

I see no mention of recovering redundancy of stored data after a node failure.

aaronrobert•1mo ago

It seems that the data is written to only a single node. But if that's the case, how is fault-tolerant ensured? In terms of data synchronization, how is this different from setting the replication factor to one in Kafka or Pulsar?

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Browser based state machine simulator and visualizer

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: ARM64 Android Dev Kit

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: Horizons – OSS agent execution engine

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Slop News – HN front page now, but it's all slop

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Browser based state machine simulator and visualizer

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: ARM64 Android Dev Kit

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: Horizons – OSS agent execution engine

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Walrus – a Kafka alternative written in Rust

Comments