frontpage.

Show HN: Matterbeam, a company-wide write-ahead log for your data

https://matterbeam.com

1•mikepk•55m ago

Hey HN. I'm Michael, founder of Matterbeam. Been chewing on the core ideas of it for over ten years, building toward it for three.

demo: https://www.youtube.com/watch?v=YuhujARUmhA whitepaper: https://matterbeam.com/whitepaper

Short version: companies build their data infra on point-to-point pipelines and one place to put all the data. Source A goes to warehouse B. Team C wants the same data shaped differently? Build another. Eventually a mess of brittle ETL nobody wants to touch.

Matterbeam puts existing ideas together in a different way. Source data collected as immutable, time-ordered facts into a log. Destinations replay and transform those facts, from any point in time, into the target they need. One source, many uses.

My last startup was acquired by Pluralsight in 2014. I ended up leading product architecture and data there for about five years. Working with really brilliant, product and data people that I would have said were doing everything _right_. Yet no one in the company was happy with data. It made me question if something more fundamental wasn't broken.

A key inspiration came from Martin Kleppmann's 2015 talk "Turning the Database Inside Out." (https://www.youtube.com/watch?v=fU9hR3kiOK0) Most databases internally do something interesting: a write-ahead log (durable, append-only, time-ordered) as a source of truth, and derived structures are created (B-trees, indexes, materialized views) optimized to serve different read patterns.

What if you took that pattern and blew it up to org scale? Your uses become materializations. Warehouse, RAG vector db, graph db, any new use created when needed with a late transform and a new emitter.

A few comparisons: We aren't Kafka. Kafka is lower-level. My first attempt at this was at Pluralsight using Kafka as the log. It was crazy expensive and complicated to operate. For Matterbeam we built cloud-native: object storage gives durability, ephemeral compute avoids coordination, we don't need 100ms latency for most jobs. Allowed us to avoid a lot of Kafka's complexity.

We aren't Fivetran. Fivetran is a managed pipeline. We're a utility. One customer replaced Fivetran when they brought us in. Saved them money, but that wasn't the goal, suddenly projects they estimated at five months started taking two days. A two-year migration compressed into months. Their PMs started asking to use Matterbeam for everything.

We aren't a warehouse or lake. Snowflake and Databricks are great at what they're great at. The push to centralize all data in these systems was a mistake. We aim to be the layer underneath. Basically fulfill the original promise of the data lake: collect without a use case, materialize when you figure out what you need, in the shape and system you need.

What's broken: This doesn't fit cleanly into "what does this replace" buckets. Most people agree data is broken but then lament "data is hard" or some form of "my team isn't doing it right." Nobody's actively looking to solve the deeper problem. Hard to find new customers even with glowing testimonials.

Connector coverage. Fivetran has hundreds. We have way fewer in production. We're working on it, we're using AI, you can write your own pretty quickly. Still, if your stack needs fifty SaaS integrations on day one, we struggle.

We're early. Handful of paid customers. Not large-enterprise-ready no SOC2, HIPAA etc yet.

Also, conscious decision not to be open source. Long list of reasons, separate post.

I'd love feedback on: How would you position or market this? It feels like category creation, which I know is hard.

Does the mental model land, or is there a piece where you go WAT?

If you've built CDC-into-warehouse, Kafka-plus-schema-registry, or rolled a data backbone, what's the part you'd have wanted an easier solution for?

Blog, testimonials, marketing video on the site. I'll be watching the thread. Be brutal, I can take it (I think).

Operation: Epic Furious

Ask HN: Any materials on building distributed rate limiter?

"Cannot be explained" – New ultra stainless steel stuns researchers

South Korea's housing crisis explained (2025)

Stochastic Parrots: Frequently Unasked Questions

Bioplastics Toxicity Upon Ingestion: Biotransformation and GI Effects

Why senior developers fail to communicate their expertise

Apple Sales Coach Will Use AI-Generated Video Presenters

Show HN: UIGen – Production UI from any API spec with full override control

Bambu Lab 3D printers: Never again

You cannot sell AI written software

Heartfelt

'I have an A because I use Chat'

" are ready to take your money"

Humanoid robots to become baggage handlers in Japan airport experiment

Incident with CodeQL

Treat Me Like an Investor

Fixing headline-only RSS feeds with RSS-fulltext

ChatGPT Performs Better on Julia Than Python for LLM Code Generation. Why?

Ask HN: How do you keep up with blogs from people you follow?

Starting 1:1s on the Right Foot

Thomas Massie Has Always Been a Pain in the Ass

UK Biobank breach prompts the field of genomics to rethink open science

Show HN: Grunden – Frontier AI inference hosted in Sweden, OpenAI-compatible

Lemmy Needs Diversity

Interactive LLM ArXiv paper knowledge graph

Who Builds Your Judgment?

Could an El Niño this year match an 1877 event that killed millions?

Bare-metal STM32: vector table, linker script, and startup code from scratch

MockFIX – desktop FIX simulator so QA can test fills without a dev