demo: https://www.youtube.com/watch?v=YuhujARUmhA whitepaper: https://matterbeam.com/whitepaper
Short version: companies build their data infra on point-to-point pipelines and one place to put all the data. Source A goes to warehouse B. Team C wants the same data shaped differently? Build another. Eventually a mess of brittle ETL nobody wants to touch.
Matterbeam puts existing ideas together in a different way. Source data collected as immutable, time-ordered facts into a log. Destinations replay and transform those facts, from any point in time, into the target they need. One source, many uses.
My last startup was acquired by Pluralsight in 2014. I ended up leading product architecture and data there for about five years. Working with really brilliant, product and data people that I would have said were doing everything _right_. Yet no one in the company was happy with data. It made me question if something more fundamental wasn't broken.
A key inspiration came from Martin Kleppmann's 2015 talk "Turning the Database Inside Out." (https://www.youtube.com/watch?v=fU9hR3kiOK0) Most databases internally do something interesting: a write-ahead log (durable, append-only, time-ordered) as a source of truth, and derived structures are created (B-trees, indexes, materialized views) optimized to serve different read patterns.
What if you took that pattern and blew it up to org scale? Your uses become materializations. Warehouse, RAG vector db, graph db, any new use created when needed with a late transform and a new emitter.
A few comparisons: We aren't Kafka. Kafka is lower-level. My first attempt at this was at Pluralsight using Kafka as the log. It was crazy expensive and complicated to operate. For Matterbeam we built cloud-native: object storage gives durability, ephemeral compute avoids coordination, we don't need 100ms latency for most jobs. Allowed us to avoid a lot of Kafka's complexity.
We aren't Fivetran. Fivetran is a managed pipeline. We're a utility. One customer replaced Fivetran when they brought us in. Saved them money, but that wasn't the goal, suddenly projects they estimated at five months started taking two days. A two-year migration compressed into months. Their PMs started asking to use Matterbeam for everything.
We aren't a warehouse or lake. Snowflake and Databricks are great at what they're great at. The push to centralize all data in these systems was a mistake. We aim to be the layer underneath. Basically fulfill the original promise of the data lake: collect without a use case, materialize when you figure out what you need, in the shape and system you need.
What's broken: This doesn't fit cleanly into "what does this replace" buckets. Most people agree data is broken but then lament "data is hard" or some form of "my team isn't doing it right." Nobody's actively looking to solve the deeper problem. Hard to find new customers even with glowing testimonials.
Connector coverage. Fivetran has hundreds. We have way fewer in production. We're working on it, we're using AI, you can write your own pretty quickly. Still, if your stack needs fifty SaaS integrations on day one, we struggle.
We're early. Handful of paid customers. Not large-enterprise-ready no SOC2, HIPAA etc yet.
Also, conscious decision not to be open source. Long list of reasons, separate post.
I'd love feedback on: How would you position or market this? It feels like category creation, which I know is hard.
Does the mental model land, or is there a piece where you go WAT?
If you've built CDC-into-warehouse, Kafka-plus-schema-registry, or rolled a data backbone, what's the part you'd have wanted an easier solution for?
Blog, testimonials, marketing video on the site. I'll be watching the thread. Be brutal, I can take it (I think).