I was reading an article earlier today, and it brought me back to a question I’ve heard over and over again in real data/infra teams:
Do we just accept vendor lock-in because it’s convenient,
or do we take the pain and build an open, multi-engine metadata stack?
For context (not my product, just what triggered the thought):
https://medium.com/p/35cc5b15b24e
I’m not trying to argue Gravitino vs. UC here — I’m more interested in the architectural mindset behind these two approaches.
On the vendor-integrated side, the upsides are obvious:
smoother UX
one place for lineage/policies
fewer moving parts
But so are the downsides:
cost keeps creeping up
you end up tied to one engine/format
migrations basically don’t happen in real life
And on the open/composable side:
Spark/Trino/Flink/Ray all first-class
Iceberg/Hudi/Delta can actually coexist
Metadata isn’t tied to compute
But again:
inconsistent metadata models everywhere
no unified governance layer
someone eventually owns a pile of glue code forever
So I’m curious: what actually works in practice?
If your company had to make this choice:
Did you go all-in on a vendor, or build something open?
Did the decision age well after a year or two?
Has anyone actually avoided metadata sprawl without getting locked in?
Where do lineage, ACLs, policies, and the “source of truth” actually live in your setup?
Really interested in what folks think, especially if you're juggling multiple engines, table formats, and clouds.
wey-gu•11m ago