Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix

https://netflixtechblog.com/uda-unified-data-architecture-6a6aee261d8d

123•Bogdanp•14h ago

Comments

bravesoul2•12h ago

Why would Netflix engineering host on Medium? Very odd. And you just lose readers to the popups but you don't benefit from their discovery much either.

echelon•12h ago

> you don't benefit from their discovery

Sure you do.

And the types of engineers writing on Medium are the ones they want to recruit, too.

yyhhsj0521•8h ago

So they don’t have to maintain it themselves

mdaniel•8h ago

Every time I see that hex-encoded URL, I enjoy plugging scribe.rip <https://news.ycombinator.com/item?id=28838053>

https://scribe.rip/uda-unified-data-architecture-6a6aee261d8...

tough•4h ago

Marketing department can own it

and SEO

bravesoul2•1h ago

Makes sense for say a 30 person consultancy, but Netflix?!

Part of marketing is knowing your audience. And plenty of marketing people exist with deep tech experience.

alganet•12h ago

> ... RDF ... SPARQL ... OWL ...

I want to believe. (really! I think that's hugely underestimated tech).

echelon•12h ago

It's 2005 again!

These tools were pretty cool and an enormous amount of work was put into them.

The ontologies were extremely extensible. There just wasn't enough of an ecosystem putting them into practice and demonstrating their utility.

Their examples are nice:

https://github.com/Netflix-Skunkworks/uda/blob/9627a97fcd972...

alganet•11h ago

That's Turtle, it's an awesome RDF serialization. https://www.w3.org/TR/turtle/

Imagine trying to convey this example in RDF/XML (that's more like 2005).

RDFa and microdata stuff for sharing got pretty far, but those are often simpler vocabularies (at least when seen from the outside, maybe folks who index that shit has something nicer going on, idk).

Honestly, I feel kind of relieved seen Netflix using this stuff. I suggested using this kind of tech to model knowledge in systems that had this problem of knowledge representation several times, but always had a hard time when people said "if it's so good, why no big player uses it?".

bertails•7h ago

We joke internally that Upper is like "RDF: The Good Parts".

heisenbit•12h ago

I really believe a common vocabulary makes sense. But it is hard, very hard as you spread across organization (some to be bought and integrated), business processes and time. As soon as it comes to generating stuff things become hard. One may be able to generate interfaces between two systems but which enterprise has only two layers? Yes, if all knowledge is captured in the central catalog we may be able to do it but who builds this perfect database and maintains it?

Attempts to do this and survived either restricted themselves to being very abstract or limited their scope to specific use cases.

chiph•12h ago

The problem I've seen is that you define your corporate entities, but then you have these systems in other divisions which need to extend it. Whether their division's special attributes get promoted to the corporate entity for everyone to use brings in politics and optimism. And making an update to a corporate-scoped entity then means you need solid change management.

IMO they can be very valuable in terms of reduced friction and costs, if you do it right and have enough rigor/discipline in the organization. Netflix might.

smarx007•11h ago

> Attempts to do this and survived either restricted themselves to being very abstract or limited their scope to specific use cases.

Wikidata? 1.65 billion graph nodes and counting under a common vocabulary.

1776smithadam•12h ago

Doesn't Google achieve the same result with Protobuf?

tantalor•12h ago

It's more like Google Knowledge Graph

happyweasel•10h ago

I share the same perspective .. I was also wondering how UDA handles the problem of evolving schemas, "old clients" communicating with newer server or vice versa.

nialse•12h ago

From the: What is ERM? We don’t need DBAs. Why use a SQL DBMS?-department.

On a more serious note, scaling of a distributed system and the associated teams necessitates handling one’s data systematically. Fixing it afterwards looks painful.

jawns•11h ago

For all the benefits, there is a large problem with this approach that often goes unacknowledged. It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

The business contract with a consolidated data definition is that everyone in the business, no matter which domain, can rely on it. But think about the red tape that introduces. Whenever you need to define or update a data definition, now you don't have to think just about your own use case, but about all of the potential use cases throughout the organization, and you likely need to get sign-off from a wide variety of stakeholders, because any change, however small, is by definition an org-wide change.

It's the data form of the classic big-org problem, "Why does it take two months to change the color of a button?"

Granted, in most cases, having data definitions duplicated, with the potential for drift, is going to be the more insidious problem. But sometimes you just want to get a small, isolated change out the door without having to go through several levels of cross-domain approval committees.

rco8786•11h ago

This doesn't sound significantly different than any other large tech org.

If your data/service/api is used by a lot of other people in the org, you have to work with them to make sure your change doesn't break them. That's true regardless of the architecture.

stathibus•11h ago

At a place like Netflix where the product has been fundamentally the same for almost a decade, installing this kind of red tape is great for job security

echelon•11h ago

> installing this kind of red tape is great for job security

It really doesn't, and that's not the point. This is for business entities that are larger than teams.

It's way worse to have a million different schemas with no way to share information. And then you have people everywhere banging on your door asking for your representation, you have to help them, you have to update it in their systems. God forbid you've got to migrate things...

If your entity type happens to be one that is core to the business, it's almost a neverending struggle. And when you find different teams took your definition and twisted it, when you're supposed to be the source of truth, and teams downstream of them consume it in the bastardized way...

This project sounds like a dream. I hope it goes well for Netflix and that they can evangelize it more.

tomrod•11h ago

Corolloray to Hyrum's Law then. Perhaps we call it "Orange is the New Model" Law

mkoubaa•9h ago

Love it

giantg2•11h ago

You could store the info as a common definition and then just use transformations on retrieval or storing if there's an exception for that system/business group.

jfengel•11h ago

I tried, for some time, to develop a product designed to solve this. It would have made it easier to specialize models locally while complying with the corporate one. (Basically, beefing up the data definition language to something like prolog, and putting real thought into making the corporate model reality-based rather than just what suits your current requirements.)

Unfortunately it came about at exactly the same time as NoSQL and Big Data, which are basically the opposite. They let you be really loose with your model, and if some data gets lost or misunderstood, hey, no biggie. It's easier to patch it later than to develop a strong model to start with.

But am I bitter about it? No, why do you ask? Twitch, twitch.

bertails•9h ago

UDA embraces the duplication of models: it's a fact of life in the enterprise. That is why "domains" are first-class citizen. We believe that good discovery capabilities will increase reusability of the domain models. Our next article will dive more into the extensibility capabilities of the metamodel Upper.

Spooky23•10h ago

The alternative is the same barriers, except with a parallel phone a friend governance model when you have to share data between verticals or programs.

It’s a classic pattern in public sector applications, where it’s partially deliberate.

dboreham•10h ago

Reminds me of my experience trying to understand what SAP actually is. For decades I wondered what sort of magic tech must be in there that allowed their software to be used by thousands of different businesses. Then someone who knew about SAP told me: "oh, no that's not how it works -- what they do is have a fixed schema and tell the customer that they must adopt it".

UltraSane•9h ago

Epic EMR is the same. But then some hospitals insist on customizing it which causes no end of problems.

thefourthchime•10h ago

sometimes grug go too early and get abstractions wrong, so grug bias towards waiting

big brain developers often not like this at all and invent many abstractions start of project

grug tempted to reach for club and yell "big brain no maintain code! big brain move on next architecture committee leave code for grug deal with!"

but grug learn control passions, major difference between grug and animal

instead grug try to limit damage of big brain developer early in project by giving them thing like UML diagram (not hurt code, probably throw away anyway) or by demanding working demo tomorrow

working demo especially good trick: force big brain make something to actually work to talk about and code to look at that do thing, will help big brain see reality on ground more quickly

remember! big brain have big brain! need only be harness for good and not in service of spirit complexity demon on accident, many times seen

https://grugbrain.dev/#grug-on-complexity

wjnc•9h ago

Data drift is real! I’ve recently restored sanity in a medium sized enterprise where there were three concurrent financial data flows. Including people not understanding each other, projects to find out ground truth and triple the workload in maintaining the dataflows. I’ve quipped to the team that endless summer is near. What if we only work on business relevant development. I would dream that the bigcorp we are part of would do the same. They are more of a tack on another Excel based solution kind of firm.

datadrivenangel•9h ago

Data drift is real, and the yoke of governance chafes enough that new people insist on redoing your work in excel until the problem gets bad enough that a new data governance push is needed.

bertails•9h ago

> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

Yes it is a "fundamentally a business problem" but we believe it can be solved with technology. We think we have a more systematic way to adopt and deploy model-first knowledge graphs in the enterprise.

> But think about the red tape that introduces.

We are very intentional about UDA not becoming more red tape. UDA lives alongside all the other systems. There will never be a mandate for everything to be in UDA.

But we sure want to make it easy for those teams who wants their business models to exist everywhere, to be connected to the business, and to make it easy to be discovered, extended, and linked to.

(I'm one of UDA's architects.)

datadrivenangel•9h ago

How can it be universal if everything isn't in UDA?

cush•9h ago

> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

It doesn't read from the article that they are denying that it's a business problem. The models they're defining seem to span all roles, engineering being only one.

citizenpaul•7h ago

IME it often comes down to "big men" issues where someone important wants the data in a certain way that is not logical or consistent so they won't let the "tech people" simple take the data and present it in a way that is logically consistent and follows best practices. They want to sit in meetings and create their own mental model monstrosity and force the devs to make it. Once that happens one time there is zero chance of the company ever having a consistent data model at any point in the future ever.

Not really a problem that can be overcome in probably 99% of companies. Lots of consultancy money to be made for the sake of ego and inflexibility though.

bravesoul2•1h ago

Spitballing. Another way to deal with the problem is like what would you do if you had billions of pieces of unstructured data (except for maybe the data being somewhat XML like) and you don't control any of it but you need to make sense of it as (ignoring rounding errors) your only business concern. That company is Google of course.

Maybe let the business units be loose but make the sense making central. Any individual unit can eventually tidy things up (SEO!) but everything will work regardless. The UX effect might be you can't find something decent to watch but that is an entirely different problem solved by not using Netflix and going to the theatre!

praveen9920•11h ago

Main challenge with this approach is change management of models scheme. Apart from Consensus for updating schema, maintaining versioned models across services becomes a challenge. Let’s say someone deprecates a field in schema, all services needs to update the business logic based on that which is challenging and against the ethos of distributed services.

regularfry•9h ago

It's a challenge but in principle it's doable with contract testing, in the style of Pact, where there's a contract broker that disparate services all coordinate through. If you've got that, you can publish your new model version as a new contract version, and everyone can see immediately where their APIs need to change. Contracts do get a passing mention in the article, but it's not a focus.

detaro•9h ago

It's still distributed services in the service of one entity. So why is something deprecated without a clear plan what existing users will do?

To me feels related to the monorepo or not discussions?

frankdejonge•11h ago

A bit unfortunate they used the term domain model here. Domain models here are purely data-centric, whereas domain modeling focuses mainly on behavior, not underlying data structures. The data that is used in domain models is used to facilitate the behavior, but the behavior it the code focus.

From a modeling perspective, there is certainly inherent complexity in representing data from domain models in different ways. One can argue though that this is a feature and not a big. Not the same level of nuance and complexity is needed in all of the use-cases. And representational models usually are optimized for particular read scenarios, this seems to mandate argue against that, favoring uniformity over contextual handling of information. It will most likely scale better in places where the level of understanding needed from the domain model is quite uniform, though I have seen most often that use-cases are often complicated when they do not simplify concepts that in their code domain model is very complex and nuanced.

smarx007•11h ago

Below are some links for extra reading from my favorites.

High-level overview:

- https://www.w3.org/DesignIssues/LinkedData.html from TimBL

- https://www.w3.org/DesignIssues/ReadWriteLinkedData.html from TimBL

- https://www.w3.org/DesignIssues/Footprints.html from TimBL

Similar recent attempts:

- https://www.uber.com/en-SE/blog/dragon-schema-integration-at... an attempt in the similar direction at Uber

- https://www.slideshare.net/joshsh/transpilers-gone-wild-intr... continuation of the Uber Dragon effort at LinkedIn

- https://www.palantir.com/docs/foundry/ontology/overview/

Standards and specs in support of such architectures:

- http://www.lotico.com/index.php/Next_Generation_RDF_and_SPAR... (RDF is the only standard in the world for graph data that is widely used; combining graph API responses from N endpoints is a straightforward graph union vs N-way graph merge for JSON/XML/other tree-based formats). Also see https://w3id.org/jelly/jelly-jvm/ if you are looking for a binary RDF serialization.

- https://www.w3.org/TR/shacl/ (needs tooling, see above)

- https://www.odata.org/ (in theory has means to reuse definitions, does not seem to work in practice)

- https://www.w3.org/TR/ldp/ (great foundation, too few features - some specs like paging never reached Recommendation status)

- https://open-services.net/ (builds atop W3C LDP; full disclosure: I'm involved in this one)

- https://www.w3.org/ns/hydra/ (focus on describing arbitrary affordances; not related to LinkedIn Hydra in any way)

Upper models:

- https://basic-formal-ontology.org/ - the gold standard. See https://www.youtube.com/watch?v=GWkk5AfRCpM for the tutorial

- https://www.iso.org/standard/87560.html - Industrial Data Ontology. There is a lot of activity around this one, but I lean towards BFO. See https://rds.posccaesar.org/WD_IDO.pdf for the unpaywalled draft and https://www.youtube.com/watch?v=uyjnJLGa4zI&list=PLr0AcmG4Ol... for the videos

twodave•11h ago

I wonder how they deal with versioning or breaking changes to the model. One advantage of keeping things more segregated is that when you decide to change a model you can do it in much smaller pieces.

I guess in their world they’d add a new model for whatever they want to change and then phase out use of the old one before removing it.

bertails•7h ago

> I wonder how they deal with versioning or breaking changes to the model.

Versioning is permission to break things.

Although it is not currently implemented in UDA yet, the plan is to embrace the same model as Federated GraphQL, which has proved to work very well for us (think 500+ federated GraphQL schemas). In a nutshell, UDA will actively manage deprecation cycles, as we have the ability to track the consumers of the projected models.

twodave•5h ago

That is a lot of subgraphs. Am I understanding correctly then that under UDA developers fulfill the UDA spec in whatever language they’re using, and then there’s some kind of middleware that will handle serving GraphQL queries? How are mutations represented? And how are other GraphQL-specific idioms expressed (like input parameters, nodes/edges/connections/etc.)? Is it just a subset of GraphQL that is supported?

I manage a much smaller federation where I work, and we have a lot of the same ideals I think in terms of having some centralized types that the rest of the business recognizes across the board. Right now we accomplish that within a set of “core” subgraphs that define these types, while our more product-focused ones implement their own sets of types, queries and mutations and can extend the core ones as it makes sense to.

bertylicious•11h ago

How does this relate to domain-driven design? It seems to be at odds with it, because in DDD it's kind of expected that the same concept will be represented in a different way by each system? But to be honest, I didn't read the whole blog post because of the UML vibes.

regularfry•9h ago

It doesn't. It's a blessing that they avoided the term "ubiquitous language" because that's almost exactly the dual of this concept, although people who have only ever heard the words and not dug any deeper won't know what the difference is.

bertails•9h ago

> How does this relate to domain-driven design?

The "Domain" in `upper:DomainModel` is the same D as in DDD (Domain-Driven Design) as the D in DGS (Domain Graph Service).

> in DDD it's kind of expected that the same concept will be represented in a different way by each system

In UDA, those concepts would explicitly co-exist in different domains. "Being the same" becomes a subjective thing.

waynenilsen•11h ago

Is this soap again?

dboreham•10h ago

ebXML again.

oh_my_goodness•10h ago

No, unfortunately the activity is not modeling at all. It's software development. Pretending otherwise will not make our thinking (or data structures) more logically consistent.

I feel the dream. But we went to that place 25 years ago, and we saw that it was stupid.

Tell you what, I'll do a raffle. Leave a comment telling me that I just don't get it. One lucky winner will get my copy of this book https://www.amazon.com/Unified-Modeling-Language-Addison-Wes.... You pay shipping.

bob1029•10h ago

This kind of problem could be made a lot more straightforward if we separate the schema owner (i.e., the business) from the rest of the stack. Some major SQL engines have this role built-in. Whatever you want to call it - "premature" optimization, etc. - the act of simultaneously trying to optimize while you build is perhaps important, but otherwise very disruptive to the creative exercise of naming things and relating them together (domain modeling).

When your brain is constantly locked into big-O notation and you are only worrying about N being larger than a billion, it becomes really easy to justify running a high quality representation of the domain into the dirt over arbitrary performance concerns. E.g., storing a bunch of tiny fields in one JSON blob column is going to be faster for many cases, but it totally screws up downstream use cases by making custom views of the data more expensive. The query of concern might only hit once a day, but the developers probably aren't thinking at that level of detail.

The really tragic part is that the modern RDBMS is typically capable of figuring out acceptable query plans even given the most pathetically naive domain models. I think in general there is a severe (and growing) misunderstanding regarding what something like MSSQL/Oracle/DB2 can accomplish - even in an enterprise as large as Netflix.

rorylaitila•10h ago

Good luck. This is not new. Back in the Enterprise OOP era, there was a fad of developing universal data entities. Everyone eventually learned that there is no such thing as a universal entity. The semantic meaning of the data model depends on the user context, not the producer context. A "Movie" is not the same thing to the Finance team, Acquisition team, Infrastructure team, or Customer. There is not even always a common identifier, let alone common fields, let alone common meaning of the fields.

Edit: The more I read this article the more I hear this voice https://www.youtube.com/watch?v=y8OnoxKotPQ

andsoitis•9h ago

> A "Movie" is not the same thing to the Finance team, Acquisition team, Infrastructure team, or Customer.

Shouldn’t it be?

rorylaitila•9h ago

No, because context and use defines the meaning. To the data team, a "Movie" might mean a file on disk. To the finance team, a "Movie" might mean a contract to a studio. To the Customer, a "Movie" is something they watch. That each of these contexts can use the term "Movie" does not actually mean they share anything in common. We could have called them "Files", "Contracts" and "Watchables" instead.

When people embark on 'universal' data definitions, conversations of the type "But is it reaaalllly a Movie??" are an endless source of confusion.

detaro•9h ago

Alternatively, the process of defining these global definitions exposes exactly this conflict and leads to common definitions of "Files", "Contracts" and "Watchables" instead of 3 conflicting definitions of "Movies"?

rorylaitila•8h ago

The conflict will definitely help define the terms. Maybe they will all choose "Movie", maybe not. Just there is no universally ideal term that represents a concept for all users for all time. It's a common error to seek such universal definitions.

bertails•6h ago

Exactly. In UDA, each Movie entity belongs to a specific business domain. Universality isn't an inherent truth, it's a social alignment within a group, useful only to the extent that it helps solve shared problems.

buster•9h ago

No, why would the finance team care for the cover of a movie or the available subtitles? If everyone would have the same definition, changing some thing about a movie will need a change in every consumer who doesn't actually care.

mkoubaa•9h ago

A unique identifier for a movie is the same thing, like an ISBN number. What the label means in each area is going to be different. That said, some things like "director", "budget" are immutable properties of a movie but are absolutely irrelevant for the business areas and the duplication of these properties in different domains is fundamentally not that big of a deal

mkoubaa•9h ago

Wittgenstein sends his regards

bertails•7h ago

UDA does not believe in the existence of universal data entities. We embrace the idea that 2+ teams may have different opinions on how to represent the world. We are focused on the discovery of existing entities across systems and their reusability through extensibility. We believe that automation of the projections will be key for teams to align on defining some entities, where it makes sense.

behnamoh•10h ago

I thought UDA meant they made CUDA but made it cross-platform :')

b0a04gl•10h ago

how much of upper is actually enforced at runtime vs just used for schema generation? like if a downstream system silently breaks a semantic assumption (say, infers enum incorrectly or drops a type constraint), does uda catch that anywhere or is this trust-based across projections?

bertails•7h ago

Great question. It really depends on the projection. For example, the projections to GraphQL and Java are mostly limited to what can be expressed there. But the projection to SHACL has access to all of SPARQL Constraints, which is what's used for the bootstrapping knowledge graph. We are looking into being able to do more runtime validation for data in the warehouse.

jaakl•9h ago

It seems to be based on very common naive belief that things which are named same or similar in different domains are conceptually same, so "lets deduplicate" ? There can be rare moments when they really are, but then the moment passes and then you only have troubles.

detaro•9h ago

To me the motivation seems more along the lines of "we build lots of different systems that deal in the same domains" (because they are deep in microservice land, have apps for all kinds of platforms, ...) "lets make sure they all use the same definition of the things". Do you think that doesn't make sense (because each of those should be considered their own domain?) or does something else give you your impression?

cletus•9h ago

I realize scale makes everything more difficult but at the end of the day, Netflix is encoding and serving several thousand videos via a CDN. It can't be this hard. There are a few statements in this that gave me pause.

The core problem seems to be development in isolation. Put another way: microservices. This post hints at microservices having complete autonomy over their data storage and developing their own GraphQL models. The first is normal for microservices (but an indictment at the same time). The second is... weird.

The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie". Attributes are optional. Pull what you need. Common subsets of data can be organized in fragments. If you're not doing that, why are you using GraphQL?

So I worked at Facebook and may be a bit biased here because I encountered a couple of ex-Netflix engineers in my time who basically wanted to throw away FB's internal infrastructure and reinvent Netflix microservices.

Anyway, at FB there a Video GraphQL object. There aren't 23 or 7 or even 2.

Data storage for most things was via write-through in-memory graph database called TAO that persisted things to sharded MySQL servers. On top of this, you'd use EntQL to add a bunch of behavior to TAO like permissions, privacy policies, observers and such. And again, there was one Video entity. There were offline data pipelines that would generally process logging data (ie outside TAO).

Maybe someone more experienced with microservices can speak to this: does UDA make sense? Is it solving an actual problem? Or just a self-created problem?

twodave•9h ago

I totally agree. Especially with Fusion it’s very easy to establish core types in self-contained subgraphs and then extend those types in domain-specific subgraphs. IMO the hardest part about this approach is just namespacing all the things, because GraphQL doesn’t have any real conventions for organizing service- (or product-) specific types.

cush•9h ago

>at the end of the day, Netflix is encoding and serving several thousand videos via a CDN. It can't be this hard

Yeah maybe 10 years ago, but today Netflix is one of the top production companies on the planet. In the article, they even point to how this addresses their issues in content engineering

https://netflixtechblog.com/netflix-studio-engineering-overv...

https://netflixtechblog.com/globalizing-productions-with-net...

jmull•7h ago

I think they are just trying to put in place the common data model that, as you point out, they need.

(So their micro services can work together usefully and efficiently -- I would guess that currently the communication burden between microservice teams is high and still is not that effective.)

> The whole point of GraphQL is to create a unified view of something

It can do that, but that's not really the point of GraphQL.. I suppose you're saying that's how it was used as FB. That's fine, IMO, but it sounds like this NF team decided to use something more abstract for the same purpose.

I can't comment on their choices without doing a bunch more analysis, but in my own experience I've found off-the-shelf data modeling formats have too much flexibility in some places (forcing you to add additional custom controls or require certain usage patterns) and not enough in others (forcing you to add custom extensions). The nice thing about your own format is you can make it able to express everything you want and nothing you don't. And have a well-defined projection to Graphql (and sqlite and oracle and protobufs and xml and/or whatever other thing you're using).

bertails•7h ago

> The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie".

GraphQL is great at federating APIs, and is a standardized API protocol. It is not a data modeling language. We actually tried really hard with GraphQL first.

Keyframe•9h ago

Having dealt with same problems for years now (we call our UDM - Unified Data Model, heh), I was under the impression this was an over-engineered Datamart++; It's not though. Calling UDA a datamart would be like calling K8S a bash script, which might be related but wildly different in scope.

I am definitely interested to read more and implement it myself as well. Would also be more than happy to skip the whole GraphQL end of it.

bertails•9h ago

> Would also be more than happy to skip the whole GraphQL end of it.

Netflix benefits from a large GraphQL ecosystem with federation, which is why it's so central in UDA from day 1. But adding a projection to "REST" would be very easy.

Keyframe•9h ago

I don't doubt their yield out of GraphQL is great. Not something I'm having a need for though. I'm at the helm of the tech group at one part of dun&bradstreet so we have different challenges, unification across different borders being primary one. We manage, but the going gets tough sometimes. Described architecture of UDA certainly seems to be what it was designed to solve. I think our system is even at a perfect inflection point to adopt at least some of the principles described to provide a clear path forward to resolve some of those challenges we face; Not as a replacement, but more of as a control plane over our system. I can already see how we could avoid at least schema bloat, lowest common denominator fields and overall rigidity.

Of course, details on "Upper", PDM, and Sphere are well - missing, but at least I have concepts to focus on :)

bertails•7h ago

> Of course, details on "Upper", PDM, and Sphere are well - missing, but at least I have concepts to focus on :)

Definitely coming soon ;-)

adamtaylor_13•9h ago

I’ve never been so happy I don’t work on systems this large. Holy cow.

killthebuddha•9h ago

I feel like the Netflix tech blog has officially jumped the shark.

mkoubaa•9h ago

When translating from French to English, find someone that speaks both fluently and had domain expertise over the content being translated.

Don't find a linguist who understands grammatical structure and claims to be able to map the source language to some neutral intermediate structure and map that to the target language.

This is a fallacy I notice everywhere but I dont know how to name. Maybe the "Linguist translator" fallacy?

cpard•8h ago

Reminds me of the work done at Uber with Dragon

https://www.uber.com/blog/dragon-schema-integration-at-uber-...

Unfortunately it never got open sourced but Joshua left for LinkedIn and started working on the LambdaGraph project and the Hydra language that are open sourced.

You can find more information on this fascinating work here:

https://github.com/CategoricalData/hydra

I think these approaches, including all the semantic web stuff from 10+ years ago, suffered from the added overhead of agreeing and formalising semantics and then of course maintaining them.

I wonder if LLMs can help with that part today.

Multicomp•8h ago

It's been so long since the Semantic web and RDF and OWL and SKOS. I'm so glad they stuck with W3C and didn't reinvent those wheels. Will this UDA approach catch on? I don't know, but I hope so. It seems like it is trying to move the frontier of the difficulties of applying Domain Driven Design and semantic concepts to an enterprise company of significant scale.

If we can get compound interest across development teams by giving them a common toolset and skillset that covers different applications but the same data semantics, maybe not every data contract will have to be reduced to DTOs that can be POSTed or otherwise forced to be a least common denominator just so it can fit past a network or other IPC barrier.

For that, I'm grateful Netflix is working on this and publicizing the interesting work.

majormajor•8h ago

I'm curious if anyone has seen business improvements along the lines of "this let us discover something that led to 5%+ or >$5M improvements" (percent or absolute depending on how big the company is) from these kinds of efforts?

I've been in a couple of the "we need to unify the data tables to serve everyone" exercises before decided to focus on other parts of the software stack and a lot of it just seemed like "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on." (This is specifically different from the much LARGER sort of problem which is more a copypasta one - Finance's accounting doesn't agree with Legal's accounting and nobody knows who's right, which is one dataset needed in multiple places, vs multiple datasets needed in different places.)

I think this mostly sidesteps that - they aren't forcing everyone to migrate to the same things, AFAICT - and is just about making it easy to access more broadly. Is that right?

And confusion-reducing definition things - "everyone uses the same official definitions for business concepts" - I'm all for. Seen a lot of that pain for sure.

RobinL•7h ago

> "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on"

This resonates. Moreover, it's very easy for architects to assume that because different areas of the business use data about the 'same' thing, the thing must be the same.

But often the analysis requires a slightly different thing. Like: we want a master list of prisons. But is a prison a building, a collection of prisoners (such that the male prison and the female prison on the same site are different prisons), or the institution with that name managed under a particular contract?

enjoylife•8h ago

> Once concepts are selected, Sphere walks the knowledge graph and generates SQL queries to retrieve data from the warehouse, no manual joins or technical mediation required.

If I had to guess this is how eng pitched it to the business to carve out the time to build this tooling. As with all these internally built schemas, ui’s, tooling, etc… they’re never gonna post how much this is actually used relative to the work arounds ds and eng use in their day to day.

bertails•7h ago

The price is in the 500+ domain graph services federated into our GraphQL enterprise gateway, which will all be exposed to Sphere through UDA. That's real.

borromakot•7h ago

https://ash-hq.org

> Model your domain, derive the rest

Been doing this for 5+ years.

bertails•7h ago

This does look interesting. Does the Ash Framework yield a knowledge graph? How good is it a cataloging existing data containers?

borromakot•6h ago

The concept is you model the core of your application and build it at the same time, using declarative tools, and project additions layers from this definition. The underlying data model is extendable via, well, extensions. These extend the DSL schema.

It's not conceptually a knowledge graph in the same way, but you can introspect essentially everything about your application. However, resources can be given data layers which define how they map to underlying storage, and you could use all of this information only as static information to derive additional things from, or you could just...well, use it. i.e `Ash.read(Resource)` yielding the table data. Our query engine has the same semantics they describe where you don't explicitly join etc.

```elixir MyApp.Post |> Ash.Query.filter(author.type == :admin) |> Ash.read!() ```

You can generate charts and graphs, including things like policy flow charts.

---

Ultimately I've found that modeling tools like UML that can't simultaneously actually execute that model (i.e act as the application itself) are always insufficient and/or have massive impedance mismatches once rubber meets the road. The point is to effectively reimagine this as "what if we use these modeling principles, declaratively, from the ground up".

borromakot•6h ago

Factor in that building essentially any server-side tooling without Elixir (BEAM) is a bad idea in my view, you end up with "lets just make this the way we build apps, and do it in Elixir". It's been very powerful and we're continuing to progress on it.

bertails•3h ago

It is important in UDA for the data models to be part of the same knowledge graph as the data container representations and the mappings, and eventually the instance data too. Our metamodel Upper is strongly inspired from RDFS, SHACL, and OWL in that respect.

kiitos•5h ago

> Core business concepts like ‘actor’ or ‘movie’ are modeled in many places ... teams re-model the same business entities in different systems, leading to conflicting definitions that are hard to reconcile.

There is no singular universal consistent definition of any concept like "actor" or "movie" or whatever else. These are all concepts that are well-defined only within a specific domain. The business domain concept of an "actor" is well and good and probably the most important and top-level user-facing concept of that term -- that doesn't mean that this business-domain definition is somehow authoritative, or comprehensive, or in any way some kind of superset-composite description of any/all other domain definitions of that same term.

Reconciliation of domain-specific concepts like these requires higher-level coordination across separate domains, it's not something you can do within individual domains or domain-specific services. If discrete domains/services needed to abide the same business-defined concept of whatever concepts, then that would subvert the main purpose of having separate domains/teams in the first place.. !

thethimble•3h ago

They’re not pushing a single canonical schema.

Each team still owns its local RDF graph for concepts like actor or movie. What UDA adds is a shared graph of mappings that translate between those local models whenever another team needs them.

Traditionally such translations live in scattered adapter code, which hides lineage and adds opacity - particularly as systems proliferate. By expressing the mappings as RDF triples inside UDA’s knowledge graph, they become versioned, queryable, and reusable. No more spelunking through layers of service code to understand how one team’s actor becomes another’s.

As a result, discrete teams/domains remain independent while the interconnections/relationships become first-class and introspectable. This enables coordination without centralization.

zellyn•4h ago

This is so meta it’s hard to ground it enough to understand any of it. I’d love to see a tiny worked example of, say, a billing department and an app that represent “Customer” in different ways, and how to model each, and most importantly, what practical benefits result. Ideally, the ideas of what a “customer” is would be different enough to differ in cardinality: e.g. in Billing, a customer is a business, and in the app, a customer is something that started out 1:1 with business but that broke down over time with co-ownership, assistants performing work for owners, etc.

A way to insert a value in an empty List in a specific index

Measuring AGI: Interactive Reasoning Benchmarks [video]

Memory Safety Isn't Just Rust: A Serious Look at GC

Q-learning is not yet scalable

Beware General Claims about "Generalizable Reasoning Capabilities" of AI Systems

Resilient Docs – AI suggestions to keep content fresh

New York State Updates Warn Notices to Identify Layoffs Tied to AI

In Munich, early signs of a European hyperscaler revolt

Reunion

How and Why We Fulfill In-House

'Ghost' AI students being enrolled in college as a financial aid scam

Terence Tao: Hardest Problems in Mathematics, Physics and the Future of AI

Molecule Linked to Metabolism Found to Boost Plant Growth

IEEE International Conference on LLM-Aided Design 2025

Is Tahoe really macOS 26 or 16?

What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)

Somebody bought that $3k Steam Deck prototype off eBay

Show HN: Paper trading that simulates real market dynamics and broker rules

AMD EPYC Venice boasts 256 cores – next-gen server CPUs arrive in 2026

AI Overviews hallucinates that Airbus not Boeing involved in Air India crash

Analysing FIX Data with ClickHouse

Fixing the mechanics of my bullet chess

"poline" is an enigmatic color palette generator using polar coordinates

A dying Judo Master's lesson to develop extreme competency

Addiction: The View from Rat Park (2010)

Embedding Godot games in iOS became easy

"Special thanks to our sponsor: Coinbase"

Show HN: ETL System to Extract Product Data from Websites and Upload to Shopify

GenAI as an International Lawyer

DNL Ramp-Up Time

Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix

Comments

A way to insert a value in an empty List in a specific index

Measuring AGI: Interactive Reasoning Benchmarks [video]

Memory Safety Isn't Just Rust: A Serious Look at GC

Q-learning is not yet scalable

Beware General Claims about "Generalizable Reasoning Capabilities" of AI Systems

Resilient Docs – AI suggestions to keep content fresh

New York State Updates Warn Notices to Identify Layoffs Tied to AI

In Munich, early signs of a European hyperscaler revolt

Reunion

How and Why We Fulfill In-House

'Ghost' AI students being enrolled in college as a financial aid scam

Terence Tao: Hardest Problems in Mathematics, Physics and the Future of AI

Molecule Linked to Metabolism Found to Boost Plant Growth

IEEE International Conference on LLM-Aided Design 2025

Is Tahoe really macOS 26 or 16?

What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)

Somebody bought that $3k Steam Deck prototype off eBay

Show HN: Paper trading that simulates real market dynamics and broker rules

AMD EPYC Venice boasts 256 cores – next-gen server CPUs arrive in 2026

AI Overviews hallucinates that Airbus not Boeing involved in Air India crash

Analysing FIX Data with ClickHouse

Fixing the mechanics of my bullet chess

"poline" is an enigmatic color palette generator using polar coordinates

A dying Judo Master's lesson to develop extreme competency

Addiction: The View from Rat Park (2010)

Embedding Godot games in iOS became easy

"Special thanks to our sponsor: Coinbase"

Show HN: ETL System to Extract Product Data from Websites and Upload to Shopify

GenAI as an International Lawyer

DNL Ramp-Up Time