I want to believe. (really! I think that's hugely underestimated tech).
These tools were pretty cool and an enormous amount of work was put into them.
The ontologies were extremely extensible. There just wasn't enough of an ecosystem putting them into practice and demonstrating their utility.
Their examples are nice:
https://github.com/Netflix-Skunkworks/uda/blob/9627a97fcd972...
Imagine trying to convey this example in RDF/XML (that's more like 2005).
RDFa and microdata stuff for sharing got pretty far, but those are often simpler vocabularies (at least when seen from the outside, maybe folks who index that shit has something nicer going on, idk).
Honestly, I feel kind of relieved seen Netflix using this stuff. I suggested using this kind of tech to model knowledge in systems that had this problem of knowledge representation several times, but always had a hard time when people said "if it's so good, why no big player uses it?".
Attempts to do this and survived either restricted themselves to being very abstract or limited their scope to specific use cases.
IMO they can be very valuable in terms of reduced friction and costs, if you do it right and have enough rigor/discipline in the organization. Netflix might.
Wikidata? 1.65 billion graph nodes and counting under a common vocabulary.
On a more serious note, scaling of a distributed system and the associated teams necessitates handling one’s data systematically. Fixing it afterwards looks painful.
The business contract with a consolidated data definition is that everyone in the business, no matter which domain, can rely on it. But think about the red tape that introduces. Whenever you need to define or update a data definition, now you don't have to think just about your own use case, but about all of the potential use cases throughout the organization, and you likely need to get sign-off from a wide variety of stakeholders, because any change, however small, is by definition an org-wide change.
It's the data form of the classic big-org problem, "Why does it take two months to change the color of a button?"
Granted, in most cases, having data definitions duplicated, with the potential for drift, is going to be the more insidious problem. But sometimes you just want to get a small, isolated change out the door without having to go through several levels of cross-domain approval committees.
If your data/service/api is used by a lot of other people in the org, you have to work with them to make sure your change doesn't break them. That's true regardless of the architecture.
It really doesn't, and that's not the point. This is for business entities that are larger than teams.
It's way worse to have a million different schemas with no way to share information. And then you have people everywhere banging on your door asking for your representation, you have to help them, you have to update it in their systems. God forbid you've got to migrate things...
If your entity type happens to be one that is core to the business, it's almost a neverending struggle. And when you find different teams took your definition and twisted it, when you're supposed to be the source of truth, and teams downstream of them consume it in the bastardized way...
This project sounds like a dream. I hope it goes well for Netflix and that they can evangelize it more.
Unfortunately it came about at exactly the same time as NoSQL and Big Data, which are basically the opposite. They let you be really loose with your model, and if some data gets lost or misunderstood, hey, no biggie. It's easier to patch it later than to develop a strong model to start with.
But am I bitter about it? No, why do you ask? Twitch, twitch.
It’s a classic pattern in public sector applications, where it’s partially deliberate.
big brain developers often not like this at all and invent many abstractions start of project
grug tempted to reach for club and yell "big brain no maintain code! big brain move on next architecture committee leave code for grug deal with!"
but grug learn control passions, major difference between grug and animal
instead grug try to limit damage of big brain developer early in project by giving them thing like UML diagram (not hurt code, probably throw away anyway) or by demanding working demo tomorrow
working demo especially good trick: force big brain make something to actually work to talk about and code to look at that do thing, will help big brain see reality on ground more quickly
remember! big brain have big brain! need only be harness for good and not in service of spirit complexity demon on accident, many times seen
Yes it is a "fundamentally a business problem" but we believe it can be solved with technology. We think we have a more systematic way to adopt and deploy model-first knowledge graphs in the enterprise.
> But think about the red tape that introduces.
We are very intentional about UDA not becoming more red tape. UDA lives alongside all the other systems. There will never be a mandate for everything to be in UDA.
But we sure want to make it easy for those teams who wants their business models to exist everywhere, to be connected to the business, and to make it easy to be discovered, extended, and linked to.
(I'm one of UDA's architects.)
It doesn't read from the article that they are denying that it's a business problem. The models they're defining seem to span all roles, engineering being only one.
Not really a problem that can be overcome in probably 99% of companies. Lots of consultancy money to be made for the sake of ego and inflexibility though.
Maybe let the business units be loose but make the sense making central. Any individual unit can eventually tidy things up (SEO!) but everything will work regardless. The UX effect might be you can't find something decent to watch but that is an entirely different problem solved by not using Netflix and going to the theatre!
To me feels related to the monorepo or not discussions?
From a modeling perspective, there is certainly inherent complexity in representing data from domain models in different ways. One can argue though that this is a feature and not a big. Not the same level of nuance and complexity is needed in all of the use-cases. And representational models usually are optimized for particular read scenarios, this seems to mandate argue against that, favoring uniformity over contextual handling of information. It will most likely scale better in places where the level of understanding needed from the domain model is quite uniform, though I have seen most often that use-cases are often complicated when they do not simplify concepts that in their code domain model is very complex and nuanced.
High-level overview:
- https://www.w3.org/DesignIssues/LinkedData.html from TimBL
- https://www.w3.org/DesignIssues/ReadWriteLinkedData.html from TimBL
- https://www.w3.org/DesignIssues/Footprints.html from TimBL
Similar recent attempts:
- https://www.uber.com/en-SE/blog/dragon-schema-integration-at... an attempt in the similar direction at Uber
- https://www.slideshare.net/joshsh/transpilers-gone-wild-intr... continuation of the Uber Dragon effort at LinkedIn
- https://www.palantir.com/docs/foundry/ontology/overview/
Standards and specs in support of such architectures:
- http://www.lotico.com/index.php/Next_Generation_RDF_and_SPAR... (RDF is the only standard in the world for graph data that is widely used; combining graph API responses from N endpoints is a straightforward graph union vs N-way graph merge for JSON/XML/other tree-based formats). Also see https://w3id.org/jelly/jelly-jvm/ if you are looking for a binary RDF serialization.
- https://www.w3.org/TR/shacl/ (needs tooling, see above)
- https://www.odata.org/ (in theory has means to reuse definitions, does not seem to work in practice)
- https://www.w3.org/TR/ldp/ (great foundation, too few features - some specs like paging never reached Recommendation status)
- https://open-services.net/ (builds atop W3C LDP; full disclosure: I'm involved in this one)
- https://www.w3.org/ns/hydra/ (focus on describing arbitrary affordances; not related to LinkedIn Hydra in any way)
Upper models:
- https://basic-formal-ontology.org/ - the gold standard. See https://www.youtube.com/watch?v=GWkk5AfRCpM for the tutorial
- https://www.iso.org/standard/87560.html - Industrial Data Ontology. There is a lot of activity around this one, but I lean towards BFO. See https://rds.posccaesar.org/WD_IDO.pdf for the unpaywalled draft and https://www.youtube.com/watch?v=uyjnJLGa4zI&list=PLr0AcmG4Ol... for the videos
I guess in their world they’d add a new model for whatever they want to change and then phase out use of the old one before removing it.
Versioning is permission to break things.
Although it is not currently implemented in UDA yet, the plan is to embrace the same model as Federated GraphQL, which has proved to work very well for us (think 500+ federated GraphQL schemas). In a nutshell, UDA will actively manage deprecation cycles, as we have the ability to track the consumers of the projected models.
I manage a much smaller federation where I work, and we have a lot of the same ideals I think in terms of having some centralized types that the rest of the business recognizes across the board. Right now we accomplish that within a set of “core” subgraphs that define these types, while our more product-focused ones implement their own sets of types, queries and mutations and can extend the core ones as it makes sense to.
The "Domain" in `upper:DomainModel` is the same D as in DDD (Domain-Driven Design) as the D in DGS (Domain Graph Service).
> in DDD it's kind of expected that the same concept will be represented in a different way by each system
In UDA, those concepts would explicitly co-exist in different domains. "Being the same" becomes a subjective thing.
I feel the dream. But we went to that place 25 years ago, and we saw that it was stupid.
Tell you what, I'll do a raffle. Leave a comment telling me that I just don't get it. One lucky winner will get my copy of this book https://www.amazon.com/Unified-Modeling-Language-Addison-Wes.... You pay shipping.
When your brain is constantly locked into big-O notation and you are only worrying about N being larger than a billion, it becomes really easy to justify running a high quality representation of the domain into the dirt over arbitrary performance concerns. E.g., storing a bunch of tiny fields in one JSON blob column is going to be faster for many cases, but it totally screws up downstream use cases by making custom views of the data more expensive. The query of concern might only hit once a day, but the developers probably aren't thinking at that level of detail.
The really tragic part is that the modern RDBMS is typically capable of figuring out acceptable query plans even given the most pathetically naive domain models. I think in general there is a severe (and growing) misunderstanding regarding what something like MSSQL/Oracle/DB2 can accomplish - even in an enterprise as large as Netflix.
Edit: The more I read this article the more I hear this voice https://www.youtube.com/watch?v=y8OnoxKotPQ
Shouldn’t it be?
When people embark on 'universal' data definitions, conversations of the type "But is it reaaalllly a Movie??" are an endless source of confusion.
The core problem seems to be development in isolation. Put another way: microservices. This post hints at microservices having complete autonomy over their data storage and developing their own GraphQL models. The first is normal for microservices (but an indictment at the same time). The second is... weird.
The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie". Attributes are optional. Pull what you need. Common subsets of data can be organized in fragments. If you're not doing that, why are you using GraphQL?
So I worked at Facebook and may be a bit biased here because I encountered a couple of ex-Netflix engineers in my time who basically wanted to throw away FB's internal infrastructure and reinvent Netflix microservices.
Anyway, at FB there a Video GraphQL object. There aren't 23 or 7 or even 2.
Data storage for most things was via write-through in-memory graph database called TAO that persisted things to sharded MySQL servers. On top of this, you'd use EntQL to add a bunch of behavior to TAO like permissions, privacy policies, observers and such. And again, there was one Video entity. There were offline data pipelines that would generally process logging data (ie outside TAO).
Maybe someone more experienced with microservices can speak to this: does UDA make sense? Is it solving an actual problem? Or just a self-created problem?
Yeah maybe 10 years ago, but today Netflix is one of the top production companies on the planet. In the article, they even point to how this addresses their issues in content engineering
https://netflixtechblog.com/netflix-studio-engineering-overv...
https://netflixtechblog.com/globalizing-productions-with-net...
(So their micro services can work together usefully and efficiently -- I would guess that currently the communication burden between microservice teams is high and still is not that effective.)
> The whole point of GraphQL is to create a unified view of something
It can do that, but that's not really the point of GraphQL.. I suppose you're saying that's how it was used as FB. That's fine, IMO, but it sounds like this NF team decided to use something more abstract for the same purpose.
I can't comment on their choices without doing a bunch more analysis, but in my own experience I've found off-the-shelf data modeling formats have too much flexibility in some places (forcing you to add additional custom controls or require certain usage patterns) and not enough in others (forcing you to add custom extensions). The nice thing about your own format is you can make it able to express everything you want and nothing you don't. And have a well-defined projection to Graphql (and sqlite and oracle and protobufs and xml and/or whatever other thing you're using).
GraphQL is great at federating APIs, and is a standardized API protocol. It is not a data modeling language. We actually tried really hard with GraphQL first.
I am definitely interested to read more and implement it myself as well. Would also be more than happy to skip the whole GraphQL end of it.
Netflix benefits from a large GraphQL ecosystem with federation, which is why it's so central in UDA from day 1. But adding a projection to "REST" would be very easy.
Of course, details on "Upper", PDM, and Sphere are well - missing, but at least I have concepts to focus on :)
Definitely coming soon ;-)
Don't find a linguist who understands grammatical structure and claims to be able to map the source language to some neutral intermediate structure and map that to the target language.
This is a fallacy I notice everywhere but I dont know how to name. Maybe the "Linguist translator" fallacy?
https://www.uber.com/blog/dragon-schema-integration-at-uber-...
Unfortunately it never got open sourced but Joshua left for LinkedIn and started working on the LambdaGraph project and the Hydra language that are open sourced.
You can find more information on this fascinating work here:
https://github.com/CategoricalData/hydra
I think these approaches, including all the semantic web stuff from 10+ years ago, suffered from the added overhead of agreeing and formalising semantics and then of course maintaining them.
I wonder if LLMs can help with that part today.
If we can get compound interest across development teams by giving them a common toolset and skillset that covers different applications but the same data semantics, maybe not every data contract will have to be reduced to DTOs that can be POSTed or otherwise forced to be a least common denominator just so it can fit past a network or other IPC barrier.
For that, I'm grateful Netflix is working on this and publicizing the interesting work.
I've been in a couple of the "we need to unify the data tables to serve everyone" exercises before decided to focus on other parts of the software stack and a lot of it just seemed like "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on." (This is specifically different from the much LARGER sort of problem which is more a copypasta one - Finance's accounting doesn't agree with Legal's accounting and nobody knows who's right, which is one dataset needed in multiple places, vs multiple datasets needed in different places.)
I think this mostly sidesteps that - they aren't forcing everyone to migrate to the same things, AFAICT - and is just about making it easy to access more broadly. Is that right?
And confusion-reducing definition things - "everyone uses the same official definitions for business concepts" - I'm all for. Seen a lot of that pain for sure.
This resonates. Moreover, it's very easy for architects to assume that because different areas of the business use data about the 'same' thing, the thing must be the same.
But often the analysis requires a slightly different thing. Like: we want a master list of prisons. But is a prison a building, a collection of prisoners (such that the male prison and the female prison on the same site are different prisons), or the institution with that name managed under a particular contract?
If I had to guess this is how eng pitched it to the business to carve out the time to build this tooling. As with all these internally built schemas, ui’s, tooling, etc… they’re never gonna post how much this is actually used relative to the work arounds ds and eng use in their day to day.
It's not conceptually a knowledge graph in the same way, but you can introspect essentially everything about your application. However, resources can be given data layers which define how they map to underlying storage, and you could use all of this information only as static information to derive additional things from, or you could just...well, use it. i.e `Ash.read(Resource)` yielding the table data. Our query engine has the same semantics they describe where you don't explicitly join etc.
```elixir MyApp.Post |> Ash.Query.filter(author.type == :admin) |> Ash.read!() ```
You can generate charts and graphs, including things like policy flow charts.
---
Ultimately I've found that modeling tools like UML that can't simultaneously actually execute that model (i.e act as the application itself) are always insufficient and/or have massive impedance mismatches once rubber meets the road. The point is to effectively reimagine this as "what if we use these modeling principles, declaratively, from the ground up".
There is no singular universal consistent definition of any concept like "actor" or "movie" or whatever else. These are all concepts that are well-defined only within a specific domain. The business domain concept of an "actor" is well and good and probably the most important and top-level user-facing concept of that term -- that doesn't mean that this business-domain definition is somehow authoritative, or comprehensive, or in any way some kind of superset-composite description of any/all other domain definitions of that same term.
Reconciliation of domain-specific concepts like these requires higher-level coordination across separate domains, it's not something you can do within individual domains or domain-specific services. If discrete domains/services needed to abide the same business-defined concept of whatever concepts, then that would subvert the main purpose of having separate domains/teams in the first place.. !
Each team still owns its local RDF graph for concepts like actor or movie. What UDA adds is a shared graph of mappings that translate between those local models whenever another team needs them.
Traditionally such translations live in scattered adapter code, which hides lineage and adds opacity - particularly as systems proliferate. By expressing the mappings as RDF triples inside UDA’s knowledge graph, they become versioned, queryable, and reusable. No more spelunking through layers of service code to understand how one team’s actor becomes another’s.
As a result, discrete teams/domains remain independent while the interconnections/relationships become first-class and introspectable. This enables coordination without centralization.
bravesoul2•12h ago
echelon•12h ago
Sure you do.
And the types of engineers writing on Medium are the ones they want to recruit, too.
yyhhsj0521•8h ago
mdaniel•8h ago
https://scribe.rip/uda-unified-data-architecture-6a6aee261d8...
tough•4h ago
and SEO
bravesoul2•1h ago
Part of marketing is knowing your audience. And plenty of marketing people exist with deep tech experience.