You have way less in terms of tools to actually provide nice straightforward APIs. I appreciate that Pydantic gives you type safety but at one point the actual ease of writing correct code goes beyond type safety
Just real straightforward stuff around dealing with loading in user input becomes a whole song and dance because Pydantic is an extremely basic validation thing… the hacks in DRF like request contexts are useful!
I’ve seen many projects do this and it feels like such a step back in offering simple-to-maintain APIs. Maybe I’m just biased cuz I “get” DRF (and did lose half a day recently to weird DRF behavior…)
I can stretch my imagination about Astral monetizing their tools, but this one is too difficult
In 2022, the project evolved into a commercial entity called Pydantic Services Inc., founded by Samuel Colvin and Adrian Garcia Badaracco, to build products around the open-source library. The company raised $4.7 million in seed funding in February 2023, led by Sequoia Capital, with participation from Partech, Irregular Expressions, and other investors. This was followed by a $12.5 million Series A round in October 2024, again led by Sequoia Capital and including Partech Partners, bringing the total funding to approximately $17.2 million across rounds. The Series A funding coincided with the launch of Pydantic Logfire, a commercial observability platform for backend applications, aimed at expanding beyond the core open-source validation framework. As of mid-2025, no additional funding rounds have been publicly reported.
https://techcrunch.com/2023/02/16/sequoia-backs-open-source-...
In a more just world pythons typing story was closer to typescript’s and we could have a fully realized idea like it that supports the asymmetric nature of serializing/deserializing and offers nice abstractions through the stack
Right now Pydantic for me is like “you can validate a straightforward data structure! Now it’s up to you to actually build up a useful data structure from the straightforward one”. Other tools give me both in one go. At the cost of safety (that you can contain, but you gotta do it right)
But if I had to roll the clock back I'd recommend marshmallow and that entire ecosystem. It's definitely way less bloated than Pydantic currently, and only lacks some features. Beyond that, just use plain-old dataclasses.
I have had good success with DRF model serializers in like Django projects with 100+ apps (was the sprawling nature of the apps itself a problem? Sure, maybe). Got the job done
As with anything you gotta built your own wrappers around these things to get value in larger projects though
For example, some systems interact with several different vendor, tracking and payment systems that are all kinda the same, but also kinda different. Here it makes sense to have an internal domain model and to normalize all of these other systems into your domain model at a very early level. Otherwise complexity rises very, very quickly due to the number of n things interacting with n other things.
On the other hand, for a lot of our smaller and simpler systems that output JSON based of a database for other systems... it's a realistic question if maintaining the domain model and API translation for every endpoint in every change is actually less work than ripping out the API modelling framework, which occurs once every few years, if at all? Some teams would probably rewrite from scratch with new knowledge, especially if they have API-tests available.
The biggest benefit you get is being able to have much more flexibility around validation when the input model (Pydantic here) isn’t the same as the database model. The canonical example here would be something like a user, where the validation rules vary depending on context, you might be creating a new stub user at signup when only a username and password are required, but you also want a password confirmation. At a different point you’re updating the user’s profile, and that case you have a bunch of fields that might be required now but password isn’t one of them and the username can’t be changed.
By having distinct input models you make that all much easier to reason about than having a single model which represents the database record, but also the input form, and has a bunch of flags on it to indicate which context you’re talking about.
This wasn't mentioned, but the constant validation on construction also costs something. Sometimes it's a cost you're willing to pay (again, dealing with external inputs), sometimes it's extraneous because e.g. a typechecker would suffice to catch discrepancies at build time.
Is it? I read the blog a couple of times and never was able to divine any kind of thesis beyond the title, but as you said, the content never actually explains why.
Perhaps there is a reason, but I didn’t walk away from the post with it.
Now I mainly do Python and I don’t see that kind of boilerplate duplication anywhere near as much as I used to. Not going to say the same kind of thing never happens in Python, but the frequency of it sure seems to have declined a lot-often you get a smattering of it in a big Python project rather than it having been done absolutely everywhere
The thesis is simple:
1) A DTO is a projection or a view of a given entity.
2) The "domain entity" itself is a projection of the actual storage in a database table.
3) At different layers (vertical separation), the representation of this conceptual entity changes
4) In different entry/exit points (horizontal separation), the projection of the entity may also change.
In some cases, the domain entity can be used in different modules/routes and are projected to the API with different shapes -- less properties, more properties, transformed properties, etc.Typically, when code has a very well-defined domain layer and separation of the DTO and storage representation, the code has a very predictable quality because if you are working with a `User` domain entity, it behaves consistently across all of your code and in different modules. Sometimes, a developer intermixes a database `User` or a DTO `User` and all of a sudden, the code behaves unpredictably; you suddenly have to be cognizant if the `user` instance you're handling is a `DBUser`, a `UserDTO`, or the domain entity. It has extra properties, missing properties, missing functions, can't be passed into some methods, etc.
Does this matter? I think it depends on 1) the size of the team, 2) how much re-use of the modules is needed, 3) the nature of the service. For a small team, it's overkill. For a module that will be reused by many teams, it has long term dividends. For a one-off, lightweight service, it probably doesn't matter. But for sure, for some core behaviors, having a delineated domain model really makes life easy when working with multiple teams reusing a module.
I find that the code I've worked with over the years that I like has this quality. So if I'm responsible for writing some very core service or shared module, I will take the extra effort to separate my models -- even if there's more duplication required on my behalf because it makes the code more predictable to use if everything inside of the service expects to have only one specific shape and set of behaviors and project shapes outwards as needed for the use case (DTO and storage).
- make it easy to provide
- make it simple to understand
- make it familiar
- deal with security and authentication
- be easily serializable through your communication layer
On the other hand, internal representations have the goal to help you with your private calculations:
- make it performant
- make it work with different subsystems such as persistence, caching, queuing
- provide convenience shortcuts or precalculations for your own benefits
Sometimes they overlap, or the system is not big enough that it matters.
But the more you get to big or old system, the less likely they will.
However, I often pass around pydantic objects if I have them, and I do this until it becomes a problem. And I rarely reach that point.
It's like using Python until you have performance problems.
Practicality beasts premature optimization.
You can translate many things into a Thing, model_validate will help you with that (with contextinfo etc)
You can translate your Thing into multiple output format, with model_serialize
In your model, you shall put every checks required to ensure that some input are, indeed, a Thing
And from there, you can use this object everywhere, certain that this is, indeed, a Thing, and that it has all the properties that makes a thing a Thing
Outside of I/O, the whole machinery has little use. And since pydantic models are used by introspection to build APIs, automatic deserializer and arg parsing, making it fit the I/O is where the money is.
Also, remember that despite all the improved perf of pydantic recently, they are still more expensive than dataclass, themselves more than classes. They are 8 times more expensive to instanciate than regular classes, but above all, attribute access is 50% slower.
Now I get that in Python this is not a primary concern, but still, pydantic is not a free lunch.
I'd say it's also important to state what it conveys. When I see a Pydantic objects, I expect some I/O somewhere. Breaking this expectation would take me by surprise and lower my trust of the rest of the code. Unless you are deep in defensive programming, there is no reason to validate input far from the boundaries of the program.
Apart from what has been said, I find pydantic interesting even in the middle of my code: it can be seen as an overpowered assert
It helps making sure that the complex data structure returned by that method is valid (for instance)
If you have two layers of types, then it becomes much easier to ensure that the interface is stable over time. But the downside is that it will take longer to write and maintain the code.
From the typing lens, it may be useful to consider it from Rice's theorm, and an oversimplification that typing is converting a semantic property to a trivial property. (Damas-Hindley-Milner inference usually takes advantage of a pathological case, it is not formally trivial)
There is no hard fast rules IMHO, because Rice, Rice-Shapiro, and Kreisel-Lacombe-Shoenfield-Tseitin theorms are related to generalized solutions as most undecidable problems.
But Kreisel-Lacombe-Shoenfield-Tseitin deals with programs that are expected to HALT, yet it is still undecidable if one fixed program is equivalent to a fixed other program that always terminates.
When you start stacking framework, domain, and language restrictions, the restrictions form a type of coupling, but as the decisions about integration vs disintegration are always tradeoffs it will always be context specific.
Combinators (maybe not the Y combinator) and finding normal forms is probably a better lens than my attempt at the flawed version above.
If you consider using po?is as the adapter part of the hex pattern, and notice how a service mesh is less impressive but often more clear in the hex form, it may help build intuitions where the appropriate application of the author's suggestions may fit.
But it really is primarily decoupling of restrictions IMHO. Sometimes the tradeoffs go the other way and often they change over time.
If pydantic packages valid input, use that for as long as you can.
Loading stuff from db, you need validation again, either go from binary response to 1 validated type with pydantic, or ORM object that already validates.
Then stop having any extra data types.
Keeping pydantic only at the edge and then abandoning it by reshaping it into another data type is a weird exercise. It might make sense if you have N input types and 1 computation flow but I don’t see how in the world of duck typing you’d need an extra unified data type for that.
You shouldn’t need to validate data coming from the database. IMO, this is a natural consequence of teams abandoning traditional RDBMS best practices like normalization and constraints in favor of heavy denormalization, and strings for everything.
If you strictly follow 3NF (or higher, when necessary), it is literally impossible to have referential integrity violations. There may be some other edge cases that can be difficult to enforce, but a huge variety of data bugs simply don’t exist if you don’t treat the RDBMS as a dumb KV store.
If you have one person maintaining a CRUD app, splitting out DTOs and APIs and all of these abstractions are completely not needed. Usually, you don’t even know yet what the right abstraction is, and making a premature wrong abstraction is WAY worse. Building stuff because you might need it later is a massive momentum killer.
But at some point when the project has grown (if it grows, which it won’t if you spend all your time making wrong abstractions early on), the API team doesn’t want their stuff broken because someone changed a pydantic model. So you start to need separation, not because it’s great or because it’s “the right way” but because it will collapse if you don’t. It’s the least bad option.
Where I'm with you, is that you should take care of your boundaries and muddling the line between your Pydantic domain models and your CRUD models will be painful at some point. If your domain model is changing fast compared to the API you're exposing, that could be an issue.
But that's not a "Pydantic in the domain layer" issue, that's a separation of concerns issue.
> That’s when concerns like loose coupling and separation of responsibilities start to matter more.
1 we were using .dict to introduce pydantic in the mix of other entity schemes and handling this change later was a significant pain in the neck. Some python introspection mechanism that can facilitate deep object recasting might've been nice if possible.
from pydantic import BaseModel
class MyModel(BaseModel): name: str
def dict(self, *args, **kwargs):
return self.model_dump(*args, **kwargs)
PS: thank you, I can think on my own and even failing that, chat gpt is not in closed beta any more.
Come on. We know you’ve seen JavaScript.
The argument against using API models internally is something I agree with but it’s a separate question.
I've authored tens of thousands of lines of Python code in that time - both for research tools and for "production".
I use type hints everywhere in the Python I write but it's simply not enough.
This issue is political and not so much technical as Typescript demonstrates how you can add a beautifully orthogonal and comprehensive type system to a dynamic language, thus improving the language's ergonomics and scaleability.
The political aspect is the fact that early Python promoters decided that sanity checking arguments was not "pythonic" and this dogma/ideology has persisted to this day. The only philosophical basis for this position was that that Python offered no support for simple type checking. And apparently if you didn't/don't "appreciate" this philosophy, it reflected poorly on your software engineering abilities or skill with Python.
To be fair, Python isn't the only language of that era, where promoters went to great lengths to invent alternate-reality bubbles to avoid facing the fact that their pet language had some deep flaws - and actually Perl and C++ circles were even worse and more inward facing.
So the "pythonic" approach suggests having functions just accepting anything, whether it makes sense or not, and allowing your code to blow up somewhere deep in some library somewhere - that you probably didn't even know you're using.
So instead of an error like "illegal create_user(name: str) call: name should be a str but was a float", it's apparently better (more "pythonic") to not provide such feed-back to users of your functions and instead allow them to have to deal with an exception in a 40 line stack trace with something like "illegal indexing of float by dict object" in some source file library your users haven't even heard of.
And yes I include Typescript with Java there because it has it's own version of the Java class ecosystem hell, we just don't notice it yet. Look at any typescript library that's reasonably complicated and try to deduce what some of those input types actually do or mean - be honest. Heck a few weeks back someone posted how they solved a complicated combinatorial problem using Typescripts type system alone.
As to the first problem, I recommend the Parse don't validate post[0]. The essential idea is stop using god objects that do it all, but use specific types to make contracts on what is known. Separate out concerns so there is an UnvalidatedUser (not serialized and lacking a primary key) and a ValidatedUser (committed to the database, has unique username, etc). Basic type hinting should get you the rest of the way to cleaning up code paths where you get some type certainty.
[0] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
BUT when doing heavy computation (c++, not python !) don't forget to convert to plain vectors, Protobuffs are horribly inefficient
A: You control both ends of the serialized line, or: B: The other end of the line expects protobufs.
There are many [de]serialization scenarios where you are interfacing with a third party API. (HTTP/JSON web API, a given IC's comm protocol as defined in its datasheet etc)
might still be challenging to convince proto to output what you want exactly
JSON: UTF-8 Serialization format, where brackets, commas, fields represented by strings etc.
Protobuf: Binary serialization format that makes liberal use of varints, including to define field number, lengths etc. Kind of verbose, but not heinous.
So, you could start and end your journey with the same structs and serialize with either. If you try to send a protobuf to an HTTP API that expects JSON, it won't work! If you try to send JSON to an ESP32 running ESP-Hosted, likewise.
It seems like the author doesn't like depending on `pydantic`, simply because it's a third party dependency. To solve this they introduce another, but more obscure, third party dependency called `dacite`, that converts `pydantic` to `dataclasses`.
It's more likely that `dacite` is going to break your application, than `pydantic`, a library used by millions of users in huge projects, ever will. Not to mention the complexity overhead introduced by this non sense mapping.
Not simply. This is one one of the most important reasons NOT to propagate something through your code. How many millions codebases use it is irrelevant.
It is relevant, because it speaks to the reliability of the dependency. `pydantic` has 24.7k Github stars and was last updated 52 minutes ago.
Adding a random dependency `dacite`, which has 1.9k Github stars, no one has ever heard of, and was last updated 4 months ago, introduces way more complexity and sources of instabilities than propagating `pydantic`.
While it uses pydantic, sqlmodel has not been written by those guys
"Why are there no laws requiring device manufacturers to open source all software and hardware for consumer devices no longer sold?"
I think it's because people (us here included) love to yap and argue about problems instead of just implementing them and iterating on solutions in an organized manned. A good way these days to go about it would be to forego the facade of civility and use your public name to publicly tell your politician to just fuck it, do it it bad, and have plan to UNfuck after you fuck it up, until the fucking problem is fucking solved.
Same goes for UBI and other semi-infuriating issues that seem to (and probably do) have obvious solutions that we just don't try.
I can’t relate yet. Itch how? It doesn’t really go into what the problem is they’re solving.
Thank me later.
And don't forget, you get to duplicate this shit on the frontend too.
And what is a modern app if we aren't doing event-driven microservice architecture? That won't scale!!!! So now I also have to worry about my Avro schema/Protobufs/whateverthefuck. But how does everyone else know about the schema? Avro schema registry! Otherwise we won't know what data is on the wire!
And so on and so on into infinity until I have to tell a PM that adding a column will take me 5 pull requests and 8 deploys amounting to several days of work.
Congratulations on making your own small contribution to a fucking ridiculous clown fiesta.
IshKebab•9h ago
He doesn't even say why you should tediously duplicate everything instead of just using the Pydantic objects - just "You know you don’t want that"! No I don't.
The only reason I've heard is performance... but... you're using Python. You don't give a shit about performance.
photios•7h ago
You're going from a straightforward "Pydantic everywhere" solution to a weird concoction of:
1. Pydantic models
2. "Poor man's Pydantic models" (dataclasses)
3. Obscure third party dependencies (Dacite)
Thanks, I'll pass.
pletnes•7h ago
yedpodtrzitko•7h ago
ensignavenger•5h ago
Pydantic docs do clearly state that multple levels of nesting of Pydantic objects can make it much slower, so it isn't particularly surprising that such models were slow.
franktankbank•5h ago
That's dumb. You may not care about max performance but you've got some threshold where shit gets obviously way to slow to be workable. I've worked with a library heavy on pydantic where it was the bottleneck.
hxtk•5h ago
I worked on a project with a codebase on the order of millions of lines, and many times a response was made by taking an ORM object or an app internal data structure and JSON serializing it. We had a frequent problem where we’d make some change to how we process a data structure internally and oops, breaking API change. Or worse yet, sensitive data gets added to a structure typically processed with that data, not realizing it gets serialized by a response handler.
It was hard to catch this in code review because it was hard to even know when a type might be involved in generating a response elsewhere in the code base.
Switching to a schema-first API design meant that if you were making a change to a response data type, you knew it. And the CODEOWNERS file also knew it, and would bring the relevant parties into the code review. Suddenly those classes of problems went away.
microflash•4h ago
never_inline•3h ago
slt2021•3h ago
Maybe it is true if you artificially limit yourself to a single instance single thread model, due to GIL.
But because nowadays apps can easily be scaled up in many instances, this argument is irrelevant.
one may say that Python has large overhead when using a lot of objects, or that it has GIL, but people learned how to serve millions of users with python easily.
dontlaugh•2h ago
And you make be able to scale to many users, worst case with more machines. But it’ll still costs you a lot more than a faster language. That is extremely relevant, even today.