The issues I see with this approach is when developers stop at this first level of type implementation. Everything is a type and nothing works well together, tons of types seem to be subtle permutations of each other, things get hard to reason about etc.
In systems like that I would actually rather be writing a weakly typed dynamic language like JS or a strongly typed dynamic language like Elixir. However, if the developers continue pushing logic into type controlled flows, eg:move conditional logic into union types with pattern matching, leverage delegation etc. the experience becomes pleasant again. Just as an example (probably not the actual best solution) the "DewPoint" function could just take either type and just work.
> 1 + "1"
(irb):1:in 'Integer#+': String can't be coerced into Integer (TypeError)
from (irb):1:in '<main>'
from <internal:kernel>:168:in 'Kernel#loop'
from /Users/george/.rvm/rubies/ruby-3.4.2/lib/ruby/gems/3.4.0/gems/irb-1.14.3/exe/irb:9:in '<top (required)>'
from /Users/george/.rvm/rubies/ruby-3.4.2/bin/irb:25:in 'Kernel#load'
from /Users/george/.rvm/rubies/ruby-3.4.2/bin/irb:25:in '<main>'
Static typing / dynamic typing refers to whether types are checked at compile time or runtime. "Static" = compile time (eg C, C++, Rust). "Dynamic" = runtime (eg Javascript, Ruby, Excel)
Strong / weak typing refers to how "wibbly wobbly" the type system is. x86 assembly language is "weakly typed" because registers don't have types. You can do (more or less) any operation with the value in any register. Like, you can treat a register value as a float in one instruction and then as a pointer during the next instruction.
Ruby is strongly typed because all values in the system have types. Types affects what you can do. If you treat a number like its an array in ruby, you get an error. (But the error happens at runtime because ruby is dynamically typed - thus typechecking only happens at runtime!).
Sure it stops you from running into "'1' + 2" issues, but won't stop you from yeeting VeryRawUnvalidatedResponseThatMightNotBeAuthorized to a function that takes TotalValidatedRequestCanUseDownstream. You won't even notice an issue until:
- you manually validate
- you call a method that is unavailable on the wrong object.
Related Stack Overflow post: https://stackoverflow.com/questions/2690544/what-is-the-diff...
So yeah I think we should just give up these terms as a bad job. If people mean "static" or "dynamic" then they can say that, those terms have basically agreed-upon meanings, and if they mean things like "the type system prohibits [specific runtime behavior]" or "the type system allows [specific kind of coercion]" then it's best to say those things explicitly with the details filled in.
It says:
> I give the following general definitions for strong and weak typing, at least when used as absolutes:
> Strong typing: A type system that I like and feel comfortable with
> Weak typing: A type system that worries me, or makes me feel uncomfortable
irb(main):001:0> a = 1
=> 1
irb(main):002:0> a = '1'
=> "1"
It doesn't seem that strong to me.[1] https://doc.rust-lang.org/book/ch03-01-variables-and-mutabil...
let a = 1;
let a = '1';
Strongly typing means I can do 1 + '1' variable names and types has nothing to do with it being strongly typed.
https://news.ycombinator.com/item?id=42367644
A month before that:
https://news.ycombinator.com/item?id=41630705
I've given up since then.
This would allow for some nice properties. It would also enable a bunch of small optimisations in our languages that we can't have today. Eg, I could make an integer that must fall within my array bounds. Then I don't need to do bounds checking when I index into my array. It would also allow a lot more peephole optimisations to be made with Option.
Weirdly, rust already kinda supports this within a function thanks to LLVM magic. But it doesn't support it for variables passed between functions.
*Checks watch*
We're going on 45 years now.
But yeah maybe expressive enough refinement typing leads to hard to write and slow type inference engines
I think the reasons are predominantly social, not theoretical.
For every engineer out there that gets excited when I say the words "refinement types" there are twenty that either give me a blank stare or scoff at the thought, since they a priori consider any idea that isn't already in their favorite (primitivistic) language either too complicated or too useless.
Then they go and reinvent it as a static analysis layer on top of the language and give it their own name and pat themselves on the back for "inventing" such a great check. They don't read computer science papers.
procedure Sum_Demo is subtype Index is Integer range 0 .. 10; subtype Small is Integer range 0 .. 10;
Arr : array(Index) of Integer := (others => 0);
X : Small := 0;
I : Integer := Integer'Value(Integer'Image(X)); -- runtime evaluation
begin
for J in 1 .. 11 loop
I := I + 1;
end loop; Arr(I) := 42; -- possible out-of-bounds access if I = 11
end Sum_Demo;This compile, and the compiler will tell you: "warning: Constraint_Error will be raised at run time".
It's a stupid example for sure. Here's a more complex one:
procedure Sum_Demo is
subtype Index is Integer range 0 .. 10;
subtype Small is Integer range 0 .. 10;
Arr : array(Index) of Integer := (others => 0);
X : Small := 0;
I : Integer := Integer'Value(Integer'Image(X)); -- runtime evaluation
begin
for J in 1 .. 11 loop
I := I + 1;
end loop;
Arr(I) := 42; -- Let's crash it
end Sum_Demo;
This again compiles, but if you run it: raised CONSTRAINT_ERROR : sum_demo.adb:13 index check failedIt's a cute feature, but it's useless for anything complex.
Among the popular languages like golang, rust or python typescript has the most powerful type system.
How about a type with a number constrained between 0 and 10? You can already do this in typescript.
type onetonine = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
You can even programmatically define functions at the type level. So you can create a function that outputs a type between 0 to N. type Range<N extends number, A extends number[] = []> =
A['length'] extends N ? A[number] : Range<N, [...A, A['length']]>;
The issue here is that it’s a bit awkward you want these types to compose right? If I add two constrained numbers say one with max value of 3 and another with max value of two the result should be max value of 5. Typescript doesn’t support this by default with default addition. But you can create a function that does this. // Build a tuple of length L
type BuildTuple<L extends number, T extends unknown[] = []> =
T['length'] extends L ? T : BuildTuple<L, [...T, unknown]>;
// Add two numbers by concatenating their tuples
type Add<A extends number, B extends number> =
[...BuildTuple<A>, ...BuildTuple<B>]['length'];
// Create a union: 0 | 1 | 2 | ... | N-1
type Range<N extends number, A extends number[] = []> =
A['length'] extends N ? A[number] : Range<N, [...A, A['length']]>;
function addRanges<
A extends number,
B extends number
>(
a: Range<A>,
b: Range<B>
): Range<Add<A, B>> {
return (a + b) as Range<Add<A, B>>;
}
The issue is to create these functions you have to use tuples to do addition at the type level and you need to use recursion as well. Typescript recursion stops at 100 so there’s limits.Additionally it’s not intrinsic to the type system. Like you need peanno numbers built into the number system and built in by default into the entire language for this to work perfectly. That means the code in the function is not type checked but if you assume that code is correct then this function type checks when composed with other primitives of your program.
I get an error that I can't assign something that seems to me assignable, and to figure out why I need to study functions at type level using tuples and recursion. The cure is worse than the disease.
If you trust the type, then it's fine. The code is safer. In the world of of the code itself things are easier.
Of course like what you're complaining about, this opens up the possibility of more bugs in the world of types, and debugging that can be a pain. Trade offs.
In practice people usually don't go crazy with type level functions. They can do small stuff, but usually nothing super crazy. So type script by design sort of fits the complexity dynamic you're looking for. Yes you can do type level functions that are super complex, but the language is not designed around it and it doesn't promote that style either. But you CAN go a little deeper with types then say a language with less power in the type system like say Rust.
I'll take a modern hindley milner variant any day. Sophisticated enough to model nearly any type information you'll have need of, without blurring the lines or admitting the temptation of encoding complex logic in it.
In practice nobody goes too crazy with it. You have a problem with a feature almost nobody uses. It's there and Range<N> is like the upper bound of complexity I've seen in production but that is literally extremely rare as well.
There is no "temptation" of coding complex logic in it at all as the language doesn't promote these features at all. It's just available if needed. It's not well known but typescript types can be easily used to be 1 to 1 with any hindley milner variant. It's the reputational baggage of JS and frontend that keeps this fact from being well known.
In short: Typescript is more powerful then hindley milner, a subset of it has one to one parity with it, the parts that are more powerful then hindley milner aren't popular and used that widely nor does the flow of the language itself promote there usage. The feature is just there if you need it.
If you want a language where you do this stuff in practice take a look at Idris. That language has these features built into the language AND it's an ML style language like haskell.
While I'm not entirely convinced myself whether it is worth the effort, it offers the ability to express "a number greater than 0". Using type narrowing and intersection types, open/closed intervals emerge naturally from that. Just check `if (a > 0 && a < 1)` and its type becomes `(>0)&(<1)`, so the interval (0, 1).
I also built a simple playground that has a PoC implementation: https://nikeee.github.io/typescript-intervals/
My specific use case is pattern matching http status codes to an expected response type, and today I'm able to work around it with this kind of construct https://github.com/mnahkies/openapi-code-generator/blob/main... - but it's esoteric, and feels likely to be less efficient to check than what you propose / a range type.
There's runtime checking as well in my implementation, but it's a priority for me to provide good errors at build time
subset OneToTen of Int where 1..10:
raws-ec2 —eip —nsu launch
raws-ec2 —q connect | pbcopy
https://raku.land/zef:librasteve/CLI::AWS::EC2-Simplemainly I find Raku (and the community) much -Ofun
(deftype One-To-Ten ()
'(Integer 1 10))
https://play.rust-lang.org/?version=stable&mode=debug&editio...
rust-analyzer gives an error directly in IDE.
type
Foo = range[1 .. 10]
Bar = range[0.0 .. 1.0] # float works too
var f:Foo = 42 # Error: cannot convert 42 to Foo = range 1..10(int)
var p = Positive 22 # Positive and Natural types are pre-defined
[0] - https://nim-lang.org/docs/manual.html#types-subrange-typesOtherwise you could have type level asserts more generally. Why stop at a range check when you could check a regex too? This makes the difficulty more clear.
For the simplest range case (pure assignment) you could just use an enum?
That power of course does come with a price, there does not exist a static analyzer that automatically checks things, even though you can pretty much generate beautiful tests based on specs. I think e.g. Rust teams can have more junior devs safely contribute due to enablement of less variability in code quality - the compiler enforces discipline. Clojure teams need higher baseline discipline but can move incredibly fast when everyone's aligned.
It's saddening to see when Clojure gets outright dismissed for being "untyped", even though it absolutely can change one's perspective about type systems.
I still follow TDD-with-a-test for all new features, all edge cases and all bugs that I can't trigger failure by changing the type system for.
However, red-green-refactor-with-the-type-system is usually quick and can be used to provide hard guarantees against entire classes of bug.
It is always great when something is so elegantly typed that I struggle to think of how to write a failing test.
What drives me nuts is when there are testing left around basically testing the compiler that never were “red” then “greened” makes me wonder if there is some subtle edge case I am missing.
Now I just think of types as the test suite’s first line of defense. Other commenters who mention the power of types for documentation and refactoring aren’t wrong, but I think that’s because types are tests… and good tests, at almost any level, enable those same powers.
However, Im convinced that theyre both part of the same class of thing, and that "TDD" or red/green/refactor or whatever you call it works on that class, not specifically just on tests.
Documentation is a funny one too - I use my types to generate API and other sorts of reference docs and tests to generate how-to docs. There is a seemingly inextricable connection between types and reference docs, tests and how-to docs.
This is where the concept of “Correct by construction” comes in. If any of your code has a precondition that a UUID is actually unique then it should be as hard as possible to make one that isn’t. Be it by constructors throwing exceptions, inits returning Err or whatever the idiom is in your language of choice, the only way someone should be able to get a UUID without that invariant being proven is if they really *really* know what they’re doing.
(Sub UUID and the uniqueness invariant for whatever type/invariants you want, it still holds)
This is one of the basic features of object-oriented programming that a lot of people tend to overlook these days in their repetitive rants about how horrible OOP is.
One of the key things OO gives you is constructors. You can't get an instance of a class without having gone through a constructor that the class itself defines. That gives you a way to bundle up some data and wrap it in a layer of validation that can't be circumvented. If you have an instance of Foo, you have a firm guarantee that the author of Foo was able to ensure the Foo you have is a meaningful one.
Of course, writing good constructors is hard because data validation is hard. And there are plenty of classes out there with shitty constructors that let you get your hands on broken objects.
But the language itself gives you direct mechanism to do a good job here if you care to take advantage of it.
Functional languages can do this too, of course, using some combination of abstract types, the module system, and factory functions as convention. But it's a pattern in those languages where it's a language feature in OO languages. (And as any functional programmer will happily tell you, a design pattern is just a sign of a missing language feature.)
Does this count as a missing language feature by requiring a "factory pattern" to achieve that?
Convention in OOP languages is (un?)fortunately to just throw an exception though.
Nothing stops you from returning Result<CorrectObject,ConstructorError> in CorrectObject::new(..) function because it's just a regular function struct field visibility takes are if you not being able to construct incorrect CorrectObject.
Throwing an error is doing exactly that though, its exactly the same thing in theory.
What you are asking for is just more syntactic sugar around error handling, otherwise all of that already exists in most languages. If you are talking about performance that can easily be optimized at compile time for those short throw catch syntactic sugar blocks.
Java even forces you to handle those errors in code, so don't say that these are silent there is no reason they need to be.
What sucks about OOP is that it also holds your hand into antipatterns you don't necessarily want, like adding behavior to what you really just wanted to be a simple data type because a class is an obvious junk drawer to put things.
And, like your example of a problem in FP, you have to be eternally vigilant with your own patterns to avoid antipatterns like when you accidentally create a system where you have to instantiate and collaborate multiple classes to do what would otherwise be a simple `transform(a: ThingA, b: ThingB, c: ThingC): ThingZ`.
Finally, as "correct by construction" goes, doesn't it all boil down to `createUUID(string): Maybe<UUID>`? Even in an OOP language you probably want `UUID.from(string): Maybe<UUID>`, not `new UUID(string)` that throws.
One way to think about exceptions is that they are a pattern matching feature that privileges one arm of the sum type with regards to control flow and the type system (with both pros and cons to that choice). In that sense, every constructor is `UUID.from(string): MaybeWithThrownNone<UUID>`.
In other words, exceptions are for cases where the programmer screwed up. While programmers screwing up isn't unusual at all, programmers like to think that they don't make mistakes, and thus in their eye it is unusual. That is what sets it apart from environmental failures, which are par for the course.
To put it another way, it is for signalling at runtime what would have been a compiler error if you had a more advanced compiler.
Just Java (and Javascript by extension, as it was trying to copy Java at the time), really. You do have a point that Java programmers have infected other languages with their bad habits. For example, Ruby was staunchly in the "return errors as values and leave exception handling for exceptions" before Rails started attracting Java developers, but these days all bets are off. But the "purists" don't advocate for it.
In my book, that's the most important difference with C, Zig or Go-style languages, that consider that data structures are mostly descriptions of memory layout.
In Haskell:
1. Create a module with some datatype
2. Don't export the datatype's constructors
3. Export factory functions that guarantee invariants
How is that more complicated than creating a class and adding a custom constructor? Especially if you have multiple datatypes in the same module (which in e.g. Java would force you to add multiple files, and if there's any shared logic, well, that will have to go into another extra file - thankfully some more modern OOP languages are more pragmatic here).
(Most) OOP languages treat a module (an importable, namespaced subunit of a program) and a type as the same thing, but why is this necessary? Languages like Haskell break this correspondence.
Now, what I'm missing from Haskell-type languages is parameterised modules. In OOP, we can instantiate classes with dependencies (via dependency injection) and then call methods on that instance without passing all the dependencies around, which is very practical. In Haskell, you can simulate that with currying, I guess, but it's just not as nice.
'null' (and to a large extent mutability) drives a gigantic hole through whatever you're trying to prove with correct-by-construction.
You can sometimes annotate against mutability in OO, but even then you're probably not going to get given any persistent collections to work with.
The OO literature itself recommends against using constructors like that, opting for static factory pattern instead.
Welcome to typescript. Where generics are at the heart of our generic generics that throw generics of some generic generic geriatric generic that Bob wrote 8 years ago.
Because they can’t reason with the architecture they built, they throw it at the type system to keep them in line. It works most of the time. Rust’s is beautiful at barking at you that you’re wrong. Ultimately it’s us failing to design flexibility amongst ever increasing complexity.
Remember when “Components” where “Controls” and you only had like a dozen of them?
Remember when a NN was only a few hundred thousand parameters?
As complexity increases with computing power, so must our understanding of it in our mental model.
However you need to keep that mental model in check, use it. If it’s typing, do it. If it’s rigorous testing, write your tests. If it’s simulation, run it my friend. Ultimately, we all want better quality software that doesn’t break in unexpected ways.
You might go with:
type Expression = Value | Plus | Minus | Multiply | Divide;
interface Value { type: "value"; value: number; }
interface Plus { type: "plus"; left: Expression; right: Expression; }
interface Minus { type: "minus"; left: Expression; right: Expression; }
interface Multiply { type: "multiply"; left: Expression; right: Expression; }
interface Divide { type: "divide"; left: Expression; right: Expression; }
And so on.That looks nice, but when you try to pattern match on it and have your pattern matching return the types that are associated with the specific operation, it won't work. The reason is that Typescript does not natively support GADTs. Libs like ts-pattern use some tricks to get closish at least.
And while this might not be very important for most application developers, it is very important for library authors, especially to make libraries interoperable with each other and extend them safely and typesafe.
You can always enforce nominal types if you really need it.
The danger of that is of course that you provide a ladder over the wall you just built and instead of
temperature_in_f = temperature_in_c.to_fahrenheit()
They now go the shortcut route via numeric representation and may forget the conversion factor. In that case I'd argue it is best to always represent temperature as one unit (Kelvin or Celsius, depending on the math you need to do with it) and then just add a .display(Unit:: Fahrenheit) method that returns a string. If you really want to convert to TemperatureF for a calculation you would have to use a dedicated method that converts from one type to another.The unit thing is of course an example, for this finished libraries like pythons pint (https://pint.readthedocs.io/en/stable/) exist.
One thing to consider as well is that you can mix up absolute values ("it is 28°C outside") and temperature deltas ("this is 2°C warmer than the last measurement"). If you're controlling high energy heaters mixing those up can ruin your day, which is why you could use different types for absolutes and deltas (or a flag within one type). Datetime libraries often do that as well (in python for example you have datetime for absolute and timedelta for relative time)
It's a step past normal "strong typing", but I've loved this concept for a while and I'd love to have a name to refer to it by so I can help refer others to it.
[1] https://doc.rust-lang.org/rust-by-example/generics/new_types...
https://en.wikipedia.org/wiki/Strongly_typed_identifier
> The strongly typed identifier commonly wraps the data type used as the primary key in the database, such as a string, an integer or universally unique identifier (UUID).
Different people draw the line in different places for this. I've never tried writing code that takes every domain concept, no matter how small, and made a type out of it. It's always been on my bucket list though to see how it works out. I just never had the time or in-the-moment inclination to go that far.
Some languages like C++ made a contracts concept where you could make these checks more formal.
As some people indicated the auto casting in many languages could make the implementation of these primitive based types complicated and fragile and provide more nuisance than it provides value.
Relevant terms are "Value object" (1) and avoiding "Primitive obsession" where everything is "stringly typed".
Strongly typed ids should be Value Objects, but not all value objects are ids. e.g. I might have a value object that represents an x-y co-ordinate, as I would expect an object with value (2,3) to be equal to a different object with the same value.
To keep building on history, I'd suggest Hungarian types.
Meta-programming also introduced a notation which was the precursor to Hungarian Notation (page 44,45), so painted types technically pre-date Hungarian Notation.
https://web.archive.org/web/20170313211616/http://www.parc.c...
Relevant quote:
> These examples show that the idea of types is independent of how the objects belonging to the type are represented. All scalar quantities appearing above - column numbers, indices and so forth, could be represented as integers, yet the set of operations defined for them, and therefore their types, are different. We shall denote the assignment of objects to types, independent of their representations, by the term painting. When an object is painted, it acquires a distinguishing mark (or color) without changing its underlying representation. A painted type is a class of values from an underlying type, collectively painted a unique color. Operations on the underlying type are available for use on painted types as the operations are actually performed on the underlying representation; however, some operations may not make sense within the semantics of the painted type or may not be needed. The purpose of painting a type is to symbolize the association of the values belonging to the type with a certain set of operations and the abstract objects represented by them.
"There exists an identifiable programming style based on the widespread use of type information handled through mechanical typechecking techniques. This typeful programming style is in a sense independent of the language it is embedded in; it adapts equally well to functional, imperative, object-oriented, and algebraic programming, and it is not incompatible with relational and concurrent programming."
[1] Luca Cardelli, Typeful Programming, 1991. http://www.lucacardelli.name/Papers/TypefulProg.pdf
Moreover: you can separate types based on admitted values and perform runtime checks. Percentage, Money, etc.
https://lukasschwab.me/blog/gen/deriving-safe-id-types-in-go...
In general, I think this largely falls when you have code that wants to just move bytes around intermixed with code that wants to do some fairly domain specific calculations. I don't have a better way of phrasing that, at the moment. :(
There are cases where you have the data in hand but now you have to look for how to create or instantiate the types before you can do anything with it, and it can feel like a scavenger hunt in the docs unless there's a cookbook/cheatsheet section.
One example is where you might have to use createVector(x, y, z): Vector when you already have { x, y, z }. And only then can you createFace(vertices: Vector[]): Face even though Face is just { vertices }. And all that because Face has a method to flip the normal or something.
Another example is a library like Java's BouncyCastle where you have the byte arrays you need, but you have to instantiate like 8 different types and use their methods on each other just to create the type that lets you do what you wish was just `hash(data, "sha256")`.
Using the right architecture, you could make it so your core domain type and logic uses the strictly typed aliases, and so that a library that doesn't care about domain specific stuff converts them to their higher (lower?) type and works with that. Clean architecture style.
Unfortunately, that involves a lot of conversion code.
I know what a UUID (or a String) is. I don't know what an AccountID, UserID, etc. is. Now I need to know what those are (and how to make them, etc. as well) to use your software.
Maybe an elaborate type system worth it, but maybe not (especially if there are good tests.)
I generally agree that it's easy to over-do, but can be great if you have a terse, dense, clear language/framework/docs, so you can instantly learn about UserID.
Yes, that’s exactly the point. If you don’t know how to acquire an AccountID you shouldn’t just be passing a random string or UUID into a function that accepts an AccountID hoping it’ll work, you should have acquired it from a source that gives out AccountIDs!
never escape anything, either
just hand my users a raw SQL connection
It is however useful to return a UUID type, instead of a [16]byte, or a HTMLNode instead of a string etc. These discriminate real, computational differences. For example the method that gives you a string representation of an UUID doesn't care about the surrounding domain it is used in.
Distinguishing a UUID from an AccountID, or UserID is contextual, so I rather communicate that in the aggregate. Same for Celsius and Fahrenheit. We also wouldn't use a specialized type for date times in every time zone.
I now know I never know whenever "a UUID" is stored or represented as a GUIDv1 or a UUIDv4/UUIDv7.
I know it's supposed to be "just 128 bits", but somehow, I had a bunch of issues running old Java servlets+old Java persistence+old MS SQL stack that insisted, when "converting" between java.util.UUID to MS SQL Transact-SQL uniqueidentifier, every now and then, that it would be "smart" if it flipped the endianess of said UUID/GUID to "help me". It got to a point where the endpoints had to manually "fix" the endianess and insert/select/update/delete for both the "original" and the "fixed" versions of the identifiers to get the expected results back.
(My educated guess it's somewhat similar to those problems that happens when your persistence stack is "too smart" and tries to "fix timezones" of timestamps you're storing in a database for you, but does that wrong, some of the time.)
They are generated with different algorithms, if you find these distinctions to be semantically useful to operations, carry that distinction into the type.
Seems like 98% of the time it wouldn’t matter.
Presumably you need to know what an Account and a User are to use that software in the first place. I can't imagine a reasonable person easily understanding a getAccountById function which takes one argument of type UUID, but having trouble understanding a getAccountById function which takes one argument of type AccountId.
What he means is that by introducing a layer of indirection via a new type you hide the physical reality of the implementation (int vs. string).
The physical type matters if you want to log it, save to a file etc.
So now for every such type you add a burden of having to undo that indirection.
At which point "is it worth it?" is a valid question.
You made some (but not all) mistakes impossible but you've also introduced that indirection that hides things and needs to be undone by the programmer.
> There is a UI for memorialising users, but I assured her that the pros simply ran a bit of code in the PHP debugger. There’s a function that takes two parameters: one the ID of the person being memorialised, the other the ID of the person doing the memorialising. I gave her a demo to show her how easy it was....And that’s when I entered Clowntown....I first realised something was wrong when I went back to farting around on Facebook and got prompted to login....So in case you haven’t guessed what I got wrong yet, I managed to get the arguments the wrong way round. Instead of me memorialising my test user, my test user memorialised me.
I'd much rather deal with the 2nd version than the first. It's self-documenting and prevents errors like calling "foo(userId, accountId)" letting the compiler test for those cases. It also helps with more complex data structures without needing to create another type.
Map<UUID, List<UUID>>
Map<AccountId, List<UserId>>
It's literally the opposite. A string is just a bag of bytes you know nothing about. An AccountID is probably... wait for it... an ID of an Account. If you have the need to actually know the underlying representation you are free to check the definition of the type, but you shouldn't need to know that in 99% of contexts you'll want to use an AccountID in.
> Now I need to know what those are (and how to make them, etc. as well) to use your software.
You need to know what all the types are no matter what. It's just easier when they're named something specific instead of "a bag of bytes".
> https://grugbrain.dev/#grug-on-type-systems
Linking to that masterpiece is borderline insulting. Such a basic and easy to understand usage of the type system is precisely what the grug brain would advocate for.
The main problem with these is how do you actually get the verification needed when data comes in from outside the system. Check with the database every time you want to turn a string/uuid into an ID type? It can get prohibitively expensive.
Coming from C++, this kind of types with classes make sense. But also are a maintenance task with further issues, were often proper variable naming matters. Likely a good balance is the key.
That is, this is perfectly acceptable C:
int x = 10;
myId id = x; // no problems
In Go the equivalent would be an error because it will not, automatically, convert from one type to another just because it happens to be structurally identical. This forces you to be explicit in your conversion. So even though the type happens to be an int, an arbitrary int or other types which are structurally ints cannot be accidentally converted to a myId unless you somehow include an explicit but unintended conversion.This helped me! Especially because you started with typedef from C. Therefore I could relate. Others just downvote and don't explain.
Type userID int64
func Work(u userID) {...}
Work(1) // Go accepts this
I think I recalled that correctly. Since things like that were most of what I was doing I didn't feel the safety benefit in many places, but had to remember to cast the type in others (iirc, saving to a struct field manually).I teach Go a few times a year, and this comes up a few times a year. I've not got a good answer why this is consistent with such an otherwise-explicit language.
Go will not automatically cast a variable of one type to another. That still has to be done explicitly.
func main() {
var x int64 = 1
Func(SpecialInt64(x)) // this will work
Func(x) // this will not work
}
type SpecialInt64 int64
func Func(x SpecialInt64) {
}
https://go.dev/play/p/4eNQOJSmGqDWhen you write 42 in Go, it’s not an int32 or int64 or some more specific type. It’s automatically inferred to have the correct type. This applies even for user-defined numeric types.
Almost nothing is a number. A length is not a number, an age is not a number, a phone number is not a number - sin(2inches) is meaningless, 30years^2 is meaningless, phone#*2 is meaningless, and 2inches+30years is certainly meaningless - but most of our languages permit us to construct, and use, and confuse these meaningless things.
type UserID = string & { readonly __tag: unique symbol }
which always feels a bit hacky.I think it's a much better idea to do:
type UserID = { readonly __tag: unique symbol }
Now clients of `UserID` no longer knows anything about the representation. Like with the original approach you need a bit of casting, but that can be neatly encapsulated as it would be in the original approach anyway.This is going to have the biggest impact on my coding style this year.
Think of the complaints around function coloring with async, how it's "contagious". Checked exceptions have the same function color problem. You either call the potential thrower from inside a try/catch or you declare that the caller will throw an exception.
Incidentally, for exceptions, Java had (b), but for a long time didn't have (a) (although I think this changed?), leading to (b) being abused.
In fact, at each layer, if you want to propagate an error, you have to convert it to one specific to that layer.
That's the point! The whole reason for checked exceptions is to gain the benefit of knowing if a function starts throwing an exception that it didn't before, so you can decide how to handle it. It's a good thing, not a bad thing! It's no different from having a type system which can tell you if the arguments to a function change, or if its return type does.
And if you change a function deep in the call stack to return a different type on the happy path? Same thing. Yet, people don't complain about that and give up on statically type checking return values.
I honestly think the main reason that some people will simultaneously enjoy using Result/Try/Either types in languages like Rust while also maligning checked exceptions is because of the mental model and semantics around the terminology. I.e., "checked exception" and "unchecked exception" are both "exceptions", so our brains lumped those two concepts together; whereas returning a union type that has a success variant and a failure variant means that our brains are more willing lump the failure return and the successful return together.
To be fair, I do think it's a genuine design flaw to have checked and unchecked exceptions both named and syntactically handled similarly. The return type approach is a better semantic model for modelling expected business logic "failure" modes.
So Java's checked exceptions force you to write verbose and pointless code in all the wrong places (the "in the middle" code that can't handle and doesn't care about the exception).
It doesn't, you can just declare that the function throws these as well, you don't have to handle it directly.
This is annoying enough to deal with in concrete code, but interfaces make it a nightmare.
To solve this, Rust does allow you to just Box<dyn Error> (or equivalents like anyhow). And Go has the Error interface. People who list out all concrete error types are just masochists.
It took until version 1.13 to have something better, and even now too many people still do errors.New("....."), because so is Go world.
A problem easily solved by writing business logic in pure java code without any IO and handling the exceptions gracefully at the boundary.
First, the library author cannot reasonably define what is and isn't a checked exception in their public API. That really is up to the decision of the client. This wouldn't be such a big deal if it weren't so verbose to handle exceptions though: if you could trivially convert an exception to another type, or even declare it as runtime, maybe at the module or application level, you wouldn't be forced to handle them in these ways.
Second, to signature brittleness, standard advice is to create domain specific exceptions anyways. Your code probably shouldn't be throwing IOExceptions. But Java makes converting exceptions unnecessarily verbose... see above.
Ultimately, I love checked exceptions. I just hate the ergonomics around exceptions in Java. I wish designers focused more on fixing that than throwing the baby out with the bathwater.
Personally I use checked exceptions whenever I can't use Either<> and avoid unchecked like a plague.
Yeah, it's pretty sad Java language designer just completely deserted exception handling. I don't think there's any kind of improvement related to exceptions between Java 8 and 24.
To me they seem completely isomorphic?
Can we build tools that helps us work with the boundary between isosemantic and isomorphic? Like any two things that are isosemantic should be translatable between each other. And so it represents an opportunity to make the things isomorphic.
try/catch has significantly more complex call sites because it affects control flow.
But after experimenting a bit with checked exceptions, I realized how neglected exceptions are in Java. - There's no other way to handle checked exceptions other than try-catch block - They play very badly with API that use functional interfaces. Many APIs don't provide checked throws variant - catch block can't use generic / parameterized type, you need to catch Exception or Throwable then operate on it at runtime
After rolling my own Either<L,R>, it felt like a customizable typesafe macro for exception handling. It addresses all the annoyances I had with checked exception handling, and it plays nicely with exhaustive pattern matching using `sealed`.
Granted, it has the drawback that sometimes I have to explicitly spell out types due to local type inference failing to do so. But so far it has been a pleasant experience of handling error gracefully.
Semantically from CS point of view in language semantics and type system modelling, they are equivalent in puporse, as you are very well asking about.
https://news.ycombinator.com/item?id=44551088
https://news.ycombinator.com/item?id=44432640
> Your code probably shouldn't be throwing IOExceptions. But Java makes converting exceptions unnecessarily verbose
The problem just compounds too. People start checking things that they can’t handle from the functions they’re calling. The callers upstream can’t possibly handle an error from the code you’re calling, they have no idea why it’s being called.
I also hate IOException. It’s so extremely unspecific. It’s the worst way to do exceptions. Did the entire disk die or was the file not just found or do I not have permissions to write to it? IOException has no meaning.
Part of me secretly hopes Swift takes over because I really like its error handling.
Checked exceptions feel like a bad mix of error returns and colored functions to me.
I also think its a bit cleaner to have a nicely pattern matched handler blocks than bespoke handling at every level. That said, if unwrapped error results have a robust layout then its probably pretty equivalent.
Maybe you mean requests are failing on uncaught exceptions, in which case I'd say it's working well.
Or if they are unable to work, because they keep getting a maintenance page, as the load balancer redirects them after several HTTP 500 responses.
Anyway, you prefer critical workflow like payment to show a success but actually be an unhandled error?
I prefer an happy customer, and not having to deal with support calls.
But for one, Java checked exceptions don't work with generics.
readonly struct Id32<M> {
public readonly int Value { get; }
}
Then you can do: public sealed class MFoo { }
public sealed class MBar { }
And: Id32<MFoo> x;
Id32<MBar> y;
This gives you integer ids that can’t be confused with each other. It can be extended to IdGuid and IdString and supports new unique use cases simply by creating new M-prefixed “marker” types which is done in a single line.I’ve also done variations of this in TypeScript and Rust.
The name means "Value Object Generator" as it uses Source generation to generate the "Value object" types.
That readme has links to similar libraries and further reading.
I want it for a case where it seems very well suited - all customer ids are strings, but only very specific strings are customer ids. And there are other string ids around as well.
IMHO Migration won't be hard - you could allow casts to/from the primitive type while you change code. Temporarily disallowing these casts will show you where you need to make changes.
I don't know yet how "close to the edges" you would have to go back to the primitive types in ordered for json and db serialisation to work.
But it would be easier to get in place in a new "green field" codebase. I pitched it as a refactoring, but the other people were well, "antithetical" is a good word.
I prefer to have the generated code to be the part of the code repo. That's why I use code templates instead of source generators. But a properly constructed ID type has a non-trivial amount of code: https://github.com/vborovikov/pwsh/blob/main/Templates/ItemT...
That is correct, I've looked at the generated code and it's non-trivial, especially when validation, serialisation and casting concerns are present. And when you have multiple id types, and the allowed casts can change over time (i.e. lock it down when the migration is complete)
That's why I'd want it to be common, tested code.
Once you have several of these types, and they have validation and other concerns then the cost-benefit might flip.
FYI, In modern c#, you could try using "readonly record struct" in order to get lots of equality and other concerns generated for you. It's like a "whole library" but it's a compiler feature.
However I disagree in this case - if you have the problem that the library solves and it is ergonomic, then why not use it. Your "5-line-of-code example" doers not cover validation, and serialisation and casting concerns. As another commenter put it: "a properly constructed ID type has a non-trivial amount of code".
If you don't need more lines of code than that, then do your thing. But in the example that I looked at, I definitely would. As I said elsewhere in the thread, it is where all customer ids are strings, but only very specific strings are customer ids.
The larger point is that people who write c# and are reading this thread should know that these toolkits exist - that url links to other similar libraries and further reading. So they can can then make their own informed choices.
[1] enum class from C++11, classic enums have too many implicit conversions to be of any use.
They're fairly useful still (and since C++11 you can specify their underlying type), you can use them as namespaced macro definitions
They're fairly useful still (and since C++11 you can specify their underlying type), you can use them as namespaced macro definitions
Kinda hard to do "bitfield enums" with enum class
I think Rich Hickey was completely right, this is all information and we just need to get better at managing information like we are supposed to.
The downside of this approach is that these systems are tremendously brittle as changing requirements make you comfort your original data model to fit the new requirements.
Most OOP devs have seen atleast 1 library with over 1000 classes. Rust doesn't solve this problem no matter how much I love it. Its the same problem of now comparing two things that are the same but are just different types require a bunch of glue code which can itself lead to new bugs.
Data as code seems to be the right abstraction. Schemas give validation a-la cart while still allowing information to be passed, merged, and managed using generic tools rather than needing to build a whole api for every new type you define in your mega monolith.
This is an important concept to keep in mind. It applies to programming, it applies to politics, it applies to nearly every situation you can think of. Any time you find yourself wishing that everyone would just do X and the world would be a better place, realize that that is never going to happen, and that some people will choose to do Y — and some of them will even be right to do so, because you do not (and cannot) know the specific needs of every human being on the planet, so X will not actually be right for some of them.
Uhuh, so my age and my weight are the same (integers), but just have different types. Okay.
Not because it's a bad idea. Quite the contrary. I've sung the praises of it myself.
But because it's like the most basic way you can use a type system to prevent bugs. In both the sense used in the article, and in the sense that it is something you have to do to get the even more powerful tools brought to bear on the problem that type systems often.
And yet, in the real world, I am constantly explaining this to people and constantly fighting uphill battles to get people to do it, and not bypass it by using primitives as much as possible then bashing it into the strict type at the last moment, or even just trying to remove the types.
Here on HN we debate the finer points of whether we should be using dependent typing, and in the real world I'm just trying to get people to use a Username type instead of a string type.
Not always. There are some exceptions. And considered over my entire career, the trend is positive overall. But there's still a lot of basic explanations about this I have to give.
I wonder what the trend of LLM-based programming will result in after another few years. Will the LLMs use this technique themselves, or will people lean on LLMs to "just" fix the problems from using primitive types everywhere?
+ int doTheThing(bool,bool,int,int);
Die a little bit inside.https://github.com/Mk-Chan/libchess/blob/master/internal/Met... https://github.com/Mk-Chan/libchess/blob/master/Square.h
(Typescript's Zed and Clojure's Malli are counterexamples. Although not official offerings)
Following OP's example, what prevents you from getting a AccountID parsed as a UserID at runtime, in production? In production it's all UUIDs, undistinguishable from one another.
A truly safe approach would use distinct value prefixes – one per object type. Slack does this I believe.
That's part of the point of being static. If we can statically determine properties of the system and use that information in the derived machine code (or byte code or whatever), then we may be able to discard that information at runtime (though there are reasons not to discard it).
> Following OP's example, what prevents you from getting a AccountID parsed as a UserID at runtime, in production? In production it's all UUIDs, undistinguishable from one another.
If you're receiving information from the outside and converting it into data in your system you have to parse and validate it. If the UUID does not correspond to a UserID in your database or whatever, then the attempted conversion should fail. You'd have a guard like this:
if user_db.contains(UserID(uuid)) {
return UserID(uuid)
}
// signal an error or return a None, zero value, null, etc.
Static typing is just a tool, aiming to help with a subset of all possible problems you may find. If you think it's an absolute oracle of every possible problem you may find, sorry, that's just not true, and trivially demonstrable.
Your example already is a runtime check that makes no particular use of the type system. It's a simple "set contains" check (value-oriented, not type-oriented) which also is far more expensive than simply verifying the string prefix of a Slack-style object identifier.
Ultimately I'm not even saying that types are bad, or that static typing is bad. If you truly care about correctness, you'd use all layers at your disposition - static and dynamic.
I think Rich Hickey has a point that bugs like this almost certain get caught by running the program. If they make it into production it usually results in an obscure edge case.
I’m sure there are exceptions but unless you’re designing for the worst case (safety critical etc) rather than average case (web app), types come with a lot of trade offs.
I’ve been on the fence about types for a long time, but having built systems fast at a startup for years, I now believe dynamic typing is superior. Folks I know who have built similar systems and are excellent coders also prefer dynamic typing.
In my current startup we use typescript because the other team members like it. It does help replace comments when none are available, and it stops some bugs, but it also makes the codebase very hard to read and slows down dev.
A high quality test suite beats everything else hands down.
No types anywhere, so making a change is SCARY! And all the original engineers have usually moved on. Fun times. Types are a form of forced documentation after all, and help catch an entire class of bugs. If you’re really lucky, the project has good unit tests.
I think dynamic typing is wonderful for making software quickly, and it can be a force multiplier for startups. I also enjoy it when creating small services or utilities. But for a large web app, you’ll pay a price eventually. Or more accurately…the poor engineer that inherits your code in 10 years will pay the price. God bless them if they try to do a medium sized refactor without types lol. I’ve been on both ends of the spectrum here.
Pros and cons. There’s _always_ a tradeoff for the business.
But most startups aren’t building for 10 years out. If you use a lot of typing, you’ll probably die way before then. But yeah if you’re building a code base for the long term then use types unless you’re disciplined enough to write comments and good code.
As for refactoring, that is exactly what test suites are for.
That is certainly correct... but that doesn't make it a good thing. One wants to catch bugs before the program is running, not after.
There is no duck, just primitive types organized duck-wise.
The sooner you embrace the truth of mereological nihilism the better your abstractions will be.
Almost everything at every layer of abstraction is structure.
Understanding this will allow you to still use types, just not abuse them because you think they are "real".
> The Ecstasy type system is called the Turtles Type System, because the entire type system is bootstrapped on itself, and -- lacking primitives -- solely on itself. An Int, for example, is built out of an Array of Bit, and a Bit is built out of an IntLiteral (i.e. 0 or 1), which is built out of a String, which is an Array of Char, and a Char is built out of an Int. Thus, an Int is built out of many Ints. It's turtles, the whole way down.
[1]:https://xtclang.blogspot.com/2019/06/an-introduction-to-ecst...
Being forced to think early on types has a payoff at the medium complexity scale
from typing import NewType
UserId = NewType("UserId", int)
Also, you can still do integer things with them, such as
> nonsense = UserId(1) + UserId(2)
they constantly try to escape
from the darkness outside & within
by dreaming of type systems so perfect
that no one will need to be good
but the strings that are will shadow
the abstract datatype that pretends to be
Wrapper structs are the idiomatic way to achieve this, and with ExpressibleByStringLiteral are pretty ergonomic, but I wonder if there's a case for something like a "strong" typealias ("typecopy"?) that indicates e.g. "this is just a String but it's a particular kind of String and shouldn't be mixed with other Strings".
I guess the examples in TFA are golang? It's kind of nice that you don't have to define those wrapper types, they do make things a bit more annoying.
In C++ you have to be extra careful even with wrapper classes, because types are allowed to implicitly convert by default. So if Foo has a constructor that takes a single int argument, then you can pass an int anywhere Foo is expected. Fine as long as you remember to mark your constructors as explicit.
In OOP languages as long as the type you want to specialize isn't final you can just create a subclass. It's cheap (no additional wrappers or boxes), easy, and you can specialize behavior if you want to.
Unfortunately for various good reasons Java makes String final, and String is one of the most useful types to specialize on.
MyType extends String;
void foo(String s);
foo(new MyType()); // is valid
Leading to the original problem. I don't want to represent MyType as a String because it's not. StringUtils.trim(String foo);
but myApp.doSomething(AnotherMyType amt);
The latter is saying "I need not any string but a specific kind of string". StringUtils.trim(MyType.toString());
But if you have a function that works with different types you should make it more reusable.
It’s a good marker to yourself or to a review agent
The examples were a bit less contrived than this, encoding business rules where you'd want nickname for most UI but real name for official notifications, and the type system prevented future devs from using the wrong one when adding new UI or emails.
Also relevant https://refactoring.guru/smells/primitive-obsession
That refactoring guru raccoon reminds me of Minix for some reason.
type UserId = string & { readonly __tag: unique symbol };
In Python you can use `NewType` from the typing module: from typing import NewType
from uuid import UUID
UserId = NewType("UserId", UUID)
type UserIs = UUID
Especially and particularly attributes/fields/properties in an enterprise solution.
You want to associate various metadata - including at runtime - with a _value_ and use that as attribute/field/property in a container.
You want to be able to transport and combine these values in different ways, especially if your business domain is subject to many changes.
If you are tempted to use "classes" for this, you will sign up for significant pain later down the road.
I have tried to bring that to the prolog world [2] but I don't think my fellow prolog programmers are very receptive to the idea ^^.
My biggest problem has been people not specifying their units. On our own code end I'm constantly getting people to suffix variables with the units. But there's still data from clients, standard library functions, etc. where the units aren't specified!
Supoose you make two simple types one for Kelvin K and the other for Fahrenheit F or degrees D.
And you implement the conversions between them in the types.
But then you have something like
d: D = 10;
For i=1...100000:
k=f_Take_D_Return_K(d)
d=g_Take_K_Return_D(k)
endThen you will implicitly have many many automatic conversions that are not useful. How to handle this? Is it easily catched by the compiler when the functions are way more complex?
My response is: these conversions are unlikely to be the slow step in your code, don’t worry about it.
I do agree though, that it would be nice if the compiler could simplify the math to remove the conversions between units. I don’t know of any languages that can do that.
For example, it's not my case but it's like having to convert between two image representations (matrix multiply each pixel) every time.
I'm scared that this kind of 'automatic conversion' slowness will be extremely difficult to debug and to monitor.
On your case about swapping between image representations: let’s say you’re doing a FFT to transform between real and reciprocal representations of an image - you probably have to do that transformation in order to do the the work you need doing on reciprocal space. There’s no getting around it. Or am I misunderstanding?
Please don’t take my response as criticism, I’m genuinely interested here, and enjoying the discussion.
When I tried to refactor using types, this kind of problems became obvious. And forced more conversions than intended.
So I'm really curious because, a part from rewriting everything, I don't see how to avoid this problem. It's more natural for some applications to have the data format 1 and for others the data format 2. And forcing one over the other would make the application slow.
The problem arises only in 'hybrid' pipelines when new scientist need to use some existing functions some of them in the first data format, and the others in the other.
As a simple example, you can write rotations in a software in many ways, some will use matrix multiply, some Euler angles, some quaternions, some geometric algebra. It depends on the application at hand which one works the best as it maps better with the mental model of the current application. For example geometric algebra is way better to think about a problem, but sometimes Euler angles are output from a physical sensor. So some scientists will use the first, and the others the second. (of course, those kind of conversions are quite trivial and we don't care that much, but suppose each conversion is very expensive for one reason or another)
I didn't find it a criticism :)
type ID {
AsString string
AsInt int
AsWhatever whatever
}
function new type ID:
return new ID {
AsString: calculateAsString()
AsInt: calculateAsInt()
AsWhatever: calculateAsWhatever()
}
This does assume every representation will always be used, but if that's not the case it's a matter of using some manner of a generic only-once executor, like Go's sync.Once.I agree that would be a good solution, despite that my data is huge, but it assumes the data doesn't change, or doesn't change that much.
[<Measure>] type degC;
[<Measure>] type K;
let degrees_to_kelvin (degrees : float<degC>) : float<K> =
degrees * 1<K/degC> + 273.15<K>
let d = 10.0<degC>
let k : float<K> = degrees_to_kelvin d
The .NET runtime only sees `float`, as the measures have been erased, and constant folding will remove the `*1` that we used to change the measure. The `degrees_to_kelvin` call may also be inlined by the JIT compiler. We could potentially add `[<MethodImpl(MethodImplOptions.AggressiveInlining)>]` to force it to inline when possible, then constant folding may reduce the whole expression down to its result in the binary.The downside to adding the SI into the type system is the SI is not a sound type system. For example:
[<Measure>] type m
[<Measure>] type s
[<Measure>] type kg
[<Measure>] type N = kg*m/s^2
[<Measure>] type J = kg*m^2/s^2
[<Measure>] type Nm = N*m
let func_expecting_torque (t : float<Nm>) = ...
let x = 10.0<J>
func_expecting_torque x
The type system will permit this: using torque where energy is expected, and vice-versa, because they have the same SI unit, but they don't represent the same thing, and ideally it should be rejected. A potential improvement is to include Siano's Orientational Analysis[1], which can resolve this particular unsoundness because the orientations of Nm and J would be incompatible.To the best of my knowledge, the earliest specific term for the concept is painted types (Simonyi, 1976)[0], but I believe this was a new term for a concept that was already known. Simonyi himself quotes (Hoare, 1970)[1]. Hoare doesn't provide a specific term like painted type for the type being defined, but he describes new types as being built from constintuent types, where a singular constituent type is known as the base type.
Simonyi uses the term underlying type rather than base type. Hoare eludes to painted types by what they contain - though he doesn't explicitly require that the new type has the same representation as it's base type - so they're not necessarily equivalent, though in practice they often are. Simonyi made this explicit, which is what we expect from newtype in Haskell - and specifically why we'd specifically use `newtype` instead of `data` with a single (base) constituent.
If you're aware of any other early references, please share them.
---
[0]:https://web.archive.org/web/20170313211616/http://www.parc.c...
> These examples show that the idea of types is independent of how the objects belonging to the type are represented. All scalar quantities appearing above - column numbers, indices and so forth, could be represented as integers, yet the set of operations defined for them, and therefore their types, are different. We shall denote the assignment of objects to types, independent of their representations, by the term painting. When an object is painted, it acquires a distinguishing mark (or color) without changing its underlying representation. A painted type is a class of values from an underlying type, collectively painted a unique color. Operations on the underlying type are available for use on painted types as the operations are actually performed on the underlying representation; however, some operations may not make sense within the semantics of the painted type or may not be needed. The purpose of painting a type is to symbolize the association of the values belonging to the type with a certain set of operations and the abstract objects represented by them.
[1]:https://www.cs.cornell.edu/courses/cs4860/2018fa/lectures/No...
> In most cases, a new type is defined in terms of previously defined constituent types; the values of such a new type are data structures, which can be built up from component values of the constituent types, and from which the component values can subsequently be extracted. These component values will belong to the constituent types in terms of which the structured type was defined. If there is only one constituent type, it is known as the base type.
In the example, they are (it seems) converting between Celsius and Fahrenheit, using floating point. There is the possibility of minor rounding errors, although if you are converting between Celsius and Kelvin with integers only then these rounding errors do not occur.
In some cases, a function might be able to work with any units as long as the units match.
> Public and even private functions should often avoid dealing in floats or integers alone
In some cases it makes sense to use those types directly, e.g. many kind of purely mathematical functions (such as checking if a number is prime). When dealing with physical measurements, bit fields, ID numbers, etc, it does make sense to have types specifically for those things, although the compiler should allow to override the requirement of the more specific type in specific cases by an explicit operator.
There is another article about string types, but I think there is the problem of using text-based formats, that will lead to many of these problems, including needing escaping, etc.
> In any nontrivial codebase, this inevitably leads to bugs when, for example, a string representing a user ID gets used as an account ID
Inevitably is a strong word. I can't recall the last time I've seen such bug in the wild. > or when a critical function accepts three integer arguments and someone mixes up the correct order when calling it.
Positional arguments suck and we should rely on named/keyword arguments?I understand the line of reasoning here, but the examples are bad. Those aren't good reasons to introduce new types. If you follow this advice, you'll end up with an insufferable codebase where 80% LoC is type casting.
Types are like database schemas. You should spend a lot of time thinking about semantics, not simply introduce new types because you want to avoid (hypothetical) programmer errors.
"It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures."
The compiler tests the type is correct wherever you use it. It is also documentation.
Still have tests! But types are great.
But sadly, in practice I don't often use a type per ID type because it is not idiomatic to code bases I work on. It's a project of its own to move a code base to be like that if it wasn't in the outset. Also most programming languages don't make it ergonomic.
func AddMessage(u UserId, m MessageId)
If it's just func AddMessage(userId, messageId string)
it's very easy to accidentally call as AddMessage(messageId, userId)
and then best-case you are wasting time figuring out a test failure, and worst case trying to figure out the bug IRL.V.S. an instant compile error.
I have seen errors like this many times, both written by myself and others. I think it's great to use the type system to eliminate this class of error!
(Especially in languages like Go that make it very low-friction to define the newtype.)
Another benefit if you're working with any sort of static data system is it makes it very easy to validate the data -- e.g. just recursively scan for instances of FooId and make sure they are actually foo, instead of having to write custom logic or schema for everywhere a FooId might occur.
type UserId = `user:${uuid}`;
type OrgId = `org:${uuid}`;
This had the benefit that we could add validation (basic begins with kind of logic) and it was obvious upon visual inspection (e.g. in logs/debugging).1. https://www.typescriptlang.org/docs/handbook/2/template-lite...
I think it's a pretty good idea. I'm just wondering how this translated to other systems.
The only drawback was marshalling the types when they come out of the db layer. Since the db library types were string we had to hard cast them to the correct types, really my only pain. That isn't such a big deal, but it means some object creation and memory waste, like:
// pseudo code:
const results = dbclient.getObjectsByFilter( ... );
return results.map(result => ({
id: result.id as ObjectId,
...
}));
We normally didn't do it, but it would be at that time you could have some `function isObjectId(id:string) : id is ObjectId { id.beginsWith("object:"); }` wrapper for formal verification (and maybe throw exceptions on bad keys). And we were probably doing some type conversions anyway (e.g. `new Date(result.createdAt)`).If we were reading stuff from the client or network, we would often do the verification step with proper error handling.
But depending on the format it can sometimes be tricky to narrow a string back down to that format.
We have type guards to do that narrowing. (see: https://www.typescriptlang.org/docs/handbook/2/narrowing.htm..., but their older example is a little easier to read: https://www.typescriptlang.org/docs/handbook/advanced-types....)
If writing the check is too tricky, sometimes it can just be easier to track the type of a value with the value (if you can be told the type externally) with tagged unions (AKA: Discriminated unions). See: https://www.typescriptlang.org/docs/handbook/typescript-in-5...
And if the formats themselves are generated at runtime and you can use the "unique" keyword to make sure different kinds of data are treated as separate (see: https://www.typescriptlang.org/docs/handbook/symbols.html#un...).
You can combine `unique symbol` with tagged unions and type predicates to make it easier to tell them apart.
from typing import NewType
UserId = NewType('UserId', int) some_id = UserId(524313)
The goal is to encode the information you learn while parsing your data into your type system. This unlocks so many capabilities: better error handling, making illegal states unrepresentable, better compiler checking, better autocompletion etc.
[1]https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
The same argument gets brought up in favor of dynamic typing. The point of typing is that you don't need all those repetitive tests.
Moreover, the coding feedback loop gets shorter since there's no need to wait until the tests run to find out a string was passed in instead of an int (or UserID).
https://beartype.readthedocs.io/en/latest/
Or see their page about performance: https://beartype.readthedocs.io/en/latest/faq/#faq-realtime
See for example in wdoc, my advanced personal RAG library:
https://github.com/thiswillbeyourgithub/wdoc/blob/main/wdoc/...
So you can have for example:
``` import numpy as np
def func(inputA: np.ndarray, inputB: np.ndarray) -> np.ndarray: return np.concat(inputA, inputB) ```
But then you want to modify the code for some reason way later and do this:
``` def func(inputA: np.ndarray, inputB: np.ndarray) -> np.ndarray: if len(inputB.shape) == 1: return np.concat(inputA, inputB) else: return inputB ```
Now imagine that inputB is directly received from some library you imported. So pyright might not be able to check its type and inputB is actually a List. Then you will never get a crash in the version 1. But its types are wrong as inputB is a List.
The version 2 on the other hand will crash as List don't have a shape attribute. Notice also how func returns inputB, propagating the wrong type.
Sure that means the code still works until you modify version 1, but any developpers or LLM that reads func would get the wrong idea about how to modify such code. Also this example is trivial but it can become much much more complicated of course.
This would not be caught in pyright but beartype would. I'm basically using beartype absolutely everywhere I can and it really made me way more sure of my code.
If you're not convinced I'm super curious about your reasoning!
PS: also try/except in python are extremely slow so figuring out types in advance is always AFAIK a good idea performance wise.
Basically it's pure python ultra optimized code that calls "isinstance(a, b)" all the time everywhere. If there is a mismatch it crashes.
Note that you can also set it to warn instead of crash.
See for example in wdoc, my advanced personal RAG system:
https://github.com/thiswillbeyourgithub/wdoc/blob/main/wdoc/...
[1] https://www.velopen.com/blog/adding-type-safety-to-object-id...
zeroCalories•1d ago
sam_lowry_•1d ago
codr7•1d ago
A strong enough type system would be a lot more useful.