As I understand it, the primary purpose of newtypes is actually just to work around typeclass issues like in the examples mentioned at the end of the article. They are specifically designed to be zero cost, because you want to not pay when you work around the type class instance already being taken for the type you want to make an instance for. When you make an abstract data type by not exporting the data constructors, that can be done with or without newtype.
I think OCaml calls these things modules or so. But the concepts are similar. For most cases, when there's one obvious instance that you want, having Haskell pick the instance is less of a hassle.
But while I did nearly half of my career in either OCaml or Haskell, I did all of my OCaml programming and most of my Haskell programming before the recent surge of really good autocompletion systems / AI; and I notice how much they help with Rust.
So the ergonomics of ML-style modules might be perfectly acceptable now, when you have an eager assistant filling in the busy work for obvious cases. Not sure.
In other words the full range of Int?
Is newtype still bad?
In other words how much of this criticism has to do with newtype not providing sub-ranging for enumerable types?
It seems that it could be extended to do that.
Correct fields by...name? By structure? I'm trying to understand.
let full_name = (in: { first: string, last: string }) => in.first + " " + in.last
Then you can use this function on any data type that satisfies that signature, regardless of if it's User, Dog, Manager etc.I think it also makes more sense in immutable functional languages like clojure. Oddly enough I like it in Go too, despite being very different from clojure.
It seems ok in upcoming languages with polymorphic sum types (eg Roc “tags”) though?
Reading TFA now, Pythons NewType seems to be equal to Haskells newtype. Yes, it's a hack for the type checker to work around existing language semantics and feels unergonomic at times when Parse, Don't Validate needs to fall back to plain validation, but I wouldn't call it neither weird nor arbitrary.
The kind of "branding" I'm talking about is a hack only needed for structural typing systems. Consider something inspired by the C locale API, for example:
class RealLocale:
name: str
const C_LOCALE = RealLocale("C");
# Each of these can be passed to *some*, but not all, locale functions,
# which will check for them by identity before doing the logic for `RealLocale`.
singleton NO_LOCALE # used for both "error" and "query"
singleton THREAD_LOCALE
singleton GLOBAL_LOCALE
singleton ENV_LOCALE
In a structural typing system, it is impossible to write a function that takes a union including more than one of `{NO_LOCALE, THREAD_LOCALE, GLOBAL_LOCALE, ENV_LOCALE}`, since they have no contents and thus cannot be distinguished. You have to hack around it, by some kind of casting and/or adding a member that's not actually present/useful at runtime.And this kind of need is quite common. So I maintain that structural typing is not a serious type system proposal.
(Then again, the man proponent for structural typing is TypeScript, which even in $CURRENTYEAR still lacks even a way to specify ubiquitous needs like "integer" or "valid array index").
I'm not saying the nominal approach to types is wrong or bad, I just find my way of thinking is better suited for structural systems. I'm thinking less about the semantics around product_id vs user_id and more about what transforms are relevant - the semantics show up in the domain layer.
Take a vec3 for example, in a structural system you could apply a function designed for a vec2 on it, which has practical applications.
But it's never "just data". My password is different in many ways than my username. Don't you ever log/print it by accident! So even if structurally the same, we MUST treat it different. Hence any approach that always only looks at things structurally is deeply flawed in the context of safe software development.
> The difference is that happens in the domain layer instead of the type layer
This view greatly reduces the usefulness of the type layer though, as that's the only automated tool that can help the domain layer with handling cases like this.
Can a human encode something different by that than what they intended to encode? Certainly. But it's got the highest cost-benefit of any approach to double-checking your code I've found.
What's those layers you are talking about? In my domain-logic code I use types of course so there is no dedicated "type layer".
a { user_pw: PasswordString}
This is what it means to model the domain using types. It is not a separate layer, it is actually using the type system to model domain entities.
type PasswordString struct {
pw string
}
func(p PasswordString) String(token string) { ... }
I guess the point is that you can model your domain using data as well as types.But you haven't hidden the information, it's still a string. You can put the string in a wrapper struct but in a structural system that's not really any different from putting it a list or map - the data is still exposed, and if someone writes code to e.g. log objects by just enumerating all their fields (which is a very natural thing to do in those systems) then it will naturally print out your password and there's not really any way to make it ever not.
> I guess the point is that you can model your domain using data as well as types.
You want both in your toolbox though. Restricting yourself to only having types that are essentially lists, maps, or a handful of primitives has most of the same downsides as restricting yourself to not using types at all.
I guess my point is that a structural type system can still allow for encapsulation.
Those sound like decidedly non-structural features. And couldn't you undermine them by passing it to a function that expects a different `struct { pw string }` and logs its contents?
And yeah, structurally typed languages often have nominal features. They come in useful in a lot of scenarios! Unless you're talking about something like Clojure which is not statically typed.
But that domain layer should make use of the type system! That's where the type system is most useful!
I think type-theoretic safety is a completely different thing to the use of types (and names) in software-as-domain-modeling (for example, but not necessarily OO modelling). At different times, for different people, one perspective is more important than the other. It is important not to confuse the perspectives, and to value both of them, but also to recognise their strengths and weaknesses.
One theme that sometimes emerges is that the type theory people don't care about names at all. Not even field names. Taken to the extreme Customer( name: str; age: int) is just a Tuple[str, int]. The words "Customer", "name", "age" have no place in the code base.
My take is that when you are dealing with computer-scientific abstract things, e.g. a List of T, then there is no need to reference the domain entities; placeholder names like T, x, xs make sense. On the other hand, if you're writing an application that models domain semantics (eg. business rules), writing software amounts to modelling the domain, it should be easy to correlate software entities with the real-world entities that they model. To do this we use descriptive words, including names, domain events, activities and so on. e.g. List[Customer] not List[Tuple[str, int]]. Then again, you could replace all of the type names with A, B, C, ... and all the variable names with w, x, y, .... The example would end up as X[Y[Z,W]], the software would work exactly the same, and you might get some insights into the structure of the system. However, if you're in the business of building a user management system in general this is not going to fly for very long with your workmates or your client. You will have trouble onboarding new developers.
You are right, I should have gone further with the example and used Customer{ age: Age, name: PersonName}.
To get where structural type systems fall down, think about a bad case is when dealing with native state and you have a private long field with a pointer hiding in it used in native calls. Any “type” that provides that long will fit the type, leading to seg faults. A nominal type system allows you to make assurances behind the class name.
Anyways, this was a big deal in the late 90s, eg see opaque types https://en.wikipedia.org/wiki/Opaque_data_type.
class Foo {
public bar = 1;
private _value = 'hello';
static doSomething(f: Foo) {
console.log(f._value);
}
}
class MockFoo { public bar = 1; }
let mock = new MockFoo();
Foo.doSomething(mock); // Fails
Which is why you'd generally use interfaces, either declared or inline.In the pointer example, if the long field is private then it's not part of the public interface and you shouldn't run into that issue no?
You can do a lot just by hiding the private state and providing methods that operate on that private state in the type (using interfaces for example), but that approach doesn’t allow for binary methods (you need to reveal private state on a non-receiver in a method).
interface Foo {
fun doFoo()
}
You can call doFoo() on some value, and that value can refer to its private state that doesn’t appear in Foo.However, if you want to see the private data of an argument, private data has to appear in the signature (or use nominal typing). The easiest example is an equality method that compares private state.
interface Foo {
fun equals(otherFoo: Foo): Boolean
}
The receiver of the equals call can still refer to its private data, but whatever you a provided for otherFoo is only guaranteed to have the equals method. You might be able to deal with this isn’t an opaque type: interface FooModule {
export type t
fun equals(foo: t, otherFoo: t): boolean
makeFoo(): t
}
But you really aren’t doing structural typing anymore, and basically are using t like you would a name.Also, preferred by who?
Structural types does not preclude having some names that prevent mix-ups. Haskell’s `data` keyword doesn’t let you confuse structurally-identical things.
> If someone encodes "RGBColor" and "LinearRGBColor", both structs with 3 floats, your type system wouldn't provide any errors if a LinearRGB color is passed into an RGB calculation.
It 100% would, unless you were silly enough to use a bare tuple to do it. Again, defining a type with `data` in Haskell wouldn’t get confused.
Haskell doesn't let you confuse structurally-indentical things because it is nominal, not structural.
When talking about types like `Meter` and `Yard` in a structural system, the "type" of the data is also data. In a nominal system that data is encoded in the type system, but that's not the only place it can be. For example, if I asked you how far the nearest gas station is, you wouldn't respond with "10", but rather "10 minutes", or "10 kilometres", etc. Both the value and unit of measurement are relevant data, and thus both of those would be part of the structural type as well.
{ unit: yard, value: 20 }
This is real, concrete data that you can see all at once. You can feed it into different functions, create aliases for it (unlike objects where you'd need to make snapshots or copies when they might change), compare it with other data to check equality, transmit it across networks, and work with it easily in other programming languages since they all understand basic types. When you stick with this kind of data, you can use general-purpose functions that work on any data rather than being locked into specific methods tied to particular types or interfaces - methods that won't exist when you move to different languages or systems.In a nominal system you might end up with a generic Measurement<T> type that contains the unit inside, which can help with code reuse but it's not at the same level as pure data.
a function `fn convertYardsToKm(value: i32): i32` doesn't fail when you give it a weight.
Whereas in Rust you'd write something like this:
#[derive(Copy, Clone)]
struct Yard(i32);
#[derive(Copy, Clone)]
struct Km(i32);
#[derive(Copy, Clone)]
struct Lbs(i32);
#[derive(Copy, Clone)]
struct Kg(i32);
and your functions becomes `fn convertYardsToKm(value: Yard): Km`You can group them in an enum
enum Measurement {
Yard(Yard(i32)),
Km(Km(i32)),
Lbs(Lbs(i32)),
Kg(Kg(i32)),
}
(Note that it would be nice if we could refer to `Measurement::Yard` as a type vs have to add a distinct `Yard` type).That way there is no confusion what you're putting in, and what type the output is, which has resulted in for example an emergency landing https://en.wikipedia.org/wiki/Gimli_Glider#Miscalculation_du... and loss of a Mars probe: https://en.wikipedia.org/wiki/Mars_Climate_Orbiter
;)
Nominal type system can be built on top of structural type system with zero runtime overhead, but not vice versa (you'll have to add tags, which will take additional memory space).
The problem with nominal type systems, is that it needs to support parametrized types, otherwise it's hard to impossible to write reusable / generic code.
Range checking can be very annoying to deal with if you take it too seriously. This comes up when writing a property testing framework. It's easy to generate test data that will cause out of memory errors - just pass in maximum-length strings everywhere. Your code accepts any string, right? That's what type signature says!
In practice, setting compile-time limits on string sizes for the inputs to every internal function would be unreasonable. When using dynamically allocated memory, the maximum input size is really a system property: how much memory does the system have? Limits on input sizes need to be set at system boundaries.
When I wrote this blog post, I used a very simple datatype because it was an extraordinarily simple example, but given many of the comments here, it seems it may have been too simple (and thus too contrived). It is only an illustration; don’t read into it too much.
Why would you do this? As soon as you go "outside" the type you lose typechecker guarantees.
The whole point of the article is showing where the compiler can tell you when you're writing code that fails to consider some cases, and how use of `newtype` loses some of these guarantees.
If you need to use your OneToFive someplace that actually wants an integer, that happens at the boundary. Everything else is safer.
In such languages that's the equivalent of a newtype in Haskell.
Providing a proof of program correctness is pretty challenging even in languages that support it. In most cases careful checking of invariants at runtime (where not possible at compile time) and crashing loudly and early is sufficient for reliable-enough software.
It's just an example! Well, if you cannot come up with a good example, maybe you don't have a point.
Really good examples will be rather domain-specific, so it’s perfectly understandable why Alexis would trust her readers to be able to imagine uses that suit their needs.
nixpulvis•6mo ago
weinzierl•6mo ago
They do not solve every problem that constructive data modeling does but in my opinion a large portion of what actually occurs in everyday programs. Since they are zero-cost I'd say their cost-benefit ratio is pretty good.
Ada and Pascal also had handled the "encode the range in the type" nicely for decades.
chiffaa•6mo ago
OptionOfT•6mo ago
https://github.com/rust-lang/rust/commit/58645e06d9121ae3765...
nixpulvis•6mo ago